ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.


    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • S0308 : Egyptian Arabic Speecon database
    The Egyptian Arabic Speecon database
    comprises the recordings of 550 adult
    Egyptian speakers and 50 child Egyptian
    speakers who uttered respectively over
    290 items and 210 items (read and
    spontaneous).

  • T0374 : Terminology database of natural sciences
    This dictionary covers the three
    kingdoms: Animal, Vegetal, Mineral. It
    contains 50,000 species with numerous
    synonyms in French, English and Latin
    and many breeds and varieties. Minerals
    are given with their chemical formula.
    About 7,900 definitions in French are
    included. It also includes synonyms and
    linguistic variants.

  • W0053 : Catalan-Spanish Parallel Corpus
    This corpus contains more than 100
    million words and it contains 10 years
    of bilingual articles from “El Periódico
    de Catalunya”. The data are aligned at
    sentence level and stored in text files,
    in a one sentence per line basis. The
    data are provided in plain text, with no
    encoding whatsoever.

  • S0307 : BABEL Polish database
    The BABEL Polish Database is a speech
    database that was produced by a research
    consortium funded by the European Union
    under the COPERNICUS programme
    (COPERNICUS Project 1304). It consists
    of the basic "common" set which contains
    the Many Talker Set (30 males, 30
    females), the Few Talker Set (5 males, 5
    females), the Very Few Talker Set (1
    male, 1 female).

  • S0305 : EPAC Corpus: orthographic transcriptions
    This corpus consists of approx. 100
    hours of manual orthographic
    transcriptions, which were produced from
    1,677 hours of non transcribed
    recordings from the ESTER Evaluation
    Campaign (Technolangue programme). This
    corpus also consists of automatic
    transcriptions of the full 1,677 hours.

  • (last update: July 2010)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0