ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources.


    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.


    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-E0046 : ETAPE Evaluation Package
    The ETAPE Evaluation Package consists of
    ca. 30 hours of radio and TV data,
    selected to include mostly non planned
    speech and a reasonable proportion of
    multiple speaker data. All data were
    carefully transcribed, including named
    entity annotation. This package
    includes the material that was used for
    the ETAPE evaluation campaign. It
    includes resources, scoring tools,
    results of the campaign, etc., that were
    used or produced during the campaign.
    The aim of this evaluation package is to
    enable external players to evaluate
    their own system and compare their
    results with those obtained during the
    campaign itself.

  • ELRA-S0387 : SALA II US English database (2000 speakers)
    The SALA II US English database
    comprises ca 2,000 US English speakers
    (equally balanced between males and
    females, and including some speakers
    with Hispanic accents) recorded over the
    United States mobile telephone network.

  • ELRA-W0113 : TRAD Chinese-English Email Parallel corpus – Development Set
    This is a parallel corpus of 15,000
    characters in Chinese (equivalent to
    10,000 words) and a reference
    translation in English. The source texts
    are a selection of private emails
    collected from the daily life and
    business domains.

  • ELRA-W0114 : TRAD Chinese-French Email Parallel corpus – Development Set
    This is a parallel corpus of 15,000
    characters in Chinese (equivalent to
    10,000 words) and a reference
    translation in French. The source texts
    are a selection of private emails
    collected from the daily life and
    business domains.

  • ELRA-W0115 : TRAD Chinese-English Email Parallel corpus – Test Set
    This is a parallel corpus of 15,000
    characters in Chinese (equivalent to
    10,000 words) and 2 reference
    translations in English. The source
    texts are a selection of private emails
    collected from the daily life and
    business domains.

  • (last update: March 2017)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0