Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources. (last update: January 24, 2013)

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.

    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-S0372 : GlobalPhone Thai Pronunciation Dictionary
    The GlobalPhone pronunciation
    dictionaries contain the pronunciations
    of all word forms found in the
    transcription data of the GlobalPhone
    speech & text database. The Thai
    dictionary is divided in 2 sets: a
    small set with 12,420 pronunciation
    entries of 12,420 different words, and
    does not include pronunciation variants,
    and a larger set which contains 25,570
    pronunciation entries of 22,462
    different words units, and includes
    3,108 entries of up to four
    pronunciation variants.

  • ELRA-W0078 : NE3L named entities Arabic corpus
    The Arabic corpus contains 103,363 words
    coming from articles extracted from “Le
    Monde Diplomatique” newspaper, and
    published in 2004. 2 named entity
    categories were taken into account: Time
    and Amount.

  • ELRA-W0079 : NE3L named entities Chinese corpus
    The Chinese corpus contains 79,302 words
    coming from articles extracted from “Le
    Monde Diplomatique” newspaper, and
    published in 2001. 3 named entity
    categories were taken into account:
    Person, Place and Organisation.

  • ELRA-W0080 : NE3L named entities Russian corpus
    The Russian corpus contains 75,784 words
    coming from articles extracted from
    “Izvestia” newspaper, and published in
    1995. 2 named entity categories were
    taken into account: Time and Amount.

  • ELRA-S0371 : PortMedia French and Italian corpus
    This corpus contains 700 transcribed
    dialogues from about 140 French speakers
    and 604 transcribed dialogues from about
    150 Italian speakers (several dialogues
    per speaker). The method chosen for the
    corpus construction process is that of a
    ‘Wizard of Oz’ (WoZ) system. This
    consists of simulating a natural
    language man-machine dialogue. The
    scenario was built in the domain of
    touristic information and reservation. A
    manual transcription and semantic
    annotation of the corpus are provided
    with corresponding wave files.

  • (last update: November 2014)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0