Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources.

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.

    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-W0119 : Helsinki Corpus of Swahili
    This is a text corpus of Swahili
    language of 25 million words, annotated
    for part-of-speech, morphology and
    syntax. The corpus contains prose text
    from domains such as fiction, news media
    and government documents, from the
    period between 1953 and 2016.

  • ELRA-W0120 : NUM 5M Mongolian written corpus
    This is a corpus of Mongolian text
    mostly from domains like online or
    printed daily newspapers, literature,
    and laws. Part of this corpus, about
    2,800 sentences with 100,000 words, has
    been POS-tagged manually and stored in
    XML TEI format.

  • ELRA-S0393 : Persian Speech Corpus
    This speech corpus was recorded through
    a "Blubbery" model microphone by one
    male speaker in Persian (Tehrani accent)
    in a professional studio. Synthesized
    speech as an output using this corpus
    has produced a high quality, natural
    voice. It consists of 399 utterances for
    a total of about 2.5 hours, with
    orthographic and phonetic

  • ELRA-W0118 : English-Persian parallel corpus
    The English-Persian parallel corpus
    contains more than 200,000 aligned
    sentences across a variety of text types
    from the domains of art, law, culture,
    science, religion, literature, medicine,
    idioms, politics and others. It is an
    extension of the English-Persian
    parallel corpus already distributed by
    ELRA (Catalogue Reference: ELRA-W0051).
    This new version of the corpus is
    distributed with a concordance program.

  • ELRA-S0392 : Pashto phonetic lexicon
    This is a phonetic lexicon of 21,560
    words in Pashto transcribed manually by
    a native Pashto speaker (Yusufzai
    dialect) using the IPA Pashto phoneme

  • (last update: August 2017)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0