ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.


    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-S0338 : ESTER 2 Corpus
    ESTER 2 Corpus, produced within the
    ESTER 2 evaluation campaign, consists of
    a manually transcribed radio broadcast
    news corpus amounting about 100 hours
    and quick transcriptions of African
    radios amounting about 6 hours. An
    annotation of named entities is provided
    within the development data (about 6
    hours).

  • ELRA-S0342 : Acoustic database for Polish concatenative speech synthesis
    This database consists of 1443 nonsense
    words including all the diphones for the
    Polish language. The database includes
    information such as: the name of the
    diphone, context of the diphone,
    phonetic transcription in SAMPA,
    identifier of the wave file where it is
    placed, and three numbers: the
    beginning, the middle and the end of the
    diphone.

  • ELRA-E0035 : DEFT'08 Evaluation Package
    DEFT (DEfi Fouille de Texte – Text
    Mining Challenge) organizes evaluation
    campaigns in the field of text mining.
    The topic of DEFT 2008 edition is
    related to the classification of texts
    by topics and genres. DEFT’08 Evaluation
    Package enables to compare two corpora
    with different genres (a newspaper
    article corpus extracted from Le Monde
    newspaper and a corpus of encyclopaedic
    articles extracted from the internet
    free encyclopaedia, Wikipedia) on the
    basis of the same set of pre-defined
    categories.

  • ELRA-E0040 : MEDAR Evaluation Package
    The MEDAR Evaluation Package was
    produced within the project MEDAR
    (MEDiterranean ARabic language and
    speech technology), supported by the
    European Commission's ICT programme. It
    aims to enable the evaluation of SLT /MT
    (Machine Translation) systems for
    translation tasks applying to the
    English-to-Arabic direction.

  • ELRA-L0088 : Arabic Morphological Dictionary
    The Arabic Morphological Dictionary
    contains 7,912,551 entries, including
    6,247,291 nouns, 1,537,499 verbs,
    127,563 adjectives, 198 grammatical
    words. All files are provided as plain
    text in UTF8 character encoding, which
    represents about 154 Mb of data.

  • (last update: May 2012)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0