ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources. (last update: January 24, 2013)


    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.


    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-L0094 : CEPLEXicon
    CEPLEXicon results from the automatic
    tagging of two corpora, using a tagger
    and the POS tag set. The automatic
    tagging was followed by a partial manual
    revision. This lexicon covers all the
    speech produced by seven monolingual
    Portuguese children aged 1;02.00 to
    3;11.12, in a total of 114 files, each
    corresponding to 40-50 minutes of
    child-adult interaction in a
    naturalistic setting. The lexicon is
    presented in .xls format and includes
    2201 lemmas, the number of occurrences
    of each lemma in three different age
    periods, frequency of the lemma in each
    period and age of first occurrence for
    each child.

  • ELRA-W0083 : deL1L2IM corpus
    The deL1L2IM corpus is composed of 72
    dialogues, each of them having a
    duration of 20 to 45 minutes. The whole
    corpus contains ca. 52,000 words and
    4,800 messages and has a file size of
    0.5 Mb. Nine pairs of participants –
    i.e. nine learners and four native
    speakers – were required, with 8
    dialogues per pair. The interactions
    have undergone linguistic analysis
    whereby the annotation will be performed
    only on repair/correction sequences
    (incomplete learner error annotation).
    The corpus is delivered in one written
    text file (in XML format, customized
    under TEI P5).

  • ELRA-E0045 : MAURDOR Evaluation Package
    The MAURDOR project consists in
    evaluating systems for automatic
    processing of written documents.
    Collected written documents are scanned
    documents (printed, typewritten or
    manuscripts). This package contains
    8,129 documents. Once collected, those
    documents were submitted to a manual
    annotation. This package contains the
    material provided to the evaluation
    campaign participants: - Consistent
    development and test data corresponding
    to the application concerned; - Tools
    for the automatic measurement of system
    performances; - A common assessment
    protocol applicable to each processing
    stage, along with a complete automatic
    processing chain for written
    documents. The documents are provided
    in TIFF format and the annotations are
    provided in XML format.

  • ELRA-E0044 : REPERE Evaluation Package
    The REPERE Evaluation Package contains
    the visual annotation of 60 hours of
    French news TV shows, for the purpose of
    person recognition within TV programs.
    This annotation concerns both persons
    and written information appearing on
    screen. Provided data consists of: -
    video files with indexes and with manual
    transcriptions in XGTF format
    (Viper), - audio files compressed in
    WAV format with transcriptions in TRS
    format (Transcriber).

  • ELRA-W0082 : 88milSMS. A corpus of authentic text messages in French
    A pluridisciplinary team of linguists
    and computer scientists collected more
    than 88,000 French authentic text
    messages in Montpellier (2011), as part
    of the sud4science LR project. The text
    messages were semi-automatically
    anonymised, before being partially
    transcoded (into standardised French)
    and annotated.

  • (last update: April 2015)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0