Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources. (last update: January 24, 2013)

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.

    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-S0366 : LECTRA (LECture TRAnscriptions in European Portuguese)
    This corpus is composed of the audio and
    the manual transcriptions from seven
    1-semester University courses in
    Portuguese. The corpus contains a total
    of 28 hours of audio speech that were
    manually transcribed by several trained
    annotators. The corpus is comprised of
    technical University lectures.

  • ELRA-S0367 : CORAL Corpus
    The CORAL Corpus is a collection of
    spoken dialogues in European Portuguese.
    It consists of 56 dialogues about a
    predetermined subject: maps. One of the
    participants (giver) has a map with some
    landmarks and a route drawn between
    them; the other (follower) has also
    landmarks, but no route and consequently
    must reconstruct it. Only orthographic
    transcription was done for the whole
    corpus. A pilot recording was annotated
    in several levels.

  • ELRA-S0370 : MoveOn Speech and Noise Corpus
    The MoveOn Speech and Noise Corpus is a
    corpus recorded under the extreme
    conditions of the motorcycle environment
    within the MoveOn project. The speech
    utterances are in British English
    approaching the issue of command and
    control and template driven dialog
    systems with a focus on – but not
    limited to - the police domain. The
    major part of the corpus comprises noisy
    speech and environmental noise recorded
    on a motorcycle. Several clean speech
    recording sessions with the same
    recording setup (including the
    motorcycle helmet) in an office
    environment complete the corpus.

  • ELRA-S0368 : Nepali Spoken Corpus
    The Nepali Spoken Corpus contains audio
    recordings from different social
    activities within their natural settings
    as much as possible, with phonologically
    transcribed and annotated texts, and
    information about the participants. A
    total of 17 types of activity were
    recorded. The total temporal duration of
    the recorded material is 31 hours and 26

    CLIPS_MT_MANUAL is a sub-corpus of the
    original Italian CLIPS corpus (Corpora e
    Lessici dell'Italiano Parlato e
    Scritto). This corpus contains 3228
    inspected and partially repaired WAV
    signal files, each containing one
    dialogue turn (*.wav), 3228 corrected
    original CLIPS annotation files (*.acs,
    *.phn, *.std, *.wrd), 3228 BAS Partitur
    files containing the annotation tiers
    ORT, KAN and SAP (*.par), 3228 EMU
    database annotation files (*.vot, *.hlb)
    covering 30 maptask dialogues performed
    by 30 speakers (each speaker pair
    performing two different map tasks)
    recorded in 15 different locations in
    Italy in 2000-2004.

  • (last update: July 2014)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0