Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalog Reference : ELRA-S0272
    MEDIA speech database for French
    The MEDIA speech database for French was produced by ELDA within the French national project MEDIA (Automatic evaluation of man-machine dialogue systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT).

    It contains 1,258 transcribed dialogues from 250 adult speakers. The method chosen for the corpus construction process is that of a ‘Wizard of Oz’ (WoZ) system. This consists of simulating a natural language man-machine dialogue. The scenario was built in the domain of tourism and hotel reservation.

    The database is formatted following the SpeechDat conventions and it includes the following items:
    • 1,258 recorded sessions for a total of 70 hours of speech. The signals are stored in a stereo wave file format. Each of the two speech channels is recorded at 8 kHz with 16 bit quantization with the least significant byte first (“lohi” or Intel format) as signed integers.
    • Manual transcription of each session in XML format. Label files were created with the free transcription tool Transcriber (TRS files).
    • Phonetic lexicon containing all the words spoken in the database. Column 1 contains the orthography of the French word. Column 2 shows the frequency of the word. Column 3 contains the pronunciation in SAMPA format. Here is a sample entry of the lexicon:
    1) agitée 3 A/ Z i t e
    • Documentation and statistics are also provided with the database.

    The semantic annotation of the corpus is available in this catalogue and referenced ELRA-E0024 (MEDIA Evaluation Package).

    ISLRN : 195-971-767-455-9
    Project : EVALDA
    application Area : Tourism
    Technical Information
    Distribution medium : Downloadable
    Fileformat : wav
    Contents Click on the arrow to display content.
     speech corpus 
    Members Prices
    Academic - Commercial 5000.00 EUR
    Academic - Research 1000.00 EUR
    Commercial - Commercial 5000.00 EUR
    Commercial - Research 5000.00 EUR
    Non Member Prices
    Academic - Commercial 10000.00 EUR
    Academic - Research 2000.00 EUR
    Commercial - Commercial 10000.00 EUR
    Commercial - Research 10000.00 EUR

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0