Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Languages
Informations
Purchase procedure & Conditions
Pricing & user licences
How to promote your resources ?
Contact Us
Catalog Reference : S0272
MEDIA speech database for French
The MEDIA speech database for French was produced by ELDA within the French national project MEDIA (Automatic evaluation of man-machine dialogue systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT).
It contains 1,258 transcribed dialogues from 250 adult speakers. The method chosen for the corpus construction process is that of a ‘Wizard of Oz’ (WoZ) system. This consists of simulating a natural language man-machine dialogue. The scenario was built in the domain of tourism and hotel reservation.
The database is formatted following the SpeechDat conventions and it includes the following items:
• 1,258 recorded sessions for a total of 70 hours of speech. The signals are stored in a stereo wave file format. Each of the two speech channels is recorded at 8 kHz with 16 bit quantization with the least significant byte first (“lohi” or Intel format) as signed integers.
• Manual transcription of each session in XML format. Label files were created with the free transcription tool Transcriber (TRS files).
• Phonetic lexicon containing all the words spoken in the database. Column 1 contains the orthography of the French word. Column 2 shows the frequency of the word. Column 3 contains the pronunciation in SAMPA format. Here is a sample entry of the lexicon:
1) agitée 3 A/ Z i t e
• Documentation and statistics are also provided with the database.
The semantic annotation of the corpus is available in this catalogue and referenced ELRA-E0024 (MEDIA Evaluation Package).
Production
Project :
EVALDA
Applications
application Area :
Tourism
Technical Information
Distribution medium :
DVD
Fileformat :
wav
Contents
Click on the arrow to display content.
speech corpus
Language(s) :
French
TEXT_BYTE_ORDERLo-Hi
TEXT_DATA_FORMATSigned integer
TEXT_DURATION70 hours
TEXT_QUANTISATION16 bits
TEXT_CLIPPING_RATE_PERCENTAGE8 kHz
Source Channel :
Telephone
TEXT_TASKTourism and Hotel reservation
TEXT_TRANSCRIPTION_ENTRIESOrthographic
Members Prices
Academic - Commercial 5000.00 EUR
Academic - Research 1000.00 EUR
Commercial - Commercial 5000.00 EUR
Commercial - Research 5000.00 EUR
Non Member Prices
Academic - Commercial 10000.00 EUR
Academic - Research 2000.00 EUR
Commercial - Commercial 10000.00 EUR
Commercial - Research 10000.00 EUR
Tuesday 09 February, 2010
4430870 requests since Monday 27 September, 2004
Copyright © 2008
ELRA
ELRACatalogue 0.8.0