Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Languages
Informations
Purchase procedure & Conditions
Pricing & user licences
How to promote your resources ?
Contact Us
Catalog Reference : ELRA-S0157
NetDC Arabic BNSC (Broadcast News Speech Corpus)
The NetDC Arabic BNSC (Broadcast News Speech Corpus) is a corpus developed by ELDA in the framework of the European-funded project Network of Data Centres (NetDC). The project was done in collaboration with the LDC (Linguistic Data Consortium), which has produced a similar corpus from the news broadcasted by Voice of America Arabic in the United States. The database contains ca. 22.5 hours of broadcast news speech recorded from Radio Orient (France) during a 3-month period between November 2001 and January 2002 (37 broadcast news, including 32 from the 5.55 pm news and 5 from the 10.55 pm news). The language is Standard Arabic from the Middle East region. The database is stored on 1 DVD-ROM. The database was validated by SPEX, the Netherlands, to assess its compliance with NetDC specifications.
Recordings were made through a Sangean ATS 909 radio receiver connected to a desktop PC. Encoding is 16 kHz, 16 bits, single channel. Format is raw PCM (.wav) with header information.
The corpus was segmented, labelled and transcribed manually using the “Transcriber” software, developed by DGA (Délégation Générale pour l'Armement, France) and LDC (Linguistic Data Consortium, USA) (with an additional patch for Arabic). The transcriptions were done in Arabic characters and the software automatically generated the transliterations. Transcriptions include speaker turns, topics, channel information.
Each speech file (extension .wav) has an accompanying ASCII SAM label file with recording information (extension .sam), and an accompanying file with the transcription in xml format (extension .trs) and channel information. A phonetic lexicon in Arabic SAMPA has also been included.
Production
Project :
Network of Distribution Centers (NetDC)
Applications
Applications possible :
Discourse analysis#Speaker verification#Speech recognition
application Area :
Training#Research
Technical Information
development mode :
Manual
Compression :
None
Distribution medium :
DVD
Platform :
PC
Fileformat :
wav
Contents
Click on the arrow to display content.
speech corpus
Language(s) :
Arabic
TEXT_DURATION22.5 hours
TEXT_QUANTISATION16 bits
TEXT_SIGNAL_ENCODINGLinear PCM
TEXT_CLIPPING_RATE_PERCENTAGE16 kHz
Source Channel :
Radio
TEXT_TRANSCRIPTION_ENTRIESOrthographic
Resource files
Validation report
Members Prices
Academic - Commercial 1350.00 EUR
Academic - Research 100.00 EUR
Commercial - Commercial 1350.00 EUR
Commercial - Research 1350.00 EUR
Non Member Prices
Academic - Commercial 2700.00 EUR
Academic - Research 200.00 EUR
Commercial - Commercial 2700.00 EUR
Commercial - Research 2700.00 EUR
Monday 20 May, 2013
10472044 requests since Monday 27 September, 2004
Copyright © 2008
ELRA
ELRACatalogue 0.8.0