Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalog Reference : ELRA-S0219
    NEMLAR Broadcast News Speech Corpus
    This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Written Corpus (ELRA-W0042) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220).

    The Nemlar Broadcast News Speech Corpus consists of about 40 hours of Standard Arabic news broadcasts. The broadcasts were recorded from four different radio stations: Medi1, Radio Orient, RMC – Radio Monte Carlo, RTM – Radio Television Maroc.

    Each broadcast contains between 25 and 30 minutes of news and interviews (259 distinct speakers identified). The recordings were carried out at three different periods between 30 June 2002 and 18 July 2005. All files were recorded in linear PCM format, 16 kHz, 16 bit.

    The software used for the transcription is Transcriber with the additional patch for Arabic. Thus the transcriptions were done in Arabic characters and the software automatically generated the transliterations. The following annotation levels are included:
    • Orthographic transcription of speech (in news, not in music, commercials, etc.), including Named Entities
    • Speakers and speaker turns
    • Segment markers (portions of maximum 10 seconds)
    • Topic/story boundaries
    • Background noises (stationary and instantaneous noise events)
    • Change of background
    • Music/Noise
    • Word boundaries

    A lexicon of 62,000 words with transliterations, frequency and SAMPA for Arabic is also included.

    The database is distributed in 1 ISO 9660 DVD-ROM volume. It has been validated by an external partner and a validation report is provided.

    ISLRN : 479-507-036-103-9
    Project : NEMLAR (Network for Euro-Mediterranean LAnguage Resources)
    Technical Information
    Distribution medium : Downloadable
    Contents Click on the arrow to display content.
     speech corpus 
    Resource files
  • ICON_FILE_DOWNLOAD Validation report
    Members Prices
    Academic - Commercial 2000.00 EUR
    Academic - Research 150.00 EUR
    Commercial - Commercial 2000.00 EUR
    Commercial - Research 500.00 EUR
    Non Member Prices
    Academic - Commercial 4000.00 EUR
    Academic - Research 300.00 EUR
    Commercial - Commercial 4000.00 EUR
    Commercial - Research 1000.00 EUR

    Special Prices

    Discounts are available if you purchase several NEMLAR resources (W0042, S0219 and S0220):
    • 15% discount for 2 resources,
    • 30% discount for 3 resources.

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0