Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Languages
Informations
Purchase procedure & Conditions
Pricing & user licences
How to promote your resources ?
Contact Us
Catalog Reference : S0239
N4 (NATO Native and Non Native) database
Speech technology is covering an increasing number of languages, and systems are becoming more robust with regard to speech variability such as speaking style and accents. However, for real applications, especially in a multilingual and multinational context, further robustness to regional and even non-native accents is necessary. Among numerous corpora available for speech research few have specifically addressed this issue.
The NATO Speech and Language Technology group decided to create a corpus geared towards the study of non-native accents. The group chose naval communications as the common task because it naturally includes a great deal of non-native speech and because there were training facilities where data could be collected in several countries.
The N4 NATO Native and Non-Native Speech corpus was developed by the NATO research group on Speech and Language Technology in order to provide a military-oriented database for multilingual and non-native speech processing studies.
Speech data was recorded in the naval transmission training centers of four countries (Germany, The Netherlands, United Kingdom, and Canada) during naval communication training sessions in 2000-2002. The material consists of native and non-native speakers using NATO Naval English procedure between ships where the typical sentence sounds like “This is alpha, whiskey, roger. I make two seven zero six hostile, two seven zero six. Out”, and reading from a text, "The North Wind and the Sun," in both English and the speaker's native language.
The audio material was recorded on DAT and downsampled to 16kHz-16bit, and all the audio files have been manually transcribed and annotated with speakers identities using the Transcriber tool. Navy procedure recordings and text readings have been stored in different files. The first digit in the filename indicates the type of speech.
Among speech segments, the duration of Navy procedure recordings range from 1.3 to 2.3 hours for a total of 7.5 hours. The duration of the native language text readings range from 1.5 minutes to 22.9 minutes for a total of around one hour.
Canada
Germany
The Netherlands
United Kingdom
All
Signal
5.30
3.20
5.00
6.30
19.80
Silence
3.00
0.56
2.00
4.70
Speech
2.30
2.64
3.00
1.60
Speech
2.30
2.64
3.00
1.60
9.54
Navy proc
2.00
1.90
2.30
1.30
Read text
0.30
0.74
0.70
0.30
Read text
0.30
0.74
0.70
0.30
2.04
Non-native
0.27
0.37
0.32
0.00
Native
0.03
0.37
0.38
0.30
The database contains the following information about each speaker: gender, age, weight, length, possible speaking or hearing disorders, education level, living area, accent, second language, the year English was learned(for non-native speakers). The speaker accents vary widely from country to country. The speaker's average age was 22.6 years. Nineteen women participated, accounting for 18% of the study participants. There were a total of 115 speakers.
Canada
Germany
The Netherlands
United Kingdom
All
#Speakers
22
51
31
11
115
#Women
5
0
9
5
19
Age
22-35
17-23
17-61
19-62
17-62
Age mean
28.3
20.1
21
27.5
22.6
Technical Information
Distribution medium :
DVD
Contents
Click on the arrow to display content.
speech corpus
Language(s) :
English
TEXT_QUANTISATION16 bits
TEXT_CLIPPING_RATE_PERCENTAGE16 kHz
Source Channel :
Microphone
Members Prices
Academic - Research 400.00 EUR
Commercial - Research 400.00 EUR
Non Member Prices
Academic - Research 500.00 EUR
Commercial - Research 500.00 EUR
Saturday 31 July, 2010
5266546 requests since Monday 27 September, 2004
Copyright © 2008
ELRA
ELRACatalogue 0.8.0