Search and Browse – ELRA Catalogue

English

ID: ELRA-S0389

The Accented English part of the GlobalPhone resources contains 63 recording sessions of Bulgarian, Chinese, German, and Indian native speakers reading 37 English sentences each, produced in GlobalPhone-style, i.e. 16kHz PCM encoded audio recordings of utterance-segmented read speech from the new...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	600.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	700.00 €	3600.00 €
Licence: Commercial Use - ELRA VAR	3600.00 €	3600.00 €

ACCOR - English audio

English

ID: ELRA-S0001

ISLRN: 936-783-643-804-4

ACCOR is a unique acoustic and articulatory database recorded as part of the ESPRIT- ACCOR project investigating cross-language acoustic-articulatory correlations in coarticulatory processes. The European Languages covered are: Catalan, English, French, German, Irish Gaelic, Italian and Swedish. ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	25.00 €
Licence: Commercial Use - ELRA VAR	25.00 €	25.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	75.00 €
Licence: Commercial Use - ELRA VAR	75.00 €	75.00 €

ACL RD-TEC: A Reference Dataset for Terminology Extraction and Classification Research in Computational Linguistics text

English

ID: ELRA-T0375

ISLRN: 699-305-362-089-6

Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in wh...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

Acoustic database for Polish concatenative speech synthesis audio

Polish

ID: ELRA-S0342

ISLRN: 305-222-372-690-4

This database consists of 1443 nonsense words including all the diphones for the Polish language. The diphone is always placed at an unstressed syllable. The neighbourhood doesn’t influence the co-articulation of the diphone. The database includes information such as: the name of the diphone, co...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	35.00 €	35.00 €
Licence: Commercial Use - ELRA VAR	35.00 €	35.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	100.00 €
Licence: Commercial Use - ELRA VAR	100.00 €	100.00 €

Acoustic database for Polish unit selection speech synthesis audio

Polish

ID: ELRA-S0339

ISLRN: 981-910-282-065-4

This database contains parliamentary statements and newspaper reviews read by a semi-professional male speaker. It consists of a selection of 2150 sentences annotated and manually verified, including 100 rare phonemes in words. Prompts vary in length from 2.3 to 13.4 seconds, with an average leng...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	250.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

aGender

German

ID: ELRA-S0365

ISLRN: 038-476-412-610-4

aGender contains speech sample recordings over public telephone lines with read and (semi-)spontaneous speech. Native German speakers called a voice portal from their private phone, and read text + answered some open questions. The purpose of the corpus is the automatic detection of gender and/or...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	327.00 €	8127.00 €
Licence: Commercial Use - ELRA VAR	8127.00 €	8127.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	455.00 €	8255.00 €
Licence: Commercial Use - ELRA VAR	8255.00 €	8255.00 €

Albayzin corpus audio

Spanish; Castilian

ID: ELRA-S0089

ISLRN: 443-392-902-600-9

This corpus consists of 3 sub-corpora of 16 kHz 16 bits signals, recorded by 304 Castillian speakers. The 3 sub-corpora are: - Phonetic corpus: 6,800 utterances of phonetically balanced sentences, including 1000 with phonetic segmentation. - Geographic corpus: 6,800 utterances of sentences ext...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1000.00 €	10000.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2000.00 €	12000.00 €
Licence: Commercial Use - ELRA VAR	12000.00 €	12000.00 €

Special offers are also available. Check here for details.

Alcohol Language Corpus (BAS ALC) audio

German

ID: ELRA-S0299

ISLRN: 780-368-852-139-3

ALC contains recordings of German speakers that are either intoxicated or sober. The type of speech ranges from read single digits to full conversation style. Recordings were done during drinking test where speakers drank beer or wine to reach a self-chosen level of alcoholic intoxication. The ac...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	510.00 €	510.00 €
Licence: Commercial Use - ELRA VAR	510.00 €	510.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1020.00 €	1020.00 €
Licence: Commercial Use - ELRA VAR	1020.00 €	1020.00 €

Al-Hayat Arabic Corpus text

Arabic

ID: ELRA-W0030

ISLRN: 365-777-769-398-7

The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open University. The corpus contains Al-Hayat newspaper articles with value added for Language Engineering and Information Retrieval applications development purposes. The data have ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	480.00 €	960.00 €
Licence: Commercial Use - ELRA VAR	960.00 €	960.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	720.00 €	1440.00 €
Licence: Commercial Use - ELRA VAR	1440.00 €	1440.00 €

ALLIES Corpus audio

French

ID: ELRA-S0486

ISLRN: 397-116-696-859-2

The ALLIES Corpus was produced within the European CHIST-Era project ALLIES. The ALLIES project enabled to carry out a campaign for the evaluation of Broadcast News across time diarization systems using French data. This project is an extension of the previous ESTER, REPERE and ETAPE evaluation c...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	375.00 €	6250.00 €
Licence: Commercial Use - ELRA VAR	25000.00 €	25000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2500.00 €	9375.00 €
Licence: Commercial Use - ELRA VAR	31250.00 €	31250.00 €

American/Canadian English Speech Recognition Corpus (headset+mobile) audio

English

ID: ELRA-S0228-102

ISLRN: 992-319-311-431-0

This corpus comprises 12,974 entries uttered by 30 speakers (15 males and 15 females), recorded over 2 channels (headset and mobile in noisy restaurant/shopping mall/info center/hospital/station/car). Speech samples are stored as a sequence of 16-bit 48kHz for a total of 12 hours of speech per ch...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3600.00 €	3600.00 €
Licence: Commercial Use - ELRA VAR	3600.00 €	3600.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3600.00 €	3600.00 €
Licence: Commercial Use - ELRA VAR	3600.00 €	3600.00 €

American English Conversational Speech Recognition Corpus (Multi-Channel) audio

English

ID: ELRA-S0228-93

ISLRN: 576-996-121-023-5

This corpus was recorded by 20 speakers (10 males and 10 females), over 7 channels (multi-channel in quiet office/home). Speech samples are stored as a sequence of 16-bit 16 kHz for a total of 10 hours of speech per channel.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5040.00 €	5040.00 €
Licence: Commercial Use - ELRA VAR	5040.00 €	5040.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5040.00 €	5040.00 €
Licence: Commercial Use - ELRA VAR	5040.00 €	5040.00 €

American English Speech Recognition Corpus (Desktop) audio

English

ID: ELRA-S0228-79

ISLRN: 254-019-000-249-3

This corpus comprises 49,990 entries uttered by 50 speakers (25 males and 25 females), recorded over 2 channels (desktop in quiet office). Speech samples are stored as a sequence of 16-bit 16kHz for a total of 24.9 hours of speech per channel.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5400.00 €	5400.00 €
Licence: Commercial Use - ELRA VAR	5400.00 €	5400.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5400.00 €	5400.00 €
Licence: Commercial Use - ELRA VAR	5400.00 €	5400.00 €

American English Speech Recognition Corpus (Desktop) audio

English

ID: ELRA-S0228-113

ISLRN: 703-568-790-770-5

This corpus was recorded in both quiet and noisy environments over 2 channels and collected from a total of 50 speakers, including 24 males and 26 females, all of whom have been carefully screened to ensure their standard and clear pronunciation. The audio scripts cover information such as text m...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5400.00 €	5400.00 €
Licence: Commercial Use - ELRA VAR	5400.00 €	5400.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5400.00 €	5400.00 €
Licence: Commercial Use - ELRA VAR	5400.00 €	5400.00 €

American English Speech Recognition Corpus (Mobile) - 14.67 hours audio

English

ID: ELRA-S0228-73

ISLRN: 817-988-141-738-4

This corpus comprises 14,988 entries uttered by 50 speakers (23 males and 27 females), recorded over the mobile telephone network. Speech samples are stored as a sequence of 16-bit 16 kHz for a total of 14.67 hours of speech.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5400.00 €	5400.00 €
Licence: Commercial Use - ELRA VAR	5400.00 €	5400.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5400.00 €	5400.00 €
Licence: Commercial Use - ELRA VAR	5400.00 €	5400.00 €

American English Wake-up Words Speech Recognition Corpus (Mobile) audio

English

ID: ELRA-S0228-58

ISLRN: 968-856-860-742-9

The corpus contains the recordings of 38,718 utterances of American English mobile Keywords speech data which were from 149 speakers(72 males and 77 females). Each speaker was designed to record 1 session, totally 260 utterances in quiet or noisy environments. The total pure recording time is abo...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2430.00 €	2430.00 €
Licence: Commercial Use - ELRA VAR	2430.00 €	2430.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2430.00 €	2430.00 €
Licence: Commercial Use - ELRA VAR	2430.00 €	2430.00 €

American Spanish Recognition Corpus (Desktop+Mobile) audio

English

ID: ELRA-S0228-68

ISLRN: 100-009-143-020-4

This corpus comprises 33,527 entries uttered by 40 speakers (21 males and 19 females), recorded over 2 channels (desktop in quiet office and mobile in noisy restaurant). Speech samples are stored as a sequence of 16-bit 16kHz for a total of 14.7 hours of speech per channel.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4320.00 €	4320.00 €
Licence: Commercial Use - ELRA VAR	4320.00 €	4320.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4320.00 €	4320.00 €
Licence: Commercial Use - ELRA VAR	4320.00 €	4320.00 €

ANITA (Audio eNhancement In Telecom Applications) audio

English
French
German
Spanish; Castilian

ID: ELRA-S0156

ISLRN: 537-894-870-719-4

ANITA (Audio eNhancement In secured Telecommunication Applications) is a European project launched on the initiative of EADS TELECOM with the objective of reducing audio acoustics noise in secured communications in adverse environments (sirens, alarms, engines, water pumps, stress situations, etc...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1000.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1500.00 €	2500.00 €
Licence: Commercial Use - ELRA VAR	2500.00 €	2500.00 €

APASCI

Italian

ID: ELRA-S0039

ISLRN: 501-292-014-931-9

APASCI is an Italian speech database recorded in an insulated room with a Sennheiser MKH 416 T microphone. It includes 5,290 phonetically rich sentences and 10,800 isolated digits, for a total of 58,924 word occurrences (2,191 different words) and 641 minutes of speech. The speech material was re...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	800.00 €	20000.00 €
Licence: Commercial Use - ELRA VAR	20000.00 €	20000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1600.00 €	25000.00 €
Licence: Commercial Use - ELRA VAR	25000.00 €	25000.00 €

Arabic dictionary of inflected words text

Arabic

ID: ELRA-L0098

ISLRN: 049-623-948-389-2

The Arabic dictionary of inflected words consists of a list of 6 million inflected forms, fully vowelized, generated in compliance with the grammatical rules of Arabic and tagged with grammatical information which includes POS and grammatical features, including number, gender, case, definiteness...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3000.00 €	10000.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4500.00 €	15000.00 €
Licence: Commercial Use - ELRA VAR	15000.00 €	15000.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

707 Language Resources (Page 1 of 36)