Text (559)
Audio (438)
Video (19)
Available (952)
True (34)
TEI (4)
Tourism (3)
Science (1)

Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

952 Language Resources (Page 1 of 48)

« Previous | Next »Order by:

 Accented English GlobalPhone    
  • English

ID: ELRA-S0389

ISLRN: 574-579-221-841-3

The Accented English part of the GlobalPhone resources contains 63 recording sessions of Bulgarian, Chinese, German, and Indian native speakers reading 37 English sentences each, produced in GlobalPhone-style, i.e. 16kHz PCM encoded audio recordings of utterance-segmented read speech from the new...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
600.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
700.00 € submit
3600.00 € submit
Licence: Commercial Use - ELRA VAR
3600.00 € submit
3600.00 € submit
 ACCOR - English    
  • English

ID: ELRA-S0001

ISLRN: 936-783-643-804-4

ACCOR is a unique acoustic and articulatory database recorded as part of the ESPRIT- ACCOR project investigating cross-language acoustic-articulatory correlations in coarticulatory processes. The European Languages covered are: Catalan, English, French, German, Irish Gaelic, Italian and Swedish. ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
25.00 € submit
Licence: Commercial Use - ELRA VAR
25.00 € submit
25.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
75.00 € submit
Licence: Commercial Use - ELRA VAR
75.00 € submit
75.00 € submit
 ACL RD-TEC: A Reference Dataset for Terminology Extraction and Classification Research in Computational Linguistics    
  • English

ID: ELRA-T0375

ISLRN: 699-305-362-089-6

Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in wh...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
1000.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 Acoustic database for Polish concatenative speech synthesis    
  • Polish

ID: ELRA-S0342

ISLRN: 305-222-372-690-4

This database consists of 1443 nonsense words including all the diphones for the Polish language. The diphone is always placed at an unstressed syllable. The neighbourhood doesn’t influence the co-articulation of the diphone. The database includes information such as: the name of the diphone, co...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
35.00 € submit
35.00 € submit
Licence: Commercial Use - ELRA VAR
35.00 € submit
35.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
100.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit
 Acoustic database for Polish unit selection speech synthesis    
  • Polish

ID: ELRA-S0339

ISLRN: 981-910-282-065-4

This database contains parliamentary statements and newspaper reviews read by a semi-professional male speaker. It consists of a selection of 2150 sentences annotated and manually verified, including 100 rare phonemes in words. Prompts vary in length from 2.3 to 13.4 seconds, with an average leng...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
250.00 € submit
1000.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
 aGender    
  • German

ID: ELRA-S0365

ISLRN: 038-476-412-610-4

aGender contains speech sample recordings over public telephone lines with read and (semi-)spontaneous speech. Native German speakers called a voice portal from their private phone, and read text + answered some open questions. The purpose of the corpus is the automatic detection of gender and/or...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
327.00 € submit
8127.00 € submit
Licence: Commercial Use - ELRA VAR
8127.00 € submit
8127.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
455.00 € submit
8255.00 € submit
Licence: Commercial Use - ELRA VAR
8255.00 € submit
8255.00 € submit
 Albayzin corpus    
  • Spanish; Castilian

ID: ELRA-S0089

ISLRN: 443-392-902-600-9

This corpus consists of 3 sub-corpora of 16 kHz 16 bits signals, recorded by 304 Castillian speakers. The 3 sub-corpora are: - Phonetic corpus: 6,800 utterances of phonetically balanced sentences, including 1000 with phonetic segmentation. - Geographic corpus: 6,800 utterances of sentences ext...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1000.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2000.00 € submit
12000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit

Special offers are also available. Check here for details.

 Alcohol Language Corpus (BAS ALC)    
  • German

ID: ELRA-S0299

ISLRN: 780-368-852-139-3

ALC contains recordings of German speakers that are either intoxicated or sober. The type of speech ranges from read single digits to full conversation style. Recordings were done during drinking test where speakers drank beer or wine to reach a self-chosen level of alcoholic intoxication. The ac...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
510.00 € submit
510.00 € submit
Licence: Commercial Use - ELRA VAR
510.00 € submit
510.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1020.00 € submit
1020.00 € submit
Licence: Commercial Use - ELRA VAR
1020.00 € submit
1020.00 € submit
 Al-Hayat Arabic Corpus    
  • Arabic

ID: ELRA-W0030

ISLRN: 365-777-769-398-7

The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open University. The corpus contains Al-Hayat newspaper articles with value added for Language Engineering and Information Retrieval applications development purposes. The data have ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
480.00 € submit
960.00 € submit
Licence: Commercial Use - ELRA VAR
960.00 € submit
960.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
720.00 € submit
1440.00 € submit
Licence: Commercial Use - ELRA VAR
1440.00 € submit
1440.00 € submit
 Amharic-English bilingual corpus    
  • Amharic
  • English

ID: ELRA-W0074

ISLRN: 590-255-335-719-0

The Amharic-English bilingual corpus contains parallel text from legal and news domains in Amharic script, in transliterated form and in English. The size of the corpus is of 232,653 words in Amharic and 291,701 in English. This parallel corpus contains documents from two domains, namely legal a...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
4000.00 € submit
Licence: Commercial Use - ELRA VAR
4000.00 € submit
4000.00 € submit
 ANITA (Audio eNhancement In Telecom Applications)    
  • English
  • French
  • German
  • Spanish; Castilian

ID: ELRA-S0156

ISLRN: 537-894-870-719-4

ANITA (Audio eNhancement In secured Telecommunication Applications) is a European project launched on the initiative of EADS TELECOM with the objective of reducing audio acoustics noise in secured communications in adverse environments (sirens, alarms, engines, water pumps, stress situations, etc...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1000.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1500.00 € submit
2500.00 € submit
Licence: Commercial Use - ELRA VAR
2500.00 € submit
2500.00 € submit
 APASCI    
  • Italian

ID: ELRA-S0039

ISLRN: 501-292-014-931-9

APASCI is an Italian speech database recorded in an insulated room with a Sennheiser MKH 416 T microphone. It includes 5,290 phonetically rich sentences and 10,800 isolated digits, for a total of 58,924 word occurrences (2,191 different words) and 641 minutes of speech. The speech material was re...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
20000.00 € submit
Licence: Commercial Use - ELRA VAR
20000.00 € submit
20000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1600.00 € submit
25000.00 € submit
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
 Arabic dictionary of inflected words    
  • Arabic

ID: ELRA-L0098

ISLRN: 049-623-948-389-2

The Arabic dictionary of inflected words consists of a list of 6 million inflected forms, fully vowelized, generated in compliance with the grammatical rules of Arabic and tagged with grammatical information which includes POS and grammatical features, including number, gender, case, definiteness...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3000.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4500.00 € submit
15000.00 € submit
Licence: Commercial Use - ELRA VAR
15000.00 € submit
15000.00 € submit
 Arabic dictionary of inflected words with recognition of agglutinated clitics and inflection system    
  • Arabic

ID: ELRA-L0099

ISLRN: 963-860-792-289-9

This dictionary consists of 6 million inflected forms, fully vowelized, generated in compliance with the grammatical rules of Arabic and tagged with grammatical information which includes POS and grammatical features, including number, gender, case, definiteness, tense, mood and compatibility wit...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
37000.00 € submit
37000.00 € submit
 Arabic Morphological Dictionary    
  • Arabic

ID: ELRA-L0088

ISLRN: 472-591-121-577-5

The Arabic Morphological Dictionary contains 4,912,749 entries, including: - 3,374,852 nouns, - 1,537,699 verbs, - 198 grammatical words. The dictionary is stored on 1 CD. All files are provided as plain text in UTF8 character encoding, which represents about 154 Mb of data. The dictionary form...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
250.00 € submit
6000.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
450.00 € submit
12000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit
 Arabic Speech Corpus    
  • Arabic

ID: ELRA-S0384

ISLRN: 866-568-447-697-8

This speech corpus has been developed as part of a PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded through a Neumann TLM 103 Studio Microphone by one male speaker in South Levantine Arabic (Damascian accent) in a professional studio. The transcript w...

MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
9000.00 € submit
9000.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
11200.00 € submit
11200.00 € submit
 Arboretum treebank    
  • Danish

ID: ELRA-W0084

ISLRN: 025-729-182-451-2

The Arboretum treebank is a morphologically and syntactically annotated repository of Danish sentences, taken from Korpus 90 and Korpus 2000, both compiled by the Society for Danish Language and Literature (http://ordnet.dk/korpusdk/fakta), and containing samples of written Danish from the 90'ies...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1500.00 € submit
7000.00 € submit
Licence: Commercial Use - ELRA VAR
7000.00 € submit
7000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2200.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit
 ARCADE/ROMANSEVAL corpus    
  • English
  • French
  • Italian

ID: ELRA-W0018

ISLRN: 681-769-134-114-2

The ARCADE/ROMANSEVAL corpus was used as a reference corpus in two international competitions: · ARCADE, an exercise on multilingual text alignment financed by AUPELF-UREF · ROMANSEVAL, part of the SENSEVAL exercise sponsored by ACL-SIGLEX and EURALEX, on word sense disambiguation. The corpus con...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 A "scientific" corpus of modern French ("La Recherche" magazine) - Complete version    
  • French

ID: ELRA-W0025-02

ISLRN: 798-363-116-656-4

This "scientific" corpus of modern French was produced by the University of Nantes (France) within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche magazine in 1998, including issues 30...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 A "scientific" corpus of modern French ("La Recherche" magazine) - Raw data    
  • French

ID: ELRA-W0025-01

ISLRN: 508-941-013-339-7

This "scientific" corpus of modern French was produced by the University of Nantes (France) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche mag...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
240.00 € submit
1200.00 € submit
Licence: Commercial Use - ELRA VAR
1200.00 € submit
1200.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
310.00 € submit
1500.00 € submit
Licence: Commercial Use - ELRA VAR
1500.00 € submit
1500.00 € submit

« Previous | Next »