15 Language Resources

Order by:

 Collins Multilingual database (MLD) - PhraseBank    
  • Arabic
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Norwegian
  • Persian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Vietnamese

ID: ELRA-T0377

ISLRN: 452-383-219-228-0

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, distributed separately under reference ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank). The PhraseBank consists of 2,000 p...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1680.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2240.00 € submit
 Collins Multilingual database (MLD) – PhraseBank with audio files    
  • Arabic
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Norwegian
  • Persian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Vietnamese

ID: ELRA-S0383

ISLRN: 398-655-047-044-5

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the audio files corresponding t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3360.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4480.00 € submit
 Collins Multilingual database (MLD) - WordBank    
  • Arabic
  • Bengali
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Malayalam
  • Modern Greek (1453-)
  • Norwegian
  • Polish
  • Portuguese
  • Romanian; Moldavian; Moldovan
  • Russian
  • Spanish; Castilian
  • Swedish
  • Tamil
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese

ID: ELRA-T0376

ISLRN: 990-814-402-335-7

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank) and a multilingual set of sentences in 28 languages (the PhraseBank, distributed separately under reference ELRA-T0377). The WordBank contains 10,000 words...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2400.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3600.00 € submit
 GLOBAL Multilingual Lexical Data - Bilingual - Level 1    
  • Arabic
  • Chinese
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • French
  • German
  • Hebrew
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Latin
  • Modern Greek (1453-)
  • Norwegian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish

ID: ELRA-M0111-04

ISLRN: 255-971-767-096-3

The GLOBAL Multilingual Lexical Data (references ELRA-M0111-01 to ELRA-M0111-06 in the ELRA Catalogue) consists of a network of lexicographic cores for major world languages, comprising diverse monolingual, bilingual and multilingual combinations, in different sizes, originally built for language...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
6800.00 € submit
6800.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
7140.00 € submit
7140.00 € submit

Special offers are also available. Check here for details.

 GLOBAL Multilingual Lexical Data - Monolingual - Level 1    
  • Arabic
  • Chinese
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • French
  • German
  • Hebrew
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Latin
  • Modern Greek (1453-)
  • Norwegian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish

ID: ELRA-M0111-01

ISLRN: 604-974-454-390-3

The GLOBAL Multilingual Lexical Data (references ELRA-M0111-01 to ELRA-M0111-06 in the ELRA Catalogue) consists of a network of lexicographic cores for major world languages, comprising diverse monolingual, bilingual and multilingual combinations, in different sizes, originally built for language...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
4250.00 € submit
4250.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
4462.50 € submit
4462.50 € submit

Special offers are also available. Check here for details.

 Gram Vaani data set    
  • Hindi

ID: ELRA-S0405

ISLRN: 045-205-425-611-4

The Gram Vaani data set consists of 130 hours (21,000 different audio recordings) recorded by 4,000 unique Hindi speakers from the states of Bihar, Jharkhand, and Madhya Pradesh in India (20-25% female, 60% people under 30 years of age, mostly rural). The data set was collected via a voice-bas...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
50000.00 € submit
Licence: Commercial Use - ELRA VAR
50000.00 € submit
50000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
50000.00 € submit
Licence: Commercial Use - ELRA VAR
50000.00 € submit
50000.00 € submit
 Hindi Speech Data by Mobile Phone - 759 Hours    
  • Hindi

ID: ELRA-S0452

ISLRN: 942-490-066-841-8

The data is 759 hours long and was recorded by 1,425 Indian native speakers. The accent is authentic. The recording text is designed by language experts and covers general, interactive, car, home and other categories. The text is manually proofread, and the accuracy is high. Recording devices are...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
115368.00 € submit
115368.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
115368.00 € submit
115368.00 € submit

Special offers are also available. Check here for details.

 Hindi Speech Data by Mobile Phone_R - 240 Hours    
  • Hindi

ID: ELRA-S0463

ISLRN: 037-729-898-638-1

The data is 240 hours and is recorded by 401 Indian. It is recorded in both quiet and noisy environment, which is more suitable for the actual application scenario. The recording content is rich, covering economic, entertainment, news, spoken language, etc. All texts are manually transcrits, with...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
34200.00 € submit
34200.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
34200.00 € submit
34200.00 € submit

Special offers are also available. Check here for details.

 Hindi Speech Recognition Corpus (Desktop)    
  • Hindi

ID: ELRA-S0228-114

ISLRN: 198-341-627-529-5

This corpus was recorded in a quiet office environment over 4 channels and collected from a total of 196 speakers, including 95 males and 101 females, all of whom have been carefully screened to ensure their standard and clear pronunciation. The audio scripts cover information such as news and da...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
18000.00 € submit
18000.00 € submit
Licence: Commercial Use - ELRA VAR
18000.00 € submit
18000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
18000.00 € submit
18000.00 € submit
Licence: Commercial Use - ELRA VAR
18000.00 € submit
18000.00 € submit
 Hindi Speech Recognition Corpus (Mobile)    
  • Hindi

ID: ELRA-S0228-125

ISLRN: 078-014-181-343-9

This corpus was recorded in both quiet and noisy environments over 3 channels and collected from a total of 180 speakers, including 99 males and 81 females, all of whom have been carefully screened to ensure their standard and clear pronunciation. The audio scripts cover information such as news....

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
16200.00 € submit
16200.00 € submit
Licence: Commercial Use - ELRA VAR
16200.00 € submit
16200.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
16200.00 € submit
16200.00 € submit
Licence: Commercial Use - ELRA VAR
16200.00 € submit
16200.00 € submit
 MULTIGLOSS Multilingual Glossaries - L1-English pair    
  • Afrikaans
  • Arabic
  • Azerbaijani
  • Bulgarian
  • Catalan; Valencian
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Estonian
  • Finnish
  • French
  • German
  • Hebrew
  • Hindi
  • Hungarian
  • Icelandic
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latin
  • Latvian
  • Lithuanian
  • Malay (macrolanguage)
  • Modern Greek (1453-)
  • Norwegian
  • Persian
  • Polish
  • Portuguese
  • Romanian; Moldavian; Moldovan
  • Russian
  • Serbian
  • Slovak
  • Slovenian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Vietnamese
  • Western Frisian

ID: ELRA-M0112-01

ISLRN: 098-079-939-987-5

A series of innovative multilingual word-to-sense glossaries, based on a human-edited word-to-sense bilingual index of each language to English, which is linked automatically to the translation equivalents in 45 target languages. Each word and expression in every language is translated via its...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
2500.00 € submit
2500.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
2625.00 € submit
2625.00 € submit

Special offers are also available. Check here for details.

 MULTIGLOSS Multilingual Glossaries - L1-English pair + 1 language    
  • Afrikaans
  • Arabic
  • Azerbaijani
  • Bulgarian
  • Catalan; Valencian
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Estonian
  • Finnish
  • French
  • German
  • Hebrew
  • Hindi
  • Hungarian
  • Icelandic
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latin
  • Latvian
  • Lithuanian
  • Malay (macrolanguage)
  • Modern Greek (1453-)
  • Norwegian
  • Persian
  • Polish
  • Portuguese
  • Romanian; Moldavian; Moldovan
  • Russian
  • Serbian
  • Slovak
  • Slovenian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Vietnamese
  • Western Frisian

ID: ELRA-M0112-02

ISLRN: 610-290-284-705-6

A series of innovative multilingual word-to-sense glossaries, based on a human-edited word-to-sense bilingual index of each language to English, which is linked automatically to the translation equivalents in 45 target languages. Each word and expression in every language is translated via its...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
3750.00 € submit
3750.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
3937.50 € submit
3937.50 € submit

Special offers are also available. Check here for details.

 Parallel Corpora for 6 Indian Languages    
  • Bengali
  • English
  • Hindi
  • Malayalam
  • Tamil
  • Telugu
  • Urdu

ID: ELRA-W0320

ISLRN: 657-350-757-058-6

The Parallel Corpora for 6 Indian Languages contains data sets for Bengali (540,000 words – 20,000 parallel sentences), Hindi (1,200,000 words – 37 000 parallel sentences), Malayalam (660,000 words – 29,000 parallel sentences), Tamil (747,000 words – 35,000 parallel sentences), Telugu (951,000 wo...

MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
 The EMILLE/CIIL Corpus    
  • Assamese
  • Bengali
  • English
  • Gujarati
  • Hindi
  • Kannada
  • Kashmiri
  • Malayalam
  • Marathi
  • Oriya (macrolanguage)
  • Panjabi; Punjabi
  • Sinhala; Sinhalese
  • Tamil
  • Telugu
  • Urdu

ID: ELRA-W0037

ISLRN: 039-846-040-604-0

The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayala...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
 The EMILLE Lancaster Corpus    
  • Bengali
  • English
  • Gujarati
  • Hindi
  • Panjabi; Punjabi
  • Sinhala; Sinhalese
  • Tamil
  • Urdu

ID: ELRA-W0038

ISLRN: 438-045-014-925-0

The EMILLE Lancaster Corpus consists of three components: monolingual, parallel and annotated corpora. There are monolingual corpora for seven South Asian languages: Bengali, Gujarati, Hindi, Punjabi, Sinhala, Tamil, Urdu. The EMILLE monolingual corpora contain approximately 58,880,000 words (i...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
7500.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
12000.00 € submit