Search and Browse – ELRA Catalogue

Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

3 Language Resources

Order by:

Collins Multilingual database (MLD) – PhraseBank with audio files audio

Arabic
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Finnish
French
German
Hindi
Italian
Japanese
Korean
Modern Greek (1453-)
Norwegian
Persian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Thai
Turkish
Vietnamese

ID: ELRA-S0383

ISLRN: 398-655-047-044-5

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the audio files corresponding t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3360.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4480.00 €

Gram Vaani data set audio

Hindi

ID: ELRA-S0405

ISLRN: 045-205-425-611-4

The Gram Vaani data set consists of 130 hours (21,000 different audio recordings) recorded by 4,000 unique Hindi speakers from the states of Bihar, Jharkhand, and Madhya Pradesh in India (20-25% female, 60% people under 30 years of age, mostly rural). The data set was collected via a voice-bas...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	50000.00 €
Licence: Commercial Use - ELRA VAR	50000.00 €	50000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	50000.00 €
Licence: Commercial Use - ELRA VAR	50000.00 €	50000.00 €

The EMILLE/CIIL Corpus text

Assamese
Bengali
English
Gujarati
Hindi
Kannada
Kashmiri
Malayalam
Marathi
Oriya (macrolanguage)
Panjabi; Punjabi
Sinhala; Sinhalese
Tamil
Telugu
Urdu

ID: ELRA-W0037

ISLRN: 039-846-040-604-0

The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayala...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €