4 Language Resources

Order by:

 CLE Pakistan Urdu Speech Corpus    
  • Urdu

ID: ELRA-S0403

ISLRN: 572-070-066-634-8

The CLE Pakistan Urdu Speech Corpus consists of phonetically rich Urdu sentences extracted from CLE Urdu Digest Corpus and additional sentences covering telephone numbers, addresses and personal names. This speech corpus is recorded with a variety of microphone types (built in laptop, hands free,...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
12000.00 € submit
18000.00 € submit
Licence: Commercial Use - ELRA VAR
18000.00 € submit
18000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
15600.00 € submit
23400.00 € submit
Licence: Commercial Use - ELRA VAR
23400.00 € submit
23400.00 € submit
 Parallel Corpora for 6 Indian Languages    
  • Bengali
  • English
  • Hindi
  • Malayalam
  • Tamil
  • Telugu
  • Urdu

ID: ELRA-W0320

ISLRN: 657-350-757-058-6

The Parallel Corpora for 6 Indian Languages contains data sets for Bengali (540,000 words – 20,000 parallel sentences), Hindi (1,200,000 words – 37 000 parallel sentences), Malayalam (660,000 words – 29,000 parallel sentences), Tamil (747,000 words – 35,000 parallel sentences), Telugu (951,000 wo...

MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0
0.00 € submit
0.00 € submit
 The EMILLE/CIIL Corpus    
  • Assamese
  • Bengali
  • English
  • Gujarati
  • Hindi
  • Kannada
  • Kashmiri
  • Malayalam
  • Marathi
  • Oriya (macrolanguage)
  • Panjabi; Punjabi
  • Sinhala; Sinhalese
  • Tamil
  • Telugu
  • Urdu

ID: ELRA-W0037

ISLRN: 039-846-040-604-0

The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayala...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
 The EMILLE Lancaster Corpus    
  • Bengali
  • English
  • Gujarati
  • Hindi
  • Panjabi; Punjabi
  • Sinhala; Sinhalese
  • Tamil
  • Urdu

ID: ELRA-W0038

ISLRN: 438-045-014-925-0

The EMILLE Lancaster Corpus consists of three components: monolingual, parallel and annotated corpora. There are monolingual corpora for seven South Asian languages: Bengali, Gujarati, Hindi, Punjabi, Sinhala, Tamil, Urdu. The EMILLE monolingual corpora contain approximately 58,880,000 words (i...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
7500.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
12000.00 € submit