Text (1052)
Audio (679)
Video (23)
True (226)
TEI (10)
TMX (6)

Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

1681 Language Resources (Page 67 of 85)

« Previous | Next »Order by:

 TRAD Pashto Broadcast News Speech Corpus    
  • Pushto; Pashto

ID: ELRA-S0381

ISLRN: 918-508-885-913-7

This corpus contains transcribed broadcast news recordings in Pashto. Recordings are collected from 5 sources: Ashna TV, Azadi Radio, Deewa Radio, Mashaal Radio and Shamshad TV. The corpus contains 108 hours of recordings covering more than 1,000 speakers. Transcriptions are provided together ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2000.00 € submit
20000.00 € submit
Licence: Commercial Use - ELRA VAR
20000.00 € submit
20000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3500.00 € submit
28000.00 € submit
Licence: Commercial Use - ELRA VAR
28000.00 € submit
28000.00 € submit
 TRAD Pashto-English News Articles Parallel corpus    
  • English
  • Pushto; Pashto

ID: ELRA-W0097

ISLRN: 612-936-517-010-2

This is a parallel corpus, which contains 10,000 Pashto words translated into English by two different translators. The source texts have been collected from the following news websites: Azadiradio, Mashaal and Voice of America Pashto. The content has also been translated into French (see ELRA-W...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
350.00 € submit
1000.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
 TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data    
  • English
  • Pushto; Pashto

ID: ELRA-W0095

ISLRN: 006-102-605-738-4

This is a parallel corpus, which contains 10,000 Pashto words translated into English. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). These texts are VOA Ashna TV programs recorded on 15/01/2011, 18/01/2011 and 19/01/2011. ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
350.00 € submit
1000.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
 TRAD Pashto-French News Articles Parallel corpus    
  • French
  • Pushto; Pashto

ID: ELRA-W0096

ISLRN: 649-628-149-051-7

This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The source texts have been collected from the following news websites: Azadiradio, Mashaal and Voice of America Pashto. The content has also been translated into English (see ELRA-W...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
350.00 € submit
1000.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data    
  • French
  • Pushto; Pashto

ID: ELRA-W0094

ISLRN: 547-897-479-723-3

This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). These texts are VOA Ashna TV programs recorded on 15/01/2011,...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
350.00 € submit
1000.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data    
  • French
  • Pushto; Pashto

ID: ELRA-W0093

ISLRN: 802-643-297-429-4

The corpus consists of the transcription of 106 hours of recordings in Pashto translated into French. The transcriptions are extracted from the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). It contains about 832,000 source words and 747,000 target words. No audio file is provided. Pasht...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3000.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4000.00 € submit
18000.00 € submit
Licence: Commercial Use - ELRA VAR
18000.00 € submit
18000.00 € submit
 TRAD Pashto Monolingual text Corpus    
  • Pushto; Pashto

ID: ELRA-W0092

ISLRN: 394-903-293-388-0

This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different blogs and websites. Identified and negotiated or freely available sources have been crawled in 2012, cleaned and XML-formatted. Pashto is an indo-iranian language spoken by th...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1200.00 € submit
3500.00 € submit
Licence: Commercial Use - ELRA VAR
3500.00 € submit
3500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2000.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 Training and test data for Arabizi detection and transliteration    
  • Arabic
  • English

ID: ELRA-W0126

ISLRN: 986-364-744-303-9

The dataset is composed of two distinct resources: 1) A collection of mixed English and Arabizi text intended to train and test a system for the automatic detection of code-switching in mixed English and Arabizi texts. The training part of the corpus contains: 522 tweets composed of 5,207 token...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
500.00 € submit
Licence: Commercial Use - ELRA VAR
500.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
650.00 € submit
Licence: Commercial Use - ELRA VAR
650.00 € submit
650.00 € submit
 Translanguage English Database (TED) Transcripts database    
  • English

ID: ELRA-S0120

ISLRN: 502-719-830-448-5

LDC reference: https://catalog.ldc.upenn.edu/LDC2002T03 The Translanguage English Database (TED) Transcripts corpus contains transcriptions of thirty-nine of the 188 speeches of the TED Corpus made at Eurospeech'93 in Berlin. The thirty-nine transcripts in this publication are in Universal Tra...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
 Translation memories from The Ministry of Foreign Affairs of Norway (Processed)    
  • English
  • Norwegian

ID: ELRA-W0156

ISLRN: 909-695-133-060-3

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Translation memories containing translations of EU legis...

MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
 Translation memory from Swedish National Audit Office (NAO) - Riksrevisionen (Processed)    
  • English
  • Swedish

ID: ELRA-W0236

ISLRN: 709-518-556-855-4

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Translation memory from Swedish National Audit Office

MEMBERacademiccommercial
Licence: Other - Public Domain
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Other - Public Domain
0.00 € submit
0.00 € submit
 Translations of Lithuanian legislation from Seimas of the Republic of Lithuania (Processed)    
  • English
  • Lithuanian

ID: ELRA-W0165

ISLRN: 691-158-541-313-8

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Translation Memories of Lithuanian legislation from Seim...

MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
 Trilingual Documents related to International Judicial Cooperation in Civil Matters (Greek-English-French) (Processed)    
  • English
  • French
  • Modern Greek (1453-)

ID: ELRA-W0307

ISLRN: 954-287-236-137-4

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Trilingual (Greek-English-French) documents - standard f...

MEMBERacademiccommercial
Licence: Other - Public Domain
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Other - Public Domain
0.00 € submit
0.00 € submit
 TSNLP (Test Suites for NLP Testing)    
  • English
  • French
  • German

ID: ELRA-W0013

ISLRN: 717-350-913-018-8

The TSNLP project (LRE 62-089) has produced a database of test suites for English, French and German containing over 4,000 test items (sentences or fragment of sentences) per language which have been constructed for evaluating natural language processing systems, but which may also be useful for ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit
 TUNA Corpus    
  • English

ID: ELRA-W0048

ISLRN: 799-660-957-954-5

TUNA (Towards a UNified Algorithm for the generation of referring expressions) is a research project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC). The TUNA Corpus of Referring Expressions is built with the contributions from 50 native or fluent speakers of Engl...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
45.00 € submit
Licence: Commercial Use - ELRA VAR
45.00 € submit
45.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
45.00 € submit
Licence: Commercial Use - ELRA VAR
45.00 € submit
45.00 € submit
 Turkish Continuous and Isolated Word Speech Database    
  • Turkish

ID: ELRA-S0121

ISLRN: 192-049-804-522-1

This Turkish speech database was produced by the department of Théorie des Circuits et Traitement de Signal at the Faculté Polytechnique de Mons. The corpus was designed to provide read speech data for speech recognition purposes. The database contains 14 hours of speech (1618 words) from 43 Turk...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
6000.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
 Turkish Speecon database    
  • Turkish

ID: ELRA-S0178

ISLRN: 539-782-381-710-6

The Turkish Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 550 adult Turkish speakers (280 males, 270 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place). 2) The second set comprises the reco...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50000.00 € submit
67000.00 € submit
Licence: Commercial Use - ELRA VAR
67000.00 € submit
67000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
60000.00 € submit
75000.00 € submit
Licence: Commercial Use - ELRA VAR
75000.00 € submit
75000.00 € submit
 Twin database - TWINDB1    
  • French

ID: ELRA-S0088

ISLRN: 167-303-567-257-8

The Twin database named TWINDB1 includes recordings of 45 French speakers, consisting of 9 pairs of identical twins (8 males and 10 females) with similar voices, and 27 other speakers (13 males and 14 females) including 4 none-twin siblings. Each twin or sibling spoke for a total of 24 to 30 minu...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
200.00 € submit
400.00 € submit
Licence: Commercial Use - ELRA VAR
400.00 € submit
400.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
800.00 € submit
Licence: Commercial Use - ELRA VAR
800.00 € submit
800.00 € submit
 UAE Arabic Speech Recognition Corpus (Mobile)    
  • Arabic

ID: ELRA-S0228-130

ISLRN: 737-957-734-087-9

This corpus was recorded in a quiet office/home environment over 2 channels and collected from a total of 168 speakers, including 94 males and 74 females, all of whom have been carefully screened to ensure their standard and clear pronunciation.The audio scripts cover information such as news and...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
36000.00 € submit
36000.00 € submit
Licence: Commercial Use - ELRA VAR
36000.00 € submit
36000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
36000.00 € submit
36000.00 € submit
Licence: Commercial Use - ELRA VAR
36000.00 € submit
36000.00 € submit
 UK English Speecon database    
  • English

ID: ELRA-S0215

ISLRN: 773-101-261-598-6

The UK English Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 606 adult UK English speakers (325 males, 281 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place), and consisting of about 195 ho...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50000.00 € submit
67000.00 € submit
Licence: Commercial Use - ELRA VAR
67000.00 € submit
67000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
60000.00 € submit
75000.00 € submit
Licence: Commercial Use - ELRA VAR
75000.00 € submit
75000.00 € submit

« Previous | Next »