Text (1052)
Audio (679)
Video (23)
True (226)
TEI (10)
TMX (6)

Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

1680 Language Resources (Page 55 of 84)

« Previous | Next »Order by:

 Portuguese Speech Recognition Corpus (Desktop)    
  • Portuguese

ID: ELRA-S0228-83

ISLRN: 044-289-806-584-3

This corpus comprises 49,988 entries uttered by 50 speakers (26 males and 24 females), recorded over 2 channels (desktop in quiet office). Speech samples are stored as a sequence of 16-bit 48kHz for a total of 26.41 hours of speech per channel.

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5400.00 € submit
5400.00 € submit
Licence: Commercial Use - ELRA VAR
5400.00 € submit
5400.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5400.00 € submit
5400.00 € submit
Licence: Commercial Use - ELRA VAR
5400.00 € submit
5400.00 € submit
 Portuguese Speech Recognition Corpus (Desktop+Mobile)    
  • Portuguese

ID: ELRA-S0228-122

ISLRN: 733-763-220-983-6

This corpus was recorded in a quiet office environment over 2 channels and collected from a total of 200 speakers, including 102 males and 98 females, all of whom have been carefully screened to ensure their standard and clear pronunciation. The audio scripts cover information such as keywords. S...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
21600.00 € submit
21600.00 € submit
Licence: Commercial Use - ELRA VAR
21600.00 € submit
21600.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
21600.00 € submit
21600.00 € submit
Licence: Commercial Use - ELRA VAR
21600.00 € submit
21600.00 € submit
 Portuguese Speecon database    
  • Portuguese

ID: ELRA-S0180

ISLRN: 824-839-200-501-4

The Portuguese Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 553 adult Portuguese speakers (266 males, 287 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place). 2) The second set comprises th...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50000.00 € submit
67000.00 € submit
Licence: Commercial Use - ELRA VAR
67000.00 € submit
67000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
60000.00 € submit
75000.00 € submit
Licence: Commercial Use - ELRA VAR
75000.00 € submit
75000.00 € submit
 PRESS 65    
  • Swedish

ID: ELRA-W0010

ISLRN: 860-303-374-818-4

Språkdata has made available the first of its many Swedish corpora, PRESS 65. It consists of one million running words taken from Swedish newspapers from the year 1965. It has been categorised according to text type and is annotated down to the sentence level.

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
12000.00 € submit
12000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
20000.00 € submit
20000.00 € submit
 Pronunciation lexicon of British place names, surnames and first names      
  • English

ID: ELRA-S0091

ISLRN: 095-481-429-979-3

The Pronunciation lexicon of British place names, surnames and first names was produced by the University of Poitiers (France) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). This lexicon is an SGML-enc...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
25000.00 € submit
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
15000.00 € submit
40000.00 € submit
Licence: Commercial Use - ELRA VAR
40000.00 € submit
40000.00 € submit
 PTPARL Corpus    
  • Portuguese

ID: ELRA-W0060

ISLRN: 294-303-577-819-2

The PTPARL Corpus contains 1,076 texts consisting of adapted transcriptions of the Portuguese Parliament sessions. The corpus contains 1,000,441 tokens. The corpus is delivered in one file, in two different formats. The txt version has one sentence per line, an identification number for each ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
 Public Procurement Dataset 1 (Processed)    
  • English
  • Polish

ID: ELRA-W0187

ISLRN: 141-723-057-887-8

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A collection of parallel Polish-English texts published ...

MEMBERacademiccommercial
Licence: Other - Public Domain
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Other - Public Domain
0.00 € submit
0.00 € submit
 Public Procurement Dataset 2 (Processed)    
  • English
  • Polish

ID: ELRA-W0185

ISLRN: 865-835-648-658-1

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A collection of parallel Polish-English texts published ...

MEMBERacademiccommercial
Licence: Other - Public Domain
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Other - Public Domain
0.00 € submit
0.00 € submit
 Quaero Broadcast News Extended Named Entity corpus    
  • French

ID: ELRA-S0349

ISLRN: 074-668-446-920-0

The Quaero Broadcast News Extended Named Entity corpus consists of the manual annotation of (i) the ESTER 2 corpus (see ELRA-S0338) and (ii) the Quaero Speech Recognition Evaluation corpus (manual and automatic transcriptions coming from 3 different ASR systems). The first part is the training co...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 Quaero Old Press Extended Named Entity corpus    
  • French

ID: ELRA-W0073

ISLRN: 864-217-681-552-4

The Quaero Old Press Extended Named Entity corpus consists of the manual annotation of 76 newspaper issues published in 1890-1891 and provided by the French National Library (Bibliothèque Nationale de France). Three different titles are used (Le Temps, La Croix and Le Figaro) for a total of 295 p...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 Qualified POS Tagged Corpus    
  • Korean

ID: ELRA-W0034

ISLRN: 079-092-657-220-3

Monolingual corpus in a .txt format, produced by KAIST KORTERM, containing 1020000 eojeols (Korean terms) in Korean. This corpus is morphologically analyzed, POS tagged, and rectified 3 times by specialists.

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
667.00 € submit
4000.00 € submit
Licence: Commercial Use - ELRA VAR
4000.00 € submit
4000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1333.00 € submit
8000.00 € submit
Licence: Commercial Use - ELRA VAR
8000.00 € submit
8000.00 € submit
 Quarterly Reports of the Parliamentary Budget Office (Hellenic Parliament) (Processed)    
  • English
  • Modern Greek (1453-)

ID: ELRA-W0243

ISLRN: 497-530-909-088-2

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A collection of 32 reports (16 in EL and 16 In EL) of th...

MEMBERacademiccommercial
Licence: Other - Open Under-PSI
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Other - Open Under-PSI
0.00 € submit
0.00 € submit
 REPERE Evaluation Package    
  • French

ID: ELRA-E0044

ISLRN: 360-758-359-485-0

The REPERE project (REconnaissance de PERsonnes dans des Emissions audiovisuelles) consists in a series of 3 evaluation campaigns for multimedia information processing systems. The project was funded by the DGA (Délégation Générale de l’Armement, France). The REPERE Evaluation Package contains t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
5000.00 € submit
Licence: Evaluation Use - ELRA EVALUATION
1000.00 € submit
Licence: Commercial Use - ELRA VAR
20000.00 € submit
20000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2000.00 € submit
7500.00 € submit
Licence: Evaluation Use - ELRA EVALUATION
6500.00 € submit
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
 ROCO Romanian journalistic corpus    
  • Romanian; Moldavian; Moldovan

ID: ELRA-W0085

ISLRN: 312-617-089-348-7

ROCO is a Romanian journalistic corpus containing approximately 7.1 million tokens, the number of types being 231,626. It is rich in proper names, numerals and named entities. The corpus contains morphosyntactic information (MSD annotations) which has been assigned automatically with the high...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 Romanian-English corpus with studies, reports and statistical data in the field of culture from the National Institute for Cultural Research and Training website (Processed)    
  • English
  • Romanian; Moldavian; Moldovan

ID: ELRA-W0270

ISLRN: 131-157-185-289-5

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Romanian-English corpus with studies, reports and statis...

MEMBERacademiccommercial
Licence: Other - Open Under-PSI
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Other - Open Under-PSI
0.00 € submit
0.00 € submit
 Romanian - English literature corpus (Processed)    
  • English
  • Romanian; Moldavian; Moldovan

ID: ELRA-W0192

ISLRN: 050-476-818-226-7

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Romanian – English literature corpus built fro...

MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
 Romanian – English New Criminal Procedure Code (Processed)    
  • English
  • Romanian; Moldavian; Moldovan

ID: ELRA-W0170

ISLRN: 085-350-774-090-4

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. The New Civil Procedure Code in Romanian and English (bi...

MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
 Romanian - English news corpus (Processed)    
  • English
  • Romanian; Moldavian; Moldovan

ID: ELRA-W0194

ISLRN: 100-905-126-706-7

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Romanian – English news corpus built from Sout...

MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
 Romanian Ombudsman archive (Processed)    
  • English
  • Romanian; Moldavian; Moldovan

ID: ELRA-W0206

ISLRN: 422-693-047-625-3

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel aligned corpus in tmx format built from the Rom...

MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution - CC-BY-4.0
0.00 € submit
0.00 € submit

« Previous | Next »