Search and Browse – ELRA Catalogue

Portuguese-English bilingual corpus from the Portuguese Constitution (Processed) text

English
Portuguese

ID: ELRA-W0246

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Complete text of the Portuguese Constitution in Portugue...

MEMBER	academic	commercial
Licence: Other - Open Under-PSI	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Other - Open Under-PSI	0.00 €	0.00 €

Portuguese Speaking English Speech Data by Mobile Phone - 209 Hours audio

English

ID: ELRA-S0428

ISLRN: 982-703-021-937-6

532 Portuguese recorded in a relatively quiet environment in authentic English. The recorded script is designed by linguists and covers a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	49637.50 €	49637.50 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	49637.50 €	49637.50 €

Special offers are also available. Check here for details.

Portuguese SpeechDat(II) FDB-4000 audio

Portuguese

ID: ELRA-S0092

ISLRN: 886-605-380-771-9

The Portuguese SpeechDat(II) FDB-4000 comprises 4027 Portuguese speakers (1861 males, 2166 females) recorded over the Portuguese fixed telephone network. This database is partitioned into 11 CDs. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands, t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	28000.00 €	40000.00 €
Licence: Commercial Use - ELRA VAR	40000.00 €	40000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	48000.00 €	56000.00 €
Licence: Commercial Use - ELRA VAR	56000.00 €	56000.00 €

Portuguese SpeechDat(M) database audio

Portuguese

ID: ELRA-S0068

ISLRN: 181-020-544-041-9

The Portuguese SpeechDat(M) database contains the recordings of 1,001 speakers (453 males, 548 females). This speech database was collected by Portugal Telecom within the European SpeechDat project. Speech signals are stored as sequences of 8 kHz, 8-bit A-law. Files are stored according to the f...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	11000.00 €	14000.00 €
Licence: Commercial Use - ELRA VAR	14000.00 €	14000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	14000.00 €	20000.00 €
Licence: Commercial Use - ELRA VAR	20000.00 €	20000.00 €

Portuguese Speech Recognition Corpus (Desktop) audio

Portuguese

ID: ELRA-S0228-83

ISLRN: 044-289-806-584-3

This corpus comprises 49,988 entries uttered by 50 speakers (26 males and 24 females), recorded over 2 channels (desktop in quiet office). Speech samples are stored as a sequence of 16-bit 48kHz for a total of 26.41 hours of speech per channel.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5400.00 €	5400.00 €
Licence: Commercial Use - ELRA VAR	5400.00 €	5400.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5400.00 €	5400.00 €
Licence: Commercial Use - ELRA VAR	5400.00 €	5400.00 €

Portuguese Speech Recognition Corpus (Desktop+Mobile) audio

Portuguese

ID: ELRA-S0228-122

ISLRN: 733-763-220-983-6

This corpus was recorded in a quiet office environment over 2 channels and collected from a total of 200 speakers, including 102 males and 98 females, all of whom have been carefully screened to ensure their standard and clear pronunciation. The audio scripts cover information such as keywords. S...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	21600.00 €	21600.00 €
Licence: Commercial Use - ELRA VAR	21600.00 €	21600.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	21600.00 €	21600.00 €
Licence: Commercial Use - ELRA VAR	21600.00 €	21600.00 €

Portuguese Speecon database audio

Portuguese

ID: ELRA-S0180

ISLRN: 824-839-200-501-4

The Portuguese Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 553 adult Portuguese speakers (266 males, 287 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place). 2) The second set comprises th...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50000.00 €	67000.00 €
Licence: Commercial Use - ELRA VAR	67000.00 €	67000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	60000.00 €	75000.00 €
Licence: Commercial Use - ELRA VAR	75000.00 €	75000.00 €

PRESS 65

Swedish

ID: ELRA-W0010

ISLRN: 860-303-374-818-4

Språkdata has made available the first of its many Swedish corpora, PRESS 65. It consists of one million running words taken from Swedish newspapers from the year 1965. It has been categorised according to text type and is annotated down to the sentence level.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	12000.00 €	12000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	20000.00 €	20000.00 €

Pronunciation lexicon of British place names, surnames and first names text

English

ID: ELRA-S0091

ISLRN: 095-481-429-979-3

The Pronunciation lexicon of British place names, surnames and first names was produced by the University of Poitiers (France) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). This lexicon is an SGML-enc...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5000.00 €	25000.00 €
Licence: Commercial Use - ELRA VAR	25000.00 €	25000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	15000.00 €	40000.00 €
Licence: Commercial Use - ELRA VAR	40000.00 €	40000.00 €

PTPARL Corpus text

Portuguese

ID: ELRA-W0060

ISLRN: 294-303-577-819-2

The PTPARL Corpus contains 1,076 texts consisting of adapted transcriptions of the Portuguese Parliament sessions. The corpus contains 1,000,441 tokens. The corpus is delivered in one file, in two different formats. The txt version has one sentence per line, an identification number for each ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

Public Procurement Dataset 1 (Processed) text

English
Polish

ID: ELRA-W0187

ISLRN: 141-723-057-887-8

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A collection of parallel Polish-English texts published ...

MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

Public Procurement Dataset 2 (Processed) text

English
Polish

ID: ELRA-W0185

ISLRN: 865-835-648-658-1

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A collection of parallel Polish-English texts published ...

MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

Quaero Broadcast News Extended Named Entity corpus audio

French

ID: ELRA-S0349

ISLRN: 074-668-446-920-0

The Quaero Broadcast News Extended Named Entity corpus consists of the manual annotation of (i) the ESTER 2 corpus (see ELRA-S0338) and (ii) the Quaero Speech Recognition Evaluation corpus (manual and automatic transcriptions coming from 3 different ASR systems). The first part is the training co...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

Quaero Old Press Extended Named Entity corpus text

French

ID: ELRA-W0073

ISLRN: 864-217-681-552-4

The Quaero Old Press Extended Named Entity corpus consists of the manual annotation of 76 newspaper issues published in 1890-1891 and provided by the French National Library (Bibliothèque Nationale de France). Three different titles are used (Le Temps, La Croix and Le Figaro) for a total of 295 p...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

Qualified POS Tagged Corpus text

Korean

ID: ELRA-W0034

ISLRN: 079-092-657-220-3

Monolingual corpus in a .txt format, produced by KAIST KORTERM, containing 1020000 eojeols (Korean terms) in Korean. This corpus is morphologically analyzed, POS tagged, and rectified 3 times by specialists.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	667.00 €	4000.00 €
Licence: Commercial Use - ELRA VAR	4000.00 €	4000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1333.00 €	8000.00 €
Licence: Commercial Use - ELRA VAR	8000.00 €	8000.00 €

Quarterly Reports of the Parliamentary Budget Office (Hellenic Parliament) (Processed) text

English
Modern Greek (1453-)

ID: ELRA-W0243

ISLRN: 497-530-909-088-2

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A collection of 32 reports (16 in EL and 16 In EL) of th...

MEMBER	academic	commercial
Licence: Other - Open Under-PSI	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Other - Open Under-PSI	0.00 €	0.00 €

REPERE Evaluation Package audio

French

ID: ELRA-E0044

ISLRN: 360-758-359-485-0

The REPERE project (REconnaissance de PERsonnes dans des Emissions audiovisuelles) consists in a series of 3 evaluation campaigns for multimedia information processing systems. The project was funded by the DGA (Délégation Générale de l’Armement, France). The REPERE Evaluation Package contains t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	5000.00 €
Licence: Evaluation Use - ELRA EVALUATION		1000.00 €
Licence: Commercial Use - ELRA VAR	20000.00 €	20000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2000.00 €	7500.00 €
Licence: Evaluation Use - ELRA EVALUATION		6500.00 €
Licence: Commercial Use - ELRA VAR	25000.00 €	25000.00 €

ReSSInt-EMG (Spanish EMG and Speech Database) audio

Spanish; Castilian

ID: ELRA-S0498

ISLRN: 057-914-072-202-4

ReSSInt-EMG (Spanish EMG and Speech Database) has been generated in the framework of the ReSSInt project (Voice restoration with silent EMG speech interfaces) and its continuation project DeepRestore (Deep learning approaches for speech restoration from face movement biosignals), coordinated rese...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

ROCO Romanian journalistic corpus text

Romanian; Moldavian; Moldovan

ID: ELRA-W0085

ISLRN: 312-617-089-348-7

ROCO is a Romanian journalistic corpus containing approximately 7.1 million tokens, the number of types being 231,626. It is rich in proper names, numerals and named entities. The corpus contains morphosyntactic information (MSD annotations) which has been assigned automatically with the high...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	3000.00 €	3000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

1685 Language Resources (Page 55 of 85)