Search and Browse – ELRA Catalogue

Natolin European Centre Dataset (Processed) text

English
Polish

ID: ELRA-W0176

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. The Polish-English parallel corpus is composed of three ...

MEMBER	academic	commercial
Licence: Attribution - CC-BY-4.0	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Attribution - CC-BY-4.0	0.00 €	0.00 €

Nautilus Speaker Characterization (NSC) Corpus audio

German

ID: ELRA-S0395

ISLRN: 157-037-166-491-1

The Nautilus Speaker Characterization (NSC) Corpus comprises clean microphone recordings of conversational speech from 300 German speakers (126 males and 174 females) aged 18 to 35 years, with no marked dialect/accent. The recordings were performed in the acoustically-isolated room "Nautilus" (wh...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €

NE3L named entities Arabic corpus text

Arabic

ID: ELRA-W0078

ISLRN: 398-979-151-557-0

The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5000.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5000.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NE3L named entities Chinese corpus text

Chinese

ID: ELRA-W0079

ISLRN: 187-154-782-686-9

The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5000.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5000.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NE3L named entities Russian corpus text

Russian

ID: ELRA-W0080

ISLRN: 024-620-556-146-2

The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5000.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5000.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NEMLAR Broadcast News Speech Corpus audio

Arabic

ID: ELRA-S0219

ISLRN: 479-507-036-103-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Written Corpus (ELRA-W0042) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220). The Nemlar Broadcast News Speech Corpus consists of about...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	4000.00 €	4000.00 €

Special offers are also available. Check here for details.

NEMLAR Speech Synthesis Corpus audio

Arabic

ID: ELRA-S0220

ISLRN: 361-216-121-305-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Written Corpus (ELRA-W0042) and the NEMLAR Broadcast News Speech Corpus (ELRA-S0219). The NEMLAR Speech Synthesis Corpus contains the reco...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	1250.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1000.00 €	2500.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

Special offers are also available. Check here for details.

NEMLAR Written Corpus text

Arabic

ID: ELRA-W0042

ISLRN: 050-693-158-326-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Broadcast News Speech Corpus (ELRA-S0219) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220). The NEMLAR Written Corpus consists of about...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	250.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

Special offers are also available. Check here for details.

Nepali Monolingual written corpus text

Nepali (macrolanguage)

ID: ELRA-W0076

ISLRN: 325-796-965-405-9

The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali National Corpus was produced in 2006 in the framework of the project Bhasha Sanchar (“language communication”), also known as Nelralec, for Nepali Language Resources and Localiza...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

Nepali Spoken Corpus audio

Nepali (macrolanguage)

ID: ELRA-S0368

ISLRN: 688-800-566-571-0

The Nepali Spoken Corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali National Corpus was produced in 2006 in the framework of the project Bhasha Sanchar (“language communication”), also known as Nelralec, for Nepali Language Resources and Localization for Educ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NetDC Arabic BNSC (Broadcast News Speech Corpus) audio

Arabic

ID: ELRA-S0157

ISLRN: 663-177-513-755-1

The NetDC Arabic BNSC (Broadcast News Speech Corpus) is a corpus developed by ELDA in the framework of the European-funded project Network of Data Centres (NetDC). The project was done in collaboration with the LDC (Linguistic Data Consortium), which has produced a similar corpus from the news br...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	1350.00 €
Licence: Commercial Use - ELRA VAR	1350.00 €	1350.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	200.00 €	2700.00 €
Licence: Commercial Use - ELRA VAR	2700.00 €	2700.00 €

NEWBASE - Extended version of ELRA-T0090 GEOBASE text

English
French

ID: ELRA-T0362

ISLRN: 761-442-215-246-0

Extended version of ELRA-T0090 GEOBASE. The terms were selected and collated by Dr M.S.N. CARPENTER during the course of his translation activities over the past ten years. The terms have been validated by publication in the scientific literature. Conceived as a bilingual terminological resource,...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3420.00 €	4788.00 €
Licence: Commercial Use - ELRA VAR	4788.00 €	4788.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4788.00 €	6840.00 €
Licence: Commercial Use - ELRA VAR	6840.00 €	6840.00 €

New Oxford Dictionary of English, 2nd Edition text

English

ID: ELRA-L0045

ISLRN: 044-694-748-731-5

This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in X...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	6125.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	8750.00 €

New Oxford Thesaurus of English text

English

ID: ELRA-L0047

ISLRN: 869-866-137-463-6

The New Oxford Thesaurus of English is a completely new top-of-the-range thesaurus offering more alternative and opposite words than any of its competitors. The synonyms are arranged in order of ?relevance? to the look-up word, starting with an individually tagged core synonym, and followed by la...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4900.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	7000.00 €

NODE+DIMAP text

English

ID: ELRA-L0046

ISLRN: 003-258-865-840-0

The DIMAP version of NODE (first edition) is a machine-tractable version of the machine-readable dictionary files in the DIMAP dictionary maintenance programs, adding syntactic and semantic information in the conversion. In addition, DIMAP provides several mechanisms that will allow research into...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	7000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	10000.00 €

Non-Hispanic Spanish Speech Data by Mobile Phone - 762 Hours audio

Spanish; Castilian

ID: ELRA-S0444

ISLRN: 469-588-696-069-6

1,630 non-Spanish nationality native Spanish speakers such as Mexicans and Colombians participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, in-vehicle and home. The text is manually proofr...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	180975.00 €	180975.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	180975.00 €	180975.00 €

Special offers are also available. Check here for details.

Normalized Arabic Fragments for Inestimable Stemming (NAFIS) text

Arabic

ID: ELRA-W0127

ISLRN: 305-450-745-774-1

Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a collection of sentences, selected to be representative of Arabic stemming tasks and manually annotated. Indeed, NAFIS is: Comprehensive: The content of NAFIS can be generalized...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

Norwegian EUROM1 audio

Norwegian

ID: ELRA-S0301

ISLRN: 184-180-634-505-7

EUROM1 is the first really multilingual speech database produced in Europe. Equivalent corpora for each of the European languages were collected with the same number of speakers selected in the same way, and recorded in the same conditions with common file formats. Initially eight European countr...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	800.00 €	800.00 €
Licence: Commercial Use - ELRA VAR	800.00 €	800.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1600.00 €	1600.00 €
Licence: Commercial Use - ELRA VAR	1600.00 €	1600.00 €

Norwegian SpeechDat(II) FDB-1000 audio

Norwegian

ID: ELRA-S0081

ISLRN: 231-756-812-990-0

The Norwegian SpeechDat(II) FDB-1000 comprises 1016 Norwegian speakers (517 males, 499 females) recorded over the Norwegian fixed telephone network. The FDB-1000 database is partitioned into 4 CDs. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands,...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	15000.00 €	18000.00 €
Licence: Commercial Use - ELRA VAR	18000.00 €	18000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	25000.00 €	25000.00 €
Licence: Commercial Use - ELRA VAR	25000.00 €	25000.00 €

NPChunks

Portuguese

ID: ELRA-W0089

ISLRN: 412-883-442-173-8

NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. For more information on the CINTIL corpus, see ELRA-W0050, ISLRN: 176-775-844-396-0. The corpus is PoS-annotated at token level, ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

1686 Language Resources (Page 46 of 85)