Text (1052)
Audio (679)
Video (23)
True (226)
TEI (10)
TMX (6)

Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

1681 Language Resources (Page 46 of 85)

« Previous | Next »Order by:

 NE3L named entities Russian corpus    
  • Russian

ID: ELRA-W0080

ISLRN: 024-620-556-146-2

The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 NEMLAR Broadcast News Speech Corpus    
  • Arabic

ID: ELRA-S0219

ISLRN: 479-507-036-103-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Written Corpus (ELRA-W0042) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220). The Nemlar Broadcast News Speech Corpus consists of about...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
150.00 € submit
500.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
1000.00 € submit
Licence: Commercial Use - ELRA VAR
4000.00 € submit
4000.00 € submit

Special offers are also available. Check here for details.

 NEMLAR Speech Synthesis Corpus    
  • Arabic

ID: ELRA-S0220

ISLRN: 361-216-121-305-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Written Corpus (ELRA-W0042) and the NEMLAR Broadcast News Speech Corpus (ELRA-S0219). The NEMLAR Speech Synthesis Corpus contains the reco...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
1250.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1000.00 € submit
2500.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit

Special offers are also available. Check here for details.

 NEMLAR Written Corpus    
  • Arabic

ID: ELRA-W0042

ISLRN: 050-693-158-326-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Broadcast News Speech Corpus (ELRA-S0219) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220). The NEMLAR Written Corpus consists of about...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
150.00 € submit
250.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
500.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit

Special offers are also available. Check here for details.

 Nepali Monolingual written corpus    
  • Nepali (macrolanguage)

ID: ELRA-W0076

ISLRN: 325-796-965-405-9

The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali National Corpus was produced in 2006 in the framework of the project Bhasha Sanchar (“language communication”), also known as Nelralec, for Nepali Language Resources and Localiza...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
 Nepali Spoken Corpus    
  • Nepali (macrolanguage)

ID: ELRA-S0368

ISLRN: 688-800-566-571-0

The Nepali Spoken Corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali National Corpus was produced in 2006 in the framework of the project Bhasha Sanchar (“language communication”), also known as Nelralec, for Nepali Language Resources and Localization for Educ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
 NetDC Arabic BNSC (Broadcast News Speech Corpus)    
  • Arabic

ID: ELRA-S0157

ISLRN: 663-177-513-755-1

The NetDC Arabic BNSC (Broadcast News Speech Corpus) is a corpus developed by ELDA in the framework of the European-funded project Network of Data Centres (NetDC). The project was done in collaboration with the LDC (Linguistic Data Consortium), which has produced a similar corpus from the news br...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
100.00 € submit
1350.00 € submit
Licence: Commercial Use - ELRA VAR
1350.00 € submit
1350.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
200.00 € submit
2700.00 € submit
Licence: Commercial Use - ELRA VAR
2700.00 € submit
2700.00 € submit
 NEWBASE - Extended version of ELRA-T0090 GEOBASE    
  • English
  • French

ID: ELRA-T0362

ISLRN: 761-442-215-246-0

Extended version of ELRA-T0090 GEOBASE. The terms were selected and collated by Dr M.S.N. CARPENTER during the course of his translation activities over the past ten years. The terms have been validated by publication in the scientific literature. Conceived as a bilingual terminological resource,...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3420.00 € submit
4788.00 € submit
Licence: Commercial Use - ELRA VAR
4788.00 € submit
4788.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4788.00 € submit
6840.00 € submit
Licence: Commercial Use - ELRA VAR
6840.00 € submit
6840.00 € submit
 New Oxford Dictionary of English, 2nd Edition    
  • English

ID: ELRA-L0045

ISLRN: 044-694-748-731-5

This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in X...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
6125.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
8750.00 € submit
 New Oxford Thesaurus of English    
  • English

ID: ELRA-L0047

ISLRN: 869-866-137-463-6

The New Oxford Thesaurus of English is a completely new top-of-the-range thesaurus offering more alternative and opposite words than any of its competitors. The synonyms are arranged in order of ?relevance? to the look-up word, starting with an individually tagged core synonym, and followed by la...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4900.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
7000.00 € submit
 NODE+DIMAP    
  • English

ID: ELRA-L0046

ISLRN: 003-258-865-840-0

The DIMAP version of NODE (first edition) is a machine-tractable version of the machine-readable dictionary files in the DIMAP dictionary maintenance programs, adding syntactic and semantic information in the conversion. In addition, DIMAP provides several mechanisms that will allow research into...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
7000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
10000.00 € submit
 Non-Hispanic Spanish Speech Data by Mobile Phone - 762 Hours    
  • Spanish; Castilian

ID: ELRA-S0444

ISLRN: 469-588-696-069-6

1,630 non-Spanish nationality native Spanish speakers such as Mexicans and Colombians participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, in-vehicle and home. The text is manually proofr...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
180975.00 € submit
180975.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
180975.00 € submit
180975.00 € submit

Special offers are also available. Check here for details.

 Normalized Arabic Fragments for Inestimable Stemming (NAFIS)    
  • Arabic

ID: ELRA-W0127

ISLRN: 305-450-745-774-1

Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a collection of sentences, selected to be representative of Arabic stemming tasks and manually annotated. Indeed, NAFIS is: Comprehensive: The content of NAFIS can be generalized...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
 Norwegian EUROM1    
  • Norwegian

ID: ELRA-S0301

ISLRN: 184-180-634-505-7

EUROM1 is the first really multilingual speech database produced in Europe. Equivalent corpora for each of the European languages were collected with the same number of speakers selected in the same way, and recorded in the same conditions with common file formats. Initially eight European countr...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
800.00 € submit
Licence: Commercial Use - ELRA VAR
800.00 € submit
800.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1600.00 € submit
1600.00 € submit
Licence: Commercial Use - ELRA VAR
1600.00 € submit
1600.00 € submit
 Norwegian SpeechDat(II) FDB-1000    
  • Norwegian

ID: ELRA-S0081

ISLRN: 231-756-812-990-0

The Norwegian SpeechDat(II) FDB-1000 comprises 1016 Norwegian speakers (517 males, 499 females) recorded over the Norwegian fixed telephone network. The FDB-1000 database is partitioned into 4 CDs. The speech databases made within the SpeechDat(II) project were validated by SPEX, the Netherlands,...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
15000.00 € submit
18000.00 € submit
Licence: Commercial Use - ELRA VAR
18000.00 € submit
18000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
25000.00 € submit
25000.00 € submit
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
 NPChunks    
  • Portuguese

ID: ELRA-W0089

ISLRN: 412-883-442-173-8

NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randomly from the written part of the CINTIL corpus. For more information on the CINTIL corpus, see ELRA-W0050, ISLRN: 176-775-844-396-0. The corpus is PoS-annotated at token level, ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
 NRC Emotion Lexicon - Revised version    
  • English

ID: ELRA-L0130

ISLRN: 007-544-786-822-8

The NRC Emotion Lexicon was originally built by Saif M. Mohammad and Peter D. Turney through crowdsourcing. The NRC was created in order to assist with emotion analysis as other emotion lexicons were smaller at the time. In order to be able to fix this problem, Saif crowdsourced a huge collection...

MEMBERacademiccommercial
Licence: Non Commercial Use - CC-BY-NC-4.0
NON MEMBERacademiccommercial
Licence: Non Commercial Use - CC-BY-NC-4.0
 NUM 5M Mongolian written corpus    
  • Mongolian

ID: ELRA-W0120

ISLRN: 492-817-146-504-9

This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws. The collected raw texts was reduced from 5 to 4.8 million words after cleaning. The cleaned corpus comprises: - 144 texts from laws until 2009, - 288 texts from literature t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
7000.00 € submit
Licence: Commercial Use - ELRA VAR
7000.00 € submit
7000.00 € submit
 Offensive Word Filter 1    
  • English

ID: ELRA-L0059

ISLRN: 496-636-168-539-2

Oxford University Press has developed two lists of offensive words and expressions, specifically developed for filter applications in the contexts of web pages and email. Each list features a grading system describing vocabulary type and offensive strength for each term, plus collocational infor...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
 Offensive Word Filter 2    
  • English

ID: ELRA-L0060

ISLRN: 359-137-869-557-3

Oxford University Press has developed two lists of offensive words and expressions, specifically developed for filter applications in the contexts of web pages and email. Each list features a grading system describing vocabulary type and offensive strength for each term, plus collocational inform...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2500.00 € submit

« Previous | Next »