Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

112 Language Resources (Page 1 of 6)

« Previous | Next »Order by:

 Al-Hayat Arabic Corpus    
  • Arabic

ID: ELRA-W0030

ISLRN: 365-777-769-398-7

The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open University. The corpus contains Al-Hayat newspaper articles with value added for Language Engineering and Information Retrieval applications development purposes. The data have ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
480.00 € submit
960.00 € submit
Licence: Commercial Use - ELRA VAR
960.00 € submit
960.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
720.00 € submit
1440.00 € submit
Licence: Commercial Use - ELRA VAR
1440.00 € submit
1440.00 € submit
 Amharic-English bilingual corpus    
  • Amharic
  • English

ID: ELRA-W0074

ISLRN: 590-255-335-719-0

The Amharic-English bilingual corpus contains parallel text from legal and news domains in Amharic script, in transliterated form and in English. The size of the corpus is of 232,653 words in Amharic and 291,701 in English. This parallel corpus contains documents from two domains, namely legal...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
4000.00 € submit
Licence: Commercial Use - ELRA VAR
4000.00 € submit
4000.00 € submit
 Annotated tweet corpus in Arabizi, French and English    
  • Arabic
  • English
  • French

ID: ELRA-W0323

ISLRN: 482-848-308-105-6

The annotated tweet corpus in Arabizi, French and English was built by ELDA on behalf of INSA Rouen Normandie (Normandie Université, LITIS team), in the framework of the SAPhIRS project (System for the Analysis of Information Propagation in Social Networks), funded by the DGE (Direction Générale ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
7000.00 € submit
Licence: Commercial Use - ELRA VAR
7000.00 € submit
7000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit
 Arbobanko (Esperanto Treebank)    
  • Esperanto

ID: ELRA-W0129

ISLRN: 185-602-618-699-2

The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news magazine, consisting of random excerpts from the period 2000-2010. All words were annotated for lemma, part-of-speech, inflection, compounding and affixing, syntactic function, de...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
900.00 € submit
Licence: Commercial Use - ELRA VAR
900.00 € submit
900.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
1500.00 € submit
Licence: Commercial Use - ELRA VAR
1500.00 € submit
1500.00 € submit
 Arboretum treebank    
  • Danish

ID: ELRA-W0084

ISLRN: 025-729-182-451-2

The Arboretum treebank is a morphologically and syntactically annotated repository of Danish sentences, taken from Korpus 90 and Korpus 2000, both compiled by the Society for Danish Language and Literature (http://ordnet.dk/korpusdk/fakta), and containing samples of written Danish from the 90'ies...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1500.00 € submit
7000.00 € submit
Licence: Commercial Use - ELRA VAR
7000.00 € submit
7000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2200.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit
 ARCADE/ROMANSEVAL corpus    
  • English
  • French
  • Italian

ID: ELRA-W0018

ISLRN: 681-769-134-114-2

The ARCADE/ROMANSEVAL corpus was used as a reference corpus in two international competitions: · ARCADE, an exercise on multilingual text alignment financed by AUPELF-UREF · ROMANSEVAL, part of the SENSEVAL exercise sponsored by ACL-SIGLEX and EURALEX, on word sense disambiguation. The corpus ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 Archives of "El Mundo" Newspaper – Year 2020    
  • Spanish; Castilian

ID: ELRA-W0333

ISLRN: 573-498-319-304-6

This corpus consists of 15,073 articles in Spanish from electronic archives of "El Mundo" Newspaper published in the year 2020. A few articles also come from publications from other related media: El Mundo Alicante, El Mundo Andalucía, El Mundo Baleares, El Mundo Catalunya, El Mundo Valéncia et E...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
700.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 Archives of "El Mundo" Newspaper – Year 2021    
  • Spanish; Castilian

ID: ELRA-W0334

ISLRN: 196-909-664-343-4

This corpus consists of 14,461 articles in Spanish from electronic archives of "El Mundo" Newspaper published in the year 2021. A few articles also come from publications from other related media: El Mundo Alicante, El Mundo Andalucía, El Mundo Baleares, El Mundo Catalunya, El Mundo Valéncia et E...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
700.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 Archives of "El Mundo" Newspaper – Year 2022    
  • Spanish; Castilian

ID: ELRA-W0335

ISLRN: 261-537-224-628-2

This corpus consists of 16,124 articles in Spanish from electronic archives of "El Mundo" Newspaper published in the year 2022. A few articles also come from publications from other related media: El Mundo Alicante, El Mundo Andalucía, El Mundo Baleares, El Mundo Catalunya, El Mundo Valéncia et E...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
700.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 Archives of "El Mundo" Newspaper – Years 2020-2022    
  • Spanish; Castilian

ID: ELRA-W0332

ISLRN: 124-545-396-179-3

This corpus consists of 45,658 articles in Spanish from electronic archives of "El Mundo" Newspaper between 2020 and 2022. A few articles also come from publications from other related media: El Mundo Alicante, El Mundo Andalucía, El Mundo Baleares, El Mundo Catalunya, El Mundo Valéncia et Expans...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2065.00 € submit
5900.00 € submit
Licence: Commercial Use - ELRA VAR
5900.00 € submit
5900.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2360.00 € submit
8850.00 € submit
Licence: Commercial Use - ELRA VAR
8850.00 € submit
8850.00 € submit
 A "scientific" corpus of modern French ("La Recherche" magazine) - Complete version    
  • French

ID: ELRA-W0025-02

ISLRN: 798-363-116-656-4

This "scientific" corpus of modern French was produced by the University of Nantes (France) within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche magazine in 1998, including issues 30...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 A "scientific" corpus of modern French ("La Recherche" magazine) - Raw data    
  • French

ID: ELRA-W0025-01

ISLRN: 508-941-013-339-7

This "scientific" corpus of modern French was produced by the University of Nantes (France) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche mag...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
240.00 € submit
1200.00 € submit
Licence: Commercial Use - ELRA VAR
1200.00 € submit
1200.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
310.00 € submit
1500.00 € submit
Licence: Commercial Use - ELRA VAR
1500.00 € submit
1500.00 € submit
 Catalan Corpus of News Articles    
  • Catalan; Valencian

ID: ELRA-W0047

ISLRN: 000-089-517-382-8

The Catalan Corpus of News Articles comprises articles in Catalan from 1 January 1999 to 31 March 2007. These articles are grouped per trimester without chronological order inside. The DVD contains one folder per year. Each folder has been divided into subfolders, containing the archives per tri...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2975.00 € submit
14855.00 € submit
Licence: Commercial Use - ELRA VAR
14855.00 € submit
14855.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3930.00 € submit
19315.00 € submit
Licence: Commercial Use - ELRA VAR
19315.00 € submit
19315.00 € submit
 Catalan-Spanish Parallel Corpus    
  • Catalan; Valencian
  • Spanish; Castilian

ID: ELRA-W0053

ISLRN: 124-613-721-890-1

This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de Catalunya”. Both language data are rather close as the Catalan text is a translation of the Spanish one, partly achieved by means of Machine translation and then post-edited. The...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2000.00 € submit
20000.00 € submit
Licence: Commercial Use - ELRA VAR
20000.00 € submit
20000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3000.00 € submit
24000.00 € submit
Licence: Commercial Use - ELRA VAR
24000.00 € submit
24000.00 € submit
 Chinese-Vietnamese Parallel Corpus    
  • Chinese
  • Vietnamese

ID: ELRA-W0312

ISLRN: 128-772-037-486-0

The Chinese-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
200.00 € submit
400.00 € submit
Licence: Commercial Use - ELRA VAR
1400.00 € submit
1400.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
600.00 € submit
Licence: Commercial Use - ELRA VAR
2100.00 € submit
2100.00 € submit
 CINTIL-DeepBank    
  • Portuguese

ID: ELRA-W0062

ISLRN: 368-672-631-502-0

The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical representations, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 CINTIL-DependencyBank    
  • Portuguese

ID: ELRA-W0061

ISLRN: 133-035-138-613-6

The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 CINTIL-PropBank    
  • Portuguese

ID: ELRA-W0056

ISLRN: 723-486-478-286-6

The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition,...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 CINTIL-TreeBank    
  • Portuguese

ID: ELRA-W0055

ISLRN: 411-691-515-701-9

The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
 Corpus for fine-grained analysis and automatic detection of irony on Twitter    
  • English

ID: ELRA-W0337

ISLRN: 478-366-550-085-8

The Corpus for fine-grained analysis and automatic detection of irony on Twitter was carefully annotated by trained annotators (Master’s students in Linguistics) using a detailed annotation scheme for irony categorization, which describes four labels: ‘ironic by means of a polarity contrast’, ‘si...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
200.00 € submit
Licence: Commercial Use - ELRA VAR
200.00 € submit
200.00 € submit

« Previous | Next »