Resource Type:
Corpus: | ![]() |
Lexical/Conceptual: | ![]() |
Tool/Service: | ![]() |
Language Description: | ![]() |
Media Type:
Text: | ![]() |
Audio: | ![]() |
Image: | ![]() |
Video: | ![]() |
Text Numerical: | ![]() |
Text N-Gram: | ![]() |
93 Language Resources (Page 1 of 5)
« Previous | Next »Order by:


- Arabic
ID: ELRA-W0030
ISLRN: 365-777-769-398-7The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open University. The corpus contains Al-Hayat newspaper articles with value added for Language Engineering and Information Retrieval applications development purposes. The data have ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
480.00 €
![]() |
960.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
960.00 €
![]() |
960.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
720.00 €
![]() |
1440.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1440.00 €
![]() |
1440.00 €
![]() |


- Amharic
- English
ID: ELRA-W0074
ISLRN: 590-255-335-719-0The Amharic-English bilingual corpus contains parallel text from legal and news domains in Amharic script, in transliterated form and in English. The size of the corpus is of 232,653 words in Amharic and 291,701 in English. This parallel corpus contains documents from two domains, namely legal...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
2000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
2000.00 €
![]() |
2000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
4000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
4000.00 €
![]() |
4000.00 €
![]() |


- Esperanto
ID: ELRA-W0129
ISLRN: 185-602-618-699-2The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news magazine, consisting of random excerpts from the period 2000-2010. All words were annotated for lemma, part-of-speech, inflection, compounding and affixing, syntactic function, de...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
900.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
900.00 €
![]() |
900.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
1500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1500.00 €
![]() |
1500.00 €
![]() |


- Danish
ID: ELRA-W0084
ISLRN: 025-729-182-451-2The Arboretum treebank is a morphologically and syntactically annotated repository of Danish sentences, taken from Korpus 90 and Korpus 2000, both compiled by the Society for Danish Language and Literature (http://ordnet.dk/korpusdk/fakta), and containing samples of written Danish from the 90'ies...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1500.00 €
![]() |
7000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
7000.00 €
![]() |
7000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2200.00 €
![]() |
10000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
10000.00 €
![]() |
10000.00 €
![]() |


- English
- French
- Italian
ID: ELRA-W0018
ISLRN: 681-769-134-114-2The ARCADE/ROMANSEVAL corpus was used as a reference corpus in two international competitions: · ARCADE, an exercise on multilingual text alignment financed by AUPELF-UREF · ROMANSEVAL, part of the SENSEVAL exercise sponsored by ACL-SIGLEX and EURALEX, on word sense disambiguation. The corpus ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
2000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
2000.00 €
![]() |
2000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
5000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
5000.00 €
![]() |
5000.00 €
![]() |


- French
ID: ELRA-W0025-02
ISLRN: 798-363-116-656-4This "scientific" corpus of modern French was produced by the University of Nantes (France) within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche magazine in 1998, including issues 30...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
![]() |
5000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
5000.00 €
![]() |
5000.00 €
![]() |


- French
ID: ELRA-W0025-01
ISLRN: 508-941-013-339-7This "scientific" corpus of modern French was produced by the University of Nantes (France) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche mag...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
240.00 €
![]() |
1200.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1200.00 €
![]() |
1200.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
310.00 €
![]() |
1500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1500.00 €
![]() |
1500.00 €
![]() |


- Catalan; Valencian
ID: ELRA-W0047
ISLRN: 000-089-517-382-8The Catalan Corpus of News Articles comprises articles in Catalan from 1 January 1999 to 31 March 2007. These articles are grouped per trimester without chronological order inside. The DVD contains one folder per year. Each folder has been divided into subfolders, containing the archives per tri...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2975.00 €
![]() |
14855.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
14855.00 €
![]() |
14855.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3930.00 €
![]() |
19315.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
19315.00 €
![]() |
19315.00 €
![]() |


- Catalan; Valencian
- Spanish; Castilian
ID: ELRA-W0053
ISLRN: 124-613-721-890-1This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de Catalunya”. Both language data are rather close as the Catalan text is a translation of the Spanish one, partly achieved by means of Machine translation and then post-edited. The...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2000.00 €
![]() |
20000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
20000.00 €
![]() |
20000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3000.00 €
![]() |
24000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
24000.00 €
![]() |
24000.00 €
![]() |


- Portuguese
ID: ELRA-W0062
ISLRN: 368-672-631-502-0The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical representations, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Portuguese
ID: ELRA-W0061
ISLRN: 133-035-138-613-6The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Portuguese
ID: ELRA-W0056
ISLRN: 723-486-478-286-6The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition,...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Portuguese
ID: ELRA-W0055
ISLRN: 411-691-515-701-9The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Spanish; Castilian
ID: ELRA-W0041
ISLRN: 837-873-214-287-0This corpus consists of 11 novels written in Castilian Spanish by Inmaculada Ferrer-Vidal Turull, a contemporaneous author. The list of novels consists of: - La búsqueda: 113,639 words - Tristeza: 41,125 words - Cuarto menguante: 42,419 words - Recuerdos: 55,694 words - Sucedió en Abril: 46,040 w...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
![]() |
800.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
800.00 €
![]() |
800.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
![]() |
1000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1000.00 €
![]() |
1000.00 €
![]() |


- English
- French
- Spanish; Castilian
ID: ELRA-W0033
ISLRN: 052-466-219-226-4The CRATER corpus was built upon the foundations of an earlier project, ET10/63, which was funded in the final phase of the Eurotra programme. The Corpus Resources and Terminology Extraction project (MLAP-93 20) extended the bilingual annotated English-French International Telecommunications Unio...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
25.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
25.00 €
![]() |
25.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
125.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
125.00 €
![]() |
125.00 €
![]() |


- English
- French
- Spanish; Castilian
ID: ELRA-W0003
ISLRN: 645-721-607-031-5The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-French International Telecommunications Union corpus to include Spanish, and has also debugged the existing corpus. The offer consists of a multi-lingual aligned corpus of 1,000,000 t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
20.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
20.00 €
![]() |
20.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
100.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
100.00 €
![]() |
100.00 €
![]() |


- Danish
ID: ELRA-W0117
ISLRN: 213-212-351-142-5The Danish Propbank (DPB) is a multi-layer treebank, annotated not only with morphosyntactic, but also with semantic information, in particular propositions/frames with VerbNet classes and semantic roles for both arguments and satellites. In addition, the corpus has been annotated with 20 Named E...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
5000.00 €
![]() |
5000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
800.00 €
![]() |
7000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
7000.00 €
![]() |
7000.00 €
![]() |


- German
ID: ELRA-W0083
ISLRN: 339-799-085-669-8The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
0.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
0.00 €
![]() |
0.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
0.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
0.00 €
![]() |
0.00 €
![]() |


- Dutch; Flemish
ID: ELRA-W0019
ISLRN: 440-290-917-102-7The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference corpus. The Dutch corpus annotation and checking was made accordingly to the common core PAROLE tagset. The Dutch data were also checked for type. The Dutch PAROLE Distributable...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
270.00 €
![]() |
800.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1600.00 €
![]() |
1600.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
![]() |
1300.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
2500.00 €
![]() |
2500.00 €
![]() |
Special offers are also available. Check here for details.


- English
- Persian
ID: ELRA-W0118
ISLRN: 074-825-114-781-7The English-Persian parallel corpus contains more than 200,000 aligned sentences across a variety of text types from the domains of art, law, culture, science, religion, literature, medicine, idioms, politics and others. It is an extension of the English-Persian parallel corpus already distribute...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1000.00 €
![]() |
5000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
5000.00 €
![]() |
5000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1200.00 €
![]() |
6000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
6000.00 €
![]() |
6000.00 €
![]() |
« Previous | Next »