Resource Type:
Corpus: | ![]() |
Lexical/Conceptual: | ![]() |
Tool/Service: | ![]() |
Language Description: | ![]() |
Media Type:
Text: | ![]() |
Audio: | ![]() |
Image: | ![]() |
Video: | ![]() |
Text Numerical: | ![]() |
Text N-Gram: | ![]() |
57 Language Resources (Page 1 of 3)
« Previous | Next »Order by:


- Arabic
ID: ELRA-W0030
ISLRN: 365-777-769-398-7The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open University. The corpus contains Al-Hayat newspaper articles with value added for Language Engineering and Information Retrieval applications development purposes. The data have ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
480.00 €
![]() |
960.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
960.00 €
![]() |
960.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
720.00 €
![]() |
1440.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1440.00 €
![]() |
1440.00 €
![]() |


- Esperanto
ID: ELRA-W0129
ISLRN: 185-602-618-699-2The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news magazine, consisting of random excerpts from the period 2000-2010. All words were annotated for lemma, part-of-speech, inflection, compounding and affixing, syntactic function, de...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
900.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
900.00 €
![]() |
900.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
1500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1500.00 €
![]() |
1500.00 €
![]() |


- Danish
ID: ELRA-W0084
ISLRN: 025-729-182-451-2The Arboretum treebank is a morphologically and syntactically annotated repository of Danish sentences, taken from Korpus 90 and Korpus 2000, both compiled by the Society for Danish Language and Literature (http://ordnet.dk/korpusdk/fakta), and containing samples of written Danish from the 90'ies...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1500.00 €
![]() |
7000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
7000.00 €
![]() |
7000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2200.00 €
![]() |
10000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
10000.00 €
![]() |
10000.00 €
![]() |


- French
ID: ELRA-W0025-02
ISLRN: 798-363-116-656-4This "scientific" corpus of modern French was produced by the University of Nantes (France) within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche magazine in 1998, including issues 30...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
![]() |
5000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
5000.00 €
![]() |
5000.00 €
![]() |


- French
ID: ELRA-W0025-01
ISLRN: 508-941-013-339-7This "scientific" corpus of modern French was produced by the University of Nantes (France) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche mag...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
240.00 €
![]() |
1200.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1200.00 €
![]() |
1200.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
310.00 €
![]() |
1500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1500.00 €
![]() |
1500.00 €
![]() |


- Catalan; Valencian
ID: ELRA-W0047
ISLRN: 000-089-517-382-8The Catalan Corpus of News Articles comprises articles in Catalan from 1 January 1999 to 31 March 2007. These articles are grouped per trimester without chronological order inside. The DVD contains one folder per year. Each folder has been divided into subfolders, containing the archives per tri...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2975.00 €
![]() |
14855.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
14855.00 €
![]() |
14855.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3930.00 €
![]() |
19315.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
19315.00 €
![]() |
19315.00 €
![]() |


- Catalan; Valencian
- Spanish; Castilian
ID: ELRA-W0053
ISLRN: 124-613-721-890-1This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de Catalunya”. Both language data are rather close as the Catalan text is a translation of the Spanish one, partly achieved by means of Machine translation and then post-edited. The...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2000.00 €
![]() |
20000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
20000.00 €
![]() |
20000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3000.00 €
![]() |
24000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
24000.00 €
![]() |
24000.00 €
![]() |


- Portuguese
ID: ELRA-W0062
ISLRN: 368-672-631-502-0The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical representations, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Portuguese
ID: ELRA-W0061
ISLRN: 133-035-138-613-6The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Portuguese
ID: ELRA-W0056
ISLRN: 723-486-478-286-6The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition,...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Portuguese
ID: ELRA-W0055
ISLRN: 411-691-515-701-9The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Spanish; Castilian
ID: ELRA-W0041
ISLRN: 837-873-214-287-0This corpus consists of 11 novels written in Castilian Spanish by Inmaculada Ferrer-Vidal Turull, a contemporaneous author. The list of novels consists of: - La búsqueda: 113,639 words - Tristeza: 41,125 words - Cuarto menguante: 42,419 words - Recuerdos: 55,694 words - Sucedió en Abril: 46,040 w...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
![]() |
800.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
800.00 €
![]() |
800.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
![]() |
1000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1000.00 €
![]() |
1000.00 €
![]() |


- Danish
ID: ELRA-W0117
ISLRN: 213-212-351-142-5The Danish Propbank (DPB) is a multi-layer treebank, annotated not only with morphosyntactic, but also with semantic information, in particular propositions/frames with VerbNet classes and semantic roles for both arguments and satellites. In addition, the corpus has been annotated with 20 Named E...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
5000.00 €
![]() |
5000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
800.00 €
![]() |
7000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
7000.00 €
![]() |
7000.00 €
![]() |


- German
ID: ELRA-W0083
ISLRN: 339-799-085-669-8The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
0.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
0.00 €
![]() |
0.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
0.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
0.00 €
![]() |
0.00 €
![]() |


- Dutch; Flemish
ID: ELRA-W0019
ISLRN: 440-290-917-102-7The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference corpus. The Dutch corpus annotation and checking was made accordingly to the common core PAROLE tagset. The Dutch data were also checked for type. The Dutch PAROLE Distributable...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
270.00 €
![]() |
800.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1600.00 €
![]() |
1600.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
![]() |
1300.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
2500.00 €
![]() |
2500.00 €
![]() |
Special offers are also available. Check here for details.


- Swahili (macrolanguage)
ID: ELRA-W0119
ISLRN: 941-187-059-145-7This is a text corpus of Swahili language of 25 million words, annotated for part-of-speech, morphology and syntax. The corpus contains prose text from fiction, news media and government documents domains, from the period between 1953 and 2016. This package contains: - the Helsinki Corpus of Swa...
MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
7500.00 €
![]() |
7500.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
15000.00 €
![]() |
15000.00 €
![]() |


- Italian
ID: ELRA-W0044
ISLRN: 927-246-660-947-9ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in XML. ISST has a five-level structure covering orthographic, morpho-syntactic, syntactic and semantic levels of linguistic description. Syntactic annotation is distributed over t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
100.00 €
![]() |
1500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1500.00 €
![]() |
1500.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
![]() |
2500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
2500.00 €
![]() |
2500.00 €
![]() |


- German
ID: ELRA-W0016
ISLRN: 628-817-117-400-1The "Karl-May-Korpus" is a monolingual German corpus, available in an SGML-tagged ASCII text format. It contains the works of the German author Karl May (1842-1912) and consists of around 1.6 million words (divided into 9 subcorpora of about 180,000 words each). The corpus was created between 199...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
![]() |
2500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
2500.00 €
![]() |
2500.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
800.00 €
![]() |
3500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3500.00 €
![]() |
3500.00 €
![]() |


- English
ID: ELRA-W0081
ISLRN: 764-036-829-417-7The Manually Annotated Reference Corpus is a collection of English web documents annotated with key entities (such as disease, drug), built in the framework of the Khresmoi project, funded by the European Commission. It has been constructed by first annotating these entities with an imperfect aut...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
0.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
0.00 €
![]() |
0.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
0.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
0.00 €
![]() |
0.00 €
![]() |


- Arabic
ID: ELRA-W0049
ISLRN: 124-139-628-259-2This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see also ELRA-W0036-04). To each text are associated 3 files : - raw text in Arabic, - vowelized text in Arabic, - one XML file containing the morphological annotation of the text. ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
185.00 €
![]() |
975.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
975.00 €
![]() |
975.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
![]() |
2000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
2000.00 €
![]() |
2000.00 €
![]() |
« Previous | Next »