Resource Type:
Corpus: | ![]() |
Lexical/Conceptual: | ![]() |
Tool/Service: | ![]() |
Language Description: | ![]() |
Media Type:
Text: | ![]() |
Audio: | ![]() |
Image: | ![]() |
Video: | ![]() |
Text Numerical: | ![]() |
Text N-Gram: | ![]() |
94 Language Resources (Page 1 of 5)
« Previous | Next »Order by:


- French
ID: ELRA-W0082
ISLRN: 024-713-187-947-8A pluridisciplinary team of linguists and computer scientists (Rachel Panckhurst, Catherine Détrie, Cédric Lopez, Claudine Moïse, Mathieu Roche, Bertrand Verine (Praxiling, Lirmm, Lidilem, Tetis, Viseo) collected more than 88,000 French authentic text messages in Montpellier (2011), as part of th...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - Non Standard Licence Terms |
0.00 €
![]() |
0.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - Non Standard Licence Terms |
0.00 €
![]() |
0.00 €
![]() |


- Arabic
ID: ELRA-W0030
ISLRN: 365-777-769-398-7The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open University. The corpus contains Al-Hayat newspaper articles with value added for Language Engineering and Information Retrieval applications development purposes. The data have ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
480.00 €
![]() |
960.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
960.00 €
![]() |
960.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
720.00 €
![]() |
1440.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1440.00 €
![]() |
1440.00 €
![]() |


- French
ID: ELRA-W0029
ISLRN: 786-395-313-491-8Launched at the end of 1995, the AMARYLLIS project aimed at evaluating information retrieval software for French text corpora in order to provide a methodology for the evaluation of other similar tools. AMARYLLIS was organised by the Institut de l'Information Scientifique et Technique (INIST) wit...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
45.00 €
![]() |
100.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
45.00 €
![]() |
100.00 €
![]() |


- Arabic
ID: ELRA-W0027
ISLRN: 083-457-618-309-8The An-Nahar Lebanon Newspaper Text Corpus comprises articles in standard Arabic from 1995 to 2000 (6 years) stored as HTML files on CDRom media. Each year contains 45 000 articles and 24 million words. Each article includes information such as title, newspaper's name, date, country, type, page, ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2016.00 €
![]() |
3192.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3024.00 €
![]() |
4788.00 €
![]() |
Special offers are also available. Check here for details.


- Esperanto
ID: ELRA-W0129
ISLRN: 185-602-618-699-2The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news magazine, consisting of random excerpts from the period 2000-2010. All words were annotated for lemma, part-of-speech, inflection, compounding and affixing, syntactic function, de...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
900.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
900.00 €
![]() |
900.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
1500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1500.00 €
![]() |
1500.00 €
![]() |


- Danish
ID: ELRA-W0084
ISLRN: 025-729-182-451-2The Arboretum treebank is a morphologically and syntactically annotated repository of Danish sentences, taken from Korpus 90 and Korpus 2000, both compiled by the Society for Danish Language and Literature (http://ordnet.dk/korpusdk/fakta), and containing samples of written Danish from the 90'ies...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1500.00 €
![]() |
7000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
7000.00 €
![]() |
7000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2200.00 €
![]() |
10000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
10000.00 €
![]() |
10000.00 €
![]() |


- French
ID: ELRA-W0025-02
ISLRN: 798-363-116-656-4This "scientific" corpus of modern French was produced by the University of Nantes (France) within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche magazine in 1998, including issues 30...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
![]() |
5000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
5000.00 €
![]() |
5000.00 €
![]() |


- French
ID: ELRA-W0025-01
ISLRN: 508-941-013-339-7This "scientific" corpus of modern French was produced by the University of Nantes (France) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335). The corpus contains all articles published in La Recherche mag...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
240.00 €
![]() |
1200.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1200.00 €
![]() |
1200.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
310.00 €
![]() |
1500.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1500.00 €
![]() |
1500.00 €
![]() |


- Catalan; Valencian
ID: ELRA-W0047
ISLRN: 000-089-517-382-8The Catalan Corpus of News Articles comprises articles in Catalan from 1 January 1999 to 31 March 2007. These articles are grouped per trimester without chronological order inside. The DVD contains one folder per year. Each folder has been divided into subfolders, containing the archives per tri...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2975.00 €
![]() |
14855.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
14855.00 €
![]() |
14855.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3930.00 €
![]() |
19315.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
19315.00 €
![]() |
19315.00 €
![]() |


- Catalan; Valencian
- Spanish; Castilian
ID: ELRA-W0053
ISLRN: 124-613-721-890-1This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de Catalunya”. Both language data are rather close as the Catalan text is a translation of the Spanish one, partly achieved by means of Machine translation and then post-edited. The...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2000.00 €
![]() |
20000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
20000.00 €
![]() |
20000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3000.00 €
![]() |
24000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
24000.00 €
![]() |
24000.00 €
![]() |


- French
ID: ELRA-E0019
ISLRN: 154-799-255-123-0The CESART Evaluation Package was produced within the French national project CESART (Evaluation of terminology extraction tools), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The CESART project enabled to carry out a campaign for th...
MEMBER | academic | commercial |
---|---|---|
Licence: Evaluation Use - ELRA EVALUATION |
150.00 €
![]() |
500.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Evaluation Use - ELRA EVALUATION |
300.00 €
![]() |
1000.00 €
![]() |


- Portuguese
ID: ELRA-W0062
ISLRN: 368-672-631-502-0The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical representations, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Portuguese
ID: ELRA-W0061
ISLRN: 133-035-138-613-6The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Portuguese
ID: ELRA-W0056
ISLRN: 723-486-478-286-6The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition,...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- Portuguese
ID: ELRA-W0055
ISLRN: 411-691-515-701-9The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
![]() |
3000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
3000.00 €
![]() |
3000.00 €
![]() |


- English
ID: ELRA-E0042
ISLRN: 749-772-201-451-6The CLEF Initiative (Conference and Labs of the Evaluation Forum) promotes the systematic evaluation of information access systems through experimentation on shared tasks, with an emphasis on multilingual and multimodal information. The CLEFeHealth 2013 Task 3 Evaluation Package contains data ...
MEMBER | academic | commercial |
---|---|---|
Licence: Evaluation Use - ELRA EVALUATION |
0.00 €
![]() |
0.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Evaluation Use - ELRA EVALUATION |
0.00 €
![]() |
0.00 €
![]() |


- English
ID: ELRA-E0043
ISLRN: 725-020-897-275-7The CLEF Initiative (Conference and Labs of the Evaluation Forum) promotes the systematic evaluation of information access systems through experimentation on shared tasks, with an emphasis on multilingual and multimodal information. It is structured in two main parts: a series of Evaluation Labs ...
MEMBER | academic | commercial |
---|---|---|
Licence: Evaluation Use - ELRA EVALUATION |
0.00 €
![]() |
0.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Evaluation Use - ELRA EVALUATION |
0.00 €
![]() |
0.00 €
![]() |


- English
- French
- Spanish; Castilian
ID: ELRA-E0039
ISLRN: 460-370-870-489-0The Cross-Language Evaluation Forum (CLEF) promotes R&D in multilingual information access (MLIA) by (i) developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and (ii) c...
MEMBER | academic | commercial |
---|---|---|
Licence: Evaluation Use - ELRA EVALUATION |
150.00 €
![]() |
500.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Evaluation Use - ELRA EVALUATION |
300.00 €
![]() |
1000.00 €
![]() |
Special offers are also available. Check here for details.


- Spanish; Castilian
ID: ELRA-W0041
ISLRN: 837-873-214-287-0This corpus consists of 11 novels written in Castilian Spanish by Inmaculada Ferrer-Vidal Turull, a contemporaneous author. The list of novels consists of: - La búsqueda: 113,639 words - Tristeza: 41,125 words - Cuarto menguante: 42,419 words - Recuerdos: 55,694 words - Sucedió en Abril: 46,040 w...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
![]() |
800.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
800.00 €
![]() |
800.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
![]() |
1000.00 €
![]() |
Licence: Commercial Use - ELRA VAR |
1000.00 €
![]() |
1000.00 €
![]() |


- Icelandic
ID: ELRA-W0298
ISLRN: 420-670-865-427-1This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Corpus of Icelandic texts from the Central Bank of Icela...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Other - Open Under-PSI |
0.00 €
![]() |
0.00 €
![]() |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Other - Open Under-PSI |
0.00 €
![]() |
0.00 €
![]() |
« Previous | Next »