Resource Type:
Corpus: | |
Lexical/Conceptual: | |
Tool/Service: | |
Language Description: |
Media Type:
Text: | |
Audio: | |
Image: | |
Video: | |
Text Numerical: | |
Text N-Gram: |
18 Language Resources
Order by:
- Arabic
- English
- French
ID: ELRA-W0323
ISLRN: 482-848-308-105-6The annotated tweet corpus in Arabizi, French and English was built by ELDA on behalf of INSA Rouen Normandie (Normandie Université, LITIS team), in the framework of the SAPhIRS project (System for the Analysis of Information Propagation in Social Networks), funded by the DGE (Direction Générale ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
7000.00 €
|
Licence: Commercial Use - ELRA VAR |
7000.00 €
|
7000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
10000.00 €
|
Licence: Commercial Use - ELRA VAR |
10000.00 €
|
10000.00 €
|
- English
- French
- Italian
ID: ELRA-W0018
ISLRN: 681-769-134-114-2The ARCADE/ROMANSEVAL corpus was used as a reference corpus in two international competitions: · ARCADE, an exercise on multilingual text alignment financed by AUPELF-UREF · ROMANSEVAL, part of the SENSEVAL exercise sponsored by ACL-SIGLEX and EURALEX, on word sense disambiguation. The corpus ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
2000.00 €
|
Licence: Commercial Use - ELRA VAR |
2000.00 €
|
2000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
- Dutch; Flemish
- English
- Finnish
ID: ELRA-S0410
ISLRN: 072-357-063-759-1A multi-lingual speech corpus used for modeling language acquisition called CAREGIVER has been designed and recorded within the framework of the EU funded Acquisition of Communication and Recognition Skills (ACORNS) project. The motivation behind the corpus and its design relies on current knowle...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
- Arabic
- Chinese
- Croatian
- Czech
- Danish
- Dutch; Flemish
- English
- Finnish
- French
- German
- Hindi
- Italian
- Japanese
- Korean
- Modern Greek (1453-)
- Norwegian
- Persian
- Polish
- Portuguese
- Russian
- Spanish; Castilian
- Swedish
- Thai
- Turkish
- Vietnamese
ID: ELRA-S0383
ISLRN: 398-655-047-044-5The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the audio files corresponding t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3360.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
4480.00 €
|
- Arabic
- Chinese
- Croatian
- Czech
- Danish
- Dutch; Flemish
- English
- Finnish
- French
- German
- Italian
- Japanese
- Korean
- Modern Greek (1453-)
- Norwegian
- Polish
- Portuguese
- Russian
- Spanish; Castilian
- Swedish
- Thai
- Turkish
- Vietnamese
ID: ELRA-S0382
ISLRN: 309-438-781-042-2The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the corresponding audio files c...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3640.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
5200.00 €
|
- English
- French
- Norwegian
- Spanish; Castilian
ID: ELRA-S0414
ISLRN: 631-345-309-445-9The Corpus of Interactions between Seniors and an Empathic Virtual Coach in Spanish, French and Norwegian was built within the EMPATHIC project (Empathic, Expressive, Advanced Virtual Coach to Improve Independent Healthy-Life-Years of the Elderly), funded within the European Union’s Horizon 2020 ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
|
25000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
|
25000.00 €
|
Special offers are also available. Check here for details.
- English
- French
- Spanish; Castilian
ID: ELRA-W0033
ISLRN: 052-466-219-226-4The CRATER corpus was built upon the foundations of an earlier project, ET10/63, which was funded in the final phase of the Eurotra programme. The Corpus Resources and Terminology Extraction project (MLAP-93 20) extended the bilingual annotated English-French International Telecommunications Unio...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
25.00 €
|
Licence: Commercial Use - ELRA VAR |
25.00 €
|
25.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
125.00 €
|
Licence: Commercial Use - ELRA VAR |
125.00 €
|
125.00 €
|
- English
- French
- Spanish; Castilian
ID: ELRA-W0003
ISLRN: 645-721-607-031-5The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-French International Telecommunications Union corpus to include Spanish, and has also debugged the existing corpus. The offer consists of a multi-lingual aligned corpus of 1,000,000 t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
20.00 €
|
Licence: Commercial Use - ELRA VAR |
20.00 €
|
20.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
100.00 €
|
Licence: Commercial Use - ELRA VAR |
100.00 €
|
100.00 €
|
- Albanian
- Bulgarian
- Chinese
- Czech
- Danish
- Dutch; Flemish
- English
- Estonian
- French
- German
- Italian
- Japanese
- Latin
- Lithuanian
- Malay (macrolanguage)
- Modern Greek (1453-)
- Norwegian
- Portuguese
- Russian
- Scottish Gaelic; Gaelic
- Serbian
- Spanish; Castilian
- Swedish
- Turkish
- Uzbek
ID: ELRA-W0004
ISLRN: 511-168-567-582-5The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
50.00 €
|
50.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
50.00 €
|
50.00 €
|
- Chinese
- English
- Vietnamese
ID: ELRA-W0314
ISLRN: 637-630-726-817-9The English-Chinese-Vietnamese Trilingual Parallel Corpus consists of 20,046 trilingual sets of sentence pairs. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
|
500.00 €
|
Licence: Commercial Use - ELRA VAR |
1000.00 €
|
1000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
225.00 €
|
750.00 €
|
Licence: Commercial Use - ELRA VAR |
1500.00 €
|
1500.00 €
|
- English
- Italian
- Spanish; Castilian
ID: ELRA-S0323
ISLRN: 716-168-855-843-2The EPIC corpus is a parallel corpus of European Parliament speeches and their corresponding simultaneous interpretations. This corpus includes source speeches in Italian, English and Spanish and interpreted speeches in all possible combinations and directions (from English into Italian and Spani...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
- Arabic
- English
- French
ID: ELRA-E0045
ISLRN: 364-018-517-901-2The MAURDOR project consists in evaluating systems for automatic processing of written documents. Collected written documents are scanned documents (printed, typewritten or manuscripts). In order to get images for the evaluation of automatic analysis systems, 10,000 original documents were c...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
|
10000.00 €
|
Licence: Evaluation Use - ELRA EVALUATION |
5000.00 €
| |
Licence: Commercial Use - ELRA VAR |
10000.00 €
|
10000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
750.00 €
|
15000.00 €
|
Licence: Evaluation Use - ELRA EVALUATION |
7500.00 €
| |
Licence: Commercial Use - ELRA VAR |
15000.00 €
|
15000.00 €
|
- Dutch; Flemish
- English
- French
- German
ID: ELRA-S0238
ISLRN: 189-835-264-931-4In 1996, some 75 Dutch people participated in recording a multi-purpose continuous speech database. Most of them were recruited from the TNO Human Factors Research Institute, where the recordings were made. The main part of the database consisted of Dutch sentences. However, most speakers partici...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
400.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
500.00 €
|
- Danish
- Dutch; Flemish
- English
- French
- German
- Italian
- Modern Greek (1453-)
- Portuguese
- Spanish; Castilian
ID: ELRA-W0023
ISLRN: 963-635-729-341-8The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different languages and one set as the basis for translation studies. The first set is referred as the Polylingual Document Collection, a collection of newspaper articles from financial new...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
1600.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
3600.00 €
|
- English
- French
- German
- Italian
- Spanish; Castilian
ID: ELRA-W0017
ISLRN: 900-482-746-635-0This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050). This part contains raw, tagged and aligned data from the Written Questions and Answers of the Official Journal of the European Community. The corpus contains approx. 5 mill...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
2000.00 €
|
Licence: Commercial Use - ELRA VAR |
2000.00 €
|
2000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
- Chinese
- English
- Korean
ID: ELRA-W0035
ISLRN: 731-151-596-869-3Multilingual parallel corpus produced by Kaist Korterm containing 60 000 expressions in Korean, Chinese and English.
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
750.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
3000.00 €
|
3000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1500.00 €
|
6000.00 €
|
Licence: Commercial Use - ELRA VAR |
6000.00 €
|
6000.00 €
|
- Assamese
- Bengali
- English
- Gujarati
- Hindi
- Kannada
- Kashmiri
- Malayalam
- Marathi
- Oriya (macrolanguage)
- Panjabi; Punjabi
- Sinhala; Sinhalese
- Tamil
- Telugu
- Urdu
ID: ELRA-W0037
ISLRN: 039-846-040-604-0The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayala...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
- English
- French
- German
ID: ELRA-W0013
ISLRN: 717-350-913-018-8The TSNLP project (LRE 62-089) has produced a database of test suites for English, French and German containing over 4,000 test items (sentences or fragment of sentences) per language which have been constructed for evaluating natural language processing systems, but which may also be useful for ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
100.00 €
|
Licence: Commercial Use - ELRA VAR |
100.00 €
|
100.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
100.00 €
|
Licence: Commercial Use - ELRA VAR |
100.00 €
|
100.00 €
|