Search and Browse – ELRA Catalogue

Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

5 Language Resources

Order by:

deL1L2IM corpus text

German

ID: ELRA-W0083

ISLRN: 339-799-085-669-8

The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

Monolingual Vietnamese Annotated Corpus text

Vietnamese

ID: ELRA-W0310

ISLRN: 004-081-406-421-7

The Monolingual Vietnamese Annotated Corpus consists of 100,000 sentences, manually annotated with word boundaries, POS, named entities, with an average length of 20 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	900.00 €
Licence: Commercial Use - ELRA VAR	1800.00 €	1800.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	800.00 €	1300.00 €
Licence: Commercial Use - ELRA VAR	2500.00 €	2500.00 €

NUM 5M Mongolian written corpus text

Mongolian

ID: ELRA-W0120

ISLRN: 492-817-146-504-9

This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws. The collected raw texts was reduced from 5 to 4.8 million words after cleaning. The cleaned corpus comprises: - 144 texts from laws until 2009, - 288 texts from literature t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	7000.00 €
Licence: Commercial Use - ELRA VAR	7000.00 €	7000.00 €

PAROLE Italian Corpus text

Italian

ID: ELRA-W0043

ISLRN: 608-362-291-385-1

The PAROLE Italian Corpus comprises 3,135,651 words collected from four different domains: • newspapers: 2,179,800 words from La Stampa, La Repubblica, Il Corriere della Sera, L’Unione Sarda, Il Sole 24ore, between 1992 and 1996, • periodicals: 143,810 words from Casaviva, 100cose, Epoca, Espan...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	100.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	150.00 €

Persian 1984 corpus (Multext-East framework) text

Persian

ID: ELRA-W0054

ISLRN: 851-240-629-673-1

This corpus contains the Persian (Farsi) translation of a part of the novel “1984” (G. Orwell) annotated in the Multext-East framework (Multilingual Text Tools and Corpora for Eastern and Central European Languages). The aim of the Multext-East project was to develop standardized language resourc...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	45.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €