Search and Browse – ELRA Catalogue

Chinese
Vietnamese

ID: ELRA-W0312

The Chinese-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	200.00 €	400.00 €
Licence: Commercial Use - ELRA VAR	1400.00 €	1400.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	600.00 €
Licence: Commercial Use - ELRA VAR	2100.00 €	2100.00 €

deL1L2IM corpus text

German

ID: ELRA-W0083

ISLRN: 339-799-085-669-8

The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

English-Chinese-Vietnamese Trilingual Parallel Corpus text

Chinese
English
Vietnamese

ID: ELRA-W0314

ISLRN: 637-630-726-817-9

The English-Chinese-Vietnamese Trilingual Parallel Corpus consists of 20,046 trilingual sets of sentence pairs. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	225.00 €	750.00 €
Licence: Commercial Use - ELRA VAR	1500.00 €	1500.00 €

English-Vietnamese Parallel Corpus text

English
Vietnamese

ID: ELRA-W0124

ISLRN: 838-483-738-912-8

This is a corpus of 500,000 English-Vietnamese sentence pairs, built to develop SMT (Statistical Machine Translation) systems. The parallel corpus contains English documents translated by professional translators into Vietnamese. The source texts include books, dictionaries, newspapers, online ne...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	600.00 €	1200.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1000.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	8000.00 €	8000.00 €

English-Vietnamese Parallel Corpus text

English
Vietnamese

ID: ELRA-W0311

ISLRN: 893-470-491-825-6

The English-Vietnamese Parallel Corpus consists of 1,000,000 sentence pairs, with an average length of 20 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	900.00 €	1800.00 €
Licence: Commercial Use - ELRA VAR	9000.00 €	9000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1500.00 €	3000.00 €
Licence: Commercial Use - ELRA VAR	12000.00 €	12000.00 €

Korean-Vietnamese Parallel Corpus text

Korean
Vietnamese

ID: ELRA-W0313

ISLRN: 365-128-449-700-7

The Korean-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	200.00 €	400.00 €
Licence: Commercial Use - ELRA VAR	1400.00 €	1400.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	600.00 €
Licence: Commercial Use - ELRA VAR	2100.00 €	2100.00 €

Monolingual Vietnamese Annotated Corpus text

Vietnamese

ID: ELRA-W0310

ISLRN: 004-081-406-421-7

The Monolingual Vietnamese Annotated Corpus consists of 100,000 sentences, manually annotated with word boundaries, POS, named entities, with an average length of 20 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	900.00 €
Licence: Commercial Use - ELRA VAR	1800.00 €	1800.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	800.00 €	1300.00 €
Licence: Commercial Use - ELRA VAR	2500.00 €	2500.00 €

NUM 5M Mongolian written corpus text

Mongolian

ID: ELRA-W0120

ISLRN: 492-817-146-504-9

This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws. The collected raw texts was reduced from 5 to 4.8 million words after cleaning. The cleaned corpus comprises: - 144 texts from laws until 2009, - 288 texts from literature t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	7000.00 €
Licence: Commercial Use - ELRA VAR	7000.00 €	7000.00 €

PAROLE Italian Corpus text

Italian

ID: ELRA-W0043

ISLRN: 608-362-291-385-1

The PAROLE Italian Corpus comprises 3,135,651 words collected from four different domains: • newspapers: 2,179,800 words from La Stampa, La Repubblica, Il Corriere della Sera, L’Unione Sarda, Il Sole 24ore, between 1992 and 1996, • periodicals: 143,810 words from Casaviva, 100cose, Epoca, Espan...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	100.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	150.00 €

Persian 1984 corpus (Multext-East framework) text

Persian

ID: ELRA-W0054

ISLRN: 851-240-629-673-1

This corpus contains the Persian (Farsi) translation of a part of the novel “1984” (G. Orwell) annotated in the Multext-East framework (Multilingual Text Tools and Corpora for Eastern and Central European Languages). The aim of the Multext-East project was to develop standardized language resourc...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	45.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

10 Language Resources