Search and Browse – ELRA Catalogue

TRAD Chinese-French Email Parallel corpus – Development Set text

Chinese
French

ID: ELRA-W0114

This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and a reference translation in French. The source texts are a selection of emails from the Speechocean King-NLP-001 corpus, a corpus of private emails collected from the daily life and business domains. The c...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	500.00 €	500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

TRAD Chinese-French Email Parallel corpus – Test Set text

Chinese
French

ID: ELRA-W0116

ISLRN: 239-027-077-538-0

This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in French. The source texts are a selection of emails from the Speechocean King-NLP-001 corpus, a corpus of private emails collected from the daily life and business domains. The tr...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	500.00 €	500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

TRAD Chinese-French News Articles Parallel corpus text

Chinese
French

ID: ELRA-W0111

ISLRN: 153-566-144-442-2

This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in French. The source texts are newspaper articles from the Chinese version of Voice of America. Articles are dated from 2011 and 2012. The translation has been conducted by two dif...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	500.00 €	500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

TRAD Chinese-French Parallel Text - Blog text

Chinese
French

ID: ELRA-W0125

ISLRN: 713-266-631-883-0

TRAD Chinese-French Parallel Text - Blog was developed by ELDA as part of the PEA-TRAD project. It contains English translations of a subset of approximately 10,000 Chinese words from GALE Phase 1 Chinese Blog Parallel Text (available here: https://catalog.ldc.upenn.edu/LDC2008T06). The PEA-TR...

MEMBER	academic	commercial
Licence: Non Commercial Use - Non Standard Licence Terms

NON MEMBER	academic	commercial
Licence: Non Commercial Use - Non Standard Licence Terms

TRAD Chinese-French Web domain (blogs) Parallel corpus text

Chinese
French

ID: ELRA-W0109

ISLRN: 464-017-697-777-3

This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in French. The source texts are blog articles dealing with various subjects such as economy, environment, society, technologies, etc. Articles are dated from June 2013. The translat...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	500.00 €	500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

TRAD Pashto Broadcast News Speech Corpus audio

Pushto; Pashto

ID: ELRA-S0381

ISLRN: 918-508-885-913-7

This corpus contains transcribed broadcast news recordings in Pashto. Recordings are collected from 5 sources: Ashna TV, Azadi Radio, Deewa Radio, Mashaal Radio and Shamshad TV. The corpus contains 108 hours of recordings covering more than 1,000 speakers. Transcriptions are provided together ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2000.00 €	20000.00 €
Licence: Commercial Use - ELRA VAR	20000.00 €	20000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3500.00 €	28000.00 €
Licence: Commercial Use - ELRA VAR	28000.00 €	28000.00 €

TRAD Pashto-English News Articles Parallel corpus text

English
Pushto; Pashto

ID: ELRA-W0097

ISLRN: 612-936-517-010-2

This is a parallel corpus, which contains 10,000 Pashto words translated into English by two different translators. The source texts have been collected from the following news websites: Azadiradio, Mashaal and Voice of America Pashto. The content has also been translated into French (see ELRA-W...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	350.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data text

English
Pushto; Pashto

ID: ELRA-W0095

ISLRN: 006-102-605-738-4

This is a parallel corpus, which contains 10,000 Pashto words translated into English. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). These texts are VOA Ashna TV programs recorded on 15/01/2011, 18/01/2011 and 19/01/2011. ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	350.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

TRAD Pashto-French News Articles Parallel corpus text

French
Pushto; Pashto

ID: ELRA-W0096

ISLRN: 649-628-149-051-7

This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The source texts have been collected from the following news websites: Azadiradio, Mashaal and Voice of America Pashto. The content has also been translated into English (see ELRA-W...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	350.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data text

French
Pushto; Pashto

ID: ELRA-W0094

ISLRN: 547-897-479-723-3

This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). These texts are VOA Ashna TV programs recorded on 15/01/2011,...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	350.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data text

French
Pushto; Pashto

ID: ELRA-W0093

ISLRN: 802-643-297-429-4

The corpus consists of the transcription of 106 hours of recordings in Pashto translated into French. The transcriptions are extracted from the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). It contains about 832,000 source words and 747,000 target words. No audio file is provided. Pasht...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3000.00 €	10000.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4000.00 €	18000.00 €
Licence: Commercial Use - ELRA VAR	18000.00 €	18000.00 €

TRAD Pashto Monolingual text Corpus text

Pushto; Pashto

ID: ELRA-W0092

ISLRN: 394-903-293-388-0

This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different blogs and websites. Identified and negotiated or freely available sources have been crawled in 2012, cleaned and XML-formatted. Pashto is an indo-iranian language spoken by th...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1200.00 €	3500.00 €
Licence: Commercial Use - ELRA VAR	3500.00 €	3500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2000.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

Training and test data for Arabizi detection and transliteration text

Arabic
English

ID: ELRA-W0126

ISLRN: 986-364-744-303-9

The dataset is composed of two distinct resources: 1) A collection of mixed English and Arabizi text intended to train and test a system for the automatic detection of code-switching in mixed English and Arabizi texts. The training part of the corpus contains: 522 tweets composed of 5,207 token...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	500.00 €	500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	650.00 €
Licence: Commercial Use - ELRA VAR	650.00 €	650.00 €

Translanguage English Database (TED) Transcripts database audio

English

ID: ELRA-S0120

ISLRN: 502-719-830-448-5

LDC reference: https://catalog.ldc.upenn.edu/LDC2002T03 The Translanguage English Database (TED) Transcripts corpus contains transcriptions of thirty-nine of the 188 speeches of the TED Corpus made at Eurospeech'93 in Berlin. The thirty-nine transcripts in this publication are in Universal Tra...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

Translation memories from The Ministry of Foreign Affairs of Norway (Processed) text

English
Norwegian

ID: ELRA-W0156

ISLRN: 909-695-133-060-3

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Translation memories containing translations of EU legis...

MEMBER	academic	commercial
Licence: Attribution - CC-BY-4.0	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Attribution - CC-BY-4.0	0.00 €	0.00 €

Translation memory from Swedish National Audit Office (NAO) - Riksrevisionen (Processed) text

English
Swedish

ID: ELRA-W0236

ISLRN: 709-518-556-855-4

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Translation memory from Swedish National Audit Office

MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

Translations of Lithuanian legislation from Seimas of the Republic of Lithuania (Processed) text

English
Lithuanian

ID: ELRA-W0165

ISLRN: 691-158-541-313-8

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Translation Memories of Lithuanian legislation from Seim...

MEMBER	academic	commercial
Licence: Attribution - CC-BY-4.0	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Attribution - CC-BY-4.0	0.00 €	0.00 €

Trilingual Documents related to International Judicial Cooperation in Civil Matters (Greek-English-French) (Processed) text

English
French
Modern Greek (1453-)

ID: ELRA-W0307

ISLRN: 954-287-236-137-4

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Trilingual (Greek-English-French) documents - standard f...

MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

TSNLP (Test Suites for NLP Testing) text

English
French
German

ID: ELRA-W0013

ISLRN: 717-350-913-018-8

The TSNLP project (LRE 62-089) has produced a database of test suites for English, French and German containing over 4,000 test items (sentences or fragment of sentences) per language which have been constructed for evaluating natural language processing systems, but which may also be useful for ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	100.00 €
Licence: Commercial Use - ELRA VAR	100.00 €	100.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	100.00 €
Licence: Commercial Use - ELRA VAR	100.00 €	100.00 €

TUNA Corpus text

English

ID: ELRA-W0048

ISLRN: 799-660-957-954-5

TUNA (Towards a UNified Algorithm for the generation of referring expressions) is a research project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC). The TUNA Corpus of Referring Expressions is built with the contributions from 50 native or fluent speakers of Engl...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	45.00 €
Licence: Commercial Use - ELRA VAR	45.00 €	45.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	45.00 €
Licence: Commercial Use - ELRA VAR	45.00 €	45.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

1686 Language Resources (Page 67 of 85)