Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Languages
Informations
Purchase procedure & Conditions
Pricing & user licences
How to promote your resources ?
Contact Us
Catalog Reference : E0020
CESTA Evaluation Package
The CESTA Evaluation Package was produced within the French national project CESTA (Evaluation of MT systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The CESTA project enabled to carry out a campaign for the evaluation of machine translation systems with English and Arabic texts translated into French.
This package includes the material that was used for the CESTA evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system.
The campaign is distributed over two actions:
1) Evaluation on a restrictive vocabulary: an evaluation protocol was introduced and was dedicated to two translation directions: English into French and Arabic into French.
2) Evaluation on a specialised domain (evaluation after terminology enrichment): it consists in observing the impact of the systems adaptation to the specialised domain.
The CESTA evaluation package contains the following data and tools:
1) Test run data:
- English-French parallel corpus: 21,590 English words and 23,554 French words extracted from the Official Journal of the European Communities, 1993, Written Questions section of the European Parliament, from the MLCC corpus (catalogue ref. ELRA-W0023).
- Arabic-French parallel corpus: 15,603 Arabic words and 18,257 French words extracted from Le Monde Diplomatique 2002 (catalogue ref. ELRA-W0036).
2) First campaign data:
- English-French parallel corpus: test corpus of 20,658 English words and 22,774 French words extracted from the Official Journal of the European Communities, 1993, Written Questions section of the European Parliament, from the MLCC corpus (catalogue ref. ELRA-W0023). Four translations in French are available.
- Arabic-French parallel corpus: test corpus of 23,763 Arabic words and 28,664 French words extracted from Le Monde Diplomatique 2002 and 2003 (catalogue réf. ELRA-W0036). Four translations in French are available.
3) Second campaign data:
- English-French parallel corpus: adaptation corpus of 19,383 English words and 22,741 French words, extracted from the Santé Canada website. Translation in French is available.
- Arabic-French parallel corpus: adaptation corpus of 19,560 Arabic words and 22,533 French words extracted from the UNICEF, WHO and FHI websites. Translation in French is available.
- English-French parallel corpus: test corpus of 18,880 English words and 23,411 French words, extracted from the Santé Canada website. Four translations in French are available.
- Arabic-French parallel corpus: test corpus of 17,305 Arabic words and 20,885 French words extracted from the UNICEF, WHO and FHI websites. Four translations in French are available.
4) Anonymised submissions of systems and human judgments with adequacy and fluency annotations.
5) French corpus of 13,000 words with adequacy and fluency tags.
6) Evaluation infrastructure for human judgments and for automatic evaluation.
7) Project documentation and publications.
A description of the project is available at the following address:
http://www.technolangue.net/article.php3?id_article=199
(in French language)
Production
Project :
EVALDA
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Bilingual
Language(s) :
English >>>> French ; Arabic >>>> French
Alignment :
bilingual
Members Prices
Academic - Evaluation 150.00 EUR
Commercial - Evaluation 500.00 EUR
Non Member Prices
Academic - Evaluation 300.00 EUR
Commercial - Evaluation 1000.00 EUR
Friday 10 September, 2010
5379999 requests since Monday 27 September, 2004
Copyright © 2008
ELRA
ELRACatalogue 0.8.0