Maltese-English website parallel corpus (Processed)

See COPYRIGHT file, which contains Source owners
See COPYRIGHT file, which contains Source owners
See COPYRIGHT file, which contains Source owners
See COPYRIGHT file, which contains Source owners
693-091-524-649-2

ID:

ELRA-W0232

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu.
This is a parallel corpus of bilingual texts crawled from multilingual websites, which contains 26,622 TUs.
Date of crawling : 16/12/2016
A strict validation process has been followed, which resulted in discarding:
- TUs from crawled websites that do not comply with the PSI directive,
- TUs identified during the manual validation process and all the TUs from websites which error rate in the sample extracted for manual validation are strictly above the following thresholds:
50% of TUs with language identification errors,
50% of TUs with alignment errors,
50% of TUs with tokenization errors,
20% of TUs identified as machine translated content,
50% of TUs with translation errors.

MEMBERacademiccommercial
Licence: Other - Open Under-PSI
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Other - Open Under-PSI
0.00 € submit
0.00 € submit
Download
27/02/2020 Downloadable

People who looked at this resource also viewed the following:
Resources from the same project