Maltese-English website parallel corpus (Processed)
See COPYRIGHT file, which contains Source owners
See COPYRIGHT file, which contains Source owners
See COPYRIGHT file, which contains Source owners
See COPYRIGHT file, which contains Source owners
693-091-524-649-2
See COPYRIGHT file, which contains Source owners
See COPYRIGHT file, which contains Source owners
See COPYRIGHT file, which contains Source owners
ID:
ELRA-W0232
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu.
This is a parallel corpus of bilingual texts crawled from multilingual websites, which contains 26,622 TUs.
Date of crawling : 16/12/2016
A strict validation process has been followed, which resulted in discarding:
- TUs from crawled websites that do not comply with the PSI directive,
- TUs identified during the manual validation process and all the TUs from websites which error rate in the sample extracted for manual validation are strictly above the following thresholds:
50% of TUs with language identification errors,
50% of TUs with alignment errors,
50% of TUs with tokenization errors,
20% of TUs identified as machine translated content,
50% of TUs with translation errors.
MEMBER | academic | commercial |
---|---|---|
Licence: Other - Open Under-PSI |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Other - Open Under-PSI |
0.00 €
|
0.00 €
|