ELRA-S0338 : ESTER 2 Corpus ESTER 2 Corpus, produced within the ESTER 2 evaluation campaign, consists of a manually transcribed radio broadcast news corpus amounting about 100 hours and quick transcriptions of African radios amounting about 6 hours. An annotation of named entities is provided within the development data (about 6 hours).
|
ELRA-S0342 : Acoustic database for Polish concatenative speech synthesis This database consists of 1443 nonsense words including all the diphones for the Polish language. The database includes information such as: the name of the diphone, context of the diphone, phonetic transcription in SAMPA, identifier of the wave file where it is placed, and three numbers: the beginning, the middle and the end of the diphone.
|
ELRA-E0035 : DEFT'08 Evaluation Package DEFT (DEfi Fouille de Texte – Text Mining Challenge) organizes evaluation campaigns in the field of text mining. The topic of DEFT 2008 edition is related to the classification of texts by topics and genres. DEFT’08 Evaluation Package enables to compare two corpora with different genres (a newspaper article corpus extracted from Le Monde newspaper and a corpus of encyclopaedic articles extracted from the internet free encyclopaedia, Wikipedia) on the basis of the same set of pre-defined categories.
|
ELRA-E0040 : MEDAR Evaluation Package The MEDAR Evaluation Package was produced within the project MEDAR (MEDiterranean ARabic language and speech technology), supported by the European Commission's ICT programme. It aims to enable the evaluation of SLT /MT (Machine Translation) systems for translation tasks applying to the English-to-Arabic direction.
|
ELRA-L0088 : Arabic Morphological Dictionary The Arabic Morphological Dictionary contains 7,912,551 entries, including 6,247,291 nouns, 1,537,499 verbs, 127,563 adjectives, 198 grammatical words. All files are provided as plain text in UTF8 character encoding, which represents about 154 Mb of data.
|
| (last update: May 2012) |