Chargement de la page... veuillez patienter!



Cette page ne s'affiche pas? Cliquez ici
 
ELRA ELRA
  Home Catalogue » Advanced Search » Search Results
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Products meeting the search criteria Products meeting the search criteria
    select distinct(ci.catalogue_item_id), ci.catalogue_item_reference from catalogue_items as ci, item_ressources as ir, resources as r where r.resource_id = ir.resource_id and ir.catalogue_item_id = ci.catalogue_item_id order by ci.catalogue_item_reference
    AURORA-CD0002AURORA Project Database 2.0 - Evaluation Package
    The Aurora project 2.0 is a revised version of the Noisy TI digits database to follow on the work of ETSI. This CD set is a replacement for the previous set (version 1.0 consisted of 2 CDs while version 2.0 now consists of 4 CDs) . This database is intended for the evaluation of algorithms for front-end feature extraction algorithms in background noise but may also be used more widely by speech researchers to evaluate and compare the performance of noise robust speech recognition algorithms.

    AURORA-CD0003-01AURORA Project database - Subset of SpeechDat-Car - Finnish database - Evaluation Package
    This database is a subset of the SpeechDat-Car database in Finnish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Finnish digits spoken in different driving conditions inside a car.

    AURORA-CD0003-02AURORA Project database - Subset of SpeechDat-Car - Spanish database - Evaluation Package
    This database is a subset of the SpeechDat-Car database in Spanish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Spanish digits spoken in different noise and driving conditions inside a car.

    AURORA-CD0003-03AURORA Project database - Subset of SpeechDat-Car - German database - Evaluation Package
    This database is a subset of the SpeechDat-Car database in German language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected German digits spoken in different noise and driving conditions inside a car.

    AURORA-CD0003-04AURORA Project database - Subset of SpeechDat-Car - Danish database - Evaluation Package
    This database is a subset of the SpeechDat-Car database in Danish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Danish digits spoken in different noise and driving conditions inside a car.

    AURORA-CD0003-05AURORA Project database - Subset of SpeechDat-Car - Italian database - Evaluation Package
    This database is a subset of the Italian SpeechDat-Car database which has been collected as part of the European Union funded SpeechDat-Car project. It contains contains 2200 Italian connected digit utterances divided into training and testing utterances in different noise and driving conditions inside a car.

    AURORA-CD0004-01AURORA Project Database - Aurora 4a - Evaluation Package
    The Aurora project has released a number of list files for performing the training and testing on the Wall Street Journal (WSJ0) data at two sampling rates -8 kHz and 16 kHz. The Aurora 4a database is based on the WSJ0 with artificial addition of noise over a range of signal to noise ratios. It contains both clean and multicondition training sets and 14 evaluation sets with different noise types and microphones.

    AURORA-CD0004-02AURORA Project Database - Aurora 4b - Evaluation Package
    The Aurora project has released a number of list files for performing the training and testing on the Wall Street Journal (WSJ0) data at two sampling rates -8 kHz and 16 kHz. The Aurora 4b, has been released. It contains noisy versions of the Nov'92 WSJ0 development set.

    AURORA-CD0005AURORA-5
    The AURORA-5 database has been mainly developed to investigate the influence on the performance of automatic speech recognition for a hands-free speech input in noisy room environments. Furthermore two test conditions are included to study the influence of transmitting the speech in a mobile communication system.
    It contains artificially distorted versions of the recordings from adult speakers in the TI-Digits speech database downsampled at a sampling frequency of 8000 Hz, a set of recordings that contain sequences of digits uttered by different speakers in hands-free mode in a meeting room, as well as a set of scripts for running recognition experiments on those speech data. The experiments are based on the usage of the freely available software package HTK where HTK is not part of this resource.

    B0001PHONOLEX (BAS/DFKI)
    Approx. 1,6 Mio entries with orthographic forms (capital nouns, old German, spelling, ...), phonetic transcription (by rules and exception list) and other linguistic information (e.g. grammatical categories).

    B0001PHONOLEX (BAS/DFKI)
    Approx. 1,6 Mio entries with orthographic forms (capital nouns, old German, spelling, ...), phonetic transcription (by rules and exception list) and other linguistic information (e.g. grammatical categories).
    B0001PHONOLEX (BAS/DFKI)
    Approx. 1,6 Mio entries with orthographic forms (capital nouns, old German, spelling, ...), phonetic transcription (by rules and exception list) and other linguistic information (e.g. grammatical categories).
    B0001SIelex (Siemens Phonetic lexicon)
    186,600 entries, including proper names, place names, no-native entries and abbreviations, with phonetic transcriptions, main stress markers and syllable boundary markers, from the political and economical parts of the German newspapers 'Suddeutsche Zeitung' and 'Frankfurter Allgemeine Zeitung'.

    B0002LusoLEX European Portuguese Lexicon
    LusoLEX:  Multifunctional monolingual lexicon of the European variety of Portuguese, consisting of about 61,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics, augmentatives and diminutives. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations.

    B0002LusoLEX European Portuguese Lexicon
    LusoLEX:  Multifunctional monolingual lexicon of the European variety of Portuguese, consisting of about 61,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics, augmentatives and diminutives. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations.
    B0002LusoLEX European Portuguese Lexicon
    LusoLEX:  Multifunctional monolingual lexicon of the European variety of Portuguese, consisting of about 61,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics, augmentatives and diminutives. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations.
    B0002BrasiLEX Brazilian Portuguese lexicon
    BrasiLEX:  Multifunctional monolingual lexicon of the Brazilian variety of Portuguese, consisting of about 65,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics and augmentative/diminutive degree. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations.

    B0003Austrian SpeechDat(AT) FDB-1000 database
    This speech database contains the recordings of 1,000 Austrian speakers recorded over the fixed telephone network. Each speaker uttered around 60 read and spontaneous items.

    B0003Austrian SpeechDat(AT) FDB-1000 database
    This speech database contains the recordings of 1,000 Austrian speakers recorded over the fixed telephone network. Each speaker uttered around 60 read and spontaneous items.
    B0003Austrian SpeechDat(AT) FDB-1000 database
    This speech database contains the recordings of 1,000 Austrian speakers recorded over the fixed telephone network. Each speaker uttered around 60 read and spontaneous items.
    B0003Austrian SpeechDat(AT) MDB-1000 database
    This speech database contains the recordings of 1,000 Austrian speakers recorded over the Austrian mobile telephone network. Each speaker uttered around 60 read and spontaneous items.

    B0004OrienTel French as spoken in Morocco database
    This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items.

    B0004OrienTel French as spoken in Morocco database
    This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items.
    B0004OrienTel French as spoken in Morocco database
    This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items.
    B0004OrienTel Morocco MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 530 Moroccan speakers recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.

    B0004OrienTel French as spoken in Morocco database
    This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items.
    B0004OrienTel French as spoken in Morocco database
    This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items.
    B0004OrienTel Morocco MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 530 Moroccan speakers recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0004OrienTel French as spoken in Morocco database
    This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items.
    B0004OrienTel Morocco MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 530 Moroccan speakers recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0004OrienTel Morocco MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 772 Moroccan speakers recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.

    B0005OrienTel Tunisia MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.

    B0005OrienTel Tunisia MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0005OrienTel Tunisia MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0005OrienTel Tunisia MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 598 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.

    B0005OrienTel Tunisia MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0005OrienTel Tunisia MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0005OrienTel Tunisia MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 598 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0005OrienTel Tunisia MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0005OrienTel Tunisia MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 598 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0005OrienTel French as spoken in Tunisia database
    This speech database contains the recordings of 576 Tunisian speakers of French recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items.

    B0006OrienTel Egypt MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.

    B0006OrienTel Egypt MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0006OrienTel Egypt MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0006OrienTel Egypt MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 500 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.

    B0006OrienTel Egypt MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0006OrienTel Egypt MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0006OrienTel Egypt MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 500 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0006OrienTel Egypt MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0006OrienTel Egypt MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 500 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0006OrienTel English as spoken in Egypt database
    This speech database contains the recordings of 500 Egyptian speakers of English recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items.

    B0007IDIOLOGOS 1 “Bootstrap” (NEOLOGOS Project)
    The IDIOLOGOS 1 “Bootstrap” database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It comprises 1000 adult French speakers (470 males, 530 females) recorded over the French fixed telephone network.

    B0007IDIOLOGOS 1 “Bootstrap” (NEOLOGOS Project)
    The IDIOLOGOS 1 “Bootstrap” database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It comprises 1000 adult French speakers (470 males, 530 females) recorded over the French fixed telephone network.
    B0007IDIOLOGOS 1 “Bootstrap” (NEOLOGOS Project)
    The IDIOLOGOS 1 “Bootstrap” database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It comprises 1000 adult French speakers (470 males, 530 females) recorded over the French fixed telephone network.
    B0007IDIOLOGOS 2 “Eingenspeakers” (NEOLOGOS Project)
    The IDIOLOGOS 2 “Eingenspeakers” database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It comprises 200 adult French speakers (97 males, 103 females) recorded over the French fixed telephone network.

    B0008LC-STAR Spanish phonetic lexicon
    The LC-STAR Spanish phonetic lexicon comprises more than 100,000 words, including a set of 55,854 common words, a set of 45,403 proper names (including person names, family names, cities, streets, companies and brand names) and a list of 7,498 special application words. The lexicon is provided in XML format and includes phonetic transcriptions in SAMPA.

    B0008LC-STAR Spanish phonetic lexicon
    The LC-STAR Spanish phonetic lexicon comprises more than 100,000 words, including a set of 55,854 common words, a set of 45,403 proper names (including person names, family names, cities, streets, companies and brand names) and a list of 7,498 special application words. The lexicon is provided in XML format and includes phonetic transcriptions in SAMPA.
    B0008LC-STAR Spanish phonetic lexicon
    The LC-STAR Spanish phonetic lexicon comprises more than 100,000 words, including a set of 55,854 common words, a set of 45,403 proper names (including person names, family names, cities, streets, companies and brand names) and a list of 7,498 special application words. The lexicon is provided in XML format and includes phonetic transcriptions in SAMPA.
    B0008LC-STAR Catalan phonetic lexicon
    The LC-STAR Catalan phonetic lexicon comprises more than 100,000 words, including a set of 53,225 common words, a set of 45,306 proper names (including person names, family names, cities, streets, companies and brand names) and a list of 7,498 special application words. The lexicon is provided in XML format and includes phonetic transcriptions in SAMPA.

    B0009TC-STAR English Training Corpora for ASR: Transcriptions of EPPS Speech
    This corpus consists of transcriptions from 92 hours of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English). The transcription files are stored in Transcriber XML file format.

    For corresponding recordings, see ELRA-S0251

    B0009TC-STAR English Training Corpora for ASR: Transcriptions of EPPS Speech
    This corpus consists of transcriptions from 92 hours of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English). The transcription files are stored in Transcriber XML file format.

    For corresponding recordings, see ELRA-S0251
    B0009TC-STAR English Training Corpora for ASR: Transcriptions of EPPS Speech
    This corpus consists of transcriptions from 92 hours of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English). The transcription files are stored in Transcriber XML file format.

    For corresponding recordings, see ELRA-S0251
    B0009TC-STAR English Training Corpora for ASR: Recordings of EPPS Speech
    This corpus consists of the recordings of around 290 hours form EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English, 92 hours of which were annotated (transcribed) (the transcriptions are not provided in the present package). Each file contains a single channel with 16-bit resolution at a sample rate of 16kHz.

    For corresponding transcriptions, see ELRA-S0249.

    B0010Orientel United Arab Emirates MCA (Modern Colloquial Arabic)
    This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items.

    B0010Orientel United Arab Emirates MCA (Modern Colloquial Arabic)
    This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items.
    B0010Orientel United Arab Emirates MCA (Modern Colloquial Arabic)
    This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items.
    B0010Orientel United Arab Emirates MSA (Modern Standard Arabic)
    This speech database contains the recordings of 500 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.

    B0010Orientel United Arab Emirates MCA (Modern Colloquial Arabic)
    This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items.
    B0010Orientel United Arab Emirates MCA (Modern Colloquial Arabic)
    This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items.
    B0010Orientel United Arab Emirates MSA (Modern Standard Arabic)
    This speech database contains the recordings of 500 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0010Orientel United Arab Emirates MCA (Modern Colloquial Arabic)
    This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items.
    B0010Orientel United Arab Emirates MSA (Modern Standard Arabic)
    This speech database contains the recordings of 500 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0010Orientel English as spoken in the United Arab Emirates
    This speech database contains the recordings of 535 speakers of English recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 51 read and spontaneous items.

    B0011OrienTel Jordan MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.

    B0011OrienTel Jordan MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0011OrienTel Jordan MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0011OrienTel Jordan MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 556 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.

    B0011OrienTel Jordan MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0011OrienTel Jordan MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0011OrienTel Jordan MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 556 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0011OrienTel Jordan MCA (Modern Colloquial Arabic) database
    This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0011OrienTel Jordan MSA (Modern Standard Arabic) database
    This speech database contains the recordings of 556 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items.
    B0011OrienTel English as spoken in Jordan database
    This speech database contains the recordings of 578 Jordanian speakers of English recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items.

    B0012CHIL 2004 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Audio and Video Recordings of 10 seminars
    2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
    3) Transcriptions using both TRS and STMUID formats.

    B0012CHIL 2004 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Audio and Video Recordings of 10 seminars
    2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
    3) Transcriptions using both TRS and STMUID formats.
    B0012CHIL 2004 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Audio and Video Recordings of 10 seminars
    2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
    3) Transcriptions using both TRS and STMUID formats.
    B0012CHIL 2005 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Contents of the CHIL 2004 Evaluation Package (see catalogue reference ELRA-E0009 for description).
    2) Audio and Video Recordings: 5 seminars recorded in November 2004).
    3) Stereo Video Recordings of 10 subjects that move in the camera’s field of view while performing pointing gestures.
    2) Video annotations.
    3) Transcriptions.

    B0012CHIL 2004 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Audio and Video Recordings of 10 seminars
    2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
    3) Transcriptions using both TRS and STMUID formats.
    B0012CHIL 2004 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Audio and Video Recordings of 10 seminars
    2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
    3) Transcriptions using both TRS and STMUID formats.
    B0012CHIL 2005 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Contents of the CHIL 2004 Evaluation Package (see catalogue reference ELRA-E0009 for description).
    2) Audio and Video Recordings: 5 seminars recorded in November 2004).
    3) Stereo Video Recordings of 10 subjects that move in the camera’s field of view while performing pointing gestures.
    2) Video annotations.
    3) Transcriptions.
    B0012CHIL 2004 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Audio and Video Recordings of 10 seminars
    2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
    3) Transcriptions using both TRS and STMUID formats.
    B0012CHIL 2005 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Contents of the CHIL 2004 Evaluation Package (see catalogue reference ELRA-E0009 for description).
    2) Audio and Video Recordings: 5 seminars recorded in November 2004).
    3) Stereo Video Recordings of 10 subjects that move in the camera’s field of view while performing pointing gestures.
    2) Video annotations.
    3) Transcriptions.
    B0012CHIL 2006 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The CHIL 2006 Evaluation Package consists of:
    1) A set of audiovisual recordings of seminars, called non-interactive seminars and of highly-interactive small working groups’ seminars, called interactive seminars. The recordings were done between 2004 and 2005 according to the “CHIL Room Setup” specification.
    2) Video annotations.
    3) Orthographic transcriptions.

    B0013Hungarian Speecon database
    The Hungarian Speecon database comprises the recordings of 555 adult Hungarian speakers and 50 child Hungarian speakers who uttered respectively over 290 items and 210 items (read and spontaneous).

    B0013Hungarian Speecon database
    The Hungarian Speecon database comprises the recordings of 555 adult Hungarian speakers and 50 child Hungarian speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
    B0013Hungarian Speecon database
    The Hungarian Speecon database comprises the recordings of 555 adult Hungarian speakers and 50 child Hungarian speakers who uttered respectively over 290 items and 210 items (read and spontaneous).
    B0013Czech Speecon database
    The Czech Speecon database comprises the recordings of 550 adult Czech speakers and 50 child Czech speakers who uttered respectively over 290 items and 210 items (read and spontaneous).

    E0002TC-STAR 2005 Evaluation Package - ASR English
    This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the English language.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0003TC-STAR 2005 Evaluation Package - ASR Spanish
    This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the Spanish language.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0004TC-STAR 2005 Evaluation Package - ASR Mandarin Chinese
    This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the Mandarin Chinese language.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0005TC-STAR 2005 Evaluation Package - SLT English-to-Spanish
    This package includes the material used for the TC-STAR 2005 Spoken Language Translation (SLT) first evaluation campaign for English-to-Spanish translation.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0006TC-STAR 2005 Evaluation Package - SLT Spanish-to-English
    This package includes the material used for the TC-STAR 2005 Spoken Language Translation (SLT) first evaluation campaign for Spanish-to-English translation.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0007TC-STAR 2005 Evaluation Package - SLT Chinese-to-English
    This package includes the material used for the TC-STAR 2005 Spoken Language Translation (SLT) first evaluation campaign for Chinese-to-English translation.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0008The CLEF Test Suite for the CLEF 2000-2003 Campaigns – Evaluation Package
    The CLEF Test Suite contains the data used for the main tracks of the CLEF campaigns carried out from 2000 to 2003: Multilingual text retrieval, Bilingual text retrieval, Monolingual text retrieval, and Domain-specific text retrieval. It contains multilingual corpora in English, French, German, Italian, Spanish, Dutch, Swedish, Finnish, Russian, and Portuguese.

    E0009CHIL 2004 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Audio and Video Recordings of 10 seminars
    2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
    3) Transcriptions using both TRS and STMUID formats.

    E0010CHIL 2005 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The database consists of:
    1) Contents of the CHIL 2004 Evaluation Package (see catalogue reference ELRA-E0009 for description).
    2) Audio and Video Recordings: 5 seminars recorded in November 2004).
    3) Stereo Video Recordings of 10 subjects that move in the camera’s field of view while performing pointing gestures.
    2) Video annotations.
    3) Transcriptions.

    E0011TC-STAR 2006 Evaluation Package - ASR English
    This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the English language.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0012TC-STAR 2006 Evaluation Package - ASR Spanish - CORTES
    This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the Spanish language within the CORTES task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0012-01TC-STAR 2006 Evaluation Package - ASR Spanish - CORTES
    This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the Spanish language within the CORTES task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0012-02TC-STAR 2006 Evaluation Package - ASR Spanish - EPPS
    This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the Spanish language within the EPPS task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0013TC-STAR 2006 Evaluation Package - ASR Mandarin Chinese
    This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the Mandarin Chinese language.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0014TC-STAR 2006 Evaluation Package - SLT English-to-Spanish
    This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for English-to-Spanish translation. It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0015TC-STAR 2006 Evaluation Package - SLT Spanish-to-English - CORTES
    This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for Spanish-to-English translation within the CORTES task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0015-01TC-STAR 2006 Evaluation Package - SLT Spanish-to-English - CORTES
    This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for Spanish-to-English translation within the CORTES task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0015-02TC-STAR 2006 Evaluation Package - SLT Spanish-to-English - EPPS
    This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for Spanish-to-English translation within the EPPS task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0016TC-STAR 2006 Evaluation Package - SLT Chinese-to-English
    This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for Chinese-to-English translation.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0017CHIL 2006 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The CHIL 2006 Evaluation Package consists of:
    1) A set of audiovisual recordings of seminars, called non-interactive seminars and of highly-interactive small working groups’ seminars, called interactive seminars. The recordings were done between 2004 and 2005 according to the “CHIL Room Setup” specification.
    2) Video annotations.
    3) Orthographic transcriptions.

    E0018ARCADE II Evaluation Package
    The ARCADE II Evaluation Package was produced within the French national project ARCADE II (Evaluation of parallel text alignment systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The ARCADE II project enabled to carry out a campaign for the evaluation in the field of multilingual alignment.
    This package includes the material that was used for the ARCADE II evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system.
    The campaign is distributed over two actions: sentence alignment and translation of named entities.

    E0019CESART Evaluation Package
    The CESART Evaluation Package was produced within the French national project CESART (Evaluation of terminology extraction tools), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The CESART project enabled to carry out a campaign for the evaluation of terminological resources acquisition tools.
    This package includes the material that was used for the CESART evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system.
    The campaign is distributed over two actions: term extraction and relation extraction.

    E0020CESTA Evaluation Package
    The CESTA Evaluation Package was produced within the French national project CESTA (Evaluation of MT systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The CESTA project enabled to carry out a campaign for the evaluation of machine translation technologies.
    This package includes the material that was used for the CESTA evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system.
    The campaign is distributed over two actions: evaluation on a non restrictive vocabulary, evaluation on a specialised domain (evaluation after terminology enrichment).

    E0021ESTER Evaluation Package
    The ESTER Evaluation Package was produced within the French national project ESTER (Evaluation of Broadcast News enriched transcription systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The ESTER project enabled to carry out a campaign for the evaluation of Broadcast News enriched transcription systems for French.
    This package includes the material that was used for the ESTER evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
    The campaign is distributed over three actions: orthographic transcription, segmentation and information extraction (named entity tracking).
    For research or commercial use, please refer to ELRA-S0241 ESTER Corpus.

    E0022EQueR Evaluation Package
    The EQueR Evaluation Package was produced within the French national project EQueR (Evaluation campaign for Question-Answering systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The EQueR project enabled to carry out a campaign for the evaluation of Question-Answering systems in French.
    This package includes the material that was used for the EQueR evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
    The campaign is distributed over two actions: one generic task and one specialised task (medical domain).

    E0023EvaSy Evaluation Package
    The EvaSy Evaluation Package was produced within the French national project EvaSy (Evaluation of speech synthesis systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The EvaSy project enabled to carry out a campaign for the evaluation of speech synthesis systems using French text data.
    This package includes the material that was used for the EvaSy evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
    The campaign is distributed over three actions: evaluation of grapheme-to-phoneme conversion, evaluation of prosody, global evaluation of the quality of speech synthesis systems.

    E0024MEDIA Evaluation Package
    The MEDIA Evaluation Package was produced within the French national project MEDIA (Automatic evaluation of man-machine dialogue systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The MEDIA project enabled to carry out a campaign for the evaluation of man-machine dialogue systems for French.
    This package includes the material that was used for the MEDIA evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
    The campaign is distributed over two actions: an evaluation taking into account the dialogue context and an evaluation not taking into account the dialogue context.

    E0025TC-STAR 2007 Evaluation Package - ASR English
    This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for the English language.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0026-01TC-STAR 2007 Evaluation Package - ASR Spanish - CORTES
    This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for the Spanish language within the CORTES task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0026-02TC-STAR 2007 Evaluation Package - ASR Spanish - EPPS
    This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for the Spanish language within the EPPS task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0027TC-STAR 2007 Evaluation Package - ASR Mandarin Chinese
    This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for the Mandarin Chinese language.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0028TC-STAR 2007 Evaluation Package - SLT English-to-Spanish
    This package includes the material used for the TC-STAR 2007 Spoken Language Translation (SLT) third evaluation campaign for English-to-Spanish translation. It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0029-01TC-STAR 2007 Evaluation Package - SLT Spanish-to-English - CORTES
    This package includes the material used for the TC-STAR 2007 Spoken Language Translation (SLT) third evaluation campaign for Spanish-to-English translation within the CORTES task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0029-02TC-STAR 2007 Evaluation Package - SLT Spanish-to-English - EPPS
    This package includes the material used for the TC-STAR 2007 Spoken Language Translation (SLT) third evaluation campaign for Spanish-to-English translation within the EPPS task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0030TC-STAR 2007 Evaluation Package - SLT Chinese-to-English
    This package includes the material used for the TC-STAR 2007 Spoken Language Translation (SLT) third evaluation campaign for Chinese-to-English translation.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0031TC-STAR 2006 Evaluation Package – End-to-End
    This package includes the material used for the TC-STAR 2006 evaluation campaign within the end-to-end task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0032TC-STAR 2007 Evaluation Package – End-to-End
    This package includes the material used for the TC-STAR 2007 evaluation campaign within the end-to-end task.
    It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.

    E0033CHIL 2007 Evaluation Package
    The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.

    The CHIL 2007 Evaluation Package consists of:
    1) A set of audiovisual recordings of interactive seminars. The recordings were done between June and September 2006 according to the “CHIL Room Setup” specification.
    2) Video annotations.
    3) Orthographic transcriptions.

    E0034EASy Evaluation Package
    The EASy Evaluation Package was produced within the French national project EASy (Evaluation of syntactic parsers of French), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The project enabled to carry out a campaign for the evaluation of syntactic parsers of French. This package includes the material that was used for the EASy evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. The campaign is distributed over two actions: evaluation of constituent and dependency relation annotations.

    L0000DST German lexicon

    L0001DICO-MORPH_Lemme
    Lexicon for morphological works of over 400,000 French entries divided into 55,000 nouns, 8,000 verbs, 16,850 adjectives, 2,000 adverbs.

    L0002DICO-MORPH_Collocation
    Collocation lexicon. Up to 35,000 entries in French. An adding to ELRA-L0001.

    L0003DICO-SYNT
    90,000 French inflectional forms divided into 25,000 nouns, 8,000 verbs that generate 25,000 model verbs, 1,000 adjectives, 1,500 adverbs. Morphosyntactical information in addition to L0001.

    L0004Dutch Lexicon
    64,000 entries from general vocabulary divided into 50,000 nouns, 7,000 verbs, 6,000 adjectives, 1,000 adverbs. Morphological, syntactical & semantic information.

    L0005French Lexicon
    50,000 entries from general vocabulary divided into 36,000 nouns, 6,000 verbs, 7,000 adjectives, 1,000 adverbs. Morphological, syntactical & semantic information.

    L0006ILC Italian Morphological Lexicon
    Set of lemmas/lexical entries (about 60,000) with the corresponding inflected word-forms, and a morphological engine for morphological analysis and generation.

    L0007LexIn 2:e Swedish Lexicon
    Lexicon
    28,000 headwords and 21,000 senses

    L0008Monolingual Danish Lexicon
    25,000 entries. Each lexeme contains the word class, inflection, semantic features, syntactical frames (for verbs), and complement (for nouns & adj.).

    L0009Monolingual Portuguese Lexicon
    60,000 entries with morphological information, plus a software engine for generating inflected forms.

    L0010MULTEXT Lexicons
    This CD-ROM contains a set of lexicons developed in the MULTEXT project financed by the European Commission (LRE 62-050). The set contains the following languages:
    English: 66,214 Word forms
    French: 306,795 Word forms
    German: 233,861 Word forms
    Italian: 145,530 Word forms
    Spanish: 510,710 Word forms

    L0012Spanish gilcUB-M Dictionary
    60,000 lemmas of general vocabulary with morphosyntactical information (9,700 verbs, 35,500 nouns, 14,300 adjectives & 120 adverbs) plus 10,000 full-form adverbs.

    L0013-01THAMUS Generic Italian Dictionary - canonical forms
    A Generic monolingual Italian dictionary of 87,000 canonical forms. Multi-word terms contain morphological coding for the headword.

    L0013-02THAMUS. Generic Italian Dictionary - inflected forms
    A Generic monolingual Italian dictionary of 612,000 inflected forms. Multi-word terms contain morphological coding for the headword.

    L0013-03THAMUS. Generic Italian Dictionary - canonical forms - technical domain
    A Generic monolingual Italian dictionary of 48,000 canonical forms (Technical). Multi-word terms contain morphological coding for the headword.

    L0013-04THAMUS. Generic Italian Dictionary - inflected forms - technical domain
    A Generic monolingual Italian dictionary of 96,000 inflected forms (Technical). Multi-word terms contain morphological coding for the headword.

    L0014Adverbial Equivalence Dictionary
    1,200 entries of simplified equivalents for French fixed expressions (“laid comme un crapaud” has equivalent "tres laid").

    L0015Nominalisation Dictionary
    2,300 entries consisting of substantives of French verbs.

    L0016Tri-, quadri-, pentagrams dictionaries
    The dictionaries consist of a list of 5,487 sequences of 3, 4 or 5 characters which follow each other in French language words. In particular, they enable to locate misspelt sequences.

    L0017"N de N" Dictionary
    Generic dictionary. 21,000 entries of French uninflected noun phrases classified in 1,000 human entries, 4,200 concrete entries, 6,000 abstract entries.

    L0018German lexicon
    466,300 entries with a list of inflected words (97,000 nouns, 236,200 verbs, 130,500 adjectives/adverbs, 1,700 grammatical words, 40 punctuations, 400 prefixes, 370 suffixes).

    L0019English lexicon
    160,000 entries with a list of inflected words derived from 93,500 nouns, 35,800 verbs, 46,600 adjectives, 8,865 grammatical words.

    L0020-01DST Dictionary - String Dictionary
    DST:  550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
    The DST is distributed in different sub-sets:
    L0020-01 String dictionary
    L0020-02 Part of speech (optional)
    L0020-03 Gender, number, conjugation (optional)
    L0020-04 Lemma (optional)
    L0020-05 Semantical information (optional)
    L0020-06 Syntactical information (optional)
    L0020-07 Prep/adv. phrases (optional)
    L0020-08 Compound nouns (optional)
    L0020-09 The whole dictionary

    L0020-02DST Dictionary - Part of Speech (optional)
    DST:  550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
    The DST is distributed in different sub-sets:
    L0020-01 String dictionary
    L0020-02 Part of speech (optional)
    L0020-03 Gender, number, conjugation (optional)
    L0020-04 Lemma (optional)
    L0020-05 Semantical information (optional)
    L0020-06 Syntactical information (optional)
    L0020-07 Prep/adv. phrases (optional)
    L0020-08 Compound nouns (optional)
    L0020-09 The whole dictionary

    L0020-03DST Dictionary - Gender, number, conjugation (optional)
    DST:  550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
    The DST is distributed in different sub-sets:
    L0020-01 String dictionary
    L0020-02 Part of speech (optional)
    L0020-03 Gender, number, conjugation (optional)
    L0020-04 Lemma (optional)
    L0020-05 Semantical information (optional)
    L0020-06 Syntactical information (optional)
    L0020-07 Prep/adv. phrases (optional)
    L0020-08 Compound nouns (optional)
    L0020-09 The whole dictionary

    L0020-04DST Dictionary - Lemma (optional)
    DST:  550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
    The DST is distributed in different sub-sets:
    L0020-01 String dictionary
    L0020-02 Part of speech (optional)
    L0020-03 Gender, number, conjugation (optional)
    L0020-04 Lemma (optional)
    L0020-05 Semantical information (optional)
    L0020-06 Syntactical information (optional)
    L0020-07 Prep/adv. phrases (optional)
    L0020-08 Compound nouns (optional)
    L0020-09 The whole dictionary

    L0020-05DST Dictionary - Semantical information (optional)
    550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
    The DST is distributed in different sub-sets:
    L0020-01 String dictionary
    L0020-02 Part of speech (optional)
    L0020-03 Gender, number, conjugation (optional)
    L0020-04 Lemma (optional)
    L0020-05 Semantical information (optional)
    L0020-06 Syntactical information (optional)
    L0020-07 Prep/adv. phrases (optional)
    L0020-08 Compound nouns (optional)
    L0020-09 The whole dictionary

    L0020-06DST Dictionary - Syntactical information (optional)
    DST:  550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
    The DST is distributed in different sub-sets:
    L0020-01 String dictionary
    L0020-02 Part of speech (optional)
    L0020-03 Gender, number, conjugation (optional)
    L0020-04 Lemma (optional)
    L0020-05 Semantical information (optional)
    L0020-06 Syntactical information (optional)
    L0020-07 Prep/adv. phrases (optional)
    L0020-08 Compound nouns (optional)
    L0020-09 The whole dictionary

    L0020-07DST Dictionary - Prep./Adv. phrases (optional)
    DST:  550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
    The DST is distributed in different sub-sets:
    L0020-01 String dictionary
    L0020-02 Part of speech (optional)
    L0020-03 Gender, number, conjugation (optional)
    L0020-04 Lemma (optional)
    L0020-05 Semantical information (optional)
    L0020-06 Syntactical information (optional)
    L0020-07 Prep/adv. phrases (optional)
    L0020-08 Compound nouns (optional)
    L0020-09 The whole dictionary

    L0020-08DST Dictionary - Compound nouns (optional)
    DST:  550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
    The DST is distributed in different sub-sets:
    L0020-01 String dictionary
    L0020-02 Part of speech (optional)
    L0020-03 Gender, number, conjugation (optional)
    L0020-04 Lemma (optional)
    L0020-05 Semantical information (optional)
    L0020-06 Syntactical information (optional)
    L0020-07 Prep/adv. phrases (optional)
    L0020-08 Compound nouns (optional)
    L0020-09 The whole dictionary

    L0020-09DST Dictionary - The whole dictionary
    DST:  550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
    The DST is distributed in different sub-sets:
    L0020-01 String dictionary
    L0020-02 Part of speech (optional)
    L0020-03 Gender, number, conjugation (optional)
    L0020-04 Lemma (optional)
    L0020-05 Semantical information (optional)
    L0020-06 Syntactical information (optional)
    L0020-07 Prep/adv. phrases (optional)
    L0020-08 Compound nouns (optional)
    L0020-09 The whole dictionary

    L0021Dictionary of French verbs (SINEQUA - Jean Dubois)
    25,610 verbs with usage domains, level of language, conjugation, auxiliary, verbal adjectives in -able, -ant or -e, encoded syntactical constructions, sample phrases, synonyms, operators enabling semantic-syntactic classification, encoding of derived forms in -age, -ment, -tion, -oir, -ure, deverbal nouns, base words from which verbs can be derived, a scale of usage ranging from 1 to 6.

    L0022Dictionary of words (SINEQUA - Jean Dubois)
    126,844 French words with usage domains, grammatical category (gender, number, uncountable, collective, adjectival, nominal, verbal, adverbial derived forms).

    L0023Dictionary of affixes (SINEQUA - Jean Dubois)
    4,286 suffixes and prefixes, plus information on their verbal, nominal or adjectival bases or on the verbal basis of greco-latin items.

    L0024Dictionary of verb phrases (SINEQUA - Jean Dubois)
    3,480 entries based on the model of the dictionary of French verbs (ELRA-L0021).

    L0025Dictionary of invariable forms and phrases (SINEQUA - Jean Dubois)
    4,783 entries based on the model of the dictionary of words (ELRA-L0022).

    L0026Dictionary of exclamatory stereotyped phrases (SINEQUA - Jean Dubois)
    1,901 entries based on the model of the dictionary of invariable forms and phrases (ELRA-L0025).

    L0027Dictionary of French local authorities (SINEQUA - Jean Dubois)
    38,965 entries in lower cases with accents, controlled on the guide Michelin, without localities.

    L0028Dictionary of noun phrases and plural-only words (SINEQUA - Jean Dubois)
    2,138 compound names and 1,397 entries of plural-only words.

    L0029-01CELEX Dutch lexical database - Complete set
    Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
    The database is divided into different subsets:
    L0029-01 Complete set of data;
    L0029-02 Subset Orthography;
    L0029-03 Subset Phonology;
    L0029-04 Subset Morphology Infl.;
    L0029-05 Subset Morphology Der.;
    L0029-06 Subset Syntax;
    L0029-07 Subset Frequency.

    L0029-02CELEX Dutch lexical database - Orthography Subset
    Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
    The database is divided into different subsets:
    L0029-01 Complete set of data;
    L0029-02 Subset Orthography;
    L0029-03 Subset Phonology;
    L0029-04 Subset Morphology Infl.;
    L0029-05 Subset Morphology Der.;
    L0029-06 Subset Syntax;
    L0029-07 Subset Frequency.

    L0029-03CELEX Dutch lexical database - Phonology Subset
    Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
    The database is divided into different subsets:
    L0029-01 Complete set of data;
    L0029-02 Subset Orthography;
    L0029-03 Subset Phonology;
    L0029-04 Subset Morphology Infl.;
    L0029-05 Subset Morphology Der.;
    L0029-06 Subset Syntax;
    L0029-07 Subset Frequency.

    L0029-04CELEX Dutch lexical database - Inflectional Morphology Subset
    Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
    The database is divided into different subsets:
    L0029-01 Complete set of data;
    L0029-02 Subset Orthography;
    L0029-03 Subset Phonology;
    L0029-04 Subset Inflectional Morphology;
    L0029-05 Subset Derivational Morphology;
    L0029-06 Subset Syntax;
    L0029-07 Subset Frequency.

    L0029-05CELEX Dutch lexical database - Derivational Morphology Subset
    Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
    The database is divided into different subsets:
    L0029-01 Complete set of data;
    L0029-02 Subset Orthography;
    L0029-03 Subset Phonology;
    L0029-04 Subset Inflectional Morphology;
    L0029-05 Subset Derivational Morphology;
    L0029-06 Subset Syntax;
    L0029-07 Subset Frequency.

    L0029-06CELEX Dutch lexical database - Syntax Subset
    Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
    The database is divided into different subsets:
    L0029-01 Complete set of data;
    L0029-02 Subset Orthography;
    L0029-03 Subset Phonology;
    L0029-04 Subset Morphology Infl.;
    L0029-05 Subset Morphology Der.;
    L0029-06 Subset Syntax;
    L0029-07 Subset Frequency.

    L0029-07CELEX Dutch lexical database - Frequency Subset
    Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
    The database is divided into different subsets:
    L0029-01 Complete set of data;
    L0029-02 Subset Orthography;
    L0029-03 Subset Phonology;
    L0029-04 Subset Morphology Infl.;
    L0029-05 Subset Morphology Der.;
    L0029-06 Subset Syntax;
    L0029-07 Subset Frequency.

    L0030Bulgarian Morphological Dictionary
    67,500 entries divided into 242 inflectional types (including proper nouns), morphosyntactic information for each entry, and a morphological engine (MS DOS and WINDOWS 95/NT) for morphological analysis and generation.

    L0031Dutch PAROLE lexicon
    The entry list of the lexicon consists of about 20,200 entries distributed over 13 parts of speech (POS). The entries have been described along the dimensions of morphosyntax and syntax, according to the specifications of the PAROLE project. The lexicon is set up as an SGML file.

    L0032PAROLE Greek Lexicon
    The PAROLE Greek lexicon has two layers, morphological and syntactic. It includes the most frequent words found in a 9 million word corpus, coded according to the PAROLE specifications. The Morphological layer contains a total of 20149 Morphological units. The Syntactic layer contains 25092 Syntactic units.

    L0033LusoLEX European Portuguese Lexicon
    LusoLEX:  Multifunctional monolingual lexicon of the European variety of Portuguese, consisting of about 61,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics, augmentatives and diminutives. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations.

    L0034BrasiLEX Brazilian Portuguese lexicon
    BrasiLEX:  Multifunctional monolingual lexicon of the Brazilian variety of Portuguese, consisting of about 65,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics and augmentative/diminutive degree. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations.

    L0035PAROLE Portuguese Lexicon
    The PAROLE Portuguese Lexicon is constituted by 20 thousand entries morpho-syntactically and syntactically encoded, accordingly to the parole common encoding standards. The data is in SGML format.

    L0036Japanese Word Dictionary
    The Japanese Word Dictionary is composed of 260,000 Japanese word records arranged alphabetically according to the Japanese syllabary.

    L0037English Word Dictionary
    The English Word Dictionary, composed of 190,000 English word records arranged alphabetically.

    L0038Concept Dictionary
    The Concept Dictionary, which provides 400,000 concepts that are made reference to in the Japanese and English Word Dictionaries (ref. ELRA-L0036 and L0037), the Japanese-English and English-Japanese Bilingual Dictionaries (ref. ELRA-M0023 and M0024) as well as in the Japanese and English Co-occurrence Dictionaries (ref. ELRA-L0039 and L0040). The Concept Dictionary is composed of three separate dictionaries:
    - the Headconcept Dictionary gives a description of each concept in words
    - the Concept Classification Dictionary contains a classification of concepts that have a super-sub relation
    - the Concept Description Dictionary provides all other information regarding the relation between concepts.

    L0039Japanese Co-occurrence Dictionary
    The Japanese Co-occurrence Dictionary, composed of 900,000 headphrase notations arranged according to the Japanese syllabary. Appendix to the Japanese Co-occurrence Dictionary: The Japanese Corpus

    L0040English Co-occurrence Dictionary
    The English Co-occurrence Dictionary (ref. ELRA-L0040), composed of 460,000 alphabetically arranged of headphrases. Appendix to the English Co-occurrence Dictionary: The English Corpus.

    L0041Technical Terms Dictionary (Information processing)
    The Technical Terms Dictionary (Information processing) contains 80,000 technical terms in English and 120,000 technical terms in Japanese from the field of information processing.

    L0042PAROLE Spanish Lexicon
    The PAROLE Spanish lexicon follows standard PAROLE architecture. It contains about 22,000 morphological units, of which 12,209 are common nouns, 3,367 verbs, 4,996 adjectives.

    L0043PAROLE English lexicon
    The PAROLE English lexicon consists of 22 000 morphological units extracted from the CRL-LKB and COBUILD dictionaries: 12998 are common nouns, 40 proper nouns, 4195 verbs, 3208 adjectives, 606 adverbs, 71 adpositions, 2 articles, 21 conjunctions, 25 determiners and 53 pronouns.

    L0044Korean Lexicon
    This monolingual lexicon produced by Kaist Korterm consists of 31 476 compound nouns in Korean.

    L0045New Oxford Dictionary of English, 2nd Edition
    NODE:  The NODE contains 170,000 entries covering all varieties of English worldwide. It has been designed for language engineering and to be used in NLP applications, and is available in XML or in SGML. The NODE data set includes morphological information linked to the lemma, phrases and idioms, subject classification, with over 200 key domains, semantic relationships, etc.

    L0046NODE+DIMAP
    The first edition of the DIMAP version of NODE is a machine-tractable version of the machine-readable dictionary files in the DIMAP dictionary maintenance programs, adding syntactic and semantic information in the conversion. Apart from mechanisms which will allow research into representational formalisms and explorations of the use of these representations in extending the lexical database and in processing text for information extraction, text summarization, discourse analysis and other LE applications, DIMAP also includes semantic links between entries, thus making NODE+DIMAP a semantic network of the English language.

    L0047New Oxford Thesaurus of English
    NOTE:  This thesaurus contains 628,000 alternative words, including 573,000 synonyms, the rest being antonyms, related terms, combining forms, and hyponyms, and is available in SGML. Nearly 38,000 senses are also presented with a corpus-based example.
    It is available in SGML.

    L0048Oxford Paperback Thesaurus, 2nd edition
    The Oxford Paperback Thesaurus, available in SGML, contains 15,000 headwords, over 300,000 synonyms, and 29,000 different senses presented with corpus-based examples.

    L0049SCIPER-FR-EURADIC French Monolingual Dictionary
    SCIPER-FR-EURADIC:  This French monolingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 112,216 lemmas (694,673 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038.

    L0050SCIPER-AN-EURADIC English Monolingual Dictionary
    SCIPER-AN-EURADIC:  This English monolingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 171,713 lemmas (365,823 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038.

    L0051SCIPER-AL-EURADIC German Monolingual Dictionary
    SCIPER-AL-EURADIC:  This German monolingual dictionary was developed within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 157,810 lemmas (17,634,834 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0050, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038.

    L0052SCIPER-ES-EURADIC Spanish Monolingual Dictionary
    SCIPER-ES-EURADIC:  This Spanish monolingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 83,952 lemmas (838,391 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038.

    L0053SCIPER-IT-EURADIC Italian Monolingual Dictionary
    SCIPER-IT-EURADIC:  This Italian monolingual dictionary was developed within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 70,951 lemmas (557,204 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038.

    L0054LABEL-LEX (MW)
    LABEL-LEX (MW) is a Portuguese formalized lexicon, containing 88 619 inflected multiword lexical units (formally, sequences of simple words).

    L0055LABEL-LEX (SW)
    LABEL-LEX (SW) is a Portuguese formalized lexicon, containing 1.545.156 simple inflected words. Each dictionary entry is associated to a lemma; information about POS and morphological attributes - such as gender, number, person, case (for personal pronouns), tense, mood, diminutives, augmentatives, and superlative - is systematically formalized for each lexical entry.

    L0056STO SprogTeknologisk Ordbase (Danish Lexicon for NLP/HLT Applications)
    STO:  The STO Lexicon is the most comprehensive computational lexicon of Danish comprising approx. 81,530 entry words including morphological, syntactical and semantic information and it is well integrated with the European activities in the field of lexicon development building on experience obtained from the PAROLE and SIMPLE projects. The model and descriptive method of the STO lexicon are kept compatible with the architecture and descriptive language of PAROLE/SIMPLE. A number of refinements, adaptations and language-specific extensions to the basic model are implemented in STO.

    L0057Euskararen Datu-Base Lexikala (EDBL) – Lexical Database for Basque
    EDBL (Lexical database for Basque) is made up of about 75,000 entries divided into dictionary entries, verb forms and dependent morphemes, all of them with their respective morphological information. It was first developed as a lexical support for the spelling checker and corrector XUXEN, and later for the morphological analyser MORFEUS and the lemmatiser EUSLEM.

    L0058British English Source Lexicon (BESL) version 2.2
    BESL consists of over 230,000 lemmas, over 350,000 word forms, 60,000 proper nouns, 3,000 abbreviations, and 58,000 multi-word compound nouns. Each headword is provided with a full listing of all inflected forms and other morphological variation. Every word form is marked for part of speech (using Penn TreeBank notation). Most single-word forms include a representation of IPA pronunciation. BESL covers both British and American English, and other spelling variants, with cross-references between corresponding forms. BESL is provided in XML.

    L0059Offensive Word Filter 1
    This list features 4500 words and expressions for UK and US English usage with a grading system describing vocabulary type and offensive strength for each term, plus collocational information to help identify the terms in context. The list is provided in tab-delimited ASCII

    L0060Offensive Word Filter 2
    This list features 2000 words and expressions, classified into 13 categories, for UK and US English usage with a grading system describing vocabulary type and offensive strength for each term, plus collocational information to help identify the terms in context. The list is provided in an Excel spreadsheet.

    L0061The Oxford Spanish Dictionary
    This dictionary consists of 300,000 words and phrases, 500,000 translations, for 24 regional varieties of Spanish. It includes thousands of real, authentic example sentences carefully selected to illustrate the full range of meanings and typical contexts. The dictionary is provided in XML or SGML.

    L0062French Source Lexicon
    This source lexicon contains morphological and phonetic data for French. It consists of over 90,000 headwords/lemmas, 400,000 wordforms, 1,000 abbreviations, and 35,000 proper nouns. Each headword lemma is provided with a full listing of its possible syntactic forms and spelling variants, along with information on their relationship to the headword form. In addition, a representation of the IPA pronunciation is given for every form. There is also information on domains in which the headwords are used, e.g. Computing, Engineering, Zoology. The lexicon is provided in SGML.

    L0063Spanish Source Lexicon
    This source lexicon contains morphological and phonetic data for Spanish. It consists of over 575,000 wordforms, 1,000 abbreviations, and 25,000 proper nouns. Each headword lemma is provided with a full listing of its possible syntactic forms and spelling variants, along with information on their relationship to the headword form. In addition, a representation of the IPA pronunciation is given for every form. There is also information on domains in which the headwords are used, e.g. Computing, Engineering, Zoology. The lexicon is provided in SGML.

    L0064Italian Source Lexicon
    This source lexicon contains morphological and phonetic data for Italian. It consists of over 115,000 headwords/lemmas and 925,000 wordforms. Each headword lemma is provided with a full listing of its possible syntactic forms and spelling variants, along with information on their relationship to the headword form. In addition, a representation of the IPA pronunciation is given for every form. There is also information on domains in which the headwords are used, e.g. Computing, Engineering, Zoology. The lexicon is provided in SGML.

    L0065KORLEX – Croatian Lexicon
    The KORLEX - Croatian Lexicon provides a list of 118,252 Croatian lemmas (including 52,450 nouns, 8,985 adverbs, 14,937 verbs and 41,161 adjectives, as well as pronouns, determiners, prepositions/postpositions, conjunctions and numerals), i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features.
    The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news, in business, technological documentation, legal documentation, and politics. The resource is a flat textual file in which each textual line contains information about one lemma. The resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Croatian lexicographic order.

    L0066KORLEX – Serbian Lexicon
    The KORLEX - Serbian Lexicon provides a list of 108,491 Serbian lemmas (including 52,027 nouns, 9,153 adverbs, 15,522 verbs and 31,052 adjectives, as well as pronouns, determiners, prepositions/postpositions, conjunctions and numerals), i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features.
    The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news, in business, technological documentation, legal documentation, and politics. The resource is a flat textual file in which each textual line contains information about one lemma. The resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Serbian lexicographic order.

    L0067English lexicon with morphological information
    This English lexicon is made up of 174,000 inflected forms corresponding to 68,000 simple word lemmas (including 31,900 nouns, 11,800 verbs, 19,900 adjectives, 4,100 adverbs, 300 pronouns, articles, prepositions/postpositions and conjunctions). Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information.

    L0068French lexicon with morphological information
    This French lexicon is made up of 424,000 inflected forms corresponding to 55,000 simple word lemmas (including 34,400 nouns, 7,300 verbs, 11,700 adjectives, 1,400 adverbs, 200 pronouns, articles, prepositions/postpositions and conjunctions). Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information.

    L0069Italian lexicon with morphological information
    This Italian lexicon is made up of 862,500 inflected forms corresponding to 112,000 simple word lemmas (including 66,340 nouns, 12,030 verbs, 28,080 adjectives, 4,890 adverbs, 660 pronouns, articles, prepositions/postpositions and conjunctions). Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information.

    L0070Italian lexicon with morphological information and clitic verbs
    This Italian lexicon is the same as the one described in ELRA-L0069, but with the addition of clitic verbs, which increases the number of inflected forms to 1,800,000 (still corresponding to 112,000 simple words lemmas). It contains 66,340 nouns, 12,030 verbs, 28,080 adjectives, 4,890 adverbs, 660 pronouns, articles, prepositions/postpositions and conjunctions. Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information.

    L0071Spanish lexicon with morphological information
    This Spanish lexicon is made up of 816,000 inflected forms corresponding to 104,000 simple word lemmas (including 52,000 nouns, 9,800 verbs, 21,200 adjectives, 20,500 adverbs, 500 pronouns, articles, prepositions/postpositions and conjunctions). Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information.

    L0072-01PAROLE-SIMPLE-CLIPS PISA Italian Lexicon – Full lexicon
    PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).

    This lexicon is subdivided into five different subsets:
    L0072-01 Full lexicon
    L0072-02 Phonetic layer
    L0072-03 Morphological layer
    L0072-04 Syntactic layer
    L0072-05 Semantic layer

    L0072-02PAROLE-SIMPLE-CLIPS PISA Italian Lexicon – Phonetic layer
    PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).

    This lexicon is subdivided into five different subsets:
    L0072-01 Full lexicon
    L0072-02 Phonetic layer
    L0072-03 Morphological layer
    L0072-04 Syntactic layer
    L0072-05 Semantic layer

    L0072-03PAROLE-SIMPLE-CLIPS PISA Italian Lexicon – Morphological layer
    PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).

    This lexicon is subdivided into five different subsets:
    L0072-01 Full lexicon
    L0072-02 Phonetic layer
    L0072-03 Morphological layer
    L0072-04 Syntactic layer
    L0072-05 Semantic layer

    L0072-04PAROLE-SIMPLE-CLIPS PISA Italian Lexicon – Syntactic layer
    PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).

    This lexicon is subdivided into five different subsets:
    L0072-01 Full lexicon
    L0072-02 Phonetic layer
    L0072-03 Morphological layer
    L0072-04 Syntactic layer
    L0072-05 Semantic layer

    L0072-05PAROLE-SIMPLE-CLIPS PISA Italian Lexicon – Semantic layer
    PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).

    This lexicon is subdivided into five different subsets:
    L0072-01 Full lexicon
    L0072-02 Phonetic layer
    L0072-03 Morphological layer
    L0072-04 Syntactic layer
    L0072-05 Semantic layer

    L0073DIINAR.1 - Arabic Lexical Resource
    DIINAR.1 is an Arabic Lexical Resource which includes a total number of 119,693 lemmas, fully vowelled, and distributed as follows: 29,534 nouns and adjectives, 19,457 verbs, 70,702 deverbals (including 23,274 infinitive forms, 17,904 active participles, 13,373 passive participles, 5,781 ‘analogous adjectives’, 10,370 ‘nouns of place & time’). The data is provided in Excel files and was generated with inflected forms. Each entry has been associated with morpho-syntactic specifiers.

    L0074POLEX Polish Lexicon
    The POLEX Polish Lexicon is a morphological dictionary of Polish language. It comprises about 100,000 entries. The POLEX dictionary includes the core Polish vocabulary of general interest. It is based on a precise machine-interpretable formalism (coding system), the same for all categories (classes of speech). The dictionary entries are of the following form:
    BASIC_FORM+LIST_OF_STEMS+PARADIGMATIC_CODE+DISTRIBUTION_OF_STEMS
    It contains more than 42,000 nouns, 12,000 verbs, 15,000 adjectives, 25,000 participles, and about 200 pronouns. A simple lemmatiser (in form of PROLOG prototype) is also included.

    L0075Bulgarian Linguistic Database
    This database contains 81,647 entries in Bulgarian with a linguistic environment tool (for WINDOWS XP). The data may be used for morphological analysis and synthesis, syntactic agreement checking, phonetic stress determining.

    L0076Polderland Dutch Lexicon of Abbreviations and Acronyms
    The lexicon contains 2,180 Dutch abbreviations and acronyms. It complies with the official Dutch Spelling (2005/6). Each entry consists of an ID, word form, lemma and part of speech.

    L0077Polderland Dutch General Lexicon
    The lexicon contains 400,463 Dutch words, comprising 236,369 nouns, 90,882 adjectives, 69,744 verbs, 2,120 adverbs, and 1,348 items from other categories (pronouns, determiners, articles, adpositions, conjunctions, numerals, etc.). It complies with the official Dutch Spelling (2005/6). The lexicon contains an ID, word form, lemma and part of speech.

    L0078Polderland Dutch Lexicon of Names
    The lexicon contains 24,247 Dutch proper names. Various sorts of proper names are included, such as first names, last names, geographical names etc. Each entry contains an ID, word form, lemma, part of speech and proper name type.

    L0079Polderland Dutch Lexicon of Business Terminology
    The lexicon contains 15,987 Dutch words from the business domain, comprising 13,774 nouns, 1,267 adjectives, 895 verbs, 9 adverbs, and 42 items from other categories. The lexicon complies with the official Dutch Spelling (2005). Each entry contains an ID, word form and part of speech.

    L0080Polderland Dutch Lexicon of Legal Terminology
    The lexicon contains 6,207 Dutch words from the legal domain, comprising 4,781 nouns, 810 adjectives, 573 verbs, 12 adverbs and 31 items from other categories. It complies with the official Dutch Spelling (2005/6). Each entry contains an ID, word form and part of speech.

    L0081Polderland Dutch Lexicon of Medical Terminology
    The lexicon contains 17,115 Dutch words from the medical domain, comprising 12,638 nouns, 3,107 adjectives, 1,273 verbs, 11 adverbs and 86 items from other categories. It complies with the official Dutch Spelling (2005/6). Each entry contains an ID, word form and part of speech.

    L0082Polderland Dutch Lexicon of Social Terminology
    The lexicon contains 12,551 Dutch words from the social domain, comprising 9,984 nouns, 1,306 adjectives, 1,161 verbs, 56 adverbs and 44 items from other categories. It complies with the official Dutch Spelling (2005/6). Each entry contains an ID, word form and part of speech.

    L0083Polderland Dutch Lexicon of Technical Terminology
    The lexicon contains 9,940 Dutch words from the technical/scientific domain, comprising 8,832 nouns, 950 adjectives, 111 verbs, 2 adverbs and 45 items from other categories. It complies with the official Dutch Spelling (2005/6). Each entry contains an ID, word form and part of speech.

    L0084Macedonian Morphological Lexicon (MACPLEX)
    MACPLEX comprises two dictionaries: a dictionary of lemmas (over 80,000 entries) and a dictionary of word forms (over 1,300,000 entries). Morphological information (PoS, gender, case, definiteness, number for nouns, tense, person, etc. for verbs) is available for each entry. Out of the more than 1,300,000 word forms, there are 345,350 nouns, 467,744 adjectives, 500,220 verbs and 19,472 adverbs. The remaining entries correspond to pronouns, adpositions, conjunctions and numerals. The lexicon is available in Unicode.

    L0085euLEX (Lexical Database for Basque)
    euLEX is a general lexicon which contains 115,000 entries, divided into 94,000 dictionary entries or lemmas, 12,000 allomorphs, 7,500 verb forms and about 1,200 dependent morphemes. All entries include linguistic information such as morphology and usage. The lexicon is in XML.

    M0001Basic multilingual lexicon (MEMODATA)
    30,000 entries (associated by the meaning) for French, English, Italian, German, Spanish with lexical categories.

    M0002-01Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Economics, law & business management
    10,642 entries (with morphological information) for Economics, law & business management.

    M0002-02Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Leisure, Tourism, Sports, Food
    3,144 entries (with morphological information) for Leisure, Tourism, Sports, Food.

    M0002-03Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Geography, History, Arts
    4,116 entries (with morphological information) for Geography, History, Arts.

    M0002-04Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Sociology, Psychology, Pedagogy
    4,089 entries (with morphological information) for Sociology, Psychology, Pedagogy.

    M0002-05Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Natural and medical sciences
    10,535 entries (with morphological information) for Natural and medical sciences.

    M0002-06Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Exact sciences, Physics, Chemistry, Geology
    10,616 entries (with morphological information) for Exact sciences, Physics, Chemistry, Geology.

    M0002-07Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Data Processing, Electronics, Telecoms
    4,904 entries (with morphological information) for Data Processing, Electronics, Telecoms.

    M0002-08Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Technology, Engineering & Construction
    11,953 entries (with morphological information) for Technology, Engineering & Construction.

    M0002-09Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Economics
    1,320 entries (with morphological information) for Economics.

    M0002-10Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Data Processing
    3,565 entries (with morphological information) for Data Processing.

    M0002-11Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Telecommunications
    3,733 entries (with morphological information) for Telecommunications.

    M0002-12Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Electrical Engineering
    1,760 entries (with morphological information) for Electrical Engineering.

    M0002-13Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Plastics and Chemistry
    9,022 entries (with morphological information) for Plastics and Chemistry.

    M0002-14Bilingual Spanish-English and English-Spanish lexicons (INCYTA) - Aeronautics, Navigation, Mechanical Engineering
    23,170 entries (with morphological information) for Aeronautics, Navigation, Mechanical Engineering.

    M0003Danish-German dictionary (Institut for Erhvervsinformatik)
    10,000 entries giving the German lexeme and Danish equivalent with word class, subject area, indication of structural changes, developed for machine translation.

    M0004-01Dutch-French Lexicon (LanTmark)
    Vocabularies for transfer: General Vocabulary, 26,000 entries.
    Each entry contains domain information, source language disambiguation, features, target language actions.

    M0004-02Dutch-French Lexicon (LanTmark)
    Vocabularies for transfer: Administrative, 32,000 entries.
    Each entry contains domain information, source language disambiguation, features, target language actions.

    M0004-03Dutch-French Lexicon (LanTmark)
    Vocabularies for transfer: Data processing, 10,000 entries.
    Each entry contains domain information, source language disambiguation, features, target language actions.

    M0005English-French Lexicon (LanTmark)
    General vocabulary for transfer. 33,287 entries consisting of nouns (about 14,000), verbs (about 7,000), adjectives (about 5,000), adverbs (about 1,000), including a domain information, source language disambiguation, features, target language actions.

    M0006-01French-Dutch Lexicon (LanTmark)
    Vocabularies for transfer: General Vocabulary, 34,000 entries.
    Each entry contains source language disambiguation, features, and target language actions, developed for automatic translation.

    M0006-02French-Dutch Lexicon (LanTmark)
    Vocabularies for transfer: Administrative, 18,000 entries.
    Each entry contains source language disambiguation, features, and target language actions, developed for automatic translation.

    M0006-03French-Dutch Lexicon (LanTmark)
    Vocabularies for transfer: Data processing, 10,000 entries.
    Each entry contains source language disambiguation, features, and target language actions, developed for automatic translation.

    M0007French-English Lexicon (LanTmark)
    General vocabulary for transfer. 39,453 entries: nouns (about 21,000), verbs (about 9,000), adjectives (about 3,000), adverbs (about 1,000), including domain information, source language disambiguation, features, and target language actions, developed for automatic translation.

    M0008-01German-Danish dictionaries (Institut for Erhvervsinformatik)
    6,800 technical entries giving the German lexeme and Danish equivalent with word class, subject area, indication of structural changes, developed for machine translation.

    M0008-02German-Danish dictionaries (Institut for Erhvervsinformatik)
    15,500 general entries giving the German lexeme and Danish equivalent with word class, subject area, indication of structural changes, developed for machine translation.

    M0009-01THAMUS Bilingual dictionaries - Computer Science
    Computer Science, canonical forms: 17,800 entries, German=>Italian. Data contain morphological coding.

    M0009-02THAMUS Bilingual dictionaries - Computer Science
    Computer Science, canonical forms: 17,800 entries, Italian=>German. Data contain morphological coding.

    M0009-03THAMUS Bilingual dictionaries - Computer Science
    Computer science, inflected forms: 35,000 entries, German=>Italian. Data contain morphological coding.

    M0009-04THAMUS Bilingual dictionaries - Computer Science
    Computer Science, inflected forms: 35,000 entries, Italian=>German. Data contain morphological coding.

    M0010-01THAMUS Bilingual dictionaries - Aeronautics
    Aeronautics: 6,300 entries, English=>Italian. Data contain morphological coding.

    M0010-02THAMUS Bilingual dictionaries - Aeronautics
    Aeronautics: 6,300 entries, Italian=>English. Data contain morphological coding.

    M0010-03THAMUS Bilingual dictionaries - Law
    Law, canonical forms: 8,900 entries, English=>Italian. Data contain morphological coding.

    M0010-04THAMUS Bilingual dictionaries - Law
    Law, canonical forms: 8,900 entries, Italian=>English. Data contain morphological coding.

    M0010-05THAMUS Bilingual dictionaries - Law
    Law, inflected forms: 18,000 entries, English=>Italian. Data contain morphological coding.

    M0010-06THAMUS Bilingual dictionaries - Law
    Law, inflected forms: 18,000 entries, Italian=>English. Data contain morphological coding.

    M0010-07THAMUS Bilingual dictionaries - Computer science
    Computer science, canonical forms: 15,700 entries, English=>Italian. Data contain morphological coding.

    M0010-08THAMUS Bilingual dictionaries - Computer science
    Computer science, canonical forms: 15,700 entries, Italian=>English. Data contain morphological coding.

    M0010-09THAMUS Bilingual dictionaries - Computer science
    Computer science, inflected forms: 32,000 entries, English=>Italian. Data contain morphological coding.

    M0010-10THAMUS Bilingual dictionaries - Computer science
    Computer science, inflected forms: 32,000 entries, Italian=>English. Data contain morphological coding.

    M0010-11THAMUS Bilingual dictionaries - Medicine
    Medicine, canonical forms: 20,000 entries, English=>Italian. Data contain morphological coding.

    M0010-12THAMUS Bilingual dictionaries - Medicine
    Medicine, canonical forms: 20,000 entries, Italian=>English. Data contain morphological coding.

    M0010-13THAMUS Bilingual dictionaries - Economics
    Economics, canonical forms: 50,000 entries, English=>Italian. Data contain morphological coding.

    M0010-14THAMUS Bilingual dictionaries - Economics
    Economics, canonical forms: 50,000 entries, Italian=>English. Data contain morphological coding.

    M0010-15THAMUS Bilingual dictionaries - Economics
    Economics, inflected forms: 86,000 entries, English=>Italian. Data contain morphological coding.

    M0010-16THAMUS Bilingual dictionaries - Economics
    Economics, inflected forms: 86,000 entries, Italian=>English. Data contain morphological coding.

    M0010-17THAMUS Bilingual dictionaries - Engineering
    Engineering, canonical forms: 13,000 entries, English=>Italian. Data contain morphological coding.

    M0010-18THAMUS Bilingual dictionaries - Engineering
    Engineering, canonical forms: 13,000 entries, Italian=>English. Data contain morphological coding.

    M0010-19THAMUS Bilingual dictionaries - Engineering
    Engineering, inflected forms: 27,000 entries, English=>Italian. Data contain morphological coding.

    M0010-20THAMUS Bilingual dictionaries - Engineering
    Engineering, inflected forms: 27,000 entries, Italian=>English. Data contain morphological coding.

    M0013Bilingual Collocational Dictionary (Horst Bogatz)
    The bilingual English-German collocational dictionary consists of around 40,000 English headwords, including concepts expressed with more than one word (e.g. "the awareness of the environment" or "lame duck") and hyphenated compounds. It contains verbs, adjectives, synonyms and phrases that collocate with the headword. It provides the German equivalents for the headwords as well as their English synonyms.

    M0014-01Bilingual Dictionaries - English <=> Spanish I
    25000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-02Bilingual Dictionaries - English <=> Spanish II
    60000 entries
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-03Bilingual Dictionaries - English <=> Spanish III
    100000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-04Bilingual Dictionaries - English <=> Spanish IV
    200000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-05Bilingual Dictionaries - English <=> French I
    40000 entries
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-06Bilingual Dictionaries - English <=> French II
    80000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-07Bilingual Dictionaries - English <=> French III
    100000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-08Bilingual Dictionaries - English <=> French IV
    200000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-09Bilingual Dictionaries - English <=> German I
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-10Bilingual Dictionaries - English <=> German II
    80000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-11Bilingual Dictionaries - English <=> German III
    126000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-12Bilingual Dictionaries - English <=> Italian I
    20000 entries
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-13Bilingual Dictionaries - English <=> Italian II
    40000 entries
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-14Bilingual Dictionaries - English <=> Brazilian Portuguese I
    40000 entries
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-15Bilingual Dictionaries - English <=> Brazilian Portuguese II
    80000 entries
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-16Bilingual Dictionaries - English <=> Brazilian Portuguese III
    400000+ entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-17Bilingual Dictionaries - English <=> Portuguese I
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-18Bilingual Dictionaries - English <=> Portuguese II
    80000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-19Bilingual Dictionaries - English <=> Portuguese III
    110000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-20Bilingual Dictionaries - English <=> Portuguese IV
    234000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-21Bilingual Dictionaries - English <=> Dutch I
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-22Bilingual Dictionaries - English <=> Dutch II
    80000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-23Bilingual Dictionaries - English <=> Dutch III
    110000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-24Bilingual Dictionaries - English <=> Danish I
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-25Bilingual Dictionaries - English <=> Danish II
    80000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-26Bilingual Dictionaries - English <=> Danish III
    110000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-27Bilingual Dictionaries - English <=> Swedish I
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-28Bilingual Dictionaries - English <=> Swedish II
    80000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-29Bilingual Dictionaries - English <=> Swedish III
    110000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-30Bilingual Dictionaries - English <=> Finnish I
    30000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-31Bilingual Dictionaries - English <=> Icelandic I
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-32Bilingual Dictionaries - English <=> Icelandic II
    80000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-33Bilingual Dictionaries - English <=> Icelandic III
    100000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-34Bilingual Dictionaries - English <=> Russian I
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-35Bilingual Dictionaries - English <=> Russian II
    72000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-36Bilingual Dictionaries - English <=> Russian III
    120000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-37Bilingual Dictionaries - English <=> Russian Business
    60000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage and semantic features.

    M0014-38Bilingual Dictionaries - English <=> Russian Aerospace and Aeronautics
    60000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage and semantic features.

    M0014-39Bilingual Dictionaries - English <=> Russian Automotive
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage and semantic features.

    M0014-40Bilingual Dictionaries - English <=> Russian Minerals & Mining
    60000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage and semantic features.

    M0014-41Bilingual Dictionaries - English <=> Polish I
    30000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-42Bilingual Dictionaries - English <=> Polish II
    80000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-43Bilingual Dictionaries - English <=> Polish III
    124000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-44Bilingual Dictionaries - English <=> Polish IV
    150000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-45Bilingual Dictionaries - English <=> Hungarian I
    30000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-46Bilingual Dictionaries - English <=> Hungarian II
    80000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-47Bilingual Dictionaries - English <=> Hungarian III
    124000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-48Bilingual Dictionaries - English <=> Czech I
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-49Bilingual Dictionaries - English <=> Romanian Starter
    10000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-50Bilingual Dictionaries - English <=> Croatian I
    30000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-51Bilingual Dictionaries - English <=> Bosnian I
    30000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-52Bilingual Dictionaries - English <=> Serbian I (Latin or Cyrillic)
    30000 entries
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-53Bilingual Dictionaries - English <=> Japanese I
    40000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0014-54Bilingual Dictionaries - English <=> Greek
    60000 entries.
    Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features.

    M0015EuroWordNet English Addition to English WordNet
    Each EuroWordNet database is composed of the following:
    - The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
    - A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
    - A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
    - A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
    - WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
    Number of synsets for M0015 = 16361 synsets

    M0016EuroWordNet Dutch
    Each EuroWordNet database is composed of the following:
    - The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
    - A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
    - A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
    - A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
    - WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
    Number of synsets for M0016 = 44015 synsets

    M0017EuroWordNet Spanish
    Each EuroWordNet database is composed of the following:
    - The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
    - A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
    - A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
    - A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
    - WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
    Number of synsets for M0017 = 23370 synsets

    M0018EuroWordNet Italian
    Each EuroWordNet database is composed of the following:
    - The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
    - A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
    - A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
    - A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
    - WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
    Number of synsets for M0018 = 48529 synsets

    M0019EuroWordNet German
    Each EuroWordNet database is composed of the following:
    - The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
    - A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
    - A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
    - A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
    - WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
    Number of synsets for M0019 = 15132 synsets

    M0020EuroWordNet French
    Each EuroWordNet database is composed of the following:
    - The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
    - A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
    - A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
    - A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
    - WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
    Number of synsets for M0020 = 22745 synsets

    M0021EuroWordNet Czech
    Each EuroWordNet database is composed of the following:
    - The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
    - A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
    - A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
    - A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
    - WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
    Number of synsets for M0021 = 12824 synsets

    M0022EuroWordNet Estonian
    Each EuroWordNet database is composed of the following:
    - The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
    - A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
    - A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
    - A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
    - WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
    Number of synsets for M0022 = 9317 synsets

    M0025Bilingual English-Russian Russian-English Dictionaries
    Produced through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335), these bilingual dictionaries contain more than 350,000 pairs of words (in tabular form) in XML format:
    - Russian-English dictionary - more than 130,000 entries
    - English-Russian dictionary - more than 95,000 entries
    Each entry contains: source word (lemma); part of speech of source word; target word(s) (lemma(s)), grouped by same meaning; part of speech of target word(s); domain(s).

    M0026-01MultiWordNet database (included semantic fields) (MultiWordNet)
    MultiWordNet:  MultiWordNet contains information about the following aspects of the English and Italian lexical: lexical relations between words, semantic relations between lexical concepts, correspondences between Italian and English lexical concepts, semantic fields. Information about 51,000 Italian words meanings and 28,000 synsets (in correspondence with the English equivalents) is included. MultiWordNet can be used for NLP applications such as information retrieval, semantic tagging, disambiguation, terminology, etc.

    M0026-02Labelling of WordNet 1.6 with semantic fields (WordNet Domains)
    MultiWordNet:  MultiWordNet contains information about the following aspects of the English and Italian lexical: lexical relations between words, semantic relations between lexical concepts, correspondences between Italian and English lexical concepts, semantic fields. Information about 51,000 Italian words meanings and 28,000 synsets (in correspondence with the English equivalents) is included. MultiWordNet can be used for NLP applications such as information retrieval, semantic tagging, disambiguation, terminology, etc.

    M0027Oxford French Minidictionary
    Over 100,000 words, phrases and translations are included in this bilingual minidictionary, which is available in SGML. Complementary information, such as usage notes, is also provided.

    M0028Concise Oxford-Duden German Dictionary
    This bilingual dictionary contains 150,000 words and phrases, and 240,000 translations, and is available in XML and SGML.

    M0029Pocket Oxford Italian Dictionary
    This is a mid-sized dictionary to cover essential terms and vocabulary, available in XML and SGML. It contains 80,000 words and phrases, and 115,000 translations.

    M0030Concise Oxford Spanish Dictionary
    The coverage of this concise Oxford Spanish dictionary includes 24 varieties of Spanish as it is written and spoken throughout the Spanish-speaking world. This bilingual dictionary contains 170,000 words and phrases and 240,000 translations. It is available in SGML and XML.

    M0031Oxford Business French Dictionary
    This dictionary covers the general language of Business across a range of core areas. It contains over 50,000 words and phrases, and is available in SGML.

    M0032Oxford Business Spanish Dictionary
    This dictionary covers the general language of Business across a range of core areas. It contains over 50,000 words and phrases, and is available in SGML.

    M0033SCI-FRAN-EURADIC French-English Bilingual Dictionary
    SCI-FRAN-EURADIC:  This bilingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 243,539 pairs of French-English terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038.

    M0034SCI-FRAL-EURADIC French-German Bilingual Dictionary
    SCI-FRAL-EURADIC:  This bilingual dictionary was developed within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 170,967 pairs of French-German terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038.

    M0035SCI-FRES-EURADIC French-Spanish Bilingual Dictionary
    SCI-FRES-EURADIC:  This bilingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 102,941 pairs of French-Spanish terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0036, ELRA-M0037, ELRA-M0038.

    M0036SCI-FRIT-EURADIC French-Italian Bilingual Dictionary
    SCI-FRIT-EURADIC:  This bilingual dictionary was developed within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 116,587 pairs of French-Italian terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0037, ELRA-M0038.

    M0037SCI-ANES English-Spanish Bilingual Dictionary
    SCI-ANES:  This bilingual dictionary contains around 60,000 pairs of English-Spanish terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0038.

    M0038SCI-AN-ALL English-German Bilingual Dictionary
    SCI-AN-ALL:  This bilingual dictionary contains 59,758 pairs of English-German terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".

    See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037.

    M0039SCI-ALRU German-Russian Bilingual Dictionary
    SCI-ALRU:  This bilingual dictionary contains around 80,000 pairs of German-Russian terms, with their part of speech.
    The data are presented in a table format, where information related to each entry is separated by ";".

    M0040DixAF (Bilingual Dictionary French Arabic, Arabic French)
    DixAF:  DixAF is a French-Arabic, Arabic-French dictionary, which consists of around 125,000 binary links between ca. 43,000 French entries and ca. 35,000 Arabic entries.

    M0041Bulgarian WordNet
    The Bulgarian WordNet was initially developed within the framework of the BalkaNet project "Multilingual Semantic Network for the Balkan Languages" (IST-2000-29388) and later on under the scope of the BulNet project, funded at the national level. It models nouns, verbs, adjectives, and (occasionally) adverbs, and contains 23,715 word senses (synsets).

    M0042ItalWordNet (Italian WordNet)
    ItalWordNet (Italian WordNet) is an updated version of the EuroWordNet Italian database. The ItalWordNet database was produced within a national Italian programme called SI-TAL. It contains a total of 49,360 synsets. The ItalWordNet is provided in XML format. The original EuroWordNet Italian database is also included in this package.

    M0043Russian => English MT optimized lexicon in OLIF XML
    This lexicon is provided in structured XML of OLIF (Open Lexicon Interchange Format) format. It comprises 99,211 entries in its source language (Russian) and 134,828 entries in its target language (English). The source entries are distributed as follows: 64,487 nouns, 11,470 adjectives, 19,724 verbs, 1,762 adverbs, and 1,768 closed-class elements (interjections, special prefixes, suffixes, etc.). Nouns contain gender and number information and verbs provide details on aspect and reflexivity. The entries contain semantic information in terms of domain specification or style information (e.g., colloquial, regional use, etc.). Moreover, definitions are available for 59,775 entries, as well as collocational information for 39,148 entries.

    M0044English => Swahili Bilingual Lexicon
    This lexicon is provided in structured XML of OLIF (Open Lexicon Interchange Format) format. It comprises 58,247 entries in English and 58,300 in Swahili. The source entries are distributed as follows: 36,046 nouns, 3,013 adjectives, 18,308 verbs and 880 closed-class entries. The entries contain semantic information in terms of domain specification or style information (e.g., colloquial, regional use, etc.). Collocational information is also available for 17,570 entries.

    M0045Cebuano => English Bilingual Lexicon
    This lexicon is provided in structured XML of OLIF (Open Lexicon Interchange Format) format. It comprises 1,988 entries in Cebuano and 1,990 in English. The source entries are distributed as follows: 1,052 nouns, 462 adjectives, 405 verbs and 69 closed-class entries. The entries contain semantic information in terms of domain specification or style information (e.g., colloquial, regional use, etc.). Collocational information is also available for 500 entries.

    M0046English => Czech Bilingual Lexicon
    This lexicon is provided in structured XML of OLIF (Open Lexicon Interchange Format) format. It comprises 31,718 entries in English and 32,125 in Czech. The source entries are distributed as follows: 17,797 nouns, 7,748 adjectives, 6,039 verbs and 134 closed-class entries. The entries contain semantic information in terms of domain specification or style information (e.g., colloquial, regional use, etc.). Collocational information is also available for 3,065 entries.

    M0047Czech WordNet
    The Czech WordNet captures nouns, verbs, adjectives, and partly adverbs, and contains 28,201 word senses (synsets). Every synset encodes the equivalence relation between several literals (at least one is present), having a unique meaning, belonging to one and the same part of speech, and expressing the same lexical meaning. Each Czech synset is related to the corresponding synset in the Princeton WordNet 2.0. via its identification number ID. There is at least one language-internal relation between a synset and another synset in the database.

    M0048LatinWordNet
    LatinWordNet contains information about the following aspects of the Latin and English lexicon: lexical relations between words, semantic relations between lexical concepts, correspondences between Latin and English lexical concepts. LatinWordNet covers nouns, verbs, adjectives and adverbs, and contains 8,978 synsets in correspondence with the English equivalents (and with all the MultiWordNet-based wordnets).

    M0049Basque WordNet
    The Basque WordNet models nouns, verbs and adjectives. Each sense is linked to a so-called synset (for a total of 30,281 synsets). Every synset encodes the synonymy relation between (possibly) several words (synonyms), having a unique meaning, belonging to one and the same part of speech (specified in the POS tag value), and expressing the same lexical meaning. Each synset is related to the corresponding synset in the English WordNet 1.6. via its identification number ID, which includes the synset number and the POS tag. The only exceptions are newly created synsets to account for cultural concepts not present in WordNet 1.6.

    M0050The MWN.PT - MultiWordnet of Portuguese
    MWN.PT - MultiWordnet of Portuguese (version 1) spans over 17,200 manually validated concepts/synsets, linked under the semantic relations of hyponymy and hypernymy. These concepts are made of over 21,000 word senses/word forms and 16,000 lemmas from both European and American variants of Portuguese. They are aligned with the translationally equivalent concepts of the English Princeton WordNet and, transitively, of the MultiWordNets of Italian, Spanish, Hebrew, Romanian and Latin.

    S0001ACCOR - English
    Acoustic and articulatory multilingual database recorded as part of the ESPRIT-ACCOR project investigating cross-language acoustic-articulatory correlations in coarticulatory processes. Only English is available.

    S0003BDLEX 23000
    A phonetically transcribed French lexicon of 23,000 canonical entries (leading to over 270,000 forms) with the corresponding graphemical, phonological and morphosyntactical attributes.

    S0004BDLEX
    Lexicon for written and spoken French including 440,000 inflected forms with spelling, pronunciation (phonology) and morphosyntatic attributes

    S0005BDSONS Base de données des sons du français
    BDSONS:  Speech database with two subsets: evaluation (sentences, logatomes, numbers, digits, etc.) & acoustic modelling (sequences of CVCV, various types of sentences, etc.). The corpus consists of 16 male and 16 female speakers.

    S0006BREF-80
    BREF Sub-corpus containing training data of 5,330 sentences read by 80 French speakers. Texts were selected from the French newspaper Le Monde (over 20,000 words).

    S0007BREF-POLYGLOT
    BREF Sub-corpus containing training data of 3,193 sentences read by 6 French speakers . The sentences were selected to cover a wide range of phonetic contexts.

    S0008COLLECT
    500 speakers, half of whom called from Turin and the other half from all over Italy, automatically prompted to utter the 10 Italian digits and 5 command words.

    S0009COST232
    Multi-English Speech database - 797 successful calls received in Italy and in the UK, using different types of collecting equipment. Repetition of the same vocabulary the "TI (Texas Instrument) words" (digits + yes, no, go, etc.).

    S0010Dutch Polyphone Database
    Telephone speech from 5,050 Dutch speakers. Approx. 44 items per speaker. Read & spontaneous speech (isolated words, digits, sentences, etc.).

    S0011English SpeechDat Polyphone database DB1
    Phonetically rich sentences & application oriented utterances such as keywords, digits, etc.. 1,000 speakers recorded over digital telephone lines using fixed telephone sets.

    S0012English SpeechDat(M) Polyphone database DB2
    Phonetically rich sentences sub-set.
    See ELRA-S0011

    S0013Erlanger Bahnansage - ERBA
    ERBA:  Over 10,000 utterances read by over 100 German speakers. Domain of train inquiries.

    S0014-01EUROM1f French
    The multilingual European speech database.The first really multilingual speech database produced in Europe. Over 60 speakers per language who pronounced numbers, sentences, isolated words using close talking microphone.

    S0014-02EUROM1e English
    The multilingual European speech database.The first really multilingual speech database produced in Europe. Over 60 speakers per language who pronounced numbers, sentences, isolated words using close talking microphone.

    S0014-03EUROM1g German
    The multilingual European speech database.The first really multilingual speech database produced in Europe. Over 60 speakers per language who pronounced numbers, sentences, isolated words using close talking microphone.

    S0015EUROM1i
    The multilingual European speech database.The first really multilingual speech database produced in Europe. Over 60 speakers per language who pronounced numbers, sentences, isolated words using close talking microphone.

    S0016FRESCO: French Polyphone Database (SpeechDat(M)) DB1
    Phonetically rich sentences & application oriented utterances such as keywords, digits, etc.. French SpeechDat (Polyphone) database containing 35,000 utterances from 1,000 callers over the telephone in France.

    S0017FRESCO: French Polyphone Database (SpeechDat(M)) DB2
    French (SpeechDat(M)) polyphone database.
    Phonetically rich sentences sub-set. See ELRA-S0016

    S0018German Polyphone Database (SpeechDat(M)) DB1
    Phonetically rich sentences & application oriented utterances such as keywords, digits, etc. German read and spontaneous speech from 1,000 speakers.

    S0019German Polyphone Database (SpeechDat(M)) DB2
    German Polyphone Database (SpeechDat(M))
    Phonetically rich sentences sub-set. See ELRA-S0018

    S0020GRONINGEN
    Over 20 hours of Dutch read speech material (short texts, short sentences, etc.), from 238 speakers.

    S0021M2VTS Speaker Verification Database
    Multi Modal Verification for Teleservices and Security applications project. Multilingual data base designed to facilitate access control using multimodal identification of human faces (speech & image).

    S0022Onomastica
    Onomastica Multi-Language Pronunciation Dictionaries covering city & town names, street names, family names, first names, product names, for 11 European languages. Only German is available now.

    S0023PHONDAT 1 - PD1 (2nd edition)
    Read speech from 201 German speakers who read 450 different sentences each. Eight of them read the whole sentence corpus.

    S0024PHONDAT 2 - PD2 (2nd edition)
    200 different sentences from a train inquiry task read by 16 German speakers, provided with phonological segmentation by hand plus other labelling.

    S0025SIEMENS 100 - SI100
    Approx. 100 sentences extracted from the German newspaper SudDeutsch Zeitungen and read by 101 speakers.

    S0026SIEMENS 1000 - SI1000
    Approx. 1,000 sentences extracted from the German newspaper SudDeutsch Zeitungen and read by 10 speakers.

    S0027SieTill (Siemens Tillman)
    Telephone Speech Database database with 730 speakers (338 female, 392 male), and 36,000 utterances (digit sequences, dates, spelled names, ...).

    S0028The "SIVA" Speech Database for Speaker Verification and Identification
    Speech Database for Speaker Verification and Identification. Over 2,000 calls in Italian language, collected over the fixed telephone network.

    S0029Strange Corpus 1 - SC1 (ACCENTS)
    'Nordwind und Sonne' story read by 72 speakers with foreign accent and 16 native German speakers.

    S0030-01Swiss-French Polyphone Database 1000 speakers
    This speech database contains the recordings of 1,000 speakers who answered around 10 questions leading to spontaneous speech, and read about 28 items from a form supplied by IDIAP.

    S0030-02Swiss-French Polyphone Database 4000 speakers
    This speech database contains the recordings of 4,000 speakers who answered around 10 questions leading to spontaneous speech, and read about 28 items from a form supplied by IDIAP.

    S0031TED Translanguage English Database
    Translanguage English Database.
    Recordings made of 188 oral presentations in English, given at Eurospeech'93 in Berlin (high percentage of non native English speakers).

    S0032TEDphone (Polyphone-like Translanguage English Database)
    TEDPhone:  Polyphone/SpeechDat-like recordings of 64 speakers in English and in their native language.

    S0033BDBRUIT
    Recordings of French speech, corrupted with perturbations due to noisy environments, especially the Lombard effect. 5 male and 5 female speakers uttered sentences, digits, etc.

    S0034-01VERBMOBIL - VM CD 1.0.3 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    63 Dialogues 209 Appointments, 1840 Turns.
    1 CDROM.

    S0034-02VERBMOBIL - VM CD 1.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    63 Dialogues 209 Appointments, 1840 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentations. All files were validated according to BAS guidelines.
    1 CDROM

    S0034-03VERBMOBIL - VM CD 2.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    81 Dialogues 227 Appointments, 1538 Turns.
    1 CDROM.

    S0034-04VERBMOBIL - VM CD 2.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    81 Dialogues 227 Appointments, 1538 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentations. All files were validated according to BAS guidelines.
    1 CDROM.

    S0034-05VERBMOBIL - VM CD 3.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    45 Dialogues 184 Appointments, 1214 Turns.
    1 CDROM.

    S0034-06VERBMOBIL - VM CD 3.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    45 Dialogues 184 Appointments, 1214 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentations. All files were validated according to BAS guidelines.
    1 CDROM.

    S0034-07VERBMOBIL - VM CD 4.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    72 Dialogues, 181 Appointments, 1,588 Turns.
    1 CDROM.

    S0034-08VERBMOBIL - VM CD 4.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    72 Dialogues 181 Appointments 1,588 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentation and partitur files. All files were validated according to BAS guidelines.
    1 CDROM.

    S0034-09VERBMOBIL - VM CD 5.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    101 Dialogues, 256 Appointments, 2,154 Turns.
    1 CDROM.

    S0034-10VERBMOBIL - VM CD 5.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    101 Dialogues, 256 Appointments 2,154 Turns.This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentation and partitur files. All files were validated according to BAS guidelines.
    1 CDROM.

    S0034-11VERBMOBIL - VM CD 6.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    146 Dialogues, 191 Appointments, 1,828 Turns.
    1 CDROM.

    S0034-12VERBMOBIL - VM CD 6.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    146 Dialogues, 191 Appointments 1,828 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 1 Header structure, software and speaker documentation. All files were validated according to BAS guidelines.
    1 CDROM.

    S0034-13VERBMOBIL - VM CD 7.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    68 Dialogues, 238 Appointments, 1,739 Turns.
    1 CDROM.

    S0034-14VERBMOBIL - VM CD 7.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    68 Dialogues, 238 Appointments, 1,739 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentation and partitur files. All files were validated according to BAS guidelines.
    1 CDROM.

    S0034-16VERBMOBIL - VM CD 8.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    167 Dialogues, 167 Appointments, 1,181 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 1 Header structure, software and speaker documentation. All files were validated according to BAS guidelines.
    1 CDROM.

    S0034-17VERBMOBIL - VM CD 12.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    207 Dialogues, 207 Appointments, 2,154 Turns.
    1 CDROM.

    S0034-18VERBMOBIL - VM CD 12.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    207 Dialogues, 207 Appointments, 2,154 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentation and partitur files. All files were validated according to BAS guidelines.
    1 CDROM.

    S0034-20VERBMOBIL - VM CD 13.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    90 speakers, 1714 turns, 200 spontaneous dialogues, transliteration.
    1 CDROM.

    S0034-21VERBMOBIL - VM CD 14.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    97 speakers, 1891 turns, 156 spontaneous dialogues, transliteration.
    1 CDROM.

    S0034-22VERBMOBIL - VM CD 14.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    97 speakers, 1891 turns, 156 spontaneous dialogues, transliteration, PhonDat 2 headers, partitur files.
    1 CDROM.

    S0034-23VERBMOBIL - VM CD 16.0 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    78 speakers, 3311 turns, 200 spontaneous dialogues, transliteration (Kanji/Kana and Roman/Latin).

    S0034-24VERBMOBIL - VM CD 17.0 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    84 speakers, 2741 turns, 200 spontaneous dialogues, transliteration (Kanji/Kana and Roman/Latin).
    1 CDROM

    S0034-25VERBMOBIL - VM CD 18.0 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    80 speakers, 2345 turns, 200 spontaneous dialogues, transliteration (Kanji/Kana and Roman/Latin).
    1 CDROM

    S0034-26VERBMOBIL - VM CD 19.0 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    82 speakers, 2911 turns, 200 spontaneous dialogues, transliteration (Kanji/Kana and Roman/Latin).
    1 CDROM.

    S0034-27VERBMOBIL - VM CD S 1.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    26 Free Dialogues (with overlap, stereo recordings), 2227 Turns.
    1 CDROM.

    S0034-28VERBMOBIL II - VM CD15.1 - VM15.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German - 19 spontaneous dialogues (19 close mic, 19 room mic, 19 telephone (fixed network, GSM), 3117 turns, transliteration (VM II format), NIST headers, partitur files.
    1 CDROM.

    S0034-29VERBMOBIL II - VM CD20.1 - VM20.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German - 30 spontaneous dialogues (10 close mic, 27 room mic, 10 phone line (GSM)), 1957 turns, transliteration (VM II format), NIST headers, partitur files.
    1 CDROM.

    S0034-30VERBMOBIL II - VM CD21.1 - VM21.1 (new edition)
    Verbmobil II - German - 38 spontaneous dialogues (38 close mic, 2 room mic, 22 phone line (GSM)), 2331 turns, transliteration (VM II format), NIST headers, partitur files.
    1 CDROM.

    S0034-31VERBMOBIL II - VM CD 22.1 - VM22.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German - 60 spontaneous dialogues (28 close mic, 5 room mic, 27 phone line (GSM) recordings), 2004 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-32VERBMOBIL II - VM CD 23.1 - VM23.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - American English - 28 spontaneous dialogues (28 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 2727 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-33VERBMOBIL II - VM CD 24.1 - VM24.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German - 58 spontaneous dialogues (36 close mic, 0 room mic, 22 phone line (GSM) recordings), 2231 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-34VERBMOBIL II - VM CD 25.1 - VM25.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese - 10 spontaneous dialogues (10 close mic, 0 room mic, 0 phone line (GSM) recordings), 1654 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-35VERBMOBIL II - VM CD 26.1 - VM26.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese - 16 spontaneous dialogues (16 close mic, 0 room mic, 0 phone line (GSM) recordings), 1319 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-36VERBMOBIL II - VM CD 27.1 - VM27.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese - 24 spontaneous dialogues (24 close mic, 0 room mic, 0 phone line (GSM) recordings), 1149 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-37VERBMOBIL II - VM CD 28.1 - VM28.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - American English - 28 spontaneous dialogues (28 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 2408 turns, transliteration (Verbmobil II Format)
    1 CDROM.

    S0034-38VERBMOBIL II - VM CD 30.1 - VM30.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German - 33 spontaneous dialogues (33 close mic, 21 room mic, 25 phone line (fixed network, GSM) recordings), 4176 turns, transliteration (Verbmobil II Format)
    1 CDROM.

    S0034-39VERBMOBIL II - VM CD 31.1 - VM31.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - American English - 32 spontaneous dialogues (32 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 2512 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-40VERBMOBIL II - VM CD 32.1 - VM32.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Multilingual - 17 spontaneous dialogues (17 close mic, 0 room mic, 0 phone line (fixed, network, GSM) recordings), 992 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-41VERBMOBIL II - VM CD 33.1 - VM33.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese, 25 spontaneous dialogues (25 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1050 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-42VERBMOBIL II - VM CD 34.1 - VM34.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese, 28 spontaneous dialogues (28 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1437 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-43VERBMOBIL II - VM CD 35.1 - VM35.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese, 27 spontaneous dialogues (27 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1645 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-44VERBMOBIL II - VM CD 38.1 - VM38.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German, 33 spontaneous dialogues (33 close mic, 28 room mic, 28 phone line (fixed network, GSM) recordings), 5115 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-45VERBMOBIL II - VM CD 39.1 - VM39.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German, 28 spontaneous dialogues (28 close mic, 17 room mic, 20 phone line (fixed network, GSM) recordings), 3360 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-46VERBMOBIL II - VM CD 29.1 - VM29.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German, 25 spontaneous dialogues (25 close mic, 20 room mic, 20 phone line (fixed network, GSM) recordings), 2708 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-47VERBMOBIL II - VM CD 42.1 - VM42.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - American English, 20 spontaneous dialogues (20 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1874 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-48VERBMOBIL II - VM CD 43.1 - VM43.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese - 24 spontaneous dialogues (24 close mic, 0 room mic, 0 phone line (GSM) recordings), 1149 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-49VERBMOBIL II - VM CD 49.1 - VM49.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German, 24 spontaneous dialogues (24 close mic, 12 room mic, 12 phone line (fixed network, GSM) recordings), 2597 turns, transliteration (Verbmobil II Format)..
    1 CDROM.

    S0034-50VERBMOBIL II - VM CD 50.1 - VM50.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - American-English, 8 spontaneous dialogues (8 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 679 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-51VERBMOBIL II - VM CD 48.1 - VM48.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German, 28 spontaneous dialogues (28 close mic, 23 room mic, 27 phone line (fixed network, GSM) recordings), 4238 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-52VERBMOBIL II - VM CD 44.1 - VM44.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese, 19 spontaneous dialogues (19 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 920 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-53VERBMOBIL II - VM CD 45.1 - VM45.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese, 21 spontaneous dialogues (21 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1293 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-54VERBMOBIL II - VM CD 46.1 - VM46.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Multilingual Japanese/German, 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 607 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-55VERBMOBIL II - VM CD 47.1 - VM47.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Multilingual with human interpreter (3 channels) English/German, 17 spontaneous dialogues (17 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 853 turns, transliteration (Verbmobil II Format).
    1 CDROM.

    S0034-56VERBMOBIL II - VM Bonus CD - VMBONUS (BAS edition)
    Additional data and documentation that is not included in the regular VM volumes.
    1 CD-ROM.

    S0034-57VERBMOBIL II - VM Lexicon database - VMLEX (BAS edition)
    Verbmobil lexicon database of the University of Bielefeld.

    S0034-58VERBMOBIL II - VM CD 15.1 - VM15.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - German - 19 spontaneous dialogues (19 close mic, 19 room mic, 19 phone line (GSM)), 3117 turns, transliteration (VM II format), NIST headers, partitur files.
    1 CDROM.

    S0034-59VERBMOBIL II - VM CD 16.1 - VM16.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese, 200 dialogues, 200 appointment schedulings - 3311 turns.
    1 CDROM.

    S0034-60VERBMOBIL II - VM CD 17.1 - VM17.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    Verbmobil II - Japanese, 200 dialogues, 200 appointment schedulings - 2741 turns.
    1 CDROM.

    S0034-61VERBMOBIL II - VM CD 18.1 - VM18.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    Japanese, 200 dialogues, 200 appointment schedulings - 2345 turns.
    1 CDROM.

    S0034-62VERBMOBIL II - VM CD 19.1 - VM19.1 (new edition)
    Spontaneous speech databases recorded in a dialogue task.
    Japanese, 200 dialogues, 200 appointment schedulings - 2911 turns.
    1 CDROM.

    S0034-63VERBMOBIL II - VM CD 53.1 - VM53.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    German, 16 spontaneous dialogues (16 close mic, 8 room mic, 8 phone line (GSM) recordings) - 1771 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-64VERBMOBIL II - VM CD 60.1 - VM60.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Japanese - 10 spontaneous dialogues (10 close mic, 0 room mic, 0 phone line (GSM) recordings) - 501 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-65VERBMOBIL II - VM CD 61.1 - VM61.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Japanese - 19 spontaneous dialogues (19 close mic, 0 room mic, 0 phone line (GSM) recordings) - 946 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-66VERBMOBIL II - VM CD 62.1 - VM62.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Japanese - 21 spontaneous dialogues (21 close mic, 0 room mic, 0 phone line (GSM) recordings) - 981 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-67VERBMOBIL II - VM CD 51.1 - VM51.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Multilingual German/English with human interpreter (3 channels) - 15 spontaneous dialogues (15 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) – 856 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-68VERBMOBIL II - VM CD 52.1 - VM52.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Multilingual German/English with human interpreter (3 channels) - 13 spontaneous dialogues (13 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 728 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-69VERBMOBIL II - VM CD 55.1 - VM55.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Multilingual German/English with human interpreter (3 channels) - 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 518 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-70VERBMOBIL II - VM CD 56.1 - VM56.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Multilingual German/English with human interpreter (3 channels) - 12 spontaneous dialogues (12 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 620 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-71VERBMOBIL II - VM CD 57.1 - VM57.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Multilingual German/Japanese with 2 human interpreters (4 channels) - 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 702 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-72VERBMOBIL II - VM CD 58.1 - VM58.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Multilingual German/Japanese with 2 human interpreters (4 channels) - 7 spontaneous dialogues (7 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 421 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-73VERBMOBIL II - VM CD 59.1 - VM59.1 (BAS edition)
    Spontaneous speech databases recorded in a dialogue task.
    Multilingual German/Japanese with 2 human interpreters (4 channels) - 7 spontaneous dialogues (7 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 354 turns, transliteration (VM II Format).
    1 CDROM.

    S0034-74VERBMOBIL II - VM CD 63.0 - VM63.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    German - 14 WOZ dialogues designed to evoke emotions (mainnly anger) - transliteration, emotion labeling.
    1 CDROM.

    S0034-75VERBMOBIL II - VM CD 64.0 - VM64.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    German - 13 WOZ dialogues designed to evoke emotions (mainnly anger) - transliteration, emotion labeling.
    1 CDROM.

    S0034-76VERBMOBIL II - VM CD 65.0 - VM65.0 (original edition)
    Spontaneous speech databases recorded in a dialogue task.
    German - 13 WOZ dialogues designed to evoke emotions (mainnly anger) - transliteration, emotion labeling.
    1 CDROM.

    S0035PHONOLEX (BAS/DFKI)
    Approx. 1,6 Mio entries with orthographic forms (capital nouns, old German, spelling, ...), phonetic transcription (by rules and exception list) and other linguistic information (e.g. grammatical categories).

    S0038Siemens VoiceMail
    This speech database contains the recordings of 921 American speakers recorded over the fixed telephone network. It consists of read acoustic speech divided into 9.5 hours of transliterated speech and 8 hours of non-transliterated speech. Orthographic transliteration for about 25,000 utterances are included.

    S0039APASCI
    Italian acoustic database recorded in an insulated room. It includes ca. 16,090 utterances and digits, 58,924 words (2,191 different words), 641 minutes of speech. The data is uttered by 50 male and 50 female speakers. 42 male and 12 female speakers repeated 20 times 10 isolated digits.

    S0040Danish SpeechDat(M) database - DB1
    Phonetically rich sentences & application oriented utterances such as keywords, digits, etc..
    This speech database contains the recordings of 1,523 Danish speakers, recorded over the Danish fixed telephone network. Each speaker uttered around 100 read and spontaneous items.

    S0041Danish SpeechDat(M) database - DB2