 |
Language Resources |
 |
|
 |
Bug reports |
 |
|
 |
Search Catalogue |
 |
|
 |
Languages |
 |
|
 |
Informations |
 |
|
|
| Products meeting the search criteria |
 |
|
 |
select distinct(ci.catalogue_item_id), ci.catalogue_item_reference from catalogue_items as ci, item_ressources as ir, resources as r where r.resource_id = ir.resource_id and ir.catalogue_item_id = ci.catalogue_item_id order by ci.catalogue_item_reference
| The Aurora project 2.0 is a revised version of the Noisy TI digits database to follow on the work of ETSI. This CD set is a replacement for the previous set (version 1.0 consisted of 2 CDs while version 2.0 now consists of 4 CDs) . This database is intended for the evaluation of algorithms for front-end feature extraction algorithms in background noise but may also be used more widely by speech researchers to evaluate and compare the performance of noise robust speech recognition algorithms. |
|
|
| This database is a subset of the SpeechDat-Car database in Finnish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Finnish digits spoken in different driving conditions inside a car. |
|
|
| This database is a subset of the SpeechDat-Car database in Spanish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Spanish digits spoken in different noise and driving conditions inside a car. |
|
|
| This database is a subset of the SpeechDat-Car database in German language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected German digits spoken in different noise and driving conditions inside a car. |
|
|
| This database is a subset of the SpeechDat-Car database in Danish language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected Danish digits spoken in different noise and driving conditions inside a car. |
|
|
| This database is a subset of the Italian SpeechDat-Car database which has been collected as part of the European Union funded SpeechDat-Car project. It contains contains 2200 Italian connected digit utterances divided into training and testing utterances in different noise and driving conditions inside a car. |
|
|
| The Aurora project has released a number of list files for performing the training and testing on the Wall Street Journal (WSJ0) data at two sampling rates -8 kHz and 16 kHz. The Aurora 4a database is based on the WSJ0 with artificial addition of noise over a range of signal to noise ratios. It contains both clean and multicondition training sets and 14 evaluation sets with different noise types and microphones. |
|
|
| The Aurora project has released a number of list files for performing the training and testing on the Wall Street Journal (WSJ0) data at two sampling rates -8 kHz and 16 kHz. The Aurora 4b, has been released. It contains noisy versions of the Nov'92 WSJ0 development set. |
|
|
The AURORA-5 database has been mainly developed to investigate the influence on the performance of automatic speech recognition for a hands-free speech input in noisy room environments. Furthermore two test conditions are included to study the influence of transmitting the speech in a mobile communication system.
It contains artificially distorted versions of the recordings from adult speakers in the TI-Digits speech database downsampled at a sampling frequency of 8000 Hz, a set of recordings that contain sequences of digits uttered by different speakers in hands-free mode in a meeting room, as well as a set of scripts for running recognition experiments on those speech data. The experiments are based on the usage of the freely available software package HTK where HTK is not part of this resource. |
|
|
| Approx. 1,6 Mio entries with orthographic forms (capital nouns, old German, spelling, ...), phonetic transcription (by rules and exception list) and other linguistic information (e.g. grammatical categories). |
|
|
| Approx. 1,6 Mio entries with orthographic forms (capital nouns, old German, spelling, ...), phonetic transcription (by rules and exception list) and other linguistic information (e.g. grammatical categories). |
|
|
| Approx. 1,6 Mio entries with orthographic forms (capital nouns, old German, spelling, ...), phonetic transcription (by rules and exception list) and other linguistic information (e.g. grammatical categories). |
|
| 186,600 entries, including proper names, place names, no-native entries and abbreviations, with phonetic transcriptions, main stress markers and syllable boundary markers, from the political and economical parts of the German newspapers 'Suddeutsche Zeitung' and 'Frankfurter Allgemeine Zeitung'. |
|
|
| LusoLEX: Multifunctional monolingual lexicon of the European variety of Portuguese, consisting of about 61,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics, augmentatives and diminutives. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations. |
|
|
| LusoLEX: Multifunctional monolingual lexicon of the European variety of Portuguese, consisting of about 61,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics, augmentatives and diminutives. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations. |
|
|
| LusoLEX: Multifunctional monolingual lexicon of the European variety of Portuguese, consisting of about 61,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics, augmentatives and diminutives. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations. |
|
| BrasiLEX: Multifunctional monolingual lexicon of the Brazilian variety of Portuguese, consisting of about 65,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics and augmentative/diminutive degree. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations. |
|
|
| This speech database contains the recordings of 1,000 Austrian speakers recorded over the fixed telephone network. Each speaker uttered around 60 read and spontaneous items. |
|
|
| This speech database contains the recordings of 1,000 Austrian speakers recorded over the fixed telephone network. Each speaker uttered around 60 read and spontaneous items. |
|
|
| This speech database contains the recordings of 1,000 Austrian speakers recorded over the fixed telephone network. Each speaker uttered around 60 read and spontaneous items. |
|
| This speech database contains the recordings of 1,000 Austrian speakers recorded over the Austrian mobile telephone network. Each speaker uttered around 60 read and spontaneous items. |
|
|
| This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items. |
|
|
| This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items. |
|
|
| This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items. |
|
| This speech database contains the recordings of 530 Moroccan speakers recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items. |
|
|
| This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items. |
|
| This speech database contains the recordings of 530 Moroccan speakers recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 530 Moroccan speakers of French recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items. |
|
| This speech database contains the recordings of 530 Moroccan speakers recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 772 Moroccan speakers recorded over the Moroccan fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 598 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 598 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 792 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 598 Tunisian speakers recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 576 Tunisian speakers of French recorded over the Tunisian fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 500 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 500 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 500 Egyptian speakers recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 500 Egyptian speakers of English recorded over the Egyptian fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items. |
|
|
| The IDIOLOGOS 1 “Bootstrap” database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It comprises 1000 adult French speakers (470 males, 530 females) recorded over the French fixed telephone network. |
|
|
| The IDIOLOGOS 1 “Bootstrap” database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It comprises 1000 adult French speakers (470 males, 530 females) recorded over the French fixed telephone network. |
|
|
| The IDIOLOGOS 1 “Bootstrap” database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It comprises 1000 adult French speakers (470 males, 530 females) recorded over the French fixed telephone network. |
|
| The IDIOLOGOS 2 “Eingenspeakers” database was produced within the French national project NEOLOGOS, as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). It comprises 200 adult French speakers (97 males, 103 females) recorded over the French fixed telephone network. |
|
|
| The LC-STAR Spanish phonetic lexicon comprises more than 100,000 words, including a set of 55,854 common words, a set of 45,403 proper names (including person names, family names, cities, streets, companies and brand names) and a list of 7,498 special application words. The lexicon is provided in XML format and includes phonetic transcriptions in SAMPA. |
|
|
| The LC-STAR Spanish phonetic lexicon comprises more than 100,000 words, including a set of 55,854 common words, a set of 45,403 proper names (including person names, family names, cities, streets, companies and brand names) and a list of 7,498 special application words. The lexicon is provided in XML format and includes phonetic transcriptions in SAMPA. |
|
|
| The LC-STAR Spanish phonetic lexicon comprises more than 100,000 words, including a set of 55,854 common words, a set of 45,403 proper names (including person names, family names, cities, streets, companies and brand names) and a list of 7,498 special application words. The lexicon is provided in XML format and includes phonetic transcriptions in SAMPA. |
|
| The LC-STAR Catalan phonetic lexicon comprises more than 100,000 words, including a set of 53,225 common words, a set of 45,306 proper names (including person names, family names, cities, streets, companies and brand names) and a list of 7,498 special application words. The lexicon is provided in XML format and includes phonetic transcriptions in SAMPA. |
|
|
This corpus consists of transcriptions from 92 hours of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English). The transcription files are stored in Transcriber XML file format.
For corresponding recordings, see ELRA-S0251 |
|
|
This corpus consists of transcriptions from 92 hours of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English). The transcription files are stored in Transcriber XML file format.
For corresponding recordings, see ELRA-S0251 |
|
|
This corpus consists of transcriptions from 92 hours of EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English). The transcription files are stored in Transcriber XML file format.
For corresponding recordings, see ELRA-S0251 |
|
This corpus consists of the recordings of around 290 hours form EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English, 92 hours of which were annotated (transcribed) (the transcriptions are not provided in the present package). Each file contains a single channel with 16-bit resolution at a sample rate of 16kHz.
For corresponding transcriptions, see ELRA-S0249. |
|
|
| This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items. |
|
| This speech database contains the recordings of 500 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items. |
|
| This speech database contains the recordings of 500 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 750 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 48 read and spontaneous items. |
|
| This speech database contains the recordings of 500 Arabic speakers recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 535 speakers of English recorded over the United Arab Emirates' fixed and mobile telephone network. Each speaker uttered around 51 read and spontaneous items. |
|
|
| This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 556 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 556 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
|
| This speech database contains the recordings of 757 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 556 Jordanian speakers recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 49 read and spontaneous items. |
|
| This speech database contains the recordings of 578 Jordanian speakers of English recorded over the Jordanian fixed and mobile telephone network. Each speaker uttered around 47 read and spontaneous items. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Audio and Video Recordings of 10 seminars
2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
3) Transcriptions using both TRS and STMUID formats. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Audio and Video Recordings of 10 seminars
2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
3) Transcriptions using both TRS and STMUID formats. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Audio and Video Recordings of 10 seminars
2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
3) Transcriptions using both TRS and STMUID formats. |
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Contents of the CHIL 2004 Evaluation Package (see catalogue reference ELRA-E0009 for description).
2) Audio and Video Recordings: 5 seminars recorded in November 2004).
3) Stereo Video Recordings of 10 subjects that move in the camera’s field of view while performing pointing gestures.
2) Video annotations.
3) Transcriptions. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Audio and Video Recordings of 10 seminars
2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
3) Transcriptions using both TRS and STMUID formats. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Audio and Video Recordings of 10 seminars
2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
3) Transcriptions using both TRS and STMUID formats. |
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Contents of the CHIL 2004 Evaluation Package (see catalogue reference ELRA-E0009 for description).
2) Audio and Video Recordings: 5 seminars recorded in November 2004).
3) Stereo Video Recordings of 10 subjects that move in the camera’s field of view while performing pointing gestures.
2) Video annotations.
3) Transcriptions. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Audio and Video Recordings of 10 seminars
2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
3) Transcriptions using both TRS and STMUID formats. |
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Contents of the CHIL 2004 Evaluation Package (see catalogue reference ELRA-E0009 for description).
2) Audio and Video Recordings: 5 seminars recorded in November 2004).
3) Stereo Video Recordings of 10 subjects that move in the camera’s field of view while performing pointing gestures.
2) Video annotations.
3) Transcriptions. |
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The CHIL 2006 Evaluation Package consists of:
1) A set of audiovisual recordings of seminars, called non-interactive seminars and of highly-interactive small working groups’ seminars, called interactive seminars. The recordings were done between 2004 and 2005 according to the “CHIL Room Setup” specification.
2) Video annotations.
3) Orthographic transcriptions. |
|
|
| The Hungarian Speecon database comprises the recordings of 555 adult Hungarian speakers and 50 child Hungarian speakers who uttered respectively over 290 items and 210 items (read and spontaneous). |
|
|
| The Hungarian Speecon database comprises the recordings of 555 adult Hungarian speakers and 50 child Hungarian speakers who uttered respectively over 290 items and 210 items (read and spontaneous). |
|
|
| The Hungarian Speecon database comprises the recordings of 555 adult Hungarian speakers and 50 child Hungarian speakers who uttered respectively over 290 items and 210 items (read and spontaneous). |
|
| The Czech Speecon database comprises the recordings of 550 adult Czech speakers and 50 child Czech speakers who uttered respectively over 290 items and 210 items (read and spontaneous). |
|
|
This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the English language.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the Spanish language.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2005 Automatic Speech Recognition (ASR) first evaluation campaign for the Mandarin Chinese language.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2005 Spoken Language Translation (SLT) first evaluation campaign for English-to-Spanish translation.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2005 Spoken Language Translation (SLT) first evaluation campaign for Spanish-to-English translation.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2005 Spoken Language Translation (SLT) first evaluation campaign for Chinese-to-English translation.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
| The CLEF Test Suite contains the data used for the main tracks of the CLEF campaigns carried out from 2000 to 2003: Multilingual text retrieval, Bilingual text retrieval, Monolingual text retrieval, and Domain-specific text retrieval. It contains multilingual corpora in English, French, German, Italian, Spanish, Dutch, Swedish, Finnish, Russian, and Portuguese. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Audio and Video Recordings of 10 seminars
2) Video annotations done displaying 1 over 10 pictures in sequence, for the 4 cameras.
3) Transcriptions using both TRS and STMUID formats. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The database consists of:
1) Contents of the CHIL 2004 Evaluation Package (see catalogue reference ELRA-E0009 for description).
2) Audio and Video Recordings: 5 seminars recorded in November 2004).
3) Stereo Video Recordings of 10 subjects that move in the camera’s field of view while performing pointing gestures.
2) Video annotations.
3) Transcriptions. |
|
|
This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the English language.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the Spanish language within the CORTES task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the Spanish language within the CORTES task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the Spanish language within the EPPS task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2006 Automatic Speech Recognition (ASR) second evaluation campaign for the Mandarin Chinese language.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
| This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for English-to-Spanish translation. It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for Spanish-to-English translation within the CORTES task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for Spanish-to-English translation within the CORTES task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for Spanish-to-English translation within the EPPS task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2006 Spoken Language Translation (SLT) second evaluation campaign for Chinese-to-English translation.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The CHIL 2006 Evaluation Package consists of:
1) A set of audiovisual recordings of seminars, called non-interactive seminars and of highly-interactive small working groups’ seminars, called interactive seminars. The recordings were done between 2004 and 2005 according to the “CHIL Room Setup” specification.
2) Video annotations.
3) Orthographic transcriptions. |
|
|
The ARCADE II Evaluation Package was produced within the French national project ARCADE II (Evaluation of parallel text alignment systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The ARCADE II project enabled to carry out a campaign for the evaluation in the field of multilingual alignment.
This package includes the material that was used for the ARCADE II evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system.
The campaign is distributed over two actions: sentence alignment and translation of named entities. |
|
|
The CESART Evaluation Package was produced within the French national project CESART (Evaluation of terminology extraction tools), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The CESART project enabled to carry out a campaign for the evaluation of terminological resources acquisition tools.
This package includes the material that was used for the CESART evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system.
The campaign is distributed over two actions: term extraction and relation extraction. |
|
|
The CESTA Evaluation Package was produced within the French national project CESTA (Evaluation of MT systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The CESTA project enabled to carry out a campaign for the evaluation of machine translation technologies.
This package includes the material that was used for the CESTA evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system.
The campaign is distributed over two actions: evaluation on a non restrictive vocabulary, evaluation on a specialised domain (evaluation after terminology enrichment). |
|
|
The ESTER Evaluation Package was produced within the French national project ESTER (Evaluation of Broadcast News enriched transcription systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The ESTER project enabled to carry out a campaign for the evaluation of Broadcast News enriched transcription systems for French.
This package includes the material that was used for the ESTER evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
The campaign is distributed over three actions: orthographic transcription, segmentation and information extraction (named entity tracking).
For research or commercial use, please refer to ELRA-S0241 ESTER Corpus. |
|
|
The EQueR Evaluation Package was produced within the French national project EQueR (Evaluation campaign for Question-Answering systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The EQueR project enabled to carry out a campaign for the evaluation of Question-Answering systems in French.
This package includes the material that was used for the EQueR evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
The campaign is distributed over two actions: one generic task and one specialised task (medical domain). |
|
|
The EvaSy Evaluation Package was produced within the French national project EvaSy (Evaluation of speech synthesis systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The EvaSy project enabled to carry out a campaign for the evaluation of speech synthesis systems using French text data.
This package includes the material that was used for the EvaSy evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
The campaign is distributed over three actions: evaluation of grapheme-to-phoneme conversion, evaluation of prosody, global evaluation of the quality of speech synthesis systems. |
|
|
The MEDIA Evaluation Package was produced within the French national project MEDIA (Automatic evaluation of man-machine dialogue systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The MEDIA project enabled to carry out a campaign for the evaluation of man-machine dialogue systems for French.
This package includes the material that was used for the MEDIA evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself.
The campaign is distributed over two actions: an evaluation taking into account the dialogue context and an evaluation not taking into account the dialogue context. |
|
|
This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for the English language.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for the Spanish language within the CORTES task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for the Spanish language within the EPPS task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2007 Automatic Speech Recognition (ASR) third evaluation campaign for the Mandarin Chinese language.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
| This package includes the material used for the TC-STAR 2007 Spoken Language Translation (SLT) third evaluation campaign for English-to-Spanish translation. It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2007 Spoken Language Translation (SLT) third evaluation campaign for Spanish-to-English translation within the CORTES task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2007 Spoken Language Translation (SLT) third evaluation campaign for Spanish-to-English translation within the EPPS task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2007 Spoken Language Translation (SLT) third evaluation campaign for Chinese-to-English translation.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2006 evaluation campaign within the end-to-end task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
This package includes the material used for the TC-STAR 2007 evaluation campaign within the end-to-end task.
It includes resources, protocols, scoring tools, results of the official campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. |
|
|
The CHIL Seminars are scientific presentations given by students, faculty members or invited speakers in the field of multimodal interfaces and speech processing. The language is European English spoken by non native speakers. The recordings comprise the following: videos of the speaker and the audience from 4 fixed cameras, frontal close ups of the speaker, close talking and far-field microphone data of the speaker’s voice and background sounds.
The CHIL 2007 Evaluation Package consists of:
1) A set of audiovisual recordings of interactive seminars. The recordings were done between June and September 2006 according to the “CHIL Room Setup” specification.
2) Video annotations.
3) Orthographic transcriptions. |
|
|
| The EASy Evaluation Package was produced within the French national project EASy (Evaluation of syntactic parsers of French), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The project enabled to carry out a campaign for the evaluation of syntactic parsers of French. This package includes the material that was used for the EASy evaluation campaign. It includes resources, protocols, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of these evaluation packages is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. The campaign is distributed over two actions: evaluation of constituent and dependency relation annotations. |
|
|
| Lexicon for morphological works of over 400,000 French entries divided into 55,000 nouns, 8,000 verbs, 16,850 adjectives, 2,000 adverbs. |
|
|
| Collocation lexicon. Up to 35,000 entries in French. An adding to ELRA-L0001. |
|
|
| 90,000 French inflectional forms divided into 25,000 nouns, 8,000 verbs that generate 25,000 model verbs, 1,000 adjectives, 1,500 adverbs. Morphosyntactical information in addition to L0001. |
|
|
| 64,000 entries from general vocabulary divided into 50,000 nouns, 7,000 verbs, 6,000 adjectives, 1,000 adverbs. Morphological, syntactical & semantic information. |
|
|
| 50,000 entries from general vocabulary divided into 36,000 nouns, 6,000 verbs, 7,000 adjectives, 1,000 adverbs. Morphological, syntactical & semantic information. |
|
|
| Set of lemmas/lexical entries (about 60,000) with the corresponding inflected word-forms, and a morphological engine for morphological analysis and generation. |
|
|
Lexicon
28,000 headwords and 21,000 senses |
|
|
| 25,000 entries. Each lexeme contains the word class, inflection, semantic features, syntactical frames (for verbs), and complement (for nouns & adj.). |
|
|
| 60,000 entries with morphological information, plus a software engine for generating inflected forms. |
|
|
This CD-ROM contains a set of lexicons developed in the MULTEXT project financed by the European Commission (LRE 62-050). The set contains the following languages:
English: 66,214 Word forms
French: 306,795 Word forms
German: 233,861 Word forms
Italian: 145,530 Word forms
Spanish: 510,710 Word forms |
|
|
| 60,000 lemmas of general vocabulary with morphosyntactical information (9,700 verbs, 35,500 nouns, 14,300 adjectives & 120 adverbs) plus 10,000 full-form adverbs. |
|
|
| A Generic monolingual Italian dictionary of 87,000 canonical forms. Multi-word terms contain morphological coding for the headword. |
|
|
| A Generic monolingual Italian dictionary of 612,000 inflected forms. Multi-word terms contain morphological coding for the headword. |
|
|
| A Generic monolingual Italian dictionary of 48,000 canonical forms (Technical). Multi-word terms contain morphological coding for the headword. |
|
|
| A Generic monolingual Italian dictionary of 96,000 inflected forms (Technical). Multi-word terms contain morphological coding for the headword. |
|
|
| 1,200 entries of simplified equivalents for French fixed expressions (“laid comme un crapaud” has equivalent "tres laid"). |
|
|
| 2,300 entries consisting of substantives of French verbs. |
|
|
| The dictionaries consist of a list of 5,487 sequences of 3, 4 or 5 characters which follow each other in French language words. In particular, they enable to locate misspelt sequences. |
|
|
| Generic dictionary. 21,000 entries of French uninflected noun phrases classified in 1,000 human entries, 4,200 concrete entries, 6,000 abstract entries. |
|
|
| 466,300 entries with a list of inflected words (97,000 nouns, 236,200 verbs, 130,500 adjectives/adverbs, 1,700 grammatical words, 40 punctuations, 400 prefixes, 370 suffixes). |
|
|
| 160,000 entries with a list of inflected words derived from 93,500 nouns, 35,800 verbs, 46,600 adjectives, 8,865 grammatical words. |
|
|
DST: 550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
The DST is distributed in different sub-sets:
L0020-01 String dictionary
L0020-02 Part of speech (optional)
L0020-03 Gender, number, conjugation (optional)
L0020-04 Lemma (optional)
L0020-05 Semantical information (optional)
L0020-06 Syntactical information (optional)
L0020-07 Prep/adv. phrases (optional)
L0020-08 Compound nouns (optional)
L0020-09 The whole dictionary |
|
|
DST: 550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
The DST is distributed in different sub-sets:
L0020-01 String dictionary
L0020-02 Part of speech (optional)
L0020-03 Gender, number, conjugation (optional)
L0020-04 Lemma (optional)
L0020-05 Semantical information (optional)
L0020-06 Syntactical information (optional)
L0020-07 Prep/adv. phrases (optional)
L0020-08 Compound nouns (optional)
L0020-09 The whole dictionary |
|
|
DST: 550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
The DST is distributed in different sub-sets:
L0020-01 String dictionary
L0020-02 Part of speech (optional)
L0020-03 Gender, number, conjugation (optional)
L0020-04 Lemma (optional)
L0020-05 Semantical information (optional)
L0020-06 Syntactical information (optional)
L0020-07 Prep/adv. phrases (optional)
L0020-08 Compound nouns (optional)
L0020-09 The whole dictionary |
|
|
DST: 550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
The DST is distributed in different sub-sets:
L0020-01 String dictionary
L0020-02 Part of speech (optional)
L0020-03 Gender, number, conjugation (optional)
L0020-04 Lemma (optional)
L0020-05 Semantical information (optional)
L0020-06 Syntactical information (optional)
L0020-07 Prep/adv. phrases (optional)
L0020-08 Compound nouns (optional)
L0020-09 The whole dictionary |
|
|
550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
The DST is distributed in different sub-sets:
L0020-01 String dictionary
L0020-02 Part of speech (optional)
L0020-03 Gender, number, conjugation (optional)
L0020-04 Lemma (optional)
L0020-05 Semantical information (optional)
L0020-06 Syntactical information (optional)
L0020-07 Prep/adv. phrases (optional)
L0020-08 Compound nouns (optional)
L0020-09 The whole dictionary |
|
|
DST: 550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
The DST is distributed in different sub-sets:
L0020-01 String dictionary
L0020-02 Part of speech (optional)
L0020-03 Gender, number, conjugation (optional)
L0020-04 Lemma (optional)
L0020-05 Semantical information (optional)
L0020-06 Syntactical information (optional)
L0020-07 Prep/adv. phrases (optional)
L0020-08 Compound nouns (optional)
L0020-09 The whole dictionary |
|
|
DST: 550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
The DST is distributed in different sub-sets:
L0020-01 String dictionary
L0020-02 Part of speech (optional)
L0020-03 Gender, number, conjugation (optional)
L0020-04 Lemma (optional)
L0020-05 Semantical information (optional)
L0020-06 Syntactical information (optional)
L0020-07 Prep/adv. phrases (optional)
L0020-08 Compound nouns (optional)
L0020-09 The whole dictionary |
|
|
DST: 550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
The DST is distributed in different sub-sets:
L0020-01 String dictionary
L0020-02 Part of speech (optional)
L0020-03 Gender, number, conjugation (optional)
L0020-04 Lemma (optional)
L0020-05 Semantical information (optional)
L0020-06 Syntactical information (optional)
L0020-07 Prep/adv. phrases (optional)
L0020-08 Compound nouns (optional)
L0020-09 The whole dictionary |
|
|
DST: 550,000 inflected forms in French (43,000 common nouns, 10,938 proper nouns, 19,500 adjectives, 8,150 nouns-adjectives, 6,800 verbs, 6,200 compound nouns, etc.). Syntactical, semantic, lexicological information.
The DST is distributed in different sub-sets:
L0020-01 String dictionary
L0020-02 Part of speech (optional)
L0020-03 Gender, number, conjugation (optional)
L0020-04 Lemma (optional)
L0020-05 Semantical information (optional)
L0020-06 Syntactical information (optional)
L0020-07 Prep/adv. phrases (optional)
L0020-08 Compound nouns (optional)
L0020-09 The whole dictionary |
|
|
| 25,610 verbs with usage domains, level of language, conjugation, auxiliary, verbal adjectives in -able, -ant or -e, encoded syntactical constructions, sample phrases, synonyms, operators enabling semantic-syntactic classification, encoding of derived forms in -age, -ment, -tion, -oir, -ure, deverbal nouns, base words from which verbs can be derived, a scale of usage ranging from 1 to 6. |
|
|
| 126,844 French words with usage domains, grammatical category (gender, number, uncountable, collective, adjectival, nominal, verbal, adverbial derived forms). |
|
|
| 4,286 suffixes and prefixes, plus information on their verbal, nominal or adjectival bases or on the verbal basis of greco-latin items. |
|
|
| 3,480 entries based on the model of the dictionary of French verbs (ELRA-L0021). |
|
|
| 4,783 entries based on the model of the dictionary of words (ELRA-L0022). |
|
|
| 1,901 entries based on the model of the dictionary of invariable forms and phrases (ELRA-L0025). |
|
|
| 38,965 entries in lower cases with accents, controlled on the guide Michelin, without localities. |
|
|
| 2,138 compound names and 1,397 entries of plural-only words. |
|
|
Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
The database is divided into different subsets:
L0029-01 Complete set of data;
L0029-02 Subset Orthography;
L0029-03 Subset Phonology;
L0029-04 Subset Morphology Infl.;
L0029-05 Subset Morphology Der.;
L0029-06 Subset Syntax;
L0029-07 Subset Frequency. |
|
|
Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
The database is divided into different subsets:
L0029-01 Complete set of data;
L0029-02 Subset Orthography;
L0029-03 Subset Phonology;
L0029-04 Subset Morphology Infl.;
L0029-05 Subset Morphology Der.;
L0029-06 Subset Syntax;
L0029-07 Subset Frequency. |
|
|
Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
The database is divided into different subsets:
L0029-01 Complete set of data;
L0029-02 Subset Orthography;
L0029-03 Subset Phonology;
L0029-04 Subset Morphology Infl.;
L0029-05 Subset Morphology Der.;
L0029-06 Subset Syntax;
L0029-07 Subset Frequency. |
|
|
Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
The database is divided into different subsets:
L0029-01 Complete set of data;
L0029-02 Subset Orthography;
L0029-03 Subset Phonology;
L0029-04 Subset Inflectional Morphology;
L0029-05 Subset Derivational Morphology;
L0029-06 Subset Syntax;
L0029-07 Subset Frequency. |
|
|
Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
The database is divided into different subsets:
L0029-01 Complete set of data;
L0029-02 Subset Orthography;
L0029-03 Subset Phonology;
L0029-04 Subset Inflectional Morphology;
L0029-05 Subset Derivational Morphology;
L0029-06 Subset Syntax;
L0029-07 Subset Frequency. |
|
|
Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
The database is divided into different subsets:
L0029-01 Complete set of data;
L0029-02 Subset Orthography;
L0029-03 Subset Phonology;
L0029-04 Subset Morphology Infl.;
L0029-05 Subset Morphology Der.;
L0029-06 Subset Syntax;
L0029-07 Subset Frequency. |
|
|
Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries).
The database is divided into different subsets:
L0029-01 Complete set of data;
L0029-02 Subset Orthography;
L0029-03 Subset Phonology;
L0029-04 Subset Morphology Infl.;
L0029-05 Subset Morphology Der.;
L0029-06 Subset Syntax;
L0029-07 Subset Frequency. |
|
|
| 67,500 entries divided into 242 inflectional types (including proper nouns), morphosyntactic information for each entry, and a morphological engine (MS DOS and WINDOWS 95/NT) for morphological analysis and generation. |
|
|
| The entry list of the lexicon consists of about 20,200 entries distributed over 13 parts of speech (POS). The entries have been described along the dimensions of morphosyntax and syntax, according to the specifications of the PAROLE project. The lexicon is set up as an SGML file. |
|
|
| The PAROLE Greek lexicon has two layers, morphological and syntactic. It includes the most frequent words found in a 9 million word corpus, coded according to the PAROLE specifications. The Morphological layer contains a total of 20149 Morphological units. The Syntactic layer contains 25092 Syntactic units. |
|
|
| LusoLEX: Multifunctional monolingual lexicon of the European variety of Portuguese, consisting of about 61,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics, augmentatives and diminutives. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations. |
|
|
| BrasiLEX: Multifunctional monolingual lexicon of the Brazilian variety of Portuguese, consisting of about 65,000 entries (lemmas) and 1,600 correspondent inflexion paradigms. The set of entries includes compound words and the inflexion paradigms include information regarding enclitics and augmentative/diminutive degree. Morphological information is encoded with maximum granularity and is conformant with the EAGLES recommendations. |
|
|
| The PAROLE Portuguese Lexicon is constituted by 20 thousand entries morpho-syntactically and syntactically encoded, accordingly to the parole common encoding standards. The data is in SGML format. |
|
|
| The Japanese Word Dictionary is composed of 260,000 Japanese word records arranged alphabetically according to the Japanese syllabary. |
|
|
| The English Word Dictionary, composed of 190,000 English word records arranged alphabetically. |
|
|
The Concept Dictionary, which provides 400,000 concepts that are made reference to in the Japanese and English Word Dictionaries (ref. ELRA-L0036 and L0037), the Japanese-English and English-Japanese Bilingual Dictionaries (ref. ELRA-M0023 and M0024) as well as in the Japanese and English Co-occurrence Dictionaries (ref. ELRA-L0039 and L0040). The Concept Dictionary is composed of three separate dictionaries:
- the Headconcept Dictionary gives a description of each concept in words
- the Concept Classification Dictionary contains a classification of concepts that have a super-sub relation
- the Concept Description Dictionary provides all other information regarding the relation between concepts. |
|
|
| The Japanese Co-occurrence Dictionary, composed of 900,000 headphrase notations arranged according to the Japanese syllabary. Appendix to the Japanese Co-occurrence Dictionary: The Japanese Corpus |
|
|
| The English Co-occurrence Dictionary (ref. ELRA-L0040), composed of 460,000 alphabetically arranged of headphrases. Appendix to the English Co-occurrence Dictionary: The English Corpus. |
|
|
| The Technical Terms Dictionary (Information processing) contains 80,000 technical terms in English and 120,000 technical terms in Japanese from the field of information processing. |
|
|
| The PAROLE Spanish lexicon follows standard PAROLE architecture. It contains about 22,000 morphological units, of which 12,209 are common nouns, 3,367 verbs, 4,996 adjectives. |
|
|
| The PAROLE English lexicon consists of 22 000 morphological units extracted from the CRL-LKB and COBUILD dictionaries: 12998 are common nouns, 40 proper nouns, 4195 verbs, 3208 adjectives, 606 adverbs, 71 adpositions, 2 articles, 21 conjunctions, 25 determiners and 53 pronouns. |
|
|
| This monolingual lexicon produced by Kaist Korterm consists of 31 476 compound nouns in Korean. |
|
|
| NODE: The NODE contains 170,000 entries covering all varieties of English worldwide. It has been designed for language engineering and to be used in NLP applications, and is available in XML or in SGML. The NODE data set includes morphological information linked to the lemma, phrases and idioms, subject classification, with over 200 key domains, semantic relationships, etc. |
|
|
| The first edition of the DIMAP version of NODE is a machine-tractable version of the machine-readable dictionary files in the DIMAP dictionary maintenance programs, adding syntactic and semantic information in the conversion. Apart from mechanisms which will allow research into representational formalisms and explorations of the use of these representations in extending the lexical database and in processing text for information extraction, text summarization, discourse analysis and other LE applications, DIMAP also includes semantic links between entries, thus making NODE+DIMAP a semantic network of the English language. |
|
|
NOTE: This thesaurus contains 628,000 alternative words, including 573,000 synonyms, the rest being antonyms, related terms, combining forms, and hyponyms, and is available in SGML. Nearly 38,000 senses are also presented with a corpus-based example.
It is available in SGML. |
|
|
| The Oxford Paperback Thesaurus, available in SGML, contains 15,000 headwords, over 300,000 synonyms, and 29,000 different senses presented with corpus-based examples. |
|
|
SCIPER-FR-EURADIC: This French monolingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 112,216 lemmas (694,673 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038. |
|
|
SCIPER-AN-EURADIC: This English monolingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 171,713 lemmas (365,823 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038. |
|
|
SCIPER-AL-EURADIC: This German monolingual dictionary was developed within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 157,810 lemmas (17,634,834 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0050, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038. |
|
|
SCIPER-ES-EURADIC: This Spanish monolingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 83,952 lemmas (838,391 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038. |
|
|
SCIPER-IT-EURADIC: This Italian monolingual dictionary was developed within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 70,951 lemmas (557,204 inflected forms), with their part of speech and some information related to their inflexion. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038. |
|
|
| LABEL-LEX (MW) is a Portuguese formalized lexicon, containing 88 619 inflected multiword lexical units (formally, sequences of simple words). |
|
|
| LABEL-LEX (SW) is a Portuguese formalized lexicon, containing 1.545.156 simple inflected words. Each dictionary entry is associated to a lemma; information about POS and morphological attributes - such as gender, number, person, case (for personal pronouns), tense, mood, diminutives, augmentatives, and superlative - is systematically formalized for each lexical entry. |
|
|
| STO: The STO Lexicon is the most comprehensive computational lexicon of Danish comprising approx. 81,530 entry words including morphological, syntactical and semantic information and it is well integrated with the European activities in the field of lexicon development building on experience obtained from the PAROLE and SIMPLE projects. The model and descriptive method of the STO lexicon are kept compatible with the architecture and descriptive language of PAROLE/SIMPLE. A number of refinements, adaptations and language-specific extensions to the basic model are implemented in STO. |
|
|
| EDBL (Lexical database for Basque) is made up of about 75,000 entries divided into dictionary entries, verb forms and dependent morphemes, all of them with their respective morphological information. It was first developed as a lexical support for the spelling checker and corrector XUXEN, and later for the morphological analyser MORFEUS and the lemmatiser EUSLEM. |
|
|
| BESL consists of over 230,000 lemmas, over 350,000 word forms, 60,000 proper nouns, 3,000 abbreviations, and 58,000 multi-word compound nouns. Each headword is provided with a full listing of all inflected forms and other morphological variation. Every word form is marked for part of speech (using Penn TreeBank notation). Most single-word forms include a representation of IPA pronunciation. BESL covers both British and American English, and other spelling variants, with cross-references between corresponding forms. BESL is provided in XML. |
|
|
| This list features 4500 words and expressions for UK and US English usage with a grading system describing vocabulary type and offensive strength for each term, plus collocational information to help identify the terms in context. The list is provided in tab-delimited ASCII |
|
|
| This list features 2000 words and expressions, classified into 13 categories, for UK and US English usage with a grading system describing vocabulary type and offensive strength for each term, plus collocational information to help identify the terms in context. The list is provided in an Excel spreadsheet. |
|
|
| This dictionary consists of 300,000 words and phrases, 500,000 translations, for 24 regional varieties of Spanish. It includes thousands of real, authentic example sentences carefully selected to illustrate the full range of meanings and typical contexts. The dictionary is provided in XML or SGML. |
|
|
| This source lexicon contains morphological and phonetic data for French. It consists of over 90,000 headwords/lemmas, 400,000 wordforms, 1,000 abbreviations, and 35,000 proper nouns. Each headword lemma is provided with a full listing of its possible syntactic forms and spelling variants, along with information on their relationship to the headword form. In addition, a representation of the IPA pronunciation is given for every form. There is also information on domains in which the headwords are used, e.g. Computing, Engineering, Zoology. The lexicon is provided in SGML. |
|
|
| This source lexicon contains morphological and phonetic data for Spanish. It consists of over 575,000 wordforms, 1,000 abbreviations, and 25,000 proper nouns. Each headword lemma is provided with a full listing of its possible syntactic forms and spelling variants, along with information on their relationship to the headword form. In addition, a representation of the IPA pronunciation is given for every form. There is also information on domains in which the headwords are used, e.g. Computing, Engineering, Zoology. The lexicon is provided in SGML. |
|
|
| This source lexicon contains morphological and phonetic data for Italian. It consists of over 115,000 headwords/lemmas and 925,000 wordforms. Each headword lemma is provided with a full listing of its possible syntactic forms and spelling variants, along with information on their relationship to the headword form. In addition, a representation of the IPA pronunciation is given for every form. There is also information on domains in which the headwords are used, e.g. Computing, Engineering, Zoology. The lexicon is provided in SGML. |
|
|
The KORLEX - Croatian Lexicon provides a list of 118,252 Croatian lemmas (including 52,450 nouns, 8,985 adverbs, 14,937 verbs and 41,161 adjectives, as well as pronouns, determiners, prepositions/postpositions, conjunctions and numerals), i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features.
The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news, in business, technological documentation, legal documentation, and politics. The resource is a flat textual file in which each textual line contains information about one lemma. The resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Croatian lexicographic order. |
|
|
The KORLEX - Serbian Lexicon provides a list of 108,491 Serbian lemmas (including 52,027 nouns, 9,153 adverbs, 15,522 verbs and 31,052 adjectives, as well as pronouns, determiners, prepositions/postpositions, conjunctions and numerals), i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features.
The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news, in business, technological documentation, legal documentation, and politics. The resource is a flat textual file in which each textual line contains information about one lemma. The resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Serbian lexicographic order. |
|
|
| This English lexicon is made up of 174,000 inflected forms corresponding to 68,000 simple word lemmas (including 31,900 nouns, 11,800 verbs, 19,900 adjectives, 4,100 adverbs, 300 pronouns, articles, prepositions/postpositions and conjunctions). Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information. |
|
|
| This French lexicon is made up of 424,000 inflected forms corresponding to 55,000 simple word lemmas (including 34,400 nouns, 7,300 verbs, 11,700 adjectives, 1,400 adverbs, 200 pronouns, articles, prepositions/postpositions and conjunctions). Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information. |
|
|
| This Italian lexicon is made up of 862,500 inflected forms corresponding to 112,000 simple word lemmas (including 66,340 nouns, 12,030 verbs, 28,080 adjectives, 4,890 adverbs, 660 pronouns, articles, prepositions/postpositions and conjunctions). Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information. |
|
|
| This Italian lexicon is the same as the one described in ELRA-L0069, but with the addition of clitic verbs, which increases the number of inflected forms to 1,800,000 (still corresponding to 112,000 simple words lemmas). It contains 66,340 nouns, 12,030 verbs, 28,080 adjectives, 4,890 adverbs, 660 pronouns, articles, prepositions/postpositions and conjunctions. Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information. |
|
|
| This Spanish lexicon is made up of 816,000 inflected forms corresponding to 104,000 simple word lemmas (including 52,000 nouns, 9,800 verbs, 21,200 adjectives, 20,500 adverbs, 500 pronouns, articles, prepositions/postpositions and conjunctions). Each line in the resource file shows an inflected form, its part of speech, its related lemma and its morphological information. |
|
|
PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).
This lexicon is subdivided into five different subsets:
L0072-01 Full lexicon
L0072-02 Phonetic layer
L0072-03 Morphological layer
L0072-04 Syntactic layer
L0072-05 Semantic layer |
|
|
PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).
This lexicon is subdivided into five different subsets:
L0072-01 Full lexicon
L0072-02 Phonetic layer
L0072-03 Morphological layer
L0072-04 Syntactic layer
L0072-05 Semantic layer |
|
|
PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).
This lexicon is subdivided into five different subsets:
L0072-01 Full lexicon
L0072-02 Phonetic layer
L0072-03 Morphological layer
L0072-04 Syntactic layer
L0072-05 Semantic layer |
|
|
PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).
This lexicon is subdivided into five different subsets:
L0072-01 Full lexicon
L0072-02 Phonetic layer
L0072-03 Morphological layer
L0072-04 Syntactic layer
L0072-05 Semantic layer |
|
|
PAROLE-SIMPLE-CLIPS is a four-level, general purpose lexicon that has been elaborated over three different projects. The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). The PAROLE-SIMPLE-CLIPS Pisa Italian Lexicon was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries (to be released soon).
This lexicon is subdivided into five different subsets:
L0072-01 Full lexicon
L0072-02 Phonetic layer
L0072-03 Morphological layer
L0072-04 Syntactic layer
L0072-05 Semantic layer |
|
|
| DIINAR.1 is an Arabic Lexical Resource which includes a total number of 119,693 lemmas, fully vowelled, and distributed as follows: 29,534 nouns and adjectives, 19,457 verbs, 70,702 deverbals (including 23,274 infinitive forms, 17,904 active participles, 13,373 passive participles, 5,781 ‘analogous adjectives’, 10,370 ‘nouns of place & time’). The data is provided in Excel files and was generated with inflected forms. Each entry has been associated with morpho-syntactic specifiers. |
|
|
The POLEX Polish Lexicon is a morphological dictionary of Polish language. It comprises about 100,000 entries. The POLEX dictionary includes the core Polish vocabulary of general interest. It is based on a precise machine-interpretable formalism (coding system), the same for all categories (classes of speech). The dictionary entries are of the following form:
BASIC_FORM+LIST_OF_STEMS+PARADIGMATIC_CODE+DISTRIBUTION_OF_STEMS
It contains more than 42,000 nouns, 12,000 verbs, 15,000 adjectives, 25,000 participles, and about 200 pronouns. A simple lemmatiser (in form of PROLOG prototype) is also included. |
|
|
| This database contains 81,647 entries in Bulgarian with a linguistic environment tool (for WINDOWS XP). The data may be used for morphological analysis and synthesis, syntactic agreement checking, phonetic stress determining. |
|
|
| The lexicon contains 2,180 Dutch abbreviations and acronyms. It complies with the official Dutch Spelling (2005/6). Each entry consists of an ID, word form, lemma and part of speech. |
|
|
| The lexicon contains 400,463 Dutch words, comprising 236,369 nouns, 90,882 adjectives, 69,744 verbs, 2,120 adverbs, and 1,348 items from other categories (pronouns, determiners, articles, adpositions, conjunctions, numerals, etc.). It complies with the official Dutch Spelling (2005/6). The lexicon contains an ID, word form, lemma and part of speech. |
|
|
| The lexicon contains 24,247 Dutch proper names. Various sorts of proper names are included, such as first names, last names, geographical names etc. Each entry contains an ID, word form, lemma, part of speech and proper name type. |
|
|
| The lexicon contains 15,987 Dutch words from the business domain, comprising 13,774 nouns, 1,267 adjectives, 895 verbs, 9 adverbs, and 42 items from other categories. The lexicon complies with the official Dutch Spelling (2005). Each entry contains an ID, word form and part of speech. |
|
|
| The lexicon contains 6,207 Dutch words from the legal domain, comprising 4,781 nouns, 810 adjectives, 573 verbs, 12 adverbs and 31 items from other categories. It complies with the official Dutch Spelling (2005/6). Each entry contains an ID, word form and part of speech. |
|
|
| The lexicon contains 17,115 Dutch words from the medical domain, comprising 12,638 nouns, 3,107 adjectives, 1,273 verbs, 11 adverbs and 86 items from other categories. It complies with the official Dutch Spelling (2005/6). Each entry contains an ID, word form and part of speech. |
|
|
| The lexicon contains 12,551 Dutch words from the social domain, comprising 9,984 nouns, 1,306 adjectives, 1,161 verbs, 56 adverbs and 44 items from other categories. It complies with the official Dutch Spelling (2005/6). Each entry contains an ID, word form and part of speech. |
|
|
| The lexicon contains 9,940 Dutch words from the technical/scientific domain, comprising 8,832 nouns, 950 adjectives, 111 verbs, 2 adverbs and 45 items from other categories. It complies with the official Dutch Spelling (2005/6). Each entry contains an ID, word form and part of speech. |
|
|
| MACPLEX comprises two dictionaries: a dictionary of lemmas (over 80,000 entries) and a dictionary of word forms (over 1,300,000 entries). Morphological information (PoS, gender, case, definiteness, number for nouns, tense, person, etc. for verbs) is available for each entry. Out of the more than 1,300,000 word forms, there are 345,350 nouns, 467,744 adjectives, 500,220 verbs and 19,472 adverbs. The remaining entries correspond to pronouns, adpositions, conjunctions and numerals. The lexicon is available in Unicode. |
|
|
| euLEX is a general lexicon which contains 115,000 entries, divided into 94,000 dictionary entries or lemmas, 12,000 allomorphs, 7,500 verb forms and about 1,200 dependent morphemes. All entries include linguistic information such as morphology and usage. The lexicon is in XML. |
|
|
| 30,000 entries (associated by the meaning) for French, English, Italian, German, Spanish with lexical categories. |
|
|
| 10,642 entries (with morphological information) for Economics, law & business management. |
|
|
| 3,144 entries (with morphological information) for Leisure, Tourism, Sports, Food. |
|
|
| 4,116 entries (with morphological information) for Geography, History, Arts. |
|
|
| 4,089 entries (with morphological information) for Sociology, Psychology, Pedagogy. |
|
|
| 10,535 entries (with morphological information) for Natural and medical sciences. |
|
|
| 10,616 entries (with morphological information) for Exact sciences, Physics, Chemistry, Geology. |
|
|
| 4,904 entries (with morphological information) for Data Processing, Electronics, Telecoms. |
|
|
| 11,953 entries (with morphological information) for Technology, Engineering & Construction. |
|
|
| 1,320 entries (with morphological information) for Economics. |
|
|
| 3,565 entries (with morphological information) for Data Processing. |
|
|
| 3,733 entries (with morphological information) for Telecommunications. |
|
|
| 1,760 entries (with morphological information) for Electrical Engineering. |
|
|
| 9,022 entries (with morphological information) for Plastics and Chemistry. |
|
|
| 23,170 entries (with morphological information) for Aeronautics, Navigation, Mechanical Engineering. |
|
|
| 10,000 entries giving the German lexeme and Danish equivalent with word class, subject area, indication of structural changes, developed for machine translation. |
|
|
Vocabularies for transfer: General Vocabulary, 26,000 entries.
Each entry contains domain information, source language disambiguation, features, target language actions. |
|
|
Vocabularies for transfer: Administrative, 32,000 entries.
Each entry contains domain information, source language disambiguation, features, target language actions. |
|
|
Vocabularies for transfer: Data processing, 10,000 entries.
Each entry contains domain information, source language disambiguation, features, target language actions. |
|
|
| General vocabulary for transfer. 33,287 entries consisting of nouns (about 14,000), verbs (about 7,000), adjectives (about 5,000), adverbs (about 1,000), including a domain information, source language disambiguation, features, target language actions. |
|
|
Vocabularies for transfer: General Vocabulary, 34,000 entries.
Each entry contains source language disambiguation, features, and target language actions, developed for automatic translation. |
|
|
Vocabularies for transfer: Administrative, 18,000 entries.
Each entry contains source language disambiguation, features, and target language actions, developed for automatic translation. |
|
|
Vocabularies for transfer: Data processing, 10,000 entries.
Each entry contains source language disambiguation, features, and target language actions, developed for automatic translation. |
|
|
| General vocabulary for transfer. 39,453 entries: nouns (about 21,000), verbs (about 9,000), adjectives (about 3,000), adverbs (about 1,000), including domain information, source language disambiguation, features, and target language actions, developed for automatic translation. |
|
|
| 6,800 technical entries giving the German lexeme and Danish equivalent with word class, subject area, indication of structural changes, developed for machine translation. |
|
|
| 15,500 general entries giving the German lexeme and Danish equivalent with word class, subject area, indication of structural changes, developed for machine translation. |
|
|
| Computer Science, canonical forms: 17,800 entries, German=>Italian. Data contain morphological coding. |
|
|
| Computer Science, canonical forms: 17,800 entries, Italian=>German. Data contain morphological coding. |
|
|
| Computer science, inflected forms: 35,000 entries, German=>Italian. Data contain morphological coding. |
|
|
| Computer Science, inflected forms: 35,000 entries, Italian=>German. Data contain morphological coding. |
|
|
| Aeronautics: 6,300 entries, English=>Italian. Data contain morphological coding. |
|
|
| Aeronautics: 6,300 entries, Italian=>English. Data contain morphological coding. |
|
|
| Law, canonical forms: 8,900 entries, English=>Italian. Data contain morphological coding. |
|
|
| Law, canonical forms: 8,900 entries, Italian=>English. Data contain morphological coding. |
|
|
| Law, inflected forms: 18,000 entries, English=>Italian. Data contain morphological coding. |
|
|
| Law, inflected forms: 18,000 entries, Italian=>English. Data contain morphological coding. |
|
|
| Computer science, canonical forms: 15,700 entries, English=>Italian. Data contain morphological coding. |
|
|
| Computer science, canonical forms: 15,700 entries, Italian=>English. Data contain morphological coding. |
|
|
| Computer science, inflected forms: 32,000 entries, English=>Italian. Data contain morphological coding. |
|
|
| Computer science, inflected forms: 32,000 entries, Italian=>English. Data contain morphological coding. |
|
|
| Medicine, canonical forms: 20,000 entries, English=>Italian. Data contain morphological coding. |
|
|
| Medicine, canonical forms: 20,000 entries, Italian=>English. Data contain morphological coding. |
|
|
| Economics, canonical forms: 50,000 entries, English=>Italian. Data contain morphological coding. |
|
|
| Economics, canonical forms: 50,000 entries, Italian=>English. Data contain morphological coding. |
|
|
| Economics, inflected forms: 86,000 entries, English=>Italian. Data contain morphological coding. |
|
|
| Economics, inflected forms: 86,000 entries, Italian=>English. Data contain morphological coding. |
|
|
| Engineering, canonical forms: 13,000 entries, English=>Italian. Data contain morphological coding. |
|
|
| Engineering, canonical forms: 13,000 entries, Italian=>English. Data contain morphological coding. |
|
|
| Engineering, inflected forms: 27,000 entries, English=>Italian. Data contain morphological coding. |
|
|
| Engineering, inflected forms: 27,000 entries, Italian=>English. Data contain morphological coding. |
|
|
| The bilingual English-German collocational dictionary consists of around 40,000 English headwords, including concepts expressed with more than one word (e.g. "the awareness of the environment" or "lame duck") and hyphenated compounds. It contains verbs, adjectives, synonyms and phrases that collocate with the headword. It provides the German equivalents for the headwords as well as their English synonyms. |
|
|
25000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
60000 entries
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
100000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
200000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
100000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
200000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
126000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
20000 entries
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
400000+ entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
110000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
234000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
110000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
110000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
110000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
30000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
100000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
72000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
120000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
60000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage and semantic features. |
|
|
60000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage and semantic features. |
|
|
60000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage and semantic features. |
|
|
30000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
124000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
150000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
30000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
80000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
124000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
10000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
30000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
30000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
30000 entries
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
40000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
60000 entries.
Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features. |
|
|
Each EuroWordNet database is composed of the following:
- The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
- A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
- A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
- A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
- WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
Number of synsets for M0015 = 16361 synsets |
|
|
Each EuroWordNet database is composed of the following:
- The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
- A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
- A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
- A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
- WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
Number of synsets for M0016 = 44015 synsets |
|
|
Each EuroWordNet database is composed of the following:
- The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
- A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
- A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
- A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
- WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
Number of synsets for M0017 = 23370 synsets |
|
|
Each EuroWordNet database is composed of the following:
- The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
- A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
- A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
- A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
- WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
Number of synsets for M0018 = 48529 synsets |
|
|
Each EuroWordNet database is composed of the following:
- The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
- A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
- A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
- A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
- WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
Number of synsets for M0019 = 15132 synsets |
|
|
Each EuroWordNet database is composed of the following:
- The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
- A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
- A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
- A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
- WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
Number of synsets for M0020 = 22745 synsets |
|
|
Each EuroWordNet database is composed of the following:
- The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
- A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
- A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
- A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
- WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
Number of synsets for M0021 = 12824 synsets |
|
|
Each EuroWordNet database is composed of the following:
- The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created.
- A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions.
- A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records.
- A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets.
- WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format.
Number of synsets for M0022 = 9317 synsets |
|
|
Produced through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production & Packaging - LE4-8335), these bilingual dictionaries contain more than 350,000 pairs of words (in tabular form) in XML format:
- Russian-English dictionary - more than 130,000 entries
- English-Russian dictionary - more than 95,000 entries
Each entry contains: source word (lemma); part of speech of source word; target word(s) (lemma(s)), grouped by same meaning; part of speech of target word(s); domain(s). |
|
|
| MultiWordNet: MultiWordNet contains information about the following aspects of the English and Italian lexical: lexical relations between words, semantic relations between lexical concepts, correspondences between Italian and English lexical concepts, semantic fields. Information about 51,000 Italian words meanings and 28,000 synsets (in correspondence with the English equivalents) is included. MultiWordNet can be used for NLP applications such as information retrieval, semantic tagging, disambiguation, terminology, etc. |
|
|
| MultiWordNet: MultiWordNet contains information about the following aspects of the English and Italian lexical: lexical relations between words, semantic relations between lexical concepts, correspondences between Italian and English lexical concepts, semantic fields. Information about 51,000 Italian words meanings and 28,000 synsets (in correspondence with the English equivalents) is included. MultiWordNet can be used for NLP applications such as information retrieval, semantic tagging, disambiguation, terminology, etc. |
|
|
| Over 100,000 words, phrases and translations are included in this bilingual minidictionary, which is available in SGML. Complementary information, such as usage notes, is also provided. |
|
|
| This bilingual dictionary contains 150,000 words and phrases, and 240,000 translations, and is available in XML and SGML. |
|
|
| This is a mid-sized dictionary to cover essential terms and vocabulary, available in XML and SGML. It contains 80,000 words and phrases, and 115,000 translations. |
|
|
| The coverage of this concise Oxford Spanish dictionary includes 24 varieties of Spanish as it is written and spoken throughout the Spanish-speaking world. This bilingual dictionary contains 170,000 words and phrases and 240,000 translations. It is available in SGML and XML. |
|
|
| This dictionary covers the general language of Business across a range of core areas. It contains over 50,000 words and phrases, and is available in SGML. |
|
|
| This dictionary covers the general language of Business across a range of core areas. It contains over 50,000 words and phrases, and is available in SGML. |
|
|
SCI-FRAN-EURADIC: This bilingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 243,539 pairs of French-English terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038. |
|
|
SCI-FRAL-EURADIC: This bilingual dictionary was developed within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 170,967 pairs of French-German terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0035, ELRA-M0036, ELRA-M0037, ELRA-M0038. |
|
|
SCI-FRES-EURADIC: This bilingual dictionary was increased and improved within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 102,941 pairs of French-Spanish terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0036, ELRA-M0037, ELRA-M0038. |
|
|
SCI-FRIT-EURADIC: This bilingual dictionary was developed within the French national project EurRADic (European and Arabic Dictionaries and Corpora), as part of the Technolangue programme funded by the French Ministry of Industry. It contains 116,587 pairs of French-Italian terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0037, ELRA-M0038. |
|
|
SCI-ANES: This bilingual dictionary contains around 60,000 pairs of English-Spanish terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0038. |
|
|
SCI-AN-ALL: This bilingual dictionary contains 59,758 pairs of English-German terms, with their part of speech. The data are presented in a table format, where information related to each entry is separated by ";".
See also ELRA-L0049, ELRA-L0050, ELRA-L0051, ELRA-L0052, ELRA-L0053, ELRA-M0033, ELRA-M0034, ELRA-M0035, ELRA-M0036, ELRA-M0037. |
|
|
SCI-ALRU: This bilingual dictionary contains around 80,000 pairs of German-Russian terms, with their part of speech.
The data are presented in a table format, where information related to each entry is separated by ";". |
|
|
| DixAF: DixAF is a French-Arabic, Arabic-French dictionary, which consists of around 125,000 binary links between ca. 43,000 French entries and ca. 35,000 Arabic entries. |
|
|
| The Bulgarian WordNet was initially developed within the framework of the BalkaNet project "Multilingual Semantic Network for the Balkan Languages" (IST-2000-29388) and later on under the scope of the BulNet project, funded at the national level. It models nouns, verbs, adjectives, and (occasionally) adverbs, and contains 23,715 word senses (synsets). |
|
|
| ItalWordNet (Italian WordNet) is an updated version of the EuroWordNet Italian database. The ItalWordNet database was produced within a national Italian programme called SI-TAL. It contains a total of 49,360 synsets. The ItalWordNet is provided in XML format. The original EuroWordNet Italian database is also included in this package. |
|
|
| This lexicon is provided in structured XML of OLIF (Open Lexicon Interchange Format) format. It comprises 99,211 entries in its source language (Russian) and 134,828 entries in its target language (English). The source entries are distributed as follows: 64,487 nouns, 11,470 adjectives, 19,724 verbs, 1,762 adverbs, and 1,768 closed-class elements (interjections, special prefixes, suffixes, etc.). Nouns contain gender and number information and verbs provide details on aspect and reflexivity. The entries contain semantic information in terms of domain specification or style information (e.g., colloquial, regional use, etc.). Moreover, definitions are available for 59,775 entries, as well as collocational information for 39,148 entries. |
|
|
| This lexicon is provided in structured XML of OLIF (Open Lexicon Interchange Format) format. It comprises 58,247 entries in English and 58,300 in Swahili. The source entries are distributed as follows: 36,046 nouns, 3,013 adjectives, 18,308 verbs and 880 closed-class entries. The entries contain semantic information in terms of domain specification or style information (e.g., colloquial, regional use, etc.). Collocational information is also available for 17,570 entries. |
|
|
| This lexicon is provided in structured XML of OLIF (Open Lexicon Interchange Format) format. It comprises 1,988 entries in Cebuano and 1,990 in English. The source entries are distributed as follows: 1,052 nouns, 462 adjectives, 405 verbs and 69 closed-class entries. The entries contain semantic information in terms of domain specification or style information (e.g., colloquial, regional use, etc.). Collocational information is also available for 500 entries. |
|
|
| This lexicon is provided in structured XML of OLIF (Open Lexicon Interchange Format) format. It comprises 31,718 entries in English and 32,125 in Czech. The source entries are distributed as follows: 17,797 nouns, 7,748 adjectives, 6,039 verbs and 134 closed-class entries. The entries contain semantic information in terms of domain specification or style information (e.g., colloquial, regional use, etc.). Collocational information is also available for 3,065 entries. |
|
|
| The Czech WordNet captures nouns, verbs, adjectives, and partly adverbs, and contains 28,201 word senses (synsets). Every synset encodes the equivalence relation between several literals (at least one is present), having a unique meaning, belonging to one and the same part of speech, and expressing the same lexical meaning. Each Czech synset is related to the corresponding synset in the Princeton WordNet 2.0. via its identification number ID. There is at least one language-internal relation between a synset and another synset in the database. |
|
|
| LatinWordNet contains information about the following aspects of the Latin and English lexicon: lexical relations between words, semantic relations between lexical concepts, correspondences between Latin and English lexical concepts. LatinWordNet covers nouns, verbs, adjectives and adverbs, and contains 8,978 synsets in correspondence with the English equivalents (and with all the MultiWordNet-based wordnets). |
|
|
| The Basque WordNet models nouns, verbs and adjectives. Each sense is linked to a so-called synset (for a total of 30,281 synsets). Every synset encodes the synonymy relation between (possibly) several words (synonyms), having a unique meaning, belonging to one and the same part of speech (specified in the POS tag value), and expressing the same lexical meaning. Each synset is related to the corresponding synset in the English WordNet 1.6. via its identification number ID, which includes the synset number and the POS tag. The only exceptions are newly created synsets to account for cultural concepts not present in WordNet 1.6. |
|
|
| MWN.PT - MultiWordnet of Portuguese (version 1) spans over 17,200 manually validated concepts/synsets, linked under the semantic relations of hyponymy and hypernymy. These concepts are made of over 21,000 word senses/word forms and 16,000 lemmas from both European and American variants of Portuguese. They are aligned with the translationally equivalent concepts of the English Princeton WordNet and, transitively, of the MultiWordNets of Italian, Spanish, Hebrew, Romanian and Latin. |
|
|
| Acoustic and articulatory multilingual database recorded as part of the ESPRIT-ACCOR project investigating cross-language acoustic-articulatory correlations in coarticulatory processes. Only English is available. |
|
|
| A phonetically transcribed French lexicon of 23,000 canonical entries (leading to over 270,000 forms) with the corresponding graphemical, phonological and morphosyntactical attributes. |
|
|
| Lexicon for written and spoken French including 440,000 inflected forms with spelling, pronunciation (phonology) and morphosyntatic attributes |
|
|
| BDSONS: Speech database with two subsets: evaluation (sentences, logatomes, numbers, digits, etc.) & acoustic modelling (sequences of CVCV, various types of sentences, etc.). The corpus consists of 16 male and 16 female speakers. |
|
|
| BREF Sub-corpus containing training data of 5,330 sentences read by 80 French speakers. Texts were selected from the French newspaper Le Monde (over 20,000 words). |
|
|
| BREF Sub-corpus containing training data of 3,193 sentences read by 6 French speakers . The sentences were selected to cover a wide range of phonetic contexts. |
|
|
| 500 speakers, half of whom called from Turin and the other half from all over Italy, automatically prompted to utter the 10 Italian digits and 5 command words. |
|
|
| Multi-English Speech database - 797 successful calls received in Italy and in the UK, using different types of collecting equipment. Repetition of the same vocabulary the "TI (Texas Instrument) words" (digits + yes, no, go, etc.). |
|
|
| Telephone speech from 5,050 Dutch speakers. Approx. 44 items per speaker. Read & spontaneous speech (isolated words, digits, sentences, etc.). |
|
|
| Phonetically rich sentences & application oriented utterances such as keywords, digits, etc.. 1,000 speakers recorded over digital telephone lines using fixed telephone sets. |
|
|
Phonetically rich sentences sub-set.
See ELRA-S0011 |
|
|
| ERBA: Over 10,000 utterances read by over 100 German speakers. Domain of train inquiries. |
|
|
| The multilingual European speech database.The first really multilingual speech database produced in Europe. Over 60 speakers per language who pronounced numbers, sentences, isolated words using close talking microphone. |
|
|
| The multilingual European speech database.The first really multilingual speech database produced in Europe. Over 60 speakers per language who pronounced numbers, sentences, isolated words using close talking microphone. |
|
|
| The multilingual European speech database.The first really multilingual speech database produced in Europe. Over 60 speakers per language who pronounced numbers, sentences, isolated words using close talking microphone. |
|
|
| The multilingual European speech database.The first really multilingual speech database produced in Europe. Over 60 speakers per language who pronounced numbers, sentences, isolated words using close talking microphone. |
|
|
| Phonetically rich sentences & application oriented utterances such as keywords, digits, etc.. French SpeechDat (Polyphone) database containing 35,000 utterances from 1,000 callers over the telephone in France. |
|
|
French (SpeechDat(M)) polyphone database.
Phonetically rich sentences sub-set. See ELRA-S0016 |
|
|
| Phonetically rich sentences & application oriented utterances such as keywords, digits, etc. German read and spontaneous speech from 1,000 speakers. |
|
|
German Polyphone Database (SpeechDat(M))
Phonetically rich sentences sub-set. See ELRA-S0018 |
|
|
| Over 20 hours of Dutch read speech material (short texts, short sentences, etc.), from 238 speakers. |
|
|
| Multi Modal Verification for Teleservices and Security applications project. Multilingual data base designed to facilitate access control using multimodal identification of human faces (speech & image). |
|
|
| Onomastica Multi-Language Pronunciation Dictionaries covering city & town names, street names, family names, first names, product names, for 11 European languages. Only German is available now. |
|
|
| Read speech from 201 German speakers who read 450 different sentences each. Eight of them read the whole sentence corpus. |
|
|
| 200 different sentences from a train inquiry task read by 16 German speakers, provided with phonological segmentation by hand plus other labelling. |
|
|
| Approx. 100 sentences extracted from the German newspaper SudDeutsch Zeitungen and read by 101 speakers. |
|
|
| Approx. 1,000 sentences extracted from the German newspaper SudDeutsch Zeitungen and read by 10 speakers. |
|
|
| Telephone Speech Database database with 730 speakers (338 female, 392 male), and 36,000 utterances (digit sequences, dates, spelled names, ...). |
|
|
| Speech Database for Speaker Verification and Identification. Over 2,000 calls in Italian language, collected over the fixed telephone network. |
|
|
| 'Nordwind und Sonne' story read by 72 speakers with foreign accent and 16 native German speakers. |
|
|
| This speech database contains the recordings of 1,000 speakers who answered around 10 questions leading to spontaneous speech, and read about 28 items from a form supplied by IDIAP. |
|
|
| This speech database contains the recordings of 4,000 speakers who answered around 10 questions leading to spontaneous speech, and read about 28 items from a form supplied by IDIAP. |
|
|
Translanguage English Database.
Recordings made of 188 oral presentations in English, given at Eurospeech'93 in Berlin (high percentage of non native English speakers). |
|
|
| TEDPhone: Polyphone/SpeechDat-like recordings of 64 speakers in English and in their native language. |
|
|
| Recordings of French speech, corrupted with perturbations due to noisy environments, especially the Lombard effect. 5 male and 5 female speakers uttered sentences, digits, etc. |
|
|
Spontaneous speech databases recorded in a dialogue task.
63 Dialogues 209 Appointments, 1840 Turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
63 Dialogues 209 Appointments, 1840 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentations. All files were validated according to BAS guidelines.
1 CDROM |
|
|
Spontaneous speech databases recorded in a dialogue task.
81 Dialogues 227 Appointments, 1538 Turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
81 Dialogues 227 Appointments, 1538 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentations. All files were validated according to BAS guidelines.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
45 Dialogues 184 Appointments, 1214 Turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
45 Dialogues 184 Appointments, 1214 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentations. All files were validated according to BAS guidelines.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
72 Dialogues, 181 Appointments, 1,588 Turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
72 Dialogues 181 Appointments 1,588 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentation and partitur files. All files were validated according to BAS guidelines.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
101 Dialogues, 256 Appointments, 2,154 Turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
101 Dialogues, 256 Appointments 2,154 Turns.This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentation and partitur files. All files were validated according to BAS guidelines.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
146 Dialogues, 191 Appointments, 1,828 Turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
146 Dialogues, 191 Appointments 1,828 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 1 Header structure, software and speaker documentation. All files were validated according to BAS guidelines.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
68 Dialogues, 238 Appointments, 1,739 Turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
68 Dialogues, 238 Appointments, 1,739 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentation and partitur files. All files were validated according to BAS guidelines.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
167 Dialogues, 167 Appointments, 1,181 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 1 Header structure, software and speaker documentation. All files were validated according to BAS guidelines.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
207 Dialogues, 207 Appointments, 2,154 Turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
207 Dialogues, 207 Appointments, 2,154 Turns. This new edition contains the transliterations of all dialogues, signal files with PhonDat 2 Header structure, software and speaker documentation and partitur files. All files were validated according to BAS guidelines.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
90 speakers, 1714 turns, 200 spontaneous dialogues, transliteration.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
97 speakers, 1891 turns, 156 spontaneous dialogues, transliteration.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
97 speakers, 1891 turns, 156 spontaneous dialogues, transliteration, PhonDat 2 headers, partitur files.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
78 speakers, 3311 turns, 200 spontaneous dialogues, transliteration (Kanji/Kana and Roman/Latin). |
|
|
Spontaneous speech databases recorded in a dialogue task.
84 speakers, 2741 turns, 200 spontaneous dialogues, transliteration (Kanji/Kana and Roman/Latin).
1 CDROM |
|
|
Spontaneous speech databases recorded in a dialogue task.
80 speakers, 2345 turns, 200 spontaneous dialogues, transliteration (Kanji/Kana and Roman/Latin).
1 CDROM |
|
|
Spontaneous speech databases recorded in a dialogue task.
82 speakers, 2911 turns, 200 spontaneous dialogues, transliteration (Kanji/Kana and Roman/Latin).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
26 Free Dialogues (with overlap, stereo recordings), 2227 Turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German - 19 spontaneous dialogues (19 close mic, 19 room mic, 19 telephone (fixed network, GSM), 3117 turns, transliteration (VM II format), NIST headers, partitur files.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German - 30 spontaneous dialogues (10 close mic, 27 room mic, 10 phone line (GSM)), 1957 turns, transliteration (VM II format), NIST headers, partitur files.
1 CDROM. |
|
|
Verbmobil II - German - 38 spontaneous dialogues (38 close mic, 2 room mic, 22 phone line (GSM)), 2331 turns, transliteration (VM II format), NIST headers, partitur files.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German - 60 spontaneous dialogues (28 close mic, 5 room mic, 27 phone line (GSM) recordings), 2004 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - American English - 28 spontaneous dialogues (28 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 2727 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German - 58 spontaneous dialogues (36 close mic, 0 room mic, 22 phone line (GSM) recordings), 2231 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese - 10 spontaneous dialogues (10 close mic, 0 room mic, 0 phone line (GSM) recordings), 1654 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese - 16 spontaneous dialogues (16 close mic, 0 room mic, 0 phone line (GSM) recordings), 1319 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese - 24 spontaneous dialogues (24 close mic, 0 room mic, 0 phone line (GSM) recordings), 1149 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - American English - 28 spontaneous dialogues (28 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 2408 turns, transliteration (Verbmobil II Format)
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German - 33 spontaneous dialogues (33 close mic, 21 room mic, 25 phone line (fixed network, GSM) recordings), 4176 turns, transliteration (Verbmobil II Format)
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - American English - 32 spontaneous dialogues (32 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 2512 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Multilingual - 17 spontaneous dialogues (17 close mic, 0 room mic, 0 phone line (fixed, network, GSM) recordings), 992 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese, 25 spontaneous dialogues (25 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1050 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese, 28 spontaneous dialogues (28 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1437 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese, 27 spontaneous dialogues (27 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1645 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German, 33 spontaneous dialogues (33 close mic, 28 room mic, 28 phone line (fixed network, GSM) recordings), 5115 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German, 28 spontaneous dialogues (28 close mic, 17 room mic, 20 phone line (fixed network, GSM) recordings), 3360 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German, 25 spontaneous dialogues (25 close mic, 20 room mic, 20 phone line (fixed network, GSM) recordings), 2708 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - American English, 20 spontaneous dialogues (20 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1874 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese - 24 spontaneous dialogues (24 close mic, 0 room mic, 0 phone line (GSM) recordings), 1149 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German, 24 spontaneous dialogues (24 close mic, 12 room mic, 12 phone line (fixed network, GSM) recordings), 2597 turns, transliteration (Verbmobil II Format)..
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - American-English, 8 spontaneous dialogues (8 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 679 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German, 28 spontaneous dialogues (28 close mic, 23 room mic, 27 phone line (fixed network, GSM) recordings), 4238 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese, 19 spontaneous dialogues (19 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 920 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese, 21 spontaneous dialogues (21 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 1293 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Multilingual Japanese/German, 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 607 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Multilingual with human interpreter (3 channels) English/German, 17 spontaneous dialogues (17 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings), 853 turns, transliteration (Verbmobil II Format).
1 CDROM. |
|
|
Additional data and documentation that is not included in the regular VM volumes.
1 CD-ROM. |
|
|
| Verbmobil lexicon database of the University of Bielefeld. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - German - 19 spontaneous dialogues (19 close mic, 19 room mic, 19 phone line (GSM)), 3117 turns, transliteration (VM II format), NIST headers, partitur files.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese, 200 dialogues, 200 appointment schedulings - 3311 turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Verbmobil II - Japanese, 200 dialogues, 200 appointment schedulings - 2741 turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Japanese, 200 dialogues, 200 appointment schedulings - 2345 turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Japanese, 200 dialogues, 200 appointment schedulings - 2911 turns.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
German, 16 spontaneous dialogues (16 close mic, 8 room mic, 8 phone line (GSM) recordings) - 1771 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Japanese - 10 spontaneous dialogues (10 close mic, 0 room mic, 0 phone line (GSM) recordings) - 501 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Japanese - 19 spontaneous dialogues (19 close mic, 0 room mic, 0 phone line (GSM) recordings) - 946 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Japanese - 21 spontaneous dialogues (21 close mic, 0 room mic, 0 phone line (GSM) recordings) - 981 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Multilingual German/English with human interpreter (3 channels) - 15 spontaneous dialogues (15 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) – 856 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Multilingual German/English with human interpreter (3 channels) - 13 spontaneous dialogues (13 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 728 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Multilingual German/English with human interpreter (3 channels) - 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 518 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Multilingual German/English with human interpreter (3 channels) - 12 spontaneous dialogues (12 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 620 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Multilingual German/Japanese with 2 human interpreters (4 channels) - 11 spontaneous dialogues (11 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 702 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Multilingual German/Japanese with 2 human interpreters (4 channels) - 7 spontaneous dialogues (7 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 421 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
Multilingual German/Japanese with 2 human interpreters (4 channels) - 7 spontaneous dialogues (7 close mic, 0 room mic, 0 phone line (fixed network, GSM) recordings) - 354 turns, transliteration (VM II Format).
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
German - 14 WOZ dialogues designed to evoke emotions (mainnly anger) - transliteration, emotion labeling.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
German - 13 WOZ dialogues designed to evoke emotions (mainnly anger) - transliteration, emotion labeling.
1 CDROM. |
|
|
Spontaneous speech databases recorded in a dialogue task.
German - 13 WOZ dialogues designed to evoke emotions (mainnly anger) - transliteration, emotion labeling.
1 CDROM. |
|
|
| Approx. 1,6 Mio entries with orthographic forms (capital nouns, old German, spelling, ...), phonetic transcription (by rules and exception list) and other linguistic information (e.g. grammatical categories). |
|
|
| This speech database contains the recordings of 921 American speakers recorded over the fixed telephone network. It consists of read acoustic speech divided into 9.5 hours of transliterated speech and 8 hours of non-transliterated speech. Orthographic transliteration for about 25,000 utterances are included. |
|
|
| Italian acoustic database recorded in an insulated room. It includes ca. 16,090 utterances and digits, 58,924 words (2,191 different words), 641 minutes of speech. The data is uttered by 50 male and 50 female speakers. 42 male and 12 female speakers repeated 20 times 10 isolated digits. |
|
|
Phonetically rich sentences & application oriented utterances such as keywords, digits, etc..
This speech database contains the recordings of 1,523 Danish speakers, recorded over the Danish fixed telephone network. Each speaker uttered around 100 read and spontaneous items. |
|
|
| |