Resource Type:
Corpus: | |
Lexical/Conceptual: | |
Tool/Service: | |
Language Description: |
Media Type:
Text: | |
Audio: | |
Image: | |
Video: | |
Text Numerical: | |
Text N-Gram: |
19 Language Resources
Order by:
- Catalan; Valencian
ID: ELRA-W0327
ISLRN: 186-654-762-852-8The AnCora Catalan Corpus 2.0.0 is a corpus of 500,000 words annotated at different levels: - Lemma and Part of Speech, - Syntactic constituents and functions, - Argument structure and thematic roles, - Semantic classes of the verb, - Denotative type of deverbal nouns, - Nouns related to W...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Commercial Use - GPL |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Commercial Use - GPL |
0.00 €
|
0.00 €
|
- Spanish; Castilian
ID: ELRA-W0326
ISLRN: 252-495-813-736-1The AnCora Spanish Corpus 2.0.0 is a corpus of 500,000 words annotated at different levels: - Lemma and Part of Speech, - Syntactic constituents and functions, - Argument structure and thematic roles, - Semantic classes of the verb, - Denotative type of deverbal nouns, - Nouns related to W...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Commercial Use - GPL |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Commercial Use - GPL |
0.00 €
|
0.00 €
|
- Arabic
ID: ELRA-S0384
ISLRN: 866-568-447-697-8This speech corpus has been developed as part of a PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded through a Neumann TLM 103 Studio Microphone by one male speaker in South Levantine Arabic (Damascian accent) in a professional studio. The transcript w...
MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
9000.00 €
| |
Licence: Attribution - CC-BY |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
11200.00 €
| |
Licence: Attribution - CC-BY |
0.00 €
|
0.00 €
|
- Bulgarian
ID: ELRA-W0329
ISLRN: 832-960-876-604-2The Bulgarian Event Corpus is composed 324,905 tokens appropriate for training Named Entity Recognition (NER), Named Entity Linking (NEL) and Event Recognition models for Bulgarian in a multidomain context within Humanities. The texts are domain related. They include documents from the area of So...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: ? - CC-BY-SA-3.0 |
0.00 €
| |
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
- Bulgarian
ID: ELRA-W0328
ISLRN: 761-430-854-533-2The Bulgarian Treebank Corpus is composed of 156,149 tokens (11,138 sentences) coming from three main sources in the domain of Grammar Notebooks (1,391 sentences), News (6,698 sentences), Other (3,049 sentences). It is available with syntactical and morphological annotation on a sentence basis in...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
- Bulgarian
ID: ELRA-L0132
ISLRN: 188-702-981-369-5The Bulgarian Valency Frame Lexicon is composed of 9547 lexical entries organized by frames with 960 mappings to Princeton WordNet available in XML format. It is a treebank-driven resource of extracted valency frames from BulTreeBank. The frames were manually curated. The frames followed the surf...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
- Arabic
ID: ELRA-L0133
ISLRN: 462-532-124-988-8Comprehensive Arabic LEMmas is a lexicon covering a large list of Arabic lemmas and their corresponding inflected word forms (stems) with details (POS + Root). Each lexical entry represents a lemma followed by all its possible stems and each stem is enriched by its morphological features, especia...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
7500.00 €
|
7500.00 €
|
- Icelandic
ID: ELRA-W0298
ISLRN: 420-670-865-427-1This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Corpus of Icelandic texts from the Central Bank of Icela...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Other - Open Under-PSI |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Other - Open Under-PSI |
0.00 €
|
0.00 €
|
- Danish
ID: ELRA-W0318
ISLRN: 024-504-318-388-3The Danish Gigaword Project (DAGW) maintains a corpus for Danish with over a billion words. The general goals are to create a dataset that is: 1. representative; 2. accessible; 3. a suitable common starting point for Danish NLP models. The present version 1.0 was collected from various webs...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution - CC-BY-4.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution - CC-BY-4.0 |
0.00 €
|
0.00 €
|
- English
- Manipuri
ID: ELRA-W0316
ISLRN: 588-170-827-016-7The Ema-lon Manipuri Corpus consists of a set of resources for Manipuri language (locally known as Meiteilon) for the purpose of machine translation. The main source for these resources is the Sangai Express news website. The resources that constitute the present corpus are listed below: 1. EM C...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
0.00 €
|
0.00 €
|
- German
ID: ELRA-W0330
ISLRN: 381-445-879-769-5This corpus consists of a collection of political speeches in German crawled from the online archive of the German presidency (Bundespraësident) and the Chancellery (Bundesregierung). For the German Presidency the speeches are available from July 1, 1984 to February 17, 2012 and the corpus con...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA |
0.00 €
|
0.00 €
|
- Catalan; Valencian
ID: ELRA-S0407
ISLRN: 780-617-066-913-1Glissando-ca includes more than 12 hours of speech in Catalan, recorded under optimal acoustic conditions, orthographically transcribed, phonetically aligned and annotated with prosodic information (location of the stressed syllables and prosodic phrasing). The corpus was recorded by 8 profession...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA |
0.00 €
|
0.00 €
|
- Spanish; Castilian
ID: ELRA-S0406
ISLRN: 024-286-962-247-6Glissando-sp includes more than 12 hours of speech in Spanish, recorded under optimal acoustic conditions, orthographically transcribed, phonetically aligned and annotated with prosodic information (location of the stressed syllables and prosodic phrasing). The corpus was recorded by 8 profession...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA |
0.00 €
|
0.00 €
|
- American Sign Language
- English
ID: ELRA-S0416
ISLRN: 583-408-694-292-6The How2Sign dataset consists of a parallel corpus of speech and transcriptions of instructional videos and their corresponding American Sign Language (ASL) translation videos and annotations. It has been produced by recording 11 persons (6 males and 5 females) with various hearing status (5 self...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
- French
ID: ELRA-S0379
ISLRN: 371-240-320-910-4The JV_TDM corpus provides a phonetic annotation of 37 chapters of the original French version of “Around the World in 80 Days” by Jules Verne read by a single speaker. Each chapter has been annotated in a separate .TextGrid file. The audio files are not included in this release. They are availab...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA |
0.00 €
|
0.00 €
|
- Arabic
ID: ELRA-L0134
ISLRN: 977-057-254-691-5Moroccan Arabic Dialect Electronic Dictionary (MADED) is an electronic lexicon containing almost 11,500 entries. They are written in Arabic script wherein each Modern Standard Arabic (MSA) lemma is provided with its corresponding Moroccan Arabic equivalent. In addition, MADED entries are annotate...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
1000.00 €
|
1000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
2000.00 €
|
2000.00 €
|
- Lithuanian
ID: ELRA-W0299
ISLRN: 268-109-862-136-1This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Monolingual documents received from the Government of th...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution - CC-BY-4.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution - CC-BY-4.0 |
0.00 €
|
0.00 €
|
- Arabic
ID: ELRA-L0135
ISLRN: 064-194-729-767-0The Moroccan Morphological vocabulary is a lexicon containing more than 4.6 M entries describing a given Moroccan Arabic word with fourteen (14) morphological and semantic features: the word orthographic form, the segmentation (prefix and suffix), part-of-speech (POS), gender, number, tense and t...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
6000.00 €
|
6000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
12000.00 €
|
12000.00 €
|
- Persian
ID: ELRA-S0393
ISLRN: 068-845-898-304-0This about 2.5-hour Single-Speaker Speech corpus has been developed using the same methodologies used in the PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded in Persian (Tehrani accent) by one male speaker using a professional studio, through a "Blubb...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
4000.00 €
|
4000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use, Share Alike - CC-BY-NC-SA |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|