Resource Type:
Corpus: | |
Lexical/Conceptual: | |
Tool/Service: | |
Language Description: |
Media Type:
Text: | |
Audio: | |
Image: | |
Video: | |
Text Numerical: | |
Text N-Gram: |
10 Language Resources
Order by:
- Catalan; Valencian
ID: ELRA-W0327
ISLRN: 186-654-762-852-8The AnCora Catalan Corpus 2.0.0 is a corpus of 500,000 words annotated at different levels: - Lemma and Part of Speech, - Syntactic constituents and functions, - Argument structure and thematic roles, - Semantic classes of the verb, - Denotative type of deverbal nouns, - Nouns related to W...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Commercial Use - GPL |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Commercial Use - GPL |
0.00 €
|
0.00 €
|
- Spanish; Castilian
ID: ELRA-W0326
ISLRN: 252-495-813-736-1The AnCora Spanish Corpus 2.0.0 is a corpus of 500,000 words annotated at different levels: - Lemma and Part of Speech, - Syntactic constituents and functions, - Argument structure and thematic roles, - Semantic classes of the verb, - Denotative type of deverbal nouns, - Nouns related to W...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Commercial Use - GPL |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Commercial Use - GPL |
0.00 €
|
0.00 €
|
- Bulgarian
ID: ELRA-W0329
ISLRN: 832-960-876-604-2The Bulgarian Event Corpus is composed 324,905 tokens appropriate for training Named Entity Recognition (NER), Named Entity Linking (NEL) and Event Recognition models for Bulgarian in a multidomain context within Humanities. The texts are domain related. They include documents from the area of So...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: ? - CC-BY-SA-3.0 |
0.00 €
| |
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
- Bulgarian
ID: ELRA-W0328
ISLRN: 761-430-854-533-2The Bulgarian Treebank Corpus is composed of 156,149 tokens (11,138 sentences) coming from three main sources in the domain of Grammar Notebooks (1,391 sentences), News (6,698 sentences), Other (3,049 sentences). It is available with syntactical and morphological annotation on a sentence basis in...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA-3.0 |
0.00 €
|
0.00 €
|
- Icelandic
ID: ELRA-W0298
ISLRN: 420-670-865-427-1This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Corpus of Icelandic texts from the Central Bank of Icela...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Other - Open Under-PSI |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Other - Open Under-PSI |
0.00 €
|
0.00 €
|
- Danish
ID: ELRA-W0318
ISLRN: 024-504-318-388-3The Danish Gigaword Project (DAGW) maintains a corpus for Danish with over a billion words. The general goals are to create a dataset that is: 1. representative; 2. accessible; 3. a suitable common starting point for Danish NLP models. The present version 1.0 was collected from various webs...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution - CC-BY-4.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution - CC-BY-4.0 |
0.00 €
|
0.00 €
|
- English
- Manipuri
ID: ELRA-W0316
ISLRN: 588-170-827-016-7The Ema-lon Manipuri Corpus consists of a set of resources for Manipuri language (locally known as Meiteilon) for the purpose of machine translation. The main source for these resources is the Sangai Express news website. The resources that constitute the present corpus are listed below: 1. EM C...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
0.00 €
|
0.00 €
|
- German
ID: ELRA-W0330
ISLRN: 381-445-879-769-5This corpus consists of a collection of political speeches in German crawled from the online archive of the German presidency (Bundespraësident) and the Chancellery (Bundesregierung). For the German Presidency the speeches are available from July 1, 1984 to February 17, 2012 and the corpus con...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Share Alike - CC-BY-SA |
0.00 €
|
0.00 €
|
- American Sign Language
- English
ID: ELRA-S0416
ISLRN: 583-408-694-292-6The How2Sign dataset consists of a parallel corpus of speech and transcriptions of instructional videos and their corresponding American Sign Language (ASL) translation videos and annotations. It has been produced by recording 11 persons (6 males and 5 females) with various hearing status (5 self...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
- Lithuanian
ID: ELRA-W0299
ISLRN: 268-109-862-136-1This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Monolingual documents received from the Government of th...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution - CC-BY-4.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution - CC-BY-4.0 |
0.00 €
|
0.00 €
|