39 Language Resources (Page 2 of 2)

« Previous | Next » Order by:

 "Le Monde Diplomatique" Text corpus in Arabic    
  • Arabic

ID: ELRA-W0036-04

ISLRN: 231-368-326-920-2

Electronic archiving of "Le Monde Diplomatique" articles in Arabic from 2000. The corpus is available in HTML. Each HTML file contains one article. Number of articles available per year : • 2000: 61 articles (November and December available only) (75,305 words) • 2001: 346 articles (479,435 ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
46.00 € submit
46.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
69.00 € submit
69.00 € submit
 MGB-5 Moroccan Dialect    
  • Arabic

ID: ELRA-S0404

ISLRN: 938-639-614-524-5

The MGB-5 Moroccan Dialect comprises 14 hours of Moroccan Arabic speech extracted from 93 YouTube videos distributed across seven genres: comedy, cooking, family/children, fashion, drama, sports, and science clips. Given that dialectal Arabic does not have a clearly defined orthography, differ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
1500.00 € submit
Licence: Commercial Use - ELRA VAR
1500.00 € submit
1500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
 NE3L named entities Arabic corpus    
  • Arabic

ID: ELRA-W0078

ISLRN: 398-979-151-557-0

The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 NEMLAR Broadcast News Speech Corpus    
  • Arabic

ID: ELRA-S0219

ISLRN: 479-507-036-103-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Written Corpus (ELRA-W0042) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220). The Nemlar Broadcast News Speech Corpus consists of about...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
150.00 € submit
500.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
1000.00 € submit
Licence: Commercial Use - ELRA VAR
4000.00 € submit
4000.00 € submit

Special offers are also available. Check here for details.

 NEMLAR Speech Synthesis Corpus    
  • Arabic

ID: ELRA-S0220

ISLRN: 361-216-121-305-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Written Corpus (ELRA-W0042) and the NEMLAR Broadcast News Speech Corpus (ELRA-S0219). The NEMLAR Speech Synthesis Corpus contains the reco...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
500.00 € submit
1250.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1000.00 € submit
2500.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit

Special offers are also available. Check here for details.

 NEMLAR Written Corpus    
  • Arabic

ID: ELRA-W0042

ISLRN: 050-693-158-326-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Broadcast News Speech Corpus (ELRA-S0219) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220). The NEMLAR Written Corpus consists of about...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
150.00 € submit
250.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
500.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit

Special offers are also available. Check here for details.

 NetDC Arabic BNSC (Broadcast News Speech Corpus)    
  • Arabic

ID: ELRA-S0157

ISLRN: 663-177-513-755-1

The NetDC Arabic BNSC (Broadcast News Speech Corpus) is a corpus developed by ELDA in the framework of the European-funded project Network of Data Centres (NetDC). The project was done in collaboration with the LDC (Linguistic Data Consortium), which has produced a similar corpus from the news br...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
100.00 € submit
1350.00 € submit
Licence: Commercial Use - ELRA VAR
1350.00 € submit
1350.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
200.00 € submit
2700.00 € submit
Licence: Commercial Use - ELRA VAR
2700.00 € submit
2700.00 € submit
 Normalized Arabic Fragments for Inestimable Stemming (NAFIS)    
  • Arabic

ID: ELRA-W0127

ISLRN: 305-450-745-774-1

Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a collection of sentences, selected to be representative of Arabic stemming tasks and manually annotated. Indeed, NAFIS is: Comprehensive: The content of NAFIS can be generalized...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
 OrienTel Arabic as spoken in Israel database    
  • Arabic

ID: ELRA-S0190

ISLRN: 627-343-367-534-7

The OrienTel Arabic as spoken in Israel database comprises 750 Arabic speakers (375 males, 375 females) recorded over the Israeli fixed and mobile telephone network. This database is partitioned into 2 DVDs. The speech databases made within the OrienTel project were validated by SPEX, the Netherl...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
37875.00 € submit
40000.00 € submit
Licence: Commercial Use - ELRA VAR
40000.00 € submit
40000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
39843.00 € submit
43125.00 € submit
Licence: Commercial Use - ELRA VAR
43125.00 € submit
43125.00 € submit
 OrienTel Egypt MCA (Modern Colloquial Arabic) database    
  • Arabic

ID: ELRA-S0221

ISLRN: 036-535-444-454-5

The OrienTel Egypt MCA (Modern Colloquial Arabic) database comprises 750 Egyptian speakers (398 males, 352 females) recorded over the Egyptian fixed and mobile telephone network. This database is partitioned into 1 CD and 1 DVD. The speech databases made within the OrienTel project were validated...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
18000.00 € submit
24000.00 € submit
Licence: Commercial Use - ELRA VAR
24000.00 € submit
24000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
22500.00 € submit
30000.00 € submit
Licence: Commercial Use - ELRA VAR
30000.00 € submit
30000.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

 OrienTel Egypt MSA (Modern Standard Arabic) database    
  • Arabic

ID: ELRA-S0222

ISLRN: 830-378-677-910-7

The OrienTel Egypt MSA (Modern Standard Arabic) database comprises 500 Egyptian speakers (254 males, 246 females) recorded over the Egyptian fixed and mobile telephone network. This database is partitioned into 1 CD and 1 DVD. The speech databases made within the OrienTel project were validated b...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
12000.00 € submit
16000.00 € submit
Licence: Commercial Use - ELRA VAR
16000.00 € submit
16000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
15000.00 € submit
20000.00 € submit
Licence: Commercial Use - ELRA VAR
20000.00 € submit
20000.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

 OrienTel Jordan MCA (Modern Colloquial Arabic) database    
  • Arabic

ID: ELRA-S0289

ISLRN: 172-662-950-237-0

The OrienTel Jordan MCA (Modern Colloquial Arabic) database comprises 757 Jordanian speakers (393 males, 364 females) recorded over the Jordanian fixed and mobile telephone network. This database is stored on 1 DVD. The speech databases made within the OrienTel project were validated by SPEX, the...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
18000.00 € submit
24000.00 € submit
Licence: Commercial Use - ELRA VAR
24000.00 € submit
24000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
22500.00 € submit
30000.00 € submit
Licence: Commercial Use - ELRA VAR
30000.00 € submit
30000.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

 OrienTel Jordan MSA (Modern Standard Arabic) database    
  • Arabic

ID: ELRA-S0290

ISLRN: 259-713-018-372-4

The OrienTel Jordan MSA (Modern Standard Arabic) database comprises 556 Jordanian speakers (288 males, 268 females) recorded over the Jordanian fixed and mobile telephone network. This database is stored on 1 DVD. The speech databases made within the OrienTel project were validated by SPEX, the N...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
12000.00 € submit
16000.00 € submit
Licence: Commercial Use - ELRA VAR
16000.00 € submit
16000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
15000.00 € submit
20000.00 € submit
Licence: Commercial Use - ELRA VAR
20000.00 € submit
20000.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

 OrienTel Morocco MCA (Modern Colloquial Arabic) database    
  • Arabic

ID: ELRA-S0183

ISLRN: 613-578-868-832-2

The OrienTel Morocco MCA (Modern Colloquial Arabic) database comprises 772 Moroccan speakers (383 males, 389 females) recorded over the Moroccan fixed and mobile telephone network. This database is partitioned into 1 CD and 1 DVD. The speech databases made within the OrienTel project were validat...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
12000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
9000.00 € submit
15000.00 € submit
Licence: Commercial Use - ELRA VAR
15000.00 € submit
15000.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

 OrienTel Morocco MSA (Modern Standard Arabic) database    
  • Arabic

ID: ELRA-S0184

ISLRN: 978-839-138-181-8

The OrienTel Morocco MSA (Modern Standard Arabic) database comprises 530 Moroccan speakers (264 males, 266 females) recorded over the Moroccan fixed and mobile telephone network. This database is partitioned into 1 CD and 1 DVD. The speech databases made within the OrienTel project were validated...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
8000.00 € submit
Licence: Commercial Use - ELRA VAR
8000.00 € submit
8000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
6000.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

 OrienTel Tunisia MCA (Modern Colloquial Arabic) database    
  • Arabic

ID: ELRA-S0186

ISLRN: 297-705-745-294-4

The OrienTel Tunisia MCA (Modern Colloquial Arabic) database comprises 792 Tunisian speakers (426 males, 366 females) recorded over the Tunisian fixed and mobile telephone network. This database is partitioned into 1 CD and 1 DVD. The speech databases made within the OrienTel project were validat...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
12000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
9000.00 € submit
15000.00 € submit
Licence: Commercial Use - ELRA VAR
15000.00 € submit
15000.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

 OrienTel Tunisia MSA (Modern Standard Arabic) database    
  • Arabic

ID: ELRA-S0187

ISLRN: 926-401-827-806-5

The OrienTel Tunisia MSA (Modern Standard Arabic) database comprises 598 Tunisian speakers (359 males, 239 females) recorded over the Tunisian fixed and mobile telephone network. This database is partitioned into 1 CD and 1 DVD. The speech databases made within the OrienTel project were validated...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
8000.00 € submit
Licence: Commercial Use - ELRA VAR
8000.00 € submit
8000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
6000.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

 Training and test data for Arabizi detection and transliteration    
  • Arabic
  • English

ID: ELRA-W0126

ISLRN: 986-364-744-303-9

The dataset is composed of two distinct resources: 1) A collection of mixed English and Arabizi text intended to train and test a system for the automatic detection of code-switching in mixed English and Arabizi texts. The training part of the corpus contains: 522 tweets composed of 5,207 token...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
500.00 € submit
Licence: Commercial Use - ELRA VAR
500.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
650.00 € submit
Licence: Commercial Use - ELRA VAR
650.00 € submit
650.00 € submit
 Wojood - A corpus for nested Arabic Named Entity Recognition    
  • Arabic

ID: ELRA-W0325

ISLRN: 688-718-284-176-0

Wojood consists of about 550,000 tokens (Modern Standard Arabic and dialect) that are manually annotated with 21 entity types (person, group of people, occupation, organization, geopolitical entity, location, facility, event, date, time, language, website, law, product, cardinal number, ordinal n...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
8000.00 € submit
Licence: Commercial Use - ELRA VAR
8000.00 € submit
8000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit

« Previous | Next »