Search and Browse – ELRA Catalogue

Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

1 Language Resource

Order by:

Training and test data for Arabizi detection and transliteration text

Arabic
English

ID: ELRA-W0126

ISLRN: 986-364-744-303-9

The dataset is composed of two distinct resources: 1) A collection of mixed English and Arabizi text intended to train and test a system for the automatic detection of code-switching in mixed English and Arabizi texts. The training part of the corpus contains: 522 tweets composed of 5,207 token...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	500.00 €	500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	650.00 €
Licence: Commercial Use - ELRA VAR	650.00 €	650.00 €