Send us your bug reports.
Use keywords to find the product you are looking for.
Purchase procedure & Conditions
Pricing & user licences
How to promote your resources ?
Catalog Reference : ELRA-W0054
Persian 1984 corpus (Multext-East framework)
This corpus contains the Persian (Farsi) translation of a part of the novel “1984” (G. Orwell) annotated in the Multext-East framework (Multilingual Text Tools and Corpora for Eastern and Central European Languages). The aim of the Multext-East project was to develop standardized language resources.
The package comprises:
(i) the specifications for morphosyntactic encoding of Persian Language, based on the EAGLES/MULTEXT model and specific resources of MULTEXT-East,
(ii) the annotated Persian version of Orwell’s 1984 corpus.
The corpus contains extensive headers and markup for document structure, sentences, and various sub-sentence annotations in the XML-format following the TEI guidelines. Annotation includes POS (part-of-speech) and lemmas. The corpus contains approximately 100,000 words (6,604 sentences, 13,247 lemmas) and can easily be aligned with other corpora in the MULTEXT-East framework.
Period of coverage :
Version history :
First created in 2006, compiled and finalized on 27/02/2009 Last update: 15/04/2009
Creation date :
First created in 2006, compiled and finalized on 27/02/2009
Applications possible :
Distribution medium :
Click on the arrow to display content.
Number of languages
Character set :
Number of tokens :
Academic - Commercial 2000.00 EUR
Academic - Research 45.00 EUR
Commercial - Commercial 2000.00 EUR
Commercial - Research 2000.00 EUR
Non Member Prices
Academic - Commercial 5000.00 EUR
Academic - Research 100.00 EUR
Commercial - Commercial 5000.00 EUR
Commercial - Research 5000.00 EUR
Thursday 20 July, 2017
23468581 requests since Monday 27 September, 2004
Copyright © 2008