Send us your bug reports.
Use keywords to find the product you are looking for.
Purchase procedure & Conditions
Pricing & user licences
How to promote your resources ?
Catalog Reference : ELRA-W0051
English-Persian parallel Corpus
Please refer to ELRA-W0118 for the latest version of this corpus.
This version consists of about 3,500,000 English and Persian (Farsi) words aligned at sentence level (about 100,000 sentences, distributed over 50,021 entries). The format of the files is Unicode. It has been originally created with SQL Server, but it is presented in access file type. The texts in the corpus include a variety of text types, wich are distributed as follows:
- Art: 1804 entries (3.61%)
- Culture: 5097 entries (10.19%)
- Idiom: 435 entries (0.87%)
- Law: 2266 entries (4.53%)
- Literature: 11470 entries (22.93%)
- Medicine: 1089 entries (2.18%)
- Others: 16989 entries (33.96%)
- Poetry: 692 entries (1.38%)
- Politics: 5493 entries (10.98%)
- Proverb: 292 entries (0.58%)
- Religion: 686 entries (1.37%)
- Science: 3708 entries (7.41%)
Distribution medium :
Click on the arrow to display content.
Number of languages
English >>>> Persian
Number of tokens :
3,500,000 words (about 100,000 sentences)
Academic - Commercial 2500.00 EUR
Academic - Research 500.00 EUR
Commercial - Commercial 2500.00 EUR
Commercial - Research 2500.00 EUR
Non Member Prices
Academic - Commercial 3000.00 EUR
Academic - Research 600.00 EUR
Commercial - Commercial 3000.00 EUR
Commercial - Research 3000.00 EUR
Thursday 24 August, 2017
23681080 requests since Monday 27 September, 2004
Copyright © 2008