Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalog Reference : ELRA-W0120
    NUM 5M Mongolian written corpus
    This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws.

    The collected raw texts was reduced from 5 to 4.8 million words after cleaning. The cleaned corpus comprises:
    - 144 texts from laws until 2009,
    - 288 texts from literature that is currently being used in the primary and secondary school text books in Mongolia (including stories, novels, novelettes),
    - 1,134 editorals from the printed newspaper "Unen" dating from 1984 to 1989,
    - 2,477 online newswire texts dating from 2003 to 2009.

    Part of this corpus, about 2,800 sentences with 100,000 words, has been POS-tagged manually and stored in XML TEI format.

    ISLRN : 492-817-146-504-9
    Technical Information
    Distribution medium : Downloadable
    Fileformat : Plain text
    Contents Click on the arrow to display content.
    written corpus 
    Members Prices
    Academic - Commercial 5000.00 EUR
    Academic - Research Free
    Commercial - Commercial 5000.00 EUR
    Commercial - Research 5000.00 EUR
    Non Member Prices
    Academic - Commercial 7000.00 EUR
    Academic - Research Free
    Commercial - Commercial 7000.00 EUR
    Commercial - Research 7000.00 EUR

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0