ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalog Reference : W0023
    MLCC Multilingual and Parallel Corpora
    The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different languages and one set as the basis for translation studies.

    The first set is referred as the Polylingual Document Collection, a collection of newspaper articles from financial newspapers in 6 languages (Dutch, English, French, German, Italian and Spanish). It consists of the following sub-corpora:

    Dutch - Het Financieele Dagblad - 1992-1993 (Samples)
    The corpus contains articles from the Dutch financial newspaper Het Financieele Dagblad editions of 2nd January 1992 through to 24th December 1993. It contains around 8.5 million words of text.

    English - The Financial Times - 1993 (Samples)
    The corpus contains articles from the British financial newspaper The Financial Times editions from the year 1993. The corpus contains around 30 million words.

    French - Le Monde - 1992-1993 (Samples)
    A corpus of articles from the French newspaper Le Monde, consisting of two years worth (1992-1993) of articles on financial subjects, approximately 10 million words.

    German - Handelsblatt - 1986-1988 (Samples)
    This subcorpus consists of articles from the period 02.01.1986 to 15.06.1988. It contains some 33 million words. It may be possible to obtain more recent articles from Handelsblatt.

    Italian - Il Sole 24 Ore - 1992-1993 (Samples)
    The corpus described here contains articles from the Italian financial newspaper Il Sole 24 Ore from the year 1992. This corpus contains some 1.88 million words. The SGML-markup was done by the University of Edinburgh.

    Spanish - Expansion - 1994 (Samples)
    This subcorpus contains articles from the Spanish financial newspaper Expansion editions from 21.10.1991 to 24.10.1991 and 14.05.1994 to 27.12.1994. It contains some 10 million words.

    The second set is a Multilingual Parallel Corpus consisting of translated data in nine European languages: Danish, Dutch, English, French, German, Greek, Italian, Portuguese and Spanish. The parallel data, provided by the European Commission, comprises two sub-corpora from the Official Journal of the European Communities:

    Official Journal of the European Commission, C Series: Written Questions 1993
    Records of questions and answers regarding European Community matters. The data is regularly published as one section of the C Series of the Official Journal of the European Community in all official languages (previously nine). This corpus contains written questions asked by members of the European Parliament and corresponding answers from the European Commission in 9 parallel versions. The total size of the corpus is approximately 10.2 million words (ca. 1.1 million words per language).

    Official Journal of the European Commission, Annex: Debates of the European Parliament 1992-1994
    This parallel corpus is the records of Parliamentary sitting published as an annex to the Official Journal of the European Community Debates of the European Parliament. The Parliamentary Debates are a record of what was said by members of the meeting as well as written input provided to the meeting. The original data from which the translations are produced consist of a transcript of the sittings, each member speaking in the language of his choice. The final version consists of nine parallel versions of the material. The texts delivered comprise the Debates of Parliament from January 1992 to July 1994. This sub-corpus contains some 5 to 8 million words per language.
    Identification
    Period of coverage : 1986-1994
    Version :
    Version history :
    Technical Information
    Distribution medium : CD-ROM
    Platform : PC#Unix#Macintosh
    Contents Click on the arrow to display content.
    written corpus #1704
    written corpus #2704
    Resource files
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Debates in Portuguese
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Debates in Dutch
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Debates in Italian
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Debates in French
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Debates in Spanish
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Handelsblatt
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Het Financieele Dagblad
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - The Financial Times
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Expansion
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Le Monde
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Il Sole 24 Ore
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Debates in English
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Debates in Greek
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Debates in German
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Debates in Danish
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Questions in Danish
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Questions in German
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Questions in English
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Questions in Spanish
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Questions in Greek
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Questions in Italian
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Questions in Dutch
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Questions in Portuguese
  • ICON_FILE_DOWNLOAD TEXT_SAMPLE - Questions in French
  •  
    Members Prices
    Academic - Research 450.00 EUR
    Commercial - Research 1600.00 EUR
    Non Member Prices
    Academic - Research 1200.00 EUR
    Commercial - Research 3600.00 EUR

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0