1,328 language resources at your disposal
This is the new version of the ELRA Catalogue of Language Resources. If
you would like to view the older version,
click here
An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.
Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.
Latest Resources
English-Swedish parallel corpus from the Annual Overview of Sweden’s Official aid Agency SIDA Activities (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Source PDF files as parallel documents. The original texts are all always Swedish, the English text is its translation.
Polish-English parallel corpus from the website of the Ministry of Agriculture and Rural Development (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the Ministry of Agriculture and Rural Development, Republic of Poland (http://www.minrol.gov.pl)
General Romanian-English bilingual corpus (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Romanian – English corpus built from a Wikipedia dump.
Letter of rights for persons arrested and or detained (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Collection of transaltion units (1906 in total) in 21 language pairs extracted from 7 Police forms (one form 12 pages ...
Polish-English parallel corpus from the website of the Ministry of Foreign Affairs (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the Ministry of Foreign Affairs, Republic of Poland (https://mfa.gov.pl/en/)
EUIPO - list of goods and services Italian and French (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EUIPO list of goods and services format: TMX
Polish-English parallel corpus from the website of the Polish Tourism Organisation (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the Polish Tourism Organisation (https://pot.gov.pl/en)
English-Swedish corpus from Finnish Information Bank (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. http://www.infopankki.fi - Finland in your language - Information about Finland - Moving to Finland - Living in Finland
Croatian-English corpus with studies on the challenges to the Croatian Accession to the European Union from the Croatian Institute of Public Finance website (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Croatian-English corpus with studies on the challenges to the Croatian Accession to the European Union from the Croatian Institute of ...
Slovenian-English corpus with statistical reports from the Statistical Office of the Republic of Slovenia website (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Slovenian-English corpus with statistical reports from the Statistical Office of the Republic of Slovenia website. The resource contains pdf files ...
Polish-English parallel corpus from the website "geoportal.gov.pl" (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website "geoportal.gov.pl (https://www.geoportal.gov.pl)
BMI Brochures and Website 2016 (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual tmx file of German to English translations of the Federal Ministry of the Interior's website and brochures. Topics include ...
Bilingual Bulgarian-English corpus from the 2018 Proposal for a National Climate Change Adaptation Strategy and Action Plan from the website of the Bulgarian Ministry of Environment and Water (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bilingual Bulgarian-English corpus from the 2018 Proposal for a National Climate Change Adaptation Strategy and Action Plan
Polish-English parallel corpus from the website of the National Centre for Research and Development (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the National Centre for Research and Development (https://www.ncbr.gov.pl)
Croatian-English parallel corpus from the website of the Ministry of Foreign and European Affairs, Republic of Croatia (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Croatian-English parallel corpus from the website of the Ministry of Foreign and European Affairs, Republic of Croatia (http://www.mvep.hr)
English-Swedish parallel corpus from the web site of the Swedish Migration Board - Migrationsverket (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. All texts have been collected from their website of the Swedish Migration Board. The original text is always in Swedish, ...
EUIPO - list of goods and services French and English (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EUIPO list of goods and services format: TMX
Parallel corpus (en-pl) from the Export Promotion Portal of Poland (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A paralell corpus constructed from data acquired form the *.trade.gov.pl websites
Parallel texts from the Swedish Competition Authority - Konkurrensverket (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts. The original texts are all always Swedish, the English text is its translation.
English-Lithuanian EASTIN-CL Multilingual Ontology of Assistive Technology (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EASTIN-CL Multilingual Ontology of Assistive Technology was created within the EASTIN-CL project aimed at applying language technologies to portal of ...
ENGLISH/POLISH PHRASE BOOK FOR ADMINISTRATIVE STAFF of LOCAL GOVERNMENT UNITS (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. An English/Polish phrase book for the administrative staff of local government units (LGUs).
International Agreements (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. International Agreements have been translated into national languages. The local translations were collected from the national legislation website. Attempt was ...
EUIPO - list of goods and services German and Italian (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EUIPO list of goods and services format: TMX
Public Procurement Dataset 1 (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. A collection of parallel Polish-English texts published by the Polish Public Procurement Office. Sentence-level alignment of translation segments was carried ...
Maltese-English website parallel corpus (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. This is a parallel corpus of bilingual texts crawled from multilingual websites, which contains 26,622 TUs. Date of crawling : ...
Financial Stability Reports from the National Bank of Poland (2015-16) (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Financial Stability Reports from the National Bank of Poland (2015-16)
DA-EN Danish Ministry of Higher Education and Science 2 (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts Danish-English from the Danish Ministry of Higher Education and Science, size 115,000 words, topic: research policy (Processed)
Czech Banking Association Terminology (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Terms in Czech - English relating to finance
Trilingual Documents related to International Judicial Cooperation in Civil Matters (Greek-English-French) (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Trilingual (Greek-English-French) documents - standard forms: requests, certificates, summaries, notices related to the implementation of Hague Convention of 1980, European ...
Polish-English parallel corpus from the website of the Office of the Commissioner for Human Rights (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the Office of the Commissioner for Human Rights (https://www.rpo.gov.pl/en)
Corpus of State-related content from the Latvian Web (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Latvian Web, home pages of ministries and state public services, army, etc. were crawled, and parallel Latvian-English content was collected. ...
Translation memory from Swedish National Audit Office (NAO) - Riksrevisionen (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Translation memory from Swedish National Audit Office
English-Estonian EASTIN-CL Multilingual Ontology of Assistive Technology (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EASTIN-CL Multilingual Ontology of Assistive Technology was created within the EASTIN-CL project aimed at applying language technologies to portal of ...
Polish-English parallel corpus from the website of the Citizens Information Board (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the Citizens Information Board, Ireland (http://www.citizensinformation.ie)
SIP Internal dictionary (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Collection of 228 translated terms between English, German and French, but not always in all three langagues
BMVI Website (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. tmx file, 2718 TUs, bilingual German/English, texts from the website of the Federal Ministry of Transport and Digital Infrastructure (BMVI) ...
Macroeconomic Developments (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Bulletins of Macroeconomic Developments
Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Letter of rights for persons arrested on the basis of a European Arrest Warrant (EAW), 1 page, (Processed)
English-Slovak parallel corpus of texts from The Ministry of Culture of the Slovak Republic (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Dataset of various English-Slovak legal texts within agenda of the Ministry, plain text format alligned at the sentence level, the ...
English-Swedish parallel texts from The Swedish Agency for Economic and Regional Growth - Tillväxtverket (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts from The Swedish Agency for Economic and Regional Growth (Tillväxtverket). Original texts are in Swedish, the English texts ...
Bilingual hr-en parallel corpus from the National and University Library in Zagreb website (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Contents of http://www.nsk.hr were crawled, aligned on document and sentence level and converted into a parallel corpus
Polish-English parallel corpus from the website of the Ministry of Justice (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the Ministry of Justice, Republic of Poland (https://www.ms.gov.pl)
EUIPO - list of goods and services German and English (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EUIPO list of goods and services format: TMX
Croatian-English parallel corpus from the website of the Embassy of Finland, Zagreb (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Croatian-English parallel corpus from the website of the Embassy of Finland, Zagreb (http://www.finland.hr)
Polish-English parallel corpus from the website of the Ministry of Digital Affairs (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the Ministry of Digital Affairs (http://archiwum.mc.gov.pl and http://krmc.mc.gov.pl)
Polish-English parallel corpus from the website of the U.S. EMBASSY and CONSULATE IN POLAND (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the U.S. EMBASSY and CONSULATE IN POLAND (https://pl.usembassy.gov/)
Bilingual hr-en parallel corpus from Croatian National Bank website (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Contents of http://www.hnb.hr were crawled, aligned on document and sentence level and converted into a parallel corpus
Parallel corpus (Greek - English) in the law domain (Processed) (Part1)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel (el-en) corpus of 1979 translation units in the law domain.
Polish-English parallel corpus from the website of the State Marine Accident Investigation Commission (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the State Marine Accident Investigation Commission (http://pkbwm.gov.pl)
DA-EN Danish Ministry of Higher Education and Science (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts Danish-English from the Danish Ministry of Higher Education and Science, size: 120,000 words, topic: innovation, science (Processed)
Parallel Global Voices (English - Polish) (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel Global Voices EN-PL is a parallel corpus generated from the Global Voices multilingual group of websites (http://globalvoices.org/), where volunteers ...
EUIPO - list of goods and services Italian and English (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. EUIPO list of goods and services format: TMX
Hallituskausi 2011-2015 -- Finnish-English Translation Memory (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Information on the "Hallituskausi 2011–" translation memory: The "Hallituskausi 2011–" translation memory is intended for those translating administrative texts between ...
Corpus of Icelandic texts from the Central Bank of Iceland (Processed)
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Corpus of Icelandic texts from the Central Bank of Iceland (Processed)
Home