TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data – ELRA Catalogue

Last view: 2025-06-27

21 Last view: 2025-06-27

TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data

View resource name in all available languages

Corpus TRAD parallèle pachto-français (transcriptions d’actualités radio et télédiffusées) - Données d'entraînement

ISLRN: 802-643-297-429-4

ID:

ELRA-W0093

The corpus consists of the transcription of 106 hours of recordings in Pashto translated into French. The transcriptions are extracted from the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). It contains about 832,000 source words and 747,000 target words. No audio file is provided.

Pashto is an indo-iranian language spoken by the Pashtun people mainly in Pakistan and Afghanistan.

This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA). It was used as training data for language modelling in machine translation.

View resource description in French

Le corpus contient la traduction en français de 106 heures de transcription du Corpus TRAD d’actualités radio et télédiffusées en pachto (ELRA-S0381). La taille du corpus est d’environ 832 000 mots source pour 747 000 mots cible. Aucun fichier audio n’est fourni.

Le pachto (ou pachtou) appartient à la famille des langues indo-iranienne. Il est parlé par les Pachtounes, principalement au Pakistan et en Afghanistan.

Ce corpus a été produit par ELDA dans le cadre du projet PEA TRAD, avec le soutien de la Direction Générale de l'Armement (DGA). Il a été utilisé en tant que données d’entraînements pour créer des modèles de langue dans le domaine de la traduction automatique.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3000.00 €	10000.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4000.00 €	18000.00 €
Licence: Commercial Use - ELRA VAR	18000.00 €	18000.00 €

DistributionAvailability start date 06/04/2016 Contact Person

Valérie Mapelli

text

Bilingual text corpusLanguages

French Pushto; Pashto

Linguality

Linguality type: Bilingual

Size

27.6 Mb

Metadata

Created: 05/12/2005

Metadata Language: French, English (fr, en)

Version

Version: 1.0

Last Updated: 04/06/2016

People who looked at this resource also viewed the following: