Persian Kids’ Speech Corpus – ELRA Catalogue

Last view: 2024-07-24

236 Last view: 2024-07-24

Last update: 2023-06-20

2 Last update: 2023-06-20

Persian Kids’ Speech Corpus

View resource name in all available languages

Corpus oral du persan parlé par des enfants

ISLRN: 822-550-731-416-7

ID:

ELRA-S0487

The Persian Kids’ Speech Corpus consists of speech signals recorded by 286 children (141 girls, 145 boys), from 6 to 9 years old, through an Andreas Mic Anti-Noise microphone and a Premium Speechmike headphone. The CoolEdit Pro2.1 software was utilized to record the speech at 16 kHz, single-channel, 16bit resolution, and save it in WAV files. The data was recorded in the school environment, so some audio files contain the real environment noises. This recorded data was manually checked and labeled. Segmentation and labeling were performed through Praat software. Finally, a corpus containing 162,395 samples with a duration of 33 hours and 44 minutes was created. The samples are distributed as follows:
- 29,057 Words (478 minutes),
- 17,429 SubWords (260 minutes),
- 43,838 Syllables (485 minutes),
- 70,078 Phonemes (765 minutes),
- 1,993 Extra Vocabulary (36 minutes).
The prepared speech corpus comprehensively contains all the 29 Persian phonemes, 118 syllables, 56 sub-words, and 711 words and is particularly applicable to speech recognition and linguistics studies.

View resource description in French

Le corpus oral du persan parlé par des enfants comprend les signaux de parole de 286 enfants (141 filles, 145 garçons), âgés de 6 à 9 ans, enregistrés via un microphone Andreas Mic Anti-Noise et un casque Premium Speechmike. Le logiciel CoolEdit Pro2.1 a été utilisé pour enregistrer le discours à 16 kHz, monocanal, résolution 16 bits, que a été ensuite sauvegardé en fichiers WAV. Les données ayant été enregistrées dans l'environnement scolaire, certains fichiers audios contiennent les bruits réels de l'environnement. Les données enregistrées ont été vérifiées et étiquetées manuellement. La segmentation et l’étiquetage ont été réalisés à partir du logiciel Praat. Le corpus final contient 162395 échantillons d'une durée de 33 heures et 44 minutes. Les échantillons sont répartis comme suit:
- 29057 mots (478 minutes),
- 17429 sous-mots (260 minutes),
- 43838 syllables (485 minutes),
- 70078 phonèmes (765 minutes),
- 1993 vocabulaire complémentaire (36 minutes).
Le corpus oral obtenu contient de manière exhaustive l’ensemble des 29 phonèmes persans, 118 syllabes, 56 sous-mots et 711 mots et il est particulièrement approprié pour la reconnaissance de la parole et les études linguistiques.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50.00 €	1000.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	1500.00 €
Licence: Commercial Use - ELRA VAR	1500.00 €	1500.00 €

DistributionAvailability start date 20/06/2023 Contact Person

Valérie Mapelli

audio

Monolingual audio corpusLanguages

Persian

Linguality

Linguality type: Monolingual

Size

162,395 Entries

Metadata

Created: 06/20/2023

Last Updated: 06/20/2023

Metadata Language: French, English (fr, en)

People who looked at this resource also viewed the following: