Development of a Speech-to-Text (STT) System for the Breton Language

Please use this identifier to cite or link to this item: https://repository.esi-sba.dz/jspui/handle/123456789/851

Title:	Development of a Speech-to-Text (STT) System for the Breton Language
Authors:	ZEMOURI, OUassim
Keywords:	Deep Learning Automatic Speech Recognition Breton Language Model Evaluation Low-resource Languages Data Cleaning
Issue Date:	2025
Abstract:	This thesis explores Automatic Speech Recognition (ASR) for Breton, a low-resource language with significant dialectal variation. We evaluate several ASR models including OpenAI’s Whisper models, focusing on Whisper-Large, across two datasets: Mozilla Common Voice 21 and La Banque Sonore des Dialectes Bretons (BSDB). Experiments were conducted with and without text cleaning, using Word Error Rate (WER) and Character Error Rate (CER) as evaluation metrics. Due to resource limitations, full fine-tuning of Whisper proved challenging, leading to the use of Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA. A finetuned Whisper-Small model was produced, demonstrating the effectiveness of PEFT for under-resourced languages. This work underlines the potential of modern ASR models and efficient adaptation techniques to improve speech recognition for Breton language and offers insights applicable to other low-resource languages.
URI:	https://repository.esi-sba.dz/jspui/handle/123456789/851
Appears in Collections:	Master

Files in This Item:

File	Description	Size	Format
Zemouri_Ouassim_Thesis_Master-1-1.pdf		84,35 kB	Adobe PDF	View/Open