Multilingual Domain Adaptation for Speech Recognition Using LLMs

Ulu E. N., Derya E., Tümer D., Demirel B., Karamanlıoğlu A.

28th International Conference on Text, Speech, and Dialogue, TSD 2025, Erlangen, Almanya, 25 - 28 Ağustos 2025, cilt.16029 LNAI, ss.381-393, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 16029 LNAI
Doi Numarası: 10.1007/978-3-032-02548-7_32
Basıldığı Şehir: Erlangen
Basıldığı Ülke: Almanya
Sayfa Sayıları: ss.381-393
Anahtar Kelimeler: Domain Adaptation, Large Language Models, Multilingual Speech Recognition, Whisper
Hacettepe Üniversitesi Adresli: Hayır

Özet

We present a practical pipeline for multilingual domain adaptation in automatic speech recognition (ASR) that combines the Whisper model with large language models (LLMs). Using Aya-23-8B, Common Voice transcripts in 22 languages are automatically classified into the Law and Healthcare domains, producing high-quality domain labels at a fraction of the manual cost. These labels drive parameter-efficient (LoRA) fine-tuning of Whisper and deliver consistent relative Word Error Rate (WER) reductions of up to 14.3% for languages that contribute at least 800 in-domain utterances. A data-volume analysis reveals a clear breakpoint: gains become reliably large once that 800-utterance threshold is crossed, while monolingual tuning still rescues performance in truly low-resource settings. The workflow therefore shifts the key success factor from expensive hand labelling to scalable data acquisition, and can be replicated in new domains with minimal human intervention.