Turkish dialect recognition in terms of prosodic by long short-term memory neural networks

IŞIK, GÜLTEKİN; ARTUNER, HARUN

doi:10.17341/gazimmfd.453677

Turkish dialect recognition in terms of prosodic by long short-term memory neural networks

JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, cilt.35, sa.1, ss.213-224, 2020 (SCI-Expanded, Scopus, TRDizin)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 35 Sayı: 1
Basım Tarihi: 2020
Doi Numarası: 10.17341/gazimmfd.453677
Dergi Adı: JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Art Source, Compendex, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.213-224
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Hacettepe Üniversitesi Adresli: Evet

Özet

Dialects are forms of speech, separated from languages which they belong to in terms of some characteristics and which are specific to a certain region of the country. Obtaining dialect-specific characteristics and recognition of dialects using them is among the popular topics in speech processing. In particular, the dialect of the speech is asked to be identified first in order to improve the performance of large scale speech recognition systems. Languages/dialects are distinguished from one another by prosodic features such as intonation, stress and rhythm. These perceptual features are obtained by measuring the pitch, energy and duration at the physical level, respectively. In recent years, with the increasing popularity of deep neural networks, Long Short-Term Memory (LSTM) neural networks are frequently used in sequence classification and language modeling problems. LSTM neural networks are successful in modeling long-term contextual information. In this study, Turkish dialect recognition was performed with LSTM neural networks using prosodic features. Here, LSTM neural networks were used both as sequence classifier and language modeler. It was observed that the proposed methods gave an accuracy rate of 78.7% on the Turkish dataset consisting of Ankara, Alanya, Kibris and Trabzon dialects.