Turkish dialect recognition in terms of prosodic by long short-term memory neural networks


Creative Commons License

IŞIK G., ARTUNER H.

JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, cilt.35, sa.1, ss.213-224, 2020 (SCI-Expanded) identifier identifier

Özet

Dialects are forms of speech, separated from languages which they belong to in terms of some characteristics and which are specific to a certain region of the country. Obtaining dialect-specific characteristics and recognition of dialects using them is among the popular topics in speech processing. In particular, the dialect of the speech is asked to be identified first in order to improve the performance of large scale speech recognition systems. Languages/dialects are distinguished from one another by prosodic features such as intonation, stress and rhythm. These perceptual features are obtained by measuring the pitch, energy and duration at the physical level, respectively. In recent years, with the increasing popularity of deep neural networks, Long Short-Term Memory (LSTM) neural networks are frequently used in sequence classification and language modeling problems. LSTM neural networks are successful in modeling long-term contextual information. In this study, Turkish dialect recognition was performed with LSTM neural networks using prosodic features. Here, LSTM neural networks were used both as sequence classifier and language modeler. It was observed that the proposed methods gave an accuracy rate of 78.7% on the Turkish dataset consisting of Ankara, Alanya, Kibris and Trabzon dialects.