A Syllable-Based Turkish Speech Recognition System by Using Time Delay Neural Networks (TDNNs)


International Conference of Soft Computing and Pattern Recognition (SoCPaR), Ha-Noi, Vietnam, 15 December 2013 - 18 December 2015, pp.219-224 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • Doi Number: 10.1109/socpar.2013.7054130
  • City: Ha-Noi
  • Country: Vietnam
  • Page Numbers: pp.219-224
  • Hacettepe University Affiliated: Yes


In this paper, we present a model for Turkish speech recognition. The model is syllable-based, where the recognition is performed through syllables as speech recognition units. The main goal of the model is to recognize as much as possible of a given continuous speech by identifying only a small set of syllables in the language. For that purpose, only the syllable types with a higher frequency are selected for the recognition. The use of longer recognition units in speech recognition systems increases the success of the recognition since it is easier to detect the endpoints of syllables when compared to phonemes. On the other side, word-based recognition requires a very large dataset that includes all the words and word forms in the language, which is also another challenge. Hereby, we take the advantage of Turkish being an ortographically transparent and syllabified language. Our model employs time delay neural networks (TDNNs) for learning syllables. We achieve an accuracy of %65.6 on our large vocabulary continuous speech corpus. In addition, we define an algorithm for the automatic detection of syllable boundaries which gives an accuracy of %44. The automatic syllable boundary detection module is used for the recognition of isolated syllables rather than a continuous speech.