Stem-based PoS Tagging for Agglutinative Languages


Creative Commons License

BÖLÜCÜ N., CAN BUĞLALILAR B.

25th Signal Processing and Communications Applications Conference (SIU), Antalya, Türkiye, 15 - 18 Mayıs 2017 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası:
  • Doi Numarası: 10.1109/siu.2017.7960386
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Hacettepe Üniversitesi Adresli: Evet

Özet

Words are made up of morphemes being glued together in agglutinative languages. This makes it difficult to perform part-of-speech tagging for these languages due to sparsity. In this paper, we present two Hidden Markov Model based Bayesian PoS tagging models for agglutinative languages. Our first model is word-based and the second model is stem-based where the stems of the words are obtained from other two unsupervised stemmers: HPS stemmer and Morfessor FlatCat. The results show that stemming improves the accuracy in PoS tagging. We present the results for Turkish as an agglutinative language and English as a morphologically poor language.