Stem-based PoS Tagging for Agglutinative Languages


Creative Commons License

BÖLÜCÜ N., CAN BUĞLALILAR B.

25th Signal Processing and Communications Applications Conference (SIU), Antalya, Turkey, 15 - 18 May 2017, (Full Text) identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • Doi Number: 10.1109/siu.2017.7960386
  • City: Antalya
  • Country: Turkey
  • Hacettepe University Affiliated: Yes

Abstract

Words are made up of morphemes being glued together in agglutinative languages. This makes it difficult to perform part-of-speech tagging for these languages due to sparsity. In this paper, we present two Hidden Markov Model based Bayesian PoS tagging models for agglutinative languages. Our first model is word-based and the second model is stem-based where the stems of the words are obtained from other two unsupervised stemmers: HPS stemmer and Morfessor FlatCat. The results show that stemming improves the accuracy in PoS tagging. We present the results for Turkish as an agglutinative language and English as a morphologically poor language.