Unsupervised Morphological Segmentation Using Neural Word Embeddings


Creative Commons License

Ustun A., CAN BUĞLALILAR B.

4th International Conference on Statistical Language and Speech Processing (SLSP), Pilsen, Çek Cumhuriyeti, 11 - 12 Ekim 2016, cilt.9918, ss.43-53 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 9918
  • Doi Numarası: 10.1007/978-3-319-45925-7_4
  • Basıldığı Şehir: Pilsen
  • Basıldığı Ülke: Çek Cumhuriyeti
  • Sayfa Sayıları: ss.43-53
  • Hacettepe Üniversitesi Adresli: Evet

Özet

We present a fully unsupervised method for morphological segmentation. Unlike many morphological segmentation systems, our method is based on semantic features rather than orthographic features. In order to capture word meanings, word embeddings are obtained from a two-level neural network [11]. We compute the semantic similarity between words using the neural word embeddings, which forms our baseline segmentation model. We model morphotactics with a bigram language model based on maximum likelihood estimates by using the initial segmentations from the baseline. Results show that using semantic features helps to improve morphological segmentation especially in agglutinating languages like Turkish. Our method shows competitive performance compared to other unsupervised morphological segmentation systems.