Clustering Word Roots Syntactically


Creative Commons License

Ozturk M. B., CAN BUĞLALILAR B.

24th Signal Processing and Communication Application Conference (SIU), Zonguldak, Turkey, 16 - 19 May 2016, pp.1461-1464, (Full Text) identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • Doi Number: 10.1109/siu.2016.7496026
  • City: Zonguldak
  • Country: Turkey
  • Page Numbers: pp.1461-1464
  • Hacettepe University Affiliated: Yes

Abstract

Distributional representation of words is used for both syntactic and semantic tasks. In this paper two different methods are presented for clustering word roots. In the first method, the distributional model word2vec [1] is used for clustering word roots, whereas distributional approaches are generally used for words. For this purpose, the distributional similarities of roots are modeled and the roots are divided into syntactic categories (noun, verb etc.). In the other method, two different models are proposed: an information theoretical model and a probabilistic model. With a metric [8] based on mutual information and with another metric based on Jensen-Shannon divergence, similarities of word roots are calculated and clustering is performed using these metrics. Clustering word roots has a significant role in other natural language processing applications such as machine translation and question answering, and in other applications that include language generation. We obtained a purity of 0.92 from the obtained clusters.