Synset expansion on translation graph for automatic wordnet construction

ERCAN, GÖNENÇ; Haziyev, Farid

doi:10.1016/j.ipm.2018.10.002

Synset expansion on translation graph for automatic wordnet construction

ERCAN G., Haziyev F.

INFORMATION PROCESSING & MANAGEMENT, cilt.56, sa.1, ss.130-150, 2019 (SCI-Expanded, SSCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 56 Sayı: 1
Basım Tarihi: 2019
Doi Numarası: 10.1016/j.ipm.2018.10.002
Dergi Adı: INFORMATION PROCESSING & MANAGEMENT
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
Sayfa Sayıları: ss.130-150
Hacettepe Üniversitesi Adresli: Evet

Özet

Research on clustering algorithms in synonymy graphs of a single language yields promising results, however, this idea is not yet explored in a multilingual setting. Nevertheless, moving the problem to a multilingual translation graph enables the use of more clues and techniques not possible in a monolingual synonymy graph. This article explores the potential of sense induction methods in a massively multilingual translation graph. For this purpose, the performance of graph clustering methods in synset detection are investigated. In the context of translation graphs, the use of existing Wordnets in different languages is an important clue for synset detection which cannot be utilized in a monolingual setting. Casting the problem into an unsupervised synset expansion task rather than a clustering or community detection task improves the results substantially. Furthermore, instead of a greedy unsupervised expansion algorithm guided by heuristics, we devise a supervised learning algorithm able to learn synset expansion patterns from the words in existing Wordnets to achieve superior results. As the training data is formed of already existing Wordnets, as opposed to previous work, manual labeling is not required. To evaluate our methods, Wordnets for Slovenian, Persian, German and Russian are built from scratch and compared to their manually built Wordnets or labeled test-sets. Results reveal a clear improvement over 2 state-of-the-art algorithms targeting massively multilingual Wordnets and competitive results with Wordnet construction methods targeting a single language. The system is able to produce Wordnets from scratch with a Wordnet base concept coverage ranging from 20% to 88% for 51 languages and expands existing Wordnets up to 30%.