Unsupervised learning of allomorphs in Turkish

CAN BUĞLALILAR, BURCU

doi:10.3906/elk-1605-216

Unsupervised learning of allomorphs in Turkish

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, cilt.25, sa.4, ss.3253-3260, 2017 (SCI-Expanded, Scopus, TRDizin)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 25 Sayı: 4
Basım Tarihi: 2017
Doi Numarası: 10.3906/elk-1605-216
Dergi Adı: TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.3253-3260
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Hacettepe Üniversitesi Adresli: Evet

Özet

One morpheme may have several surface forms that correspond to allomorphs. In English, ed and d are surface forms of the past tense morpheme, and s, es, and ies are surface forms of the plural or present tense morpheme. Turkish has a large number of allomorphs due to its morphophonemic processes. One morpheme can have tens of different surface forms in Turkish. This leads to a sparsity problem in natural language processing tasks in Turkish. Detection of allomorphs has not been studied much because of its difficulty. For example, tu and di are Turkish allomorphs (i.e. past tense morpheme), but all of their letters are different. This paper presents an unsupervised model to extract the allomorphs in Turkish. We are able to obtain an F-measure of 73.71% in the detection of allomorphs, and our model outperforms previous unsupervised models on morpheme clustering.