Automatic rule learning exploiting morphological features for named entity recognition in Turkish

Tatar, Serhan; ÇİÇEKLİ, İLYAS

doi:10.1177/0165551511398573

Automatic rule learning exploiting morphological features for named entity recognition in Turkish

Tatar S., ÇİÇEKLİ İ.

JOURNAL OF INFORMATION SCIENCE, cilt.37, sa.2, ss.137-151, 2011 (SCI-Expanded, SSCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 37 Sayı: 2
Basım Tarihi: 2011
Doi Numarası: 10.1177/0165551511398573
Dergi Adı: JOURNAL OF INFORMATION SCIENCE
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
Sayfa Sayıları: ss.137-151
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Hacettepe Üniversitesi Adresli: Evet

Özet

Named entity recognition (NER) is one of the basic tasks in automatic extraction of information from natural language texts. In this paper, we describe an automatic rule learning method that exploits different features of the input text to identify the named entities located in the natural language texts. Moreover, we explore the use of morphological features for extracting named entities from Turkish texts. We believe that the developed system can also be used for other agglutinative languages. The paper also provides a comprehensive overview of the field by reviewing the NER research literature. We conducted our experiments on the TurkIE dataset, a corpus of articles collected from different Turkish newspapers. Our method achieved an average F-score of 91.08% on the dataset. The results of the comparative experiments demonstrate that the developed technique is successfully applicable to the task of automatic NER and exploiting morphological features can significantly improve the NER from Turkish, an agglutinative language.