Automatic rule learning exploiting morphological features for named entity recognition in Turkish


Tatar S., ÇİÇEKLİ İ.

JOURNAL OF INFORMATION SCIENCE, cilt.37, ss.137-151, 2011 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 37 Konu: 2
  • Basım Tarihi: 2011
  • Doi Numarası: 10.1177/0165551511398573
  • Dergi Adı: JOURNAL OF INFORMATION SCIENCE
  • Sayfa Sayıları: ss.137-151

Özet

Named entity recognition (NER) is one of the basic tasks in automatic extraction of information from natural language texts. In this paper, we describe an automatic rule learning method that exploits different features of the input text to identify the named entities located in the natural language texts. Moreover, we explore the use of morphological features for extracting named entities from Turkish texts. We believe that the developed system can also be used for other agglutinative languages. The paper also provides a comprehensive overview of the field by reviewing the NER research literature. We conducted our experiments on the TurkIE dataset, a corpus of articles collected from different Turkish newspapers. Our method achieved an average F-score of 91.08% on the dataset. The results of the comparative experiments demonstrate that the developed technique is successfully applicable to the task of automatic NER and exploiting morphological features can significantly improve the NER from Turkish, an agglutinative language.