Two learning approaches for protein name extraction


Tatar S. , Cicekli I.

JOURNAL OF BIOMEDICAL INFORMATICS, cilt.42, ss.1046-1055, 2009 (SCI İndekslerine Giren Dergi) identifier identifier identifier

  • Cilt numarası: 42 Konu: 6
  • Basım Tarihi: 2009
  • Doi Numarası: 10.1016/j.jbi.2009.05.004
  • Dergi Adı: JOURNAL OF BIOMEDICAL INFORMATICS
  • Sayfa Sayıları: ss.1046-1055

Özet

Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. in the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types.