Two learning approaches for protein name extraction


Tatar S., Cicekli I.

JOURNAL OF BIOMEDICAL INFORMATICS, cilt.42, sa.6, ss.1046-1055, 2009 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 42 Sayı: 6
  • Basım Tarihi: 2009
  • Doi Numarası: 10.1016/j.jbi.2009.05.004
  • Dergi Adı: JOURNAL OF BIOMEDICAL INFORMATICS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.1046-1055
  • Hacettepe Üniversitesi Adresli: Hayır

Özet

Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. in the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types.