Two learning approaches for protein name extraction

Tatar S., Cicekli I.

JOURNAL OF BIOMEDICAL INFORMATICS, vol.42, no.6, pp.1046-1055, 2009 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 42 Issue: 6
  • Publication Date: 2009
  • Doi Number: 10.1016/j.jbi.2009.05.004
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.1046-1055
  • Hacettepe University Affiliated: No


Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. in the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types.