A content-based citation analysis study based on text categorization


SCIENTOMETRICS, vol.114, no.1, pp.335-357, 2018 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 114 Issue: 1
  • Publication Date: 2018
  • Doi Number: 10.1007/s11192-017-2560-2
  • Journal Name: SCIENTOMETRICS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
  • Page Numbers: pp.335-357
  • Keywords: Content-based citation analysis, Qualitative research evaluation, Text categorization, Weka, SCIENCE, COUNTS, INCENTIVES, AGREEMENT, RETRIEVAL, QUALITY, IMPACT
  • Hacettepe University Affiliated: Yes


Publications and citations are important components for measuring research performance. Academics receive incentives, tenures, or awards from the number of citations they receive; however, the use of citations for research/er evaluation purposes can give rise to unethical practices and manipulation. Consequently, it is necessary to change the current approach to the use of citations. The main aim of this study was to conduct a content-based citation analysis study for Turkish citations. To achieve this aim, 423 peer-reviewed articles, the associated 12,881 references, and 101,019 sentences published in library and information science literature in Turkey were thoroughly examined. The citations were divided into four main categories; citation meaning, citation purpose, citation shape, and citation array. Then, each category was further divided into sub-categories. A tagging process with inter-annotator agreement was conducted and citation categories for the citation sentences determined. Weka software was used to apply the text categorization methods. The automatic citation sentence classification achieved at least a 90% success rate for all citation classes, which proved that using computational linguistics to evaluate citation contexts developing new techniques was possible and gave more detailed results.