Binary background model with geometric mean for author-independent authorship verification


Canbay P., SEZER E., Sever H.

JOURNAL OF INFORMATION SCIENCE, cilt.49, sa.2, ss.448-464, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 49 Sayı: 2
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1177/01655515211007710
  • Dergi Adı: JOURNAL OF INFORMATION SCIENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, Academic Search Premier, FRANCIS, IBZ Online, Periodicals Index Online, ABI/INFORM, Aerospace Database, Analytical Abstracts, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, EBSCO Education Source, Education Abstracts, Index Islamicus, Information Science and Technology Abstracts, INSPEC, Library and Information Science Abstracts, Library Literature and Information Science, Library, Information Science & Technology Abstracts (LISTA), Metadex, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.448-464
  • Anahtar Kelimeler: Authorship verification, binary background model, document-pair verification, forensic authorship, geometric mean, Turkish blog authorship corpus
  • Hacettepe Üniversitesi Adresli: Evet

Özet

Authorship verification (AV) is one of the main problems of authorship analysis and digital text forensics. The classical AV problem is to decide whether or not a particular author wrote the document in question. However, if there is one and relatively short document as the author's known document, the verification problem becomes more difficult than the classical AV and needs a generalised solution. Regarding to decide AV of the given two unlabeled documents (2D-AV), we proposed a system that provides an author-independent solution with the help of a Binary Background Model (BBM). The BBM is a supervised model that provides an informative background to distinguish document pairs written by the same or different authors. To evaluate the document pairs in one representation, we also proposed a new, simple and efficient document combination method based on the geometric mean of the stylometric features. We tested the performance of the proposed system for both author-dependent and author-independent AV cases. In addition, we introduced a new, well-defined, manually labelled Turkish blog corpus to be used in subsequent studies about authorship analysis. Using a publicly available English blog corpus for generating the BBM, the proposed system demonstrated an accuracy of over 90% from both trained and unseen authors' test sets. Furthermore, the proposed combination method and the system using the BBM with the English blog corpus were also evaluated with other genres, which were used in the international PAN AV competitions, and achieved promising results.