LogoSENSE: A companion HOG based logo detection scheme for phishing web page and E-mail brand recognition


BOZKIR A. S., AYDOS M.

COMPUTERS & SECURITY, cilt.95, 2020 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 95
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1016/j.cose.2020.101855
  • Dergi Adı: COMPUTERS & SECURITY
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, PASCAL, ABI/INFORM, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, Criminal Justice Abstracts, INSPEC, Metadex, Civil Engineering Abstracts
  • Hacettepe Üniversitesi Adresli: Evet

Özet

With the advent of Internet and opportunities in e-commerce, a visual perception oriented cyber-attack so-called phishing has become one of the tremendous problems of the cyber world since it aims to access user credentials in order to gain illegal financial profit and steal sensitive personal data. In order to fight with this security threat, various studies using a different source of information such as URL, text content, DOM trees or visual features belonging to web pages have been utilized. Apart from other works, we propose a companion scheme to recognize brands of "zero hour" phishing web pages by localizing and classifying the target brand logos involved in page screenshots by solely use of computer vision methods in object detection manner. For this purpose, the features of Histogram of Oriented Gradients (HOG) have been employed to obtain visual representations of target brand logos in scale invariant fashion. In addition, throughout the classification, a max-margin loss equipped SVM classifier has been used in order to work with a low number of training images and to decrease the number of false positives. Moreover, we prepared a publicly available dataset having a total of 3060 training and 1979 unique phishing and legitimate web page/e-mail snapshots along with their bounding box annotations for evaluation and further academic usage. Detailed experiments show that, at the best configuration, our schema named "LogoSENSE" is able to achieve 93.50% precision and of 77.94% recall score along with obtaining F1 score of 85.02%. The experiments show that the proposed approach outperforms SIFT based detection and presents comparative results against a state-of-art deep learning based object detection method. As a result, LogoSENSE serves promising results in terms of detection accuracy and run-time efficiency, yielding a companion tool that can be used as a brand recognition mechanism for phishing web pages and emails. (C) 2020 Elsevier Ltd. All rights reserved.