Leveraging cross-resolution attention for effective extreme low-resolution video action recognition


Oguz O., İKİZLER CİNBİŞ N.

Signal, Image and Video Processing, cilt.18, sa.1, ss.399-406, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 18 Sayı: 1
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1007/s11760-023-02766-x
  • Dergi Adı: Signal, Image and Video Processing
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
  • Sayfa Sayıları: ss.399-406
  • Anahtar Kelimeler: Cross-resolution attention, Extreme low-resolution action recognition, Knowledge distillation
  • Hacettepe Üniversitesi Adresli: Evet

Özet

Recognizing human actions in extremely low-resolution (eLR) videos poses a formidable challenge in the action recognition domain due to the lack of temporal and spatial information in the corresponding eLR frames. In this work, we propose a novel eLR video human action recognition architecture that recognize actions in an eLR setup. The proposed approach and its variants utilize an expanded knowledge distillation scheme that provides the essential flow of information from high-resolution (HR) frames to eLR frames. To further improve the generalization capability, we integrate cross-resolution attention modules that can operate without HR information during inference time. Additionally, we investigate the impact of an eLR data preprocessing pipeline that leverages a super-resolution algorithm and experimentally show the efficacy of the proposed models in eLR space. Our experiments indicate the importance of examining eLR human action recognition and demonstrate that the proposed methods can surpass and/or compete with the current state-of-the-art methods, achieving effective generalization capabilities on both UCF-101 and HMDB-51 datasets.