Leveraging cross-resolution attention for effective extreme low-resolution video action recognition

Oguz, Oguzhan; İKİZLER CİNBİŞ, NAZLI

doi:10.1007/s11760-023-02766-x

Leveraging cross-resolution attention for effective extreme low-resolution video action recognition

Oguz O., İKİZLER CİNBİŞ N.

Signal, Image and Video Processing, cilt.18, sa.1, ss.399-406, 2024 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 18 Sayı: 1
Basım Tarihi: 2024
Doi Numarası: 10.1007/s11760-023-02766-x
Dergi Adı: Signal, Image and Video Processing
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
Sayfa Sayıları: ss.399-406
Anahtar Kelimeler: Cross-resolution attention, Extreme low-resolution action recognition, Knowledge distillation
Hacettepe Üniversitesi Adresli: Evet

Özet

Recognizing human actions in extremely low-resolution (eLR) videos poses a formidable challenge in the action recognition domain due to the lack of temporal and spatial information in the corresponding eLR frames. In this work, we propose a novel eLR video human action recognition architecture that recognize actions in an eLR setup. The proposed approach and its variants utilize an expanded knowledge distillation scheme that provides the essential flow of information from high-resolution (HR) frames to eLR frames. To further improve the generalization capability, we integrate cross-resolution attention modules that can operate without HR information during inference time. Additionally, we investigate the impact of an eLR data preprocessing pipeline that leverages a super-resolution algorithm and experimentally show the efficacy of the proposed models in eLR space. Our experiments indicate the importance of examining eLR human action recognition and demonstrate that the proposed methods can surpass and/or compete with the current state-of-the-art methods, achieving effective generalization capabilities on both UCF-101 and HMDB-51 datasets.