Leveraging cross-resolution attention for effective extreme low-resolution video action recognition


Oguz O., İKİZLER CİNBİŞ N.

Signal, Image and Video Processing, vol.18, no.1, pp.399-406, 2024 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 18 Issue: 1
  • Publication Date: 2024
  • Doi Number: 10.1007/s11760-023-02766-x
  • Journal Name: Signal, Image and Video Processing
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
  • Page Numbers: pp.399-406
  • Keywords: Cross-resolution attention, Extreme low-resolution action recognition, Knowledge distillation
  • Hacettepe University Affiliated: Yes

Abstract

Recognizing human actions in extremely low-resolution (eLR) videos poses a formidable challenge in the action recognition domain due to the lack of temporal and spatial information in the corresponding eLR frames. In this work, we propose a novel eLR video human action recognition architecture that recognize actions in an eLR setup. The proposed approach and its variants utilize an expanded knowledge distillation scheme that provides the essential flow of information from high-resolution (HR) frames to eLR frames. To further improve the generalization capability, we integrate cross-resolution attention modules that can operate without HR information during inference time. Additionally, we investigate the impact of an eLR data preprocessing pipeline that leverages a super-resolution algorithm and experimentally show the efficacy of the proposed models in eLR space. Our experiments indicate the importance of examining eLR human action recognition and demonstrate that the proposed methods can surpass and/or compete with the current state-of-the-art methods, achieving effective generalization capabilities on both UCF-101 and HMDB-51 datasets.