Comparison of 2D and 3D attention mechanisms for human (collective) activity recognition


ZALLUHOĞLU C., İKİZLER CİNBİŞ N.

SIGNAL IMAGE AND VIDEO PROCESSING, vol.16, no.4, pp.865-872, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 16 Issue: 4
  • Publication Date: 2022
  • Doi Number: 10.1007/s11760-021-02028-8
  • Journal Name: SIGNAL IMAGE AND VIDEO PROCESSING
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
  • Page Numbers: pp.865-872
  • Keywords: Collective activity recognition, Action recognition, Convolutional neural networks, Attention
  • Hacettepe University Affiliated: Yes

Abstract

In this study, we utilize attention mechanisms to leverage the spatio-temporal information available in videos for the action recognition and collective activity recognition tasks. In this context, we explore 2D and 3D attention mechanisms and investigate their effect on capturing the related action information. To this end, we introduce a framework for incorporating 2D and 3D-attention with two distinct 3D-ConvNets architectures, which are standard 3D-ConvNets (C3D) and inflated 3D-ConvNets (I3D). We evaluate this framework on four benchmark datasets; UCF101, and HMDB51 for action recognition and CAD and C-Sports for collective activity recognition. Experimental results show that the 3D attention-based ConvNets improves the performance on all datasets when compared to the architectures that do not leverage any attention mechanism. Our results also indicate that 3D attention mechanism yields higher recognition performance compared to its 2D attention counterpart.