Comparison of 2D and 3D attention mechanisms for human (collective) activity recognition


ZALLUHOĞLU C., İKİZLER CİNBİŞ N.

SIGNAL IMAGE AND VIDEO PROCESSING, cilt.16, sa.4, ss.865-872, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 16 Sayı: 4
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1007/s11760-021-02028-8
  • Dergi Adı: SIGNAL IMAGE AND VIDEO PROCESSING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
  • Sayfa Sayıları: ss.865-872
  • Anahtar Kelimeler: Collective activity recognition, Action recognition, Convolutional neural networks, Attention
  • Hacettepe Üniversitesi Adresli: Evet

Özet

In this study, we utilize attention mechanisms to leverage the spatio-temporal information available in videos for the action recognition and collective activity recognition tasks. In this context, we explore 2D and 3D attention mechanisms and investigate their effect on capturing the related action information. To this end, we introduce a framework for incorporating 2D and 3D-attention with two distinct 3D-ConvNets architectures, which are standard 3D-ConvNets (C3D) and inflated 3D-ConvNets (I3D). We evaluate this framework on four benchmark datasets; UCF101, and HMDB51 for action recognition and CAD and C-Sports for collective activity recognition. Experimental results show that the 3D attention-based ConvNets improves the performance on all datasets when compared to the architectures that do not leverage any attention mechanism. Our results also indicate that 3D attention mechanism yields higher recognition performance compared to its 2D attention counterpart.