Comparison of 2D and 3D attention mechanisms for human (collective) activity recognition

ZALLUHOĞLU, CEMİL; İKİZLER CİNBİŞ, NAZLI

doi:10.1007/s11760-021-02028-8

Comparison of 2D and 3D attention mechanisms for human (collective) activity recognition

ZALLUHOĞLU C., İKİZLER CİNBİŞ N.

SIGNAL IMAGE AND VIDEO PROCESSING, cilt.16, sa.4, ss.865-872, 2022 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 4
Basım Tarihi: 2022
Doi Numarası: 10.1007/s11760-021-02028-8
Dergi Adı: SIGNAL IMAGE AND VIDEO PROCESSING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, zbMATH
Sayfa Sayıları: ss.865-872
Anahtar Kelimeler: Collective activity recognition, Action recognition, Convolutional neural networks, Attention
Hacettepe Üniversitesi Adresli: Evet

Özet

In this study, we utilize attention mechanisms to leverage the spatio-temporal information available in videos for the action recognition and collective activity recognition tasks. In this context, we explore 2D and 3D attention mechanisms and investigate their effect on capturing the related action information. To this end, we introduce a framework for incorporating 2D and 3D-attention with two distinct 3D-ConvNets architectures, which are standard 3D-ConvNets (C3D) and inflated 3D-ConvNets (I3D). We evaluate this framework on four benchmark datasets; UCF101, and HMDB51 for action recognition and CAD and C-Sports for collective activity recognition. Experimental results show that the 3D attention-based ConvNets improves the performance on all datasets when compared to the architectures that do not leverage any attention mechanism. Our results also indicate that 3D attention mechanism yields higher recognition performance compared to its 2D attention counterpart.