IMAGE AND VISION COMPUTING, cilt.94, 2020 (SCI-Expanded)
Collective activity recognition is an important subtask of human action recognition, where the existing datasets are mostly limited. In this paper, we look into this issue and introduce the Collective Sports (C-Sports) dataset, which is a novel benchmark dataset for multi-task recognition of both collective activity and sports categories. Various state-of-the-art techniques are evaluated on this dataset, together with multi-task variants which demonstrate increased performance. From the experimental results, we can say that while sports categories of the videos are inferred accurately, there is still room for improvement for collective activity recognition, especially regarding the generalization ability beyond previously unseen sports categories. In order to evaluate this ability, we introduce a novel evaluation protocol called unseen sports, where the training and test are carried out on disjoint sets of sports categories. The relatively lower recognition performances in this evaluation protocol indicate that the recognition models tend to be influenced by the surrounding context, rather than focusing on the essence of the collective activities. We believe that C-Sports dataset will stir further interest in this research direction. (C) 2020 Elsevier B.V. All rights reserved.