SIGNAL IMAGE AND VIDEO PROCESSING, 2022 (SCI-Expanded)
Localizing actions in instructional web videos is a complex problem due to background scenes that are unrelated to the task described in the video. Wrong prediction of the action step labels could be reduced by separating backgrounds from actions. Yet, discrimination of actions from backgrounds is challenging due to various styles for the same activity. In this study, we aim to improve the action localization results through learning the actionness of video clips to determine the possibility of a clip having an action. We present a method to learn an actionness score for each video clip to be used for post-processing baseline video clip to step label assignment scores. We propose to use auxiliary representation formed from baseline video to step label assignment scores to reinforce the discrimination of video clips. The experiments on CrossTask and COIN datasets show that our actionness score helps to improve the performance of action step localization and also action segmentation.