Action Recognition and Localization by Hierarchical Space-Time Segments


Ma S., Zhang J., Ikizler-Cinbis N., Sclaroff S.

IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1 - 08 December 2013, pp.2744-2751 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/iccv.2013.341
  • City: Sydney
  • Country: Australia
  • Page Numbers: pp.2744-2751

Abstract

We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.