Intelligent Systems Conference (IntelliSys), London, Kanada, 6 - 07 Eylül 2018, cilt.868, ss.1102-1115
Action classification from video streams is a challenging problem, especially when there is a limited number of training data for different actions. Recent developments in deep learning based methods enabled high classification accuracies for many problems in different domains, yet they still perform poorly when the dataset is small. In this work, we examined the performances of Hidden Markov Models (HMM) and long short-term memory (LSTM) based recurrent neural network models using the same sequence classification framework with the well known KTH action dataset. KTH contains limited examples for training, hence challenges the deep learning based techniques even when transfer learning is applied in feature extraction. Our experiments depict that using a pre-trained convolutional network, i.e. SqueezeNet, and fine-tuning for feature extraction; HMM performs better in sequence modeling than an LSTM based model. Using the same feature extraction approach, i.e. fine-tuned SqueezeNet, we obtained 99.30% accuracy with an HMM, which is the best classification accuracy that is reported so far with this dataset; yet 81.92% accuracy with the best performing LSTM configuration.