Using independently recurrent networks for reinforcement learning based unsupervised video summarization


MULTIMEDIA TOOLS AND APPLICATIONS, 2021 (SCI İndekslerine Giren Dergi) identifier identifier


Sigmoid and hyperbolic activation functions in long short-term memory (LSTM) and gated recurrent unit (GRU) based models used in recent studies on video summarization, may cause gradient decay over layers. Moreover, interpreting and developing network models are difficult because of entanglement of neurons on recurrent neural network (RNN). To solve these issues, in this study, we propose a method that uses deep reinforcement learning together with independently recurrent neural networks (IndRNN) for unsupervised video summarization. In this method, Leaky Rectified Linear Unit (Leaky ReLU) is used as an activation function to deal with decaying gradient and dying neuron problems. The model, which does not rely on any labels or user interaction, is designed with a reward function that jointly accounts for uniformity, diversity and representativeness of generated summaries. In this way, our model can create summaries as uniform as possible, has more layers and can be trained with more steps without having any problem related to gradients. Based on the experiments conducted on two benchmark datasets, we observe that, compared to the state-of-the-art methods on video summarization task, better summarization performance can be obtained.