Using independently recurrent networks for reinforcement learning based unsupervised video summarization

Yaliniz, Gokhan; İKİZLER CİNBİŞ, NAZLI

doi:10.1007/s11042-020-10293-x

Using independently recurrent networks for reinforcement learning based unsupervised video summarization

Atıf İçin Kopyala

Yaliniz G., İKİZLER CİNBİŞ N.

MULTIMEDIA TOOLS AND APPLICATIONS, cilt.80, sa.12, ss.17827-17847, 2021 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 80 Sayı: 12
Basım Tarihi: 2021
Doi Numarası: 10.1007/s11042-020-10293-x
Dergi Adı: MULTIMEDIA TOOLS AND APPLICATIONS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, FRANCIS, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
Sayfa Sayıları: ss.17827-17847
Anahtar Kelimeler: Video summarization, Recurrent neural networks, Reinforcement learning, Unsupervised learning, KEYFRAME SELECTION
Hacettepe Üniversitesi Adresli: Evet

Özet

Sigmoid and hyperbolic activation functions in long short-term memory (LSTM) and gated recurrent unit (GRU) based models used in recent studies on video summarization, may cause gradient decay over layers. Moreover, interpreting and developing network models are difficult because of entanglement of neurons on recurrent neural network (RNN). To solve these issues, in this study, we propose a method that uses deep reinforcement learning together with independently recurrent neural networks (IndRNN) for unsupervised video summarization. In this method, Leaky Rectified Linear Unit (Leaky ReLU) is used as an activation function to deal with decaying gradient and dying neuron problems. The model, which does not rely on any labels or user interaction, is designed with a reward function that jointly accounts for uniformity, diversity and representativeness of generated summaries. In this way, our model can create summaries as uniform as possible, has more layers and can be trained with more steps without having any problem related to gradients. Based on the experiments conducted on two benchmark datasets, we observe that, compared to the state-of-the-art methods on video summarization task, better summarization performance can be obtained.