MSVD-Turkish: A Large-Scale Dataset for Video Captioning in Turkish

Citamak B., Kuyu M., Erdem A., Erdem E.

27th Signal Processing and Communications Applications Conference (SIU), Sivas, Türkiye, 24 - 26 Nisan 2019, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/siu.2019.8806555
Basıldığı Şehir: Sivas
Basıldığı Ülke: Türkiye
Hacettepe Üniversitesi Adresli: Evet

Özet

Automatically generating natural language descriptions for videos, aka video captioning, has been recently introduced as a challenging integrated vision and language problem. Although researchers have demonstrated numerous solutions for English, to date there has been no study on Turkish language due to the lack of suitable datasets to train Turkish video captioning models. To tackle this, in this study we construct a largescale Turkish benchmark dataset by carefully translating English descriptions from MSVD dataset to Turkish. Moreover, we implement several neural models, including LSTM-based sequence-to-sequence architectures with temporal attention mechanisms, and report the performances of these strong baselines on our dataset. We hope that our dataset will serve as a good resource for future efforts on Turkish video captioning.