Alignment of Image-Text and Video-Text Datasets Görüntü-Metin ve Video-Metin Veri Kümelerinin Hizalanmasi

Özköse Y. E., Gökçe Z., DUYGULU ŞAHİN P.

31st IEEE Conference on Signal Processing and Communications Applications, SIU 2023, İstanbul, Türkiye, 5 - 08 Temmuz 2023

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/siu59756.2023.10224043
Basıldığı Şehir: İstanbul
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: dataset alignment, deep learning, machine learning
Hacettepe Üniversitesi Adresli: Evet

Özet

In this study, the alignment of video-text and imagetext datasets is studied. Firstly, similarities are calculated over the texts in the two data sets. A retrieval setup with visual similarities is then applied to the subset which is created via calculated text similarities. A BERT-based embedding vector method is applied to the raw and pure texts. As a visual feature, object-based and CLIP-based methods are used to define video frames. According to the results, alignment with CLIP features achieves the best results in the subset created by filtering using raw text.