Alignment of Image-Text and Video-Text Datasets Görüntü-Metin ve Video-Metin Veri Kümelerinin Hizalanmasi


Özköse Y. E., Gökçe Z., DUYGULU ŞAHİN P.

31st IEEE Conference on Signal Processing and Communications Applications, SIU 2023, İstanbul, Türkiye, 5 - 08 Temmuz 2023 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu59756.2023.10224043
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: dataset alignment, deep learning, machine learning
  • Hacettepe Üniversitesi Adresli: Evet

Özet

In this study, the alignment of video-text and imagetext datasets is studied. Firstly, similarities are calculated over the texts in the two data sets. A retrieval setup with visual similarities is then applied to the subset which is created via calculated text similarities. A BERT-based embedding vector method is applied to the raw and pure texts. As a visual feature, object-based and CLIP-based methods are used to define video frames. According to the results, alignment with CLIP features achieves the best results in the subset created by filtering using raw text.