Automatic Data Augmentation for Cooking Videos


Ozkose Y. E., DUYGULU ŞAHİN P.

32nd IEEE Signal Processing and Communications Applications Conference (SIU), Mersin, Türkiye, 15 - 18 Mayıs 2024, (Tam Metin Bildiri) identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu61531.2024.10600845
  • Basıldığı Şehir: Mersin
  • Basıldığı Ülke: Türkiye
  • Hacettepe Üniversitesi Adresli: Evet

Özet

Text-video joint models that are trained on state-of-the-art text-video datasets perform well on general text-video datasets that contain general actions. However domain-specific datasets are hard to collect because of data collection and labeling costs. In this work, we propose a pipeline to automatically extend YouCook2 that is collected to define and retrieve cooking videos. Related videos are found with 89 recipe categories, then segments are prepared by random sampling, and segmenttext pairs are obtained by a captioning model. Experiments on extended dataset show that automatically created dataset increases recall results. Code and data are made available at https://github.com/EmreOzkose/hucook