Few-shot Audio Classification using Contrastive Training


Cigdem E. F., YALIM KELEŞ H.

32nd IEEE Signal Processing and Communications Applications Conference (SIU), Mersin, Türkiye, 15 - 18 Mayıs 2024 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/siu61531.2024.10600788
  • Basıldığı Şehir: Mersin
  • Basıldığı Ülke: Türkiye
  • Hacettepe Üniversitesi Adresli: Evet

Özet

In this study, the issue of few-shot audio classification in scenarios with limited labeled data is addressed, and experiments conducted on the GSC and ESC-50 audio datasets are presented. The study examines three experimental setups structured around training scenarios with 5, 10, and 15 samples. In all these experiments, accuracy values obtained using 1 and 5 audio recordings per instance for 5-class situations are compared. Trainings were conducted with three different loss optimizations, and the effects of simple feature transformations on classification performance for each training were also assessed. The findings indicate that these feature transformations enhance classification accuracy in both datasets. Notably, the hybrid approach, which combines simultaneous contrastive loss with few sample cross-entropy loss, achieved the highest classification performance in the fine-tuned scenario. In this context, tests conducted with 5 samples for 5 classes yielded success rates ranging between 86% and 91% in ESC-50 dataset, 91% and 95% in GSC dataset, depending on the number of samples used in training.