Towards Better Communication: Refining Hand Pose Estimation in Low-Resolution Sign Language Videos


Tasyurek U. M., Kiziltepe T., YALIM KELEŞ H.

18th International Conference on Automatic Face and Gesture Recognition (FG), İstanbul, Türkiye, 27 - 31 Mayıs 2024 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/fg59268.2024.10582003
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Hacettepe Üniversitesi Adresli: Evet

Özet

In this study, we present a novel methodology that enhances hand keypoint extraction in low-resolution sign language datasets, a challenge that has been largely unexplored in sign language research. By addressing the limitations of existing pose extraction models like OpenPose and MediaPipe, which frequently struggle with accurately detecting hand keypoints in low-resolution footage, our method marks a notable advancement in this specialized field. Our methodology adapts the U-Net and Attention U-Net architectures to improve the resolution of sign language videos while reducing undetected hand presence (UHP) in low-resolution footage. The key innovation focuses on hand movements through a progressive training procedure, utilizing datasets from SRF DSGS and ShowTV Main News domains. Through comprehensive experimentation and cross-dataset evaluations, our findings demonstrate a significant reduction in the UHP ratio, notably in the Attention U-Net model with our proposed loss function, tailored to enhance hand keypoints detection. In our benchmark tests, using low-resolution TV news broadcasts, our fine-tuned models, particularly the BWA-UNet, showed marked improvements in hand keypoint accuracy compared to standard upsampling methods. These results underscore the effectiveness of our approach in practical, real-world scenarios, highlighting its potential to substantially improve hand keypoint detection in sign language videos.