18th International Conference on Automatic Face and Gesture Recognition (FG), İstanbul, Türkiye, 27 - 31 Mayıs 2024
In this study, we present a novel methodology that enhances hand keypoint extraction in low-resolution sign language datasets, a challenge that has been largely unexplored in sign language research. By addressing the limitations of existing pose extraction models like OpenPose and MediaPipe, which frequently struggle with accurately detecting hand keypoints in low-resolution footage, our method marks a notable advancement in this specialized field. Our methodology adapts the U-Net and Attention U-Net architectures to improve the resolution of sign language videos while reducing undetected hand presence (UHP) in low-resolution footage. The key innovation focuses on hand movements through a progressive training procedure, utilizing datasets from SRF DSGS and ShowTV Main News domains. Through comprehensive experimentation and cross-dataset evaluations, our findings demonstrate a significant reduction in the UHP ratio, notably in the Attention U-Net model with our proposed loss function, tailored to enhance hand keypoints detection. In our benchmark tests, using low-resolution TV news broadcasts, our fine-tuned models, particularly the BWA-UNet, showed marked improvements in hand keypoint accuracy compared to standard upsampling methods. These results underscore the effectiveness of our approach in practical, real-world scenarios, highlighting its potential to substantially improve hand keypoint detection in sign language videos.