Pattern Recognition, cilt.151, 2024 (SCI-Expanded)
There are over 150 sign languages worldwide, each with numerous local variants and thousands of signs. However, collecting annotated data for each sign language to train a model is a laborious and expert-dependent task. To address this issue, this paper introduces the problem of few-shot sign language recognition (FSSLR) in a cross-lingual setting. The central motivation is to be able to recognize a novel sign, even if it belongs to a sign language unseen during training, based on a small set of examples. To tackle this problem, we propose a novel embedding-based framework that first extracts a spatio-temporal visual representation based on video and hand features, as well as hand landmark estimates. To establish a comprehensive test bed, we propose three meta-learning FSSLR benchmarks that span multiple languages, and extensively evaluate the proposed framework. The experimental results demonstrate the effectiveness and superiority of the proposed approach for few-shot sign language recognition in both monolingual and cross-lingual settings.