Isolated Sign Language Recognition with Multi-scale Features using LSTM

Mercanoglu Sincan O., Tur A. O. , Yalim Keles H.

27th Signal Processing and Communications Applications Conference (SIU), Sivas, Turkey, 24 - 26 April 2019 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/siu.2019.8806467
  • City: Sivas
  • Country: Turkey
  • Keywords: convolutional neural networks, long short-term memory, feature pooling module, sign language recognition


Sign language recognition systems are used to convert signs in video streams to text automatically. In this work, an original isolated sign language recognition model is created using Convolutional Neural Networks (CNNs), Feature Pooling Module and Long Short-Term Memory Networks (LSTMs). In the CNN part, a pre-trained VGG-16 model is used identically in two parallel architectures, after adapting its weights to the dataset; in this architecture, the features from color (RGB) and depth streams are extracted in parallel. The extracted features are directed to FPM to generate multi-scale features. The features matrices are reduced to representative feature vectors, using Global Average Pooling (GAP). The features that are obtained from RGB and depth streams are concatenated and passed to the LSTM architecture after instance normalization. We get 93.15% test accuracy on Montalbano Italian sign language dataset using the proposed model; this result is comparable with the recent state-of-the-art methods.