27th Signal Processing and Communications Applications Conference (SIU), Sivas, Türkiye, 24 - 26 Nisan 2019
Sign language recognition systems are used to convert signs in video streams to text automatically. In this work, an original isolated sign language recognition model is created using Convolutional Neural Networks (CNNs), Feature Pooling Module and Long Short-Term Memory Networks (LSTMs). In the CNN part, a pre-trained VGG-16 model is used identically in two parallel architectures, after adapting its weights to the dataset; in this architecture, the features from color (RGB) and depth streams are extracted in parallel. The extracted features are directed to FPM to generate multi-scale features. The features matrices are reduced to representative feature vectors, using Global Average Pooling (GAP). The features that are obtained from RGB and depth streams are concatenated and passed to the LSTM architecture after instance normalization. We get 93.15% test accuracy on Montalbano Italian sign language dataset using the proposed model; this result is comparable with the recent state-of-the-art methods.