Hand and Face Segmentation with Deep Convolutional Networks using Limited Labelled Data

Sincan O. M., Gencoglu S., Bacak M., Keles H.

3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2019, Ankara, Türkiye, 11 - 13 Ekim 2019

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/ismsit.2019.8932835
Basıldığı Şehir: Ankara
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: CNN, deep learning, face segmentation, gesture segmentation, sign language, U-net, VGG
Hacettepe Üniversitesi Adresli: Hayır

Özet

© 2019 IEEE.Segmentation is a crucial step for many classification problems. There are many researchers that approach the problem using classical computer vision methods, recently deep learning approaches have been used more frequently in different domains. In this paper, we propose two segmentation networks that mark face and hands from static images for sign language recognition using only a few training data. Our networks have encoder-decoder structure that contains convolutional, max pooling and upsampling layers; the first one is a U-Net based network and the second one is a VGG-based network. We evaluate our models on two sign language datasets; the first one is our Ankara University Turkish Sign Language dataset (AU-TSL) and the second one is Montalbano Italian gesture dataset. Datasets contain background and illumination variations. Also, they are recorded with different signers. We train our models using only 400 images that we randomly selected from video frames. Our experiments show that even when we reduce the training data in half, we can still obtain satisfactory results. Proposed methods have achieved more than 98% precision using 400 frames with both datasets. Our code is available at https://github.com/au-cvml-lab/Hands-and-Face-Segmentation-With-Limited-Data.