Using Motion History Images with 3D Convolutional Networks in Isolated Sign Language Recognition


Creative Commons License

Mercanoglu Sincan O., Keles H.

IEEE Access, vol.10, pp.18608-18618, 2022 (SCI-Expanded, Scopus) identifier identifier

  • Publication Type: Article / Article
  • Volume: 10
  • Publication Date: 2022
  • Doi Number: 10.1109/access.2022.3151362
  • Journal Name: IEEE Access
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Page Numbers: pp.18608-18618
  • Keywords: 3D-CNN, attention, deep learning, motion history image, sign language recognition
  • Open Archive Collection: AVESIS Open Access Collection
  • Hacettepe University Affiliated: Yes

Abstract

© 2013 IEEE.Sign language recognition using computational models is a challenging problem that requires simultaneous spatio-temporal modeling of the multiple sources, i.e. faces, hands, body, etc. In this paper, we propose an isolated sign language recognition model based on a model trained using Motion History Images (MHI) that are generated from RGB video frames. RGB-MHI images represent spatio-temporal summary of each sign video effectively in a single RGB image. We propose two different approaches using this RGB-MHI model. In the first approach, we use the RGB-MHI model as a motion-based spatial attention module integrated into a 3D-CNN architecture. In the second approach, we use RGB-MHI model features directly with the features of a 3D-CNN model using a late fusion technique. We perform extensive experiments on two recently released large-scale isolated sign language datasets, namely AUTSL and BosphorusSign22k. Our experiments show that our models, which use only RGB data, can compete with the state-of-the-art models in the literature that use multi-modal data.