Sign Language Video Synthesis using Skeleton Sequence Skelet Sekans ile aret Dili Videosu Sentezleme


Gencoglu S., Keles H.

28th Signal Processing and Communications Applications Conference, SIU 2020, Gaziantep, Turkey, 5 - 07 October 2020 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/siu49456.2020.9302436
  • City: Gaziantep
  • Country: Turkey
  • Keywords: conditional generative adversarial networks, convolutional neural networks, Generative adversarial networks, video to video synthesis.
  • Hacettepe University Affiliated: No

Abstract

© 2020 IEEE.Generative Adversarial Networks (GANs) enable generating realistic synthetic images. However, majority of the research in this domain focus on image-to-image synthesis problem. The aim of this study is to develop a model that encodes high quality video frames, with true motion dynamics, using only a reference image frame and a skeleton sequence. In this context, Ankara University Turkish Sign Language dataset is used to synthesize new sign videos using a given signer frame as a reference and a skeleton stream. To solve this challenging problem, a conditional generative adversarial network (GAN) is designed, where skeletal data is used as a condition. Using the trained model, we are able to generate sign video streams with the given signer, where the motion dynamics are successfully and fluently encoded in the video. Moreover, we evaluated the quality of the generated images using Fr chet Inception Distance (FID) metric; the FID score is 26.