NOVAction23: Addressing the data diversity gap by uniquely generated synthetic sequences for real-world human action recognition

Tasoren, Ali; ÇELİKCAN, UFUK

doi:10.1016/j.cag.2023.10.011

NOVAction23: Addressing the data diversity gap by uniquely generated synthetic sequences for real-world human action recognition

Atıf İçin Kopyala

Tasoren A. E., ÇELİKCAN U.

Computers and Graphics (Pergamon), cilt.118, ss.1-10, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 118
Basım Tarihi: 2024
Doi Numarası: 10.1016/j.cag.2023.10.011
Dergi Adı: Computers and Graphics (Pergamon)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, Civil Engineering Abstracts
Sayfa Sayıları: ss.1-10
Anahtar Kelimeler: Data diversity gap, Human action recognition, Procedural generation, Synthetic data
Hacettepe Üniversitesi Adresli: Evet

Özet

Recognition of human actions using machine learning requires extensive datasets to develop robust models. Nevertheless, obtaining real-world data presents challenges due to the costly and time-consuming process involved. Additionally, existing datasets mostly contain indoor videos due to the challenges of capturing pose data outdoors. Synthetic data have been used to overcome these difficulties, yet the currently available synthetic datasets for human action recognition lack photorealism and diversity in their features. Addressing these shortcomings, we develop the NOVAction engine to generate highly diversified and photorealistic synthetic human action sequences. We use NOVAction to create the NOVAction23 dataset comprising 25,415 human action sequences with corresponding poses and labels (available at https://github.com/celikcan-cglab/NOVAction23). In NOVAction23, the performed motions and viewpoints are varied on the fly through procedural generation, to ensure that, for a given action class, each generated sequence features a distinct motion performed by one of the 1,105 synthetic humans captured from a unique viewpoint. Moreover, each synthetic human is unique in terms of body shape (height and weight), skin tone, gender, hair, facial hair, clothing, shoes and accessories. To further increase data diversity, the motion sequences are rendered under various weather conditions and at different times of day, across three outdoor and two indoor settings. We evaluate NOVAction23 by training three state-of-the-art recognizers on it, in addition to the NTU 120 dataset, and corroborating using real-world videos from YouTube. Our results confirm that the NOVAction23 dataset can improve the performance of state-of-the-art human action recognition.