Learning from Synthetic Human Group Activities

Publication Date: 6/17/2024

Event: CVPR 2024

Reference: pp. 21922-21932, 2024

Authors: Che-Jui Chang, Rutgers University; Danrui Li, Rutgers University; Deep Patel, NEC Laboratories America, Inc.; Parth Goel, Rutgers University; Honglu Zhou, NEC Laboratories America, Inc.; Seonghyeon Moon, Rutgers University; Samuel S. Sohn, Rutgers University; Sejong Yoon, The College of New Jersey; Vladimir Pavlovic, Rutgers University; Mubbasir Kapadia, Rutgers University

Abstract: The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and group activities. Powered by Unity Engine, M3Act features multiple semantic groups, highly diverse and photorealistic images, and a comprehensive set of annotations, which facilitates the learning of human-centered tasks across single-person, multi-person, and multi-group conditions. We demonstrate the advantages of M3Act across three core experiments. The results suggest our synthetic dataset can significantly improve the performance of several downstream methods and replace real-world datasets to reduce cost. Notably, M3Act improves the state-of-the-art MOTRv2 on DanceTrack dataset, leading to a hop on the leaderboard from 10t?h to 2n?d place. Moreover, M3Act opens new research for controllable 3D group activity generation. We define multiple metrics and propose a competitive baseline for the novel task. Our code and data are available at our project page: http://cjerry1243.github.io/M3Act.

Publication Link: https://openaccess.thecvf.com/content/CVPR2024/papers/Chang_Learning_from_Synthetic_Human_Group_Activities_CVPR_2024_paper.pdf