Pose Tracking refers to the process of determining the spatial position and orientation of an object or entity in a given environment over time. This involves tracking the object’s movements, rotations, and changes in position relative to a reference point or coordinate system. Pose tracking has applications in various fields, including computer vision, robotics, augmented reality, and human-computer interaction.

Posts

Learning from Synthetic Human Group Activities

The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and group activities. Powered by Unity Engine, M3Act features multiple semantic groups, highly diverse and photorealistic images, and a comprehensive set of annotations, which facilitates the learning of human-centered tasks across single-person, multi-person, and multi-group conditions. We demonstrate the advantages of M3Act across three core experiments. The results suggest our synthetic dataset can significantly improve the performance of several downstream methods and replace real-world datasets to reduce cost. Notably, M3Act improves the state-of-the-art MOTRv2 on DanceTrack dataset, leading to a hop on the leaderboard from 10t?h to 2n?d place. Moreover, M3Act opens new research for controllable 3D group activity generation. We define multiple metrics and propose a competitive baseline for the novel task. Our code and data are available at our project page: http://cjerry1243.github.io/M3Act.

15 Keypoints Is All You Need

Pose-tracking is an important problem that requires identifying unique human pose-instances and matching them temporally across different frames in a video. However, existing pose-tracking methods are unable to accurately model temporal relationships and require significant computation, often computing the tracks offline. We present an efficient multi-person pose-tracking method, KeyTrack that only relies on keypoint information without using any RGB or optical flow to locate and track human keypoints in real-time. KeyTrack is a top-down approach that learns spatio-temporal pose relationships by modeling the multi-person pose-tracking problem as a novel Pose Entailment task using a Transformer-based architecture. Furthermore, KeyTrack uses a novel, parameter-free, keypoint refinement technique that improves the keypoint estimates used by the Transformers. We achieved state-of-the-art results on PoseTrack’17 and PoseTrack’18 benchmarks while using only a fraction of the computation used by most other methods for computing the tracking information.