[Audio] Hi everyone, I'm Siyu Mo. I'm currently a master student from Beijing Sport University. Today, I would like to introduce our research: TaiChi action capture and performance analysis with Multi-view RGB Cameras..
[Audio] Nowdays, vision-based action quality assessment in sports motion scoring and intelligent sports training falls short of current application demands. And with the devolepment of Deep Learning, which needs a lot of datasets to train a good model, there is still a lack of 3D action datasets for sports scoring or quality evaluation. Furthermore, existing studies focus more on the recognitions of regular actions or assessments of competitive sports, with limited focus on intricate sport like TaiChi. To address this, we created a professional TaiChi dataset. We also developed an effective 3D human modelling framework and a normalized modeling method for skeleton sequences based on motion transfer..
[Audio] The whole system we proposed consists of 5 components, including Motion Capture, Multi Camera Calibratioin, 3D Skeleton Fusioin, 3D Surface Reconstruction and Performance Analysis..
[Audio] First, before motion capture, we designed a multi-camera calibration tool by the 2D planar checkerboard calibration method, which enables the system to be calibrated to obtain the internal reference matrix of each camera and the pose relationship between multi cameras. Based on the camera projection model, we constructed a minimization objective function with re-projection error to solve the camera parameters. And bundle adjustment is used for optimization..
[Audio] Then we capture TaiChi motion with our multi -camera system. For the TaiChi dataset, here is its organization. It contains 23,232 action samples that captured by 32 RGB cameras from 32 different views and a RGB-D camera simultaneously. During TaiChi data acquisition, 11 subjects have performed the 24-form TaiChi actions..
[Audio] To address the occlusion problem, we make 3D skeleton fusion by direct linear transformation algorithm. Here are the 2D images rendered after 3D skeleton fusion. And the formula below shows the main calculation process..
[Audio] Considering the excellent modeling and rendering capabilities of neural radiance fields, we realize 3D human surface reconstruction by joint using the traditional Colmap and deep-learning based Instant NeRF..
[Audio] TaiChi Performance Analysis has two parts: Data Precision Analysis and Performance Analysis..
[Audio] To confirm our multi-camera system's accuracy against an IMU-based motion capture system, we synchronized experiments using both systems. A subject wore IMU equipment and executed six limb exercises at 30 fps and 60 fps frame rates. Discrepancies in joint positions from the two Mocap systems were evident in images A and B, depicting sensor positions and extracted skeleton joints. By analyzing coordinate data from shoulder, elbow, hip, and knee joints, we compared Mocap data. Image C displays the Mean Squared Error (MSE) between the systems, revealing average angle errors of 6.18° and 6.81 cm in 30 fps, and 6.68° and 6.41 cm in 60 fps. These deviations stem from differing joint positions captured by the two Mocap systems..
[Audio] For TaiChi performance analysis, we implement the action transfer without using any paired data for supervision. The transfer network encodes the input skeleton to three orthogonal invariance properties including structure, motion and view-angle. Then they are decoded to targeted skeletons, which are used to evaluate trajectory and angle changes of mentioned four joints..
[Audio] We can see the comparision before and after motion transfer..
[Audio] This is the visual analysis results of re-targeted skeletons from two students compared with the coach. Which shows that the TaiChi motion of student 1 is more standard..
[Audio] We believe that our work can help improve training and performance for TaiChi practitioners. In the future, we will research more accurate and robust analysis methods with data from less camera views. Thank you for your attention..