A Body Part Embedding Model with Datasets for Measuring 2D Human Motion Similarity
Jonghyuk Park† Sukhyun Cho† Dongwoo Kim‡ Oleksandr Bailo‡ Heewoong Park† Sanghoon Hong‡ Jonghun Park†
Seoul National University† Kakao Brain‡
in IEEE Access
Paper | Code | BibTeX
SARA Dataset
NTU RGB+D 120 Similiarity Annotations
Abstract
Abstract
Human motion similarity is practiced in many fields, including action recognition, anomaly detection, and human performance evaluation. While many computer vision tasks have benefited from deep learning, measuring motion similarity has attracted less attention, particularly due to the lack of large datasets. To address this problem, we introduce two datasets: a synthetic motion dataset for model training and a dataset containing human annotations of real-world video clip pairs for motion similarity evaluation. Furthermore, in order to compute the motion similarity from these datasets, we propose a deep learning model that produces motion embeddings suitable for measuring the similarity between different motions of each human body part. The network is trained with the proposed motion variation loss to robustly distinguish even subtly different motions. The proposed approach outperforms the other baselines considered in terms of correlations between motion similarity predictions and human annotations while being suitable for real-time action analysis. Both datasets and codes will be released to the public.
Method
Method
First, we separate the motion attribute from the skeleton and view by body part.

Then, simliarity is calculated only with motion embedding.

See paper for more details.
datasets
Datasets
We created two datasets for learning and evaluating motion similarity.
- For model training, a motion sequence was generated using the Adobe Mixamo framework. (dataset link)
- For model evaluation, similarity annotations between motion video pairs of NTU RGB+D 120 were collected. (dataset link)
application
Application
Here are examples of measuring the similarity between two dances!
![]() |
![]() |
![]() |
---|---|---|
![]() |
![]() |
![]() |
acknowledgements
Acknowledgments
- This work was supported by Kakao and Kakao Brain corporations.
- Model implementation code borrows heavily from 2D-Motion-Retargeting.
- Portions of the research used the NTU RGB+D 120 Action Recognition Dataset made available by the ROSE Lab at the Nanyang Technological University, Singapore.