Skip-Pose Vectors: Pose-based motion embedding using Encoder-Decoder models

被引：0

作者：

Shirakawa, Yuta ^{[1
]}

Kozakaya, Tatsuo ^{[1
]}

机构：

[1] Toshiba Co Ltd, Corp Res & Dev Ctr, Tokyo, Japan

来源：

PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA) | 2019年

关键词：

D O I：

10.23919/mva.2019.8757937

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a pose-based unsupervised embedding learning method for action recognition. To classify human action based on the similarity of motions, it is important to establish a good feature space such that similar motions are mapped to similar vector representations. On the other hand, learning a feature space with this property with a supervised approach requires huge training samples, tailored supervised keypoints, and action categories. Although the labeling cost of keypoints is decreasing day by day with improvement of 2D pose estimation methods, labeling video category is still problematic work due to the variety of categories, ambiguity and variations of videos. To avoid the need for such expensive category labeling, following the success of "Skip-Thought Vectors", an unsupervised approach to model the similarity of sentences, we apply its idea to contiguous pose sequences to learn feature representations for measuring motion similarities. Thanks to handling human action as 2D poses instead of images, the model size can be small and easy to handle, and we can augment the training data by projecting 3D motion capture data to 2D. Through evaluation on the JHMDB dataset, we explore various design choices, such as whether to handle the actions as a sequence of poses or as a sequence of images. Our approach leverages pose sequences from 3D motion capture and improves its performance as much as 61.6% on JHMDB.

引用

页数：6

共 50 条

[31] A Spatiotemporal Motion Variation Features Extraction Approach for Human Tracking and Pose-based Action Recognition
Jalal, Ahmad
Kamal, Shaharyar
Farooq, Adnan
Kim, Daijin
2015 4TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION ICIEV 15, 2015,
[32] Pose-Based Tactile Servoing: Controlled Soft Touch Using Deep Learning
Lepora, Nathan F.
Lloyd, John
IEEE ROBOTICS & AUTOMATION MAGAZINE, 2021, 28 (04) : 43 - 55
[33] A Comparison of Feature and Pose-Based Mapping using Vision, Inertial and GPS on a UAV
Bryson, Mitch
Sukkarieh, Salah
2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011, : 4256 - 4262
[34] Dynamic energy system modeling using hybrid physics-based and machine encoder-decoder models
Machalek, Derek
Tuttle, Jake
Andersson, Klas
Powell, Kody M.
ENERGY AND AI, 2022, 9
[35] Using Neural Encoder-Decoder Models With Continuous Outputs for Remote Sensing Image Captioning
Ramos, Rita
Martins, Bruno
IEEE ACCESS, 2022, 10 : 24852 - 24863
[36] TrEnD: A transformer-based encoder-decoder model with adaptive patch embedding for mass segmentation in mammograms
Liu, Dongdong
Wu, Bo
Li, Changbo
Sun, Zheng
Zhang, Nan
MEDICAL PHYSICS, 2023, 50 (05) : 2884 - 2899
[37] Study on Image Super-Resolution with Transformer-Based Encoder-Decoder Models
Wang, Qing-You
Lin, Yih-Lon
2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 213 - 214
[38] OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages
Selvaraj, Prem
Gokul, N. C.
Kumar, Pratyush
Khapra, Mitesh
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2114 - 2133
[39] Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates
Singh, Abhayjeet
Illa, Aravind
Ghosh, Prasanta Kumar
INTERSPEECH 2020, 2020, : 2907 - 2911
[40] Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks
Cho, Kyunghyun
Courville, Aaron
Bengio, Yoshua
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1875 - 1886

← 1 2 3 4 5 →