Skip-Pose Vectors: Pose-based motion embedding using Encoder-Decoder models

被引:0
|
作者
Shirakawa, Yuta [1 ]
Kozakaya, Tatsuo [1 ]
机构
[1] Toshiba Co Ltd, Corp Res & Dev Ctr, Tokyo, Japan
关键词
D O I
10.23919/mva.2019.8757937
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a pose-based unsupervised embedding learning method for action recognition. To classify human action based on the similarity of motions, it is important to establish a good feature space such that similar motions are mapped to similar vector representations. On the other hand, learning a feature space with this property with a supervised approach requires huge training samples, tailored supervised keypoints, and action categories. Although the labeling cost of keypoints is decreasing day by day with improvement of 2D pose estimation methods, labeling video category is still problematic work due to the variety of categories, ambiguity and variations of videos. To avoid the need for such expensive category labeling, following the success of "Skip-Thought Vectors", an unsupervised approach to model the similarity of sentences, we apply its idea to contiguous pose sequences to learn feature representations for measuring motion similarities. Thanks to handling human action as 2D poses instead of images, the model size can be small and easy to handle, and we can augment the training data by projecting 3D motion capture data to 2D. Through evaluation on the JHMDB dataset, we explore various design choices, such as whether to handle the actions as a sequence of poses or as a sequence of images. Our approach leverages pose sequences from 3D motion capture and improves its performance as much as 61.6% on JHMDB.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] A Spatiotemporal Motion Variation Features Extraction Approach for Human Tracking and Pose-based Action Recognition
    Jalal, Ahmad
    Kamal, Shaharyar
    Farooq, Adnan
    Kim, Daijin
    2015 4TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION ICIEV 15, 2015,
  • [32] Pose-Based Tactile Servoing: Controlled Soft Touch Using Deep Learning
    Lepora, Nathan F.
    Lloyd, John
    IEEE ROBOTICS & AUTOMATION MAGAZINE, 2021, 28 (04) : 43 - 55
  • [33] A Comparison of Feature and Pose-Based Mapping using Vision, Inertial and GPS on a UAV
    Bryson, Mitch
    Sukkarieh, Salah
    2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011, : 4256 - 4262
  • [34] Dynamic energy system modeling using hybrid physics-based and machine encoder-decoder models
    Machalek, Derek
    Tuttle, Jake
    Andersson, Klas
    Powell, Kody M.
    ENERGY AND AI, 2022, 9
  • [35] Using Neural Encoder-Decoder Models With Continuous Outputs for Remote Sensing Image Captioning
    Ramos, Rita
    Martins, Bruno
    IEEE ACCESS, 2022, 10 : 24852 - 24863
  • [36] TrEnD: A transformer-based encoder-decoder model with adaptive patch embedding for mass segmentation in mammograms
    Liu, Dongdong
    Wu, Bo
    Li, Changbo
    Sun, Zheng
    Zhang, Nan
    MEDICAL PHYSICS, 2023, 50 (05) : 2884 - 2899
  • [37] Study on Image Super-Resolution with Transformer-Based Encoder-Decoder Models
    Wang, Qing-You
    Lin, Yih-Lon
    2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 213 - 214
  • [38] OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages
    Selvaraj, Prem
    Gokul, N. C.
    Kumar, Pratyush
    Khapra, Mitesh
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2114 - 2133
  • [39] Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates
    Singh, Abhayjeet
    Illa, Aravind
    Ghosh, Prasanta Kumar
    INTERSPEECH 2020, 2020, : 2907 - 2911
  • [40] Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks
    Cho, Kyunghyun
    Courville, Aaron
    Bengio, Yoshua
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1875 - 1886