Skip-Pose Vectors: Pose-based motion embedding using Encoder-Decoder models

被引:0
|
作者
Shirakawa, Yuta [1 ]
Kozakaya, Tatsuo [1 ]
机构
[1] Toshiba Co Ltd, Corp Res & Dev Ctr, Tokyo, Japan
关键词
D O I
10.23919/mva.2019.8757937
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a pose-based unsupervised embedding learning method for action recognition. To classify human action based on the similarity of motions, it is important to establish a good feature space such that similar motions are mapped to similar vector representations. On the other hand, learning a feature space with this property with a supervised approach requires huge training samples, tailored supervised keypoints, and action categories. Although the labeling cost of keypoints is decreasing day by day with improvement of 2D pose estimation methods, labeling video category is still problematic work due to the variety of categories, ambiguity and variations of videos. To avoid the need for such expensive category labeling, following the success of "Skip-Thought Vectors", an unsupervised approach to model the similarity of sentences, we apply its idea to contiguous pose sequences to learn feature representations for measuring motion similarities. Thanks to handling human action as 2D poses instead of images, the model size can be small and easy to handle, and we can augment the training data by projecting 3D motion capture data to 2D. Through evaluation on the JHMDB dataset, we explore various design choices, such as whether to handle the actions as a sequence of poses or as a sequence of images. Our approach leverages pose sequences from 3D motion capture and improves its performance as much as 61.6% on JHMDB.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Human action recognition using Pose-based discriminant embedding
    Saghafi, Behrouz
    Rajan, Deepu
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2012, 27 (01) : 96 - 111
  • [2] Skip-attention encoder-decoder framework for human motion prediction
    Zhang, Ruipeng
    Shu, Xiangbo
    Yan, Rui
    Zhang, Jiachao
    Song, Yan
    MULTIMEDIA SYSTEMS, 2022, 28 (02) : 413 - 422
  • [3] Classification of human actions using pose-based features and stacked auto encoder
    Ijjina, Earnest Paul
    Mohan, Krishna C.
    PATTERN RECOGNITION LETTERS, 2016, 83 : 268 - 277
  • [4] Pedestrian Trajectory Prediction in Heterogeneous Traffic Using Pose Keypoints-Based Convolutional Encoder-Decoder Network
    Chen, Kai
    Song, Xiao
    Ren, Xiaoxiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1764 - 1775
  • [5] Image Denoising Using a Deep Encoder-Decoder Network with Skip Connections
    Couturier, Raphael
    Perrot, Gilles
    Salomon, Michel
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT VI, 2018, 11306 : 554 - 565
  • [6] Analysis on Norms of Word Embedding and Hidden Vectors in Neural Conversational Model Based on Encoder-Decoder RNN
    Tomioka, Manaya
    Kato, Tsuneo
    Tamura, Akihiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (10) : 1780 - 1789
  • [7] Modeling Electrical Motor Dynamics Using Encoder-Decoder with Recurrent Skip Connection
    Verma, Sagar
    Henwood, Nicolas
    Castella, Marc
    Malrait, Francois
    Pesquet, Jean-Christophe
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1387 - 1394
  • [8] Alpha matting for portraits using encoder-decoder models
    Akshat Srivastava
    Srivatsav Raghu
    Abitha K Thyagarajan
    Jayasri Vaidyaraman
    Mohanaprasad Kothandaraman
    Pavan Sudheendra
    Avinav Goel
    Multimedia Tools and Applications, 2022, 81 : 14517 - 14528
  • [9] Alpha matting for portraits using encoder-decoder models
    Srivastava, Akshat
    Raghu, Srivatsav
    Thyagarajan, Abitha K.
    Vaidyaraman, Jayasri
    Kothandaraman, Mohanaprasad
    Sudheendra, Pavan
    Goel, Avinav
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (10) : 14517 - 14528
  • [10] VisCode: Embedding Information in Visualization Images using Encoder-Decoder Network
    Zhang, Peiying
    Li, Chenhui
    Wang, Changbo
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 326 - 336