A Spatio-temporal Transformer for 3D Human Motion Prediction

被引:115
|
作者
Aksan, Emre [1 ]
Kaufmann, Manuel [1 ]
Cao, Peng [2 ,3 ]
Hilliges, Otmar [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] MIT, Cambridge, MA 02139 USA
[3] Peking Univ, Beijing, Peoples R China
关键词
D O I
10.1109/3DV53792.2021.00066
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel Transformer-based architecture for the task of generative modelling of 3D human motion. Previous work commonly relies on RNN-based models considering shorter forecast horizons reaching a stationary and often implausible state quickly. Recent studies show that implicit temporal representations in the frequency domain are also effective in making predictions for a predetermined horizon. Our focus lies on learning spatio-temporal representations autoregressively and hence generation of plausible future developments over both short and long term. The proposed model learns high dimensional embeddings for skeletal joints and how to compose a temporally coherent pose via a decoupled temporal and spatial self-attention mechanism. Our dual attention concept allows the model to access current and past information directly and to capture both the structural and the temporal dependencies explicitly. We show empirically that this effectively learns the underlying motion dynamics and reduces error accumulation over time observed in auto-regressive models. Our model is able to make accurate short-term predictions and generate plausible motion sequences over long horizons. We make our code publicly available at https://github.com/eth-ait/motion-transformer.
引用
收藏
页码:565 / 574
页数:10
相关论文
共 50 条
  • [1] Toward Realistic 3D Human Motion Prediction With a Spatio-Temporal Cross- Transformer Approach
    Yu, Hua
    Fan, Xuanzhe
    Hou, Yaqing
    Pei, Wenbin
    Ge, Hongwei
    Yang, Xin
    Zhou, Dongsheng
    Zhang, Qiang
    Zhang, Mengjie
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5707 - 5720
  • [2] Spatio-Temporal Reconstruction for 3D Motion Recovery
    Yang, Jingyu
    Guo, Xin
    Li, Kun
    Wang, Meiyuan
    Lai, Yu-Kun
    Wu, Feng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (06) : 1583 - 1596
  • [3] 3D human action recognition using spatio-temporal motion templates
    Lv, FJ
    Nevatia, R
    Lee, MW
    COMPUTER VISION IN HUMAN-COMPUTER INTERACTION, PROCEEDINGS, 2005, 3766 : 120 - 130
  • [4] Augmenting Spatio-Temporal Human Motion Data for Effective 3D Action Recognition
    Sedmidubsky, Jan
    Zezula, Pavel
    2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, : 204 - 207
  • [5] STAFFormer: Spatio-temporal adaptive fusion transformer for efficient 3D human pose estimation
    Hao, Feng
    Zhong, Fujin
    Yu, Hong
    Hu, Jun
    Yang, Yan
    IMAGE AND VISION COMPUTING, 2024, 149
  • [6] Human Motion Prediction via Spatio-Temporal Inpainting
    Ruiz, A. Hernandez
    Gall, J.
    Moreno-Noguer, F.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7133 - 7142
  • [7] HR-STAN: High-Resolution Spatio-Temporal Attention Network for 3D Human Motion Prediction
    Medjaouri, Omar
    Desai, Kevin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2539 - 2548
  • [8] Spatio-temporal approach to shape and motion measurements of 3D objects
    Kujawinska, M
    Pawlowski, M
    LASER INTERFEROMETRY X: TECHNIQUES AND ANALYSIS AND APPLICATIONS, PTS A AND B, 2000, 4101 : 21 - 28
  • [9] 3D Gait Recognition Using Spatio-Temporal Motion Descriptors
    Kwolek, Bogdan
    Krzeszowski, Tomasz
    Michalczuk, Agnieszka
    Josinski, Henryk
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, 2014, 8398 : 595 - 604
  • [10] A 3D spatio-temporal motion estimation algorithm for video coding
    Lee, Gwo Giun
    Wang, Ming-Jiun
    Lin, He-Yuan
    Su, Drew Wei-Chi
    Lin, Bo-Yun
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 741 - +