A Spatio-temporal Transformer for 3D Human Motion Prediction

被引:115
|
作者
Aksan, Emre [1 ]
Kaufmann, Manuel [1 ]
Cao, Peng [2 ,3 ]
Hilliges, Otmar [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] MIT, Cambridge, MA 02139 USA
[3] Peking Univ, Beijing, Peoples R China
关键词
D O I
10.1109/3DV53792.2021.00066
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel Transformer-based architecture for the task of generative modelling of 3D human motion. Previous work commonly relies on RNN-based models considering shorter forecast horizons reaching a stationary and often implausible state quickly. Recent studies show that implicit temporal representations in the frequency domain are also effective in making predictions for a predetermined horizon. Our focus lies on learning spatio-temporal representations autoregressively and hence generation of plausible future developments over both short and long term. The proposed model learns high dimensional embeddings for skeletal joints and how to compose a temporally coherent pose via a decoupled temporal and spatial self-attention mechanism. Our dual attention concept allows the model to access current and past information directly and to capture both the structural and the temporal dependencies explicitly. We show empirically that this effectively learns the underlying motion dynamics and reduces error accumulation over time observed in auto-regressive models. Our model is able to make accurate short-term predictions and generate plausible motion sequences over long horizons. We make our code publicly available at https://github.com/eth-ait/motion-transformer.
引用
收藏
页码:565 / 574
页数:10
相关论文
共 50 条
  • [31] Spatio-Temporal Parallel Transformer Based Model for Traffic Prediction
    Kumar, Rahul
    Mendes-moreira, Joao
    Chandra, Joydeep
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (09)
  • [32] TrajectoryCNN: A New Spatio-Temporal Feature Learning Network for Human Motion Prediction
    Liu, Xiaoli
    Yin, Jianqin
    Liu, Jin
    Ding, Pengxiang
    Liu, Jun
    Liu, Huaping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) : 2133 - 2146
  • [33] KSOF: Leveraging kinematics and spatio-temporal optimal fusion for human motion prediction
    Ding, Rui
    Qu, Kehua
    Tang, Jin
    PATTERN RECOGNITION, 2025, 161
  • [34] Extracting motion velocities from 3D image sequences and coupled spatio-temporal smoothing
    Preusser, T
    Rumpf, M
    VISUALIZATION AND DATA ANALYSIS 2003, 2003, 5009 : 181 - 192
  • [35] Graph-enabled spatio-temporal transformer for ionospheric prediction
    Yu, Fengzheng
    Yuan, Hong
    Chen, Si
    Luo, Ruidan
    Luo, Hanze
    GPS SOLUTIONS, 2024, 28 (04)
  • [36] Spatio-Temporal Transformer with Clustering and Dilated Attention for Traffic Prediction
    Xu, Baowen
    Wang, Xuelei
    Liu, Chengbao
    Li, Shuo
    Li, Jingwei
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 1065 - 1071
  • [37] STEFT: Spatio-Temporal Embedding Fusion Transformer for Traffic Prediction
    Cui, Xiandai
    Lv, Hui
    ELECTRONICS, 2024, 13 (19)
  • [38] STAFFormer: Spatio-temporal adaptive fusion transformer for efficient 3D human pose estimation (vol 149, 105142, 2024)
    Hao, Feng
    Zhong, Fujin
    Wang, Yunhe
    Yu, Hong
    Hu, Jun
    Yang, Yan
    IMAGE AND VISION COMPUTING, 2024, 151
  • [39] View-invariant 3D Skeleton-based Human Activity Recognition based on Transformer and Spatio-temporal Features
    Snoun, Ahmed
    Bouchrika, Tahani
    Jemai, Olfa
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 706 - 715
  • [40] 3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention
    Tang, Zhenhua
    Qiu, Zhaofan
    Hao, Yanbin
    Hong, Richang
    Yao, Ting
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4790 - 4799