A Spatio-temporal Transformer for 3D Human Motion Prediction

被引:115
|
作者
Aksan, Emre [1 ]
Kaufmann, Manuel [1 ]
Cao, Peng [2 ,3 ]
Hilliges, Otmar [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] MIT, Cambridge, MA 02139 USA
[3] Peking Univ, Beijing, Peoples R China
关键词
D O I
10.1109/3DV53792.2021.00066
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel Transformer-based architecture for the task of generative modelling of 3D human motion. Previous work commonly relies on RNN-based models considering shorter forecast horizons reaching a stationary and often implausible state quickly. Recent studies show that implicit temporal representations in the frequency domain are also effective in making predictions for a predetermined horizon. Our focus lies on learning spatio-temporal representations autoregressively and hence generation of plausible future developments over both short and long term. The proposed model learns high dimensional embeddings for skeletal joints and how to compose a temporally coherent pose via a decoupled temporal and spatial self-attention mechanism. Our dual attention concept allows the model to access current and past information directly and to capture both the structural and the temporal dependencies explicitly. We show empirically that this effectively learns the underlying motion dynamics and reduces error accumulation over time observed in auto-regressive models. Our model is able to make accurate short-term predictions and generate plausible motion sequences over long horizons. We make our code publicly available at https://github.com/eth-ait/motion-transformer.
引用
收藏
页码:565 / 574
页数:10
相关论文
共 50 条
  • [41] A fused convolutional spatio-temporal progressive approach for 3D human pose estimation
    Zhang, Hehao
    Hu, Zhengping
    Sun, Zhe
    Zhao, Mengyao
    Bi, Shuai
    Di, Jirui
    VISUAL COMPUTER, 2024, 40 (06): : 4387 - 4399
  • [42] Spatio-Temporal Calibration of Multiple Kinect Cameras Using 3D Human Pose
    Eichler, Nadav
    Hel-Or, Hagit
    Shimshoni, Ilan
    SENSORS, 2022, 22 (22)
  • [43] A 3D spatio-temporal simulation model for wireless channels
    Mohasseb, Y
    Fitz, MP
    2001 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-10, CONFERENCE RECORD, 2001, : 1711 - 1717
  • [44] Spatio-temporal registration techniques for relightable 3D video
    Ahmed, Naveed
    Theobalt, Christian
    Magnor, Marcus
    Seidel, Hans-Peter
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 1065 - +
  • [45] Spatio-temporal segmentation using 3D morphological tools
    Vincent, A
    Christian, R
    Fabrice, H
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS: IMAGE, SPEECH AND SIGNAL PROCESSING, 2000, : 877 - 880
  • [46] 3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers
    Thawakar, Omkar
    Anwer, Rao Muhammad
    Laaksonen, Jorma
    Reiner, Orly
    Shah, Mubarak
    Khan, Fahad Shahbaz
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VIII, 2023, 14227 : 613 - 623
  • [47] Fast Spatio-temporal Compression of Dynamic 3D Meshes
    Arvanitis, Gerasimos
    Lalos, Aris S.
    Moustakas, Konstantinos
    IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
  • [48] Spatio-temporal reflectance sharing for relightable 3D video
    Ahmed, Naveed
    Theobalt, Christian
    Seidel, Hans-Peter
    COMPUTER VISION/COMPUTER GRAPHICS COLLABORATION TECHNIQUES, 2007, 4418 : 47 - +
  • [49] Generalized Connectivity Constraints for Spatio-temporal 3D Reconstruction
    Oswald, Martin Ralf
    Stuehmer, Jan
    Cremers, Daniel
    COMPUTER VISION - ECCV 2014, PT IV, 2014, 8692 : 32 - 46
  • [50] Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video
    Xu, Feiyi
    Wang, Jifan
    Sun, Ying
    Qi, Jin
    Dong, Zhenjiang
    Sun, Yanfei
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251