STAF: 3D Human Mesh Recovery From Video With Spatio-Temporal Alignment Fusion

被引:0
|
作者
Yao, Wei [1 ]
Zhang, Hongwen [2 ]
Sun, Yunlian [1 ]
Tang, Jinhui [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
基金
中国国家自然科学基金;
关键词
3D human mesh recovery; temporal coherence; feature pyramid; attention model; POSE;
D O I
10.1109/TCSVT.2024.3410400
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The recovery of 3D human mesh from monocular images has significantly been developed in recent years. However, existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity. For this reason, we propose a novel Spatio-Temporal Alignment Fusion (STAF) model. As a videobased model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module (TCFM). As for spatial mesh-alignment evidence, we extract fine-grained local information through predicted mesh projection on the feature maps. Based on the spatial features, we further introduce a multi-stage adjacent Spatial Alignment Fusion Module (SAFM) to enhance the feature representation of the target frame. In addition to the above, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame. This method can remarkably improve the smoothness of recovery results from video. Extensive experiments on 3DPW, MPII3D, and H36M demonstrate the superiority of STAF. We achieve a state-of-the-art trade-off between precision and smoothness. Our code and more video results are on the project page https://yw0208.github.io/staf/.
引用
收藏
页码:10564 / 10577
页数:14
相关论文
共 50 条
  • [31] Spatio-temporal Human Body Segmentation from Video Stream
    Al Harbi, Nouf
    Gotoh, Yoshihiko
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PT I, 2013, 8047 : 78 - 85
  • [32] Spatio-temporal analysis and comparison of 3D videos
    Cammarasana, Simone
    Patane, Giuseppe
    VISUAL COMPUTER, 2023, 39 (04): : 1335 - 1350
  • [33] Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition
    Liu, Jun
    Shahroudy, Amir
    Xu, Dong
    Wang, Gang
    COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 816 - 833
  • [34] 3D human action recognition using spatio-temporal motion templates
    Lv, FJ
    Nevatia, R
    Lee, MW
    COMPUTER VISION IN HUMAN-COMPUTER INTERACTION, PROCEEDINGS, 2005, 3766 : 120 - 130
  • [35] SPATIO-TEMPORAL ATTENTION GRAPH FOR MONOCULAR 3D HUMAN POSE ESTIMATION
    Zhang, Lijun
    Shao, Xiaohu
    Li, Zhenghao
    Zhou, Xiang-Dong
    Shi, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1231 - 1235
  • [36] Spatio-temporal attention on manifold space for 3D human action recognition
    Ding, Chongyang
    Liu, Kai
    Cheng, Fei
    Belyaev, Evgeny
    APPLIED INTELLIGENCE, 2021, 51 (01) : 560 - 570
  • [37] Spatio-temporal attention on manifold space for 3D human action recognition
    Chongyang Ding
    Kai Liu
    Fei Cheng
    Evgeny Belyaev
    Applied Intelligence, 2021, 51 : 560 - 570
  • [38] Global and Local Spatio-Temporal Encoder for 3D Human Pose Estimation
    Wang, Yong
    Kang, Hongbo
    Wu, Doudou
    Yang, Wenming
    Zhang, Longbin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4039 - 4049
  • [39] Video2mesh: 3D human pose and shape recovery by a temporal convolutional transformer network
    Chao, Xianjin
    Ge, Zhipeng
    Leung, Howard
    IET COMPUTER VISION, 2023, 17 (04) : 379 - 388
  • [40] Spatio-temporal Matching for Human Detection in Video
    Zhou, Feng
    De la Torre, Fernando
    COMPUTER VISION - ECCV 2014, PT VI, 2014, 8694 : 62 - 77