STAF: 3D Human Mesh Recovery From Video With Spatio-Temporal Alignment Fusion

被引:0
|
作者
Yao, Wei [1 ]
Zhang, Hongwen [2 ]
Sun, Yunlian [1 ]
Tang, Jinhui [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
基金
中国国家自然科学基金;
关键词
3D human mesh recovery; temporal coherence; feature pyramid; attention model; POSE;
D O I
10.1109/TCSVT.2024.3410400
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The recovery of 3D human mesh from monocular images has significantly been developed in recent years. However, existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity. For this reason, we propose a novel Spatio-Temporal Alignment Fusion (STAF) model. As a videobased model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module (TCFM). As for spatial mesh-alignment evidence, we extract fine-grained local information through predicted mesh projection on the feature maps. Based on the spatial features, we further introduce a multi-stage adjacent Spatial Alignment Fusion Module (SAFM) to enhance the feature representation of the target frame. In addition to the above, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame. This method can remarkably improve the smoothness of recovery results from video. Extensive experiments on 3DPW, MPII3D, and H36M demonstrate the superiority of STAF. We achieve a state-of-the-art trade-off between precision and smoothness. Our code and more video results are on the project page https://yw0208.github.io/staf/.
引用
收藏
页码:10564 / 10577
页数:14
相关论文
共 50 条
  • [41] Detection of spatio-temporal conflicts on a temporal 3D space system
    Song, YB
    Chua, DKH
    ADVANCES IN ENGINEERING SOFTWARE, 2005, 36 (11-12) : 814 - 826
  • [42] Video Summarization Through Reinforcement Learning With a 3D Spatio-Temporal U-Net
    Liu, Tianrui
    Meng, Qingjie
    Huang, Jun-Jie
    Vlontzos, Athanasios
    Rueckert, Daniel
    Kainz, Bernhard
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1573 - 1586
  • [43] MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
    Zhang, Jinlu
    Tu, Zhigang
    Yang, Jianyu
    Chen, Yujin
    Yuan, Junsong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13222 - 13232
  • [44] Spatio-temporal adaptive 3-D Kalman filter for video
    Kim, J
    Woods, JW
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 1997, 6 (03) : 414 - 424
  • [45] 3D Human Pose Estimation via Spatio-Temporal Matching from Monocular RGB Images
    Yan, Jielu
    Zhou, Ming Liang
    Fang, Bin
    Xu, Ke
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (12)
  • [46] Exploiting spatio-temporal representation for 3D human action recognition from depth map sequences
    Ji, Xiaopeng
    Zhao, Qingsong
    Cheng, Jun
    Ma, Chenfei
    KNOWLEDGE-BASED SYSTEMS, 2021, 227
  • [47] Spatio-Temporal 3D Reconstruction from Frame Sequences and Feature Points
    Federico, Giulio
    Carrara, Fabio
    Amato, Giuseppe
    Di Benedetto, Marco
    PROCEEDINGS OF THE 2024 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES WORKSHOPS, IMXW 2024, 2024, : 52 - 64
  • [48] Using Spatio-temporal Structure to Predict Human Activities from RGB-D Video
    Liu, Shunan
    Yang, Liu
    Wang, Xiaoli
    IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 2149 - 2152
  • [49] Spatio-Temporal Fusion Network for Video Super-Resolution
    Li, Huabin
    Zhang, Pingjian
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [50] 3D Human Pose Estimation with Spatio-Temporal Criss-cross Attention
    Tang, Zhenhua
    Qiu, Zhaofan
    Hao, Yanbin
    Hong, Richang
    Yao, Ting
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4790 - 4799