STAF: 3D Human Mesh Recovery From Video With Spatio-Temporal Alignment Fusion

被引:0
|
作者
Yao, Wei [1 ]
Zhang, Hongwen [2 ]
Sun, Yunlian [1 ]
Tang, Jinhui [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
基金
中国国家自然科学基金;
关键词
3D human mesh recovery; temporal coherence; feature pyramid; attention model; POSE;
D O I
10.1109/TCSVT.2024.3410400
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The recovery of 3D human mesh from monocular images has significantly been developed in recent years. However, existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity. For this reason, we propose a novel Spatio-Temporal Alignment Fusion (STAF) model. As a videobased model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module (TCFM). As for spatial mesh-alignment evidence, we extract fine-grained local information through predicted mesh projection on the feature maps. Based on the spatial features, we further introduce a multi-stage adjacent Spatial Alignment Fusion Module (SAFM) to enhance the feature representation of the target frame. In addition to the above, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame. This method can remarkably improve the smoothness of recovery results from video. Extensive experiments on 3DPW, MPII3D, and H36M demonstrate the superiority of STAF. We achieve a state-of-the-art trade-off between precision and smoothness. Our code and more video results are on the project page https://yw0208.github.io/staf/.
引用
收藏
页码:10564 / 10577
页数:14
相关论文
共 50 条
  • [21] Skeleton-Based Spatio-Temporal U-Network for 3D Human Pose Estimation in Video
    Li, Weiwei
    Du, Rong
    Chen, Shudong
    SENSORS, 2022, 22 (07)
  • [22] Spatio-Temporal Reflectance Fusion Based on 3D Steering Kernel Regression Techniques
    Zhuo G.
    Wu B.
    Zhu X.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2018, 43 (04): : 563 - 570
  • [23] Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks
    Wang, Y.
    Shen, X. J.
    Chen, H. P.
    Sun, J. X.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2021, 31 (03) : 580 - 587
  • [24] Action Recognition in Videos with Spatio-Temporal Fusion 3D Convolutional Neural Networks
    Y. Wang
    X. J. Shen
    H. P. Chen
    J. X. Sun
    Pattern Recognition and Image Analysis, 2021, 31 : 580 - 587
  • [25] Video modeling by spatio-temporal resampling and Bayesian fusion
    Zheng, Yunfei
    Li, Xin
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 3201 - 3204
  • [26] An Effective Fusion Scheme of Spatio-Temporal Features for Human Action Recognition in RGB-D Video
    Tran, Quang D.
    Ly, Ngoc Q.
    2013 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2013,
  • [27] A Spatio-Temporal 3D Representation of a Historic Dataset
    Papasarantou, Chrissa
    Kalaouzis, Giorgos
    Pentazou, Ioulia
    Bourdakis, Vassilis
    ECAADE 2015: REAL TIME - EXTENDING THE REACH OF COMPUTATION, VOL 1, 2015, : 701 - 708
  • [28] Deep Video Matting via Spatio-Temporal Alignment and Aggregation
    Sun, Yanan
    Wang, Guanzhi
    Gu, Qiao
    Tang, Chi-Keung
    Tai, Yu-Wing
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6971 - 6980
  • [29] Visual 3D querying of spatio-temporal data
    Sourina, Olga
    2006 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS, 2006, : 147 - 153
  • [30] Spatio-temporal analysis and comparison of 3D videos
    Simone Cammarasana
    Giuseppe Patanè
    The Visual Computer, 2023, 39 : 1335 - 1350