MAQT: multi-scale attention and query-optimized transformer for end-to-end pose estimation

被引:0
|
作者
Liang, Hong [1 ]
Wang, Cuiping [1 ]
Shao, Mingwen [1 ]
Zhang, Qian [1 ]
机构
[1] China Univ Petr East China, Qingdao Inst Software, Coll Comp Sci & Technol, Qingdao 266580, Shandong, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2025年 / 81卷 / 02期
基金
中国国家自然科学基金;
关键词
Human pose estimation; Transformer; Feature fusion; Query selection;
D O I
10.1007/s11227-025-06923-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Researchers are rapidly turning their focus to human pose estimation as a crucial area of computer vision. In light of the shortcomings of existing Transformer-based pose estimate methods when handling localized features, this work presents MAQT, an enhanced end-to-end method aimed at precise multi-human body pose estimation. To improve the localization of keypoints that are sensitive to scale changes, MAQT offers an Asym-Fusion block. Additionally, we design a new query strategy to optimize the initial selection of queries with Uncertainty-minimal Query Selection. Two self-attention mechanisms are used in the decoding phase for understanding and recording spatial and semantic connections between keypoints. In this paper, the MAQT method is validated on the MS COCO and CrowdPose datasets, and favorable experimental results are obtained.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] DCMSTRD: End-to-end Dense Captioning via Multi-Scale Transformer Decoding
    Shao, Zhuang
    Han, Jungong
    Debattista, Kurt
    Pang, Yanwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7581 - 7593
  • [2] Multi-scale attention guided network for end-to-end face alignment and recognition
    Shakeel, M. Saad
    Zhang, Yuxuan
    Wang, Xin
    Kang, Wenxiong
    Mahmood, Arif
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 88
  • [3] End-to-end 3D Human Pose Estimation with Transformer
    Zhang, Bowei
    Cui, Peng
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4529 - 4536
  • [4] MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection
    Kim, Bumsoo
    Mun, Jonghwan
    On, Kyoung-Woon
    Shin, Minchul
    Lee, Junhyun
    Kim, Eun-Sol
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19556 - 19565
  • [5] AMTT: An End-to-End Anchor-Based Multi-Scale Transformer Tracking Method
    Zheng, Yitao
    Deng, Honggui
    Xu, Qiguo
    Li, Ni
    ELECTRONICS, 2024, 13 (14)
  • [6] End-to-End Multi-Person Pose Estimation with Transformers
    Shi, Dahu
    Wei, Xing
    Li, Liangqi
    Ren, Ye
    Tan, Wenming
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11059 - 11068
  • [7] MSARN: A Multi-scale Attention Residual Network for End-to-End Environmental Sound Classification
    Fucai Hu
    Peng Song
    Ruhan He
    Zhaoli Yan
    Yongsheng Yu
    Neural Processing Letters, 2023, 55 : 11449 - 11465
  • [8] MSARN: A Multi-scale Attention Residual Network for End-to-End Environmental Sound Classification
    Hu, Fucai
    Song, Peng
    He, Ruhan
    Yan, Zhaoli
    Yu, Yongsheng
    NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11449 - 11465
  • [9] HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders
    Dhingra, Naina
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [10] Transforming Scene Text Detection and Recognition: A Multi-Scale End-to-End Approach With Transformer Framework
    Geng, Tianyu
    IEEE ACCESS, 2024, 12 : 40582 - 40596