UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation

被引:0
|
作者
Li B. [1 ,2 ]
Tang S. [1 ]
Li W. [1 ,2 ]
机构
[1] School of Information and Control Engineering, China University of Mining and Technology, Xuzhou
[2] School of Mechanical and Electronic Engineering, Suzhou University, Suzhou
来源
关键词
attention mechanism; context enhancement; lightweight network; multi-branch structure; Pose estimation;
D O I
10.3233/JIFS-231440
中图分类号
学科分类号
摘要
Pose estimation plays a crucial role in human-centered vision applications and has advanced significantly in recent years. However, prevailing approaches use extremely complex structural designs for obtaining high scores on the benchmark dataset, hampering edge device applications. In this study, an efficient and lightweight human pose estimation problem is investigated. Enhancements are made to the context enhancement module of the U-shaped structure to improve the multi-scale local modeling capability. With a transformer structure, a lightweight transformer block was designed to enhance the local feature extraction and global modeling ability. Finally, a lightweight pose estimation network-U-shaped Hybrid Vision Transformer, UViT-was developed. The minimal network UViT-T achieved a 3.9% improvement in AP scores on the COCO validation set with fewer model parameters and computational complexity compared with the best-performing V2 version of the MobileNet series. Specifically, with an input size of 384×288, UViT-T achieves an impressive AP score of 70.2 on the COCO test-dev set, with only 1.52 M parameters and 2.32 GFLOPs. The inference speed is approximately twice that of general-purpose networks. This study provides an efficient and lightweight design idea and method for the human pose estimation task and provides theoretical support for its deployment on edge devices. © 2024-IOS Press. All rights reserved.
引用
收藏
页码:8345 / 8359
页数:14
相关论文
共 50 条
  • [1] U-shaped spatial–temporal transformer network for 3D human pose estimation
    Honghong Yang
    Longfei Guo
    Yumei Zhang
    Xiaojun Wu
    Machine Vision and Applications, 2022, 33
  • [2] U-shaped spatial-temporal transformer network for 3D human pose estimation
    Yang, Honghong
    Guo, Longfei
    Zhang, Yumei
    Wu, Xiaojun
    MACHINE VISION AND APPLICATIONS, 2022, 33 (06)
  • [3] Lightweight and Efficient Human Pose Estimation Fusing Transformer and Attention
    Wu, Chengpeng
    Tan, Guangxing
    Chen, Haifeng
    Li, Chunyu
    Computer Engineering and Applications, 2024, 60 (22) : 197 - 208
  • [4] LIGHTPOSE: A LIGHTWEIGHT AND EFFICIENT MODEL WITH TRANSFORMER FOR HUMAN POSE ESTIMATION
    Liu, Xiyang
    Li, Peng
    Ni, Ding
    Wang, Yan
    Xue, Hui
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2674 - 2678
  • [5] EfficientPose: A Lightweight and Efficient Model with Transformer for Human Pose Estimation
    Liang, Wei
    Cheng, Zhang
    Han, Junjia
    Wang, Yanxia
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14864 : 120 - 131
  • [6] Video summarization with u-shaped transformer
    Yaosen Chen
    Bing Guo
    Yan Shen
    Renshuang Zhou
    Weichen Lu
    Wei Wang
    Xuming Wen
    Xinhua Suo
    Applied Intelligence, 2022, 52 : 17864 - 17880
  • [7] Video summarization with u-shaped transformer
    Chen, Yaosen
    Guo, Bing
    Shen, Yan
    Zhou, Renshuang
    Lu, Weichen
    Wang, Wei
    Wen, Xuming
    Suo, Xinhua
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17864 - 17880
  • [8] DAUNet: Detail-Aware U-Shaped Network for 2D Human Pose Estimation
    Li, Xi
    Li, Yuxin
    Xiao, Zhenhua
    Huang, Zhenghua
    Zou, Lianying
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (02): : 3325 - 3349
  • [9] Shift Pose: A Lightweight Transformer-like Neural Network for Human Pose Estimation
    Chen, Haijian
    Jiang, Xinyun
    Dai, Yonghui
    SENSORS, 2022, 22 (19)
  • [10] ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
    Xu, Yufei
    Zhang, Jing
    Zhang, Qiming
    Tao, Dacheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,