UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation

被引:0
|
作者
Li B. [1 ,2 ]
Tang S. [1 ]
Li W. [1 ,2 ]
机构
[1] School of Information and Control Engineering, China University of Mining and Technology, Xuzhou
[2] School of Mechanical and Electronic Engineering, Suzhou University, Suzhou
来源
关键词
attention mechanism; context enhancement; lightweight network; multi-branch structure; Pose estimation;
D O I
10.3233/JIFS-231440
中图分类号
学科分类号
摘要
Pose estimation plays a crucial role in human-centered vision applications and has advanced significantly in recent years. However, prevailing approaches use extremely complex structural designs for obtaining high scores on the benchmark dataset, hampering edge device applications. In this study, an efficient and lightweight human pose estimation problem is investigated. Enhancements are made to the context enhancement module of the U-shaped structure to improve the multi-scale local modeling capability. With a transformer structure, a lightweight transformer block was designed to enhance the local feature extraction and global modeling ability. Finally, a lightweight pose estimation network-U-shaped Hybrid Vision Transformer, UViT-was developed. The minimal network UViT-T achieved a 3.9% improvement in AP scores on the COCO validation set with fewer model parameters and computational complexity compared with the best-performing V2 version of the MobileNet series. Specifically, with an input size of 384×288, UViT-T achieves an impressive AP score of 70.2 on the COCO test-dev set, with only 1.52 M parameters and 2.32 GFLOPs. The inference speed is approximately twice that of general-purpose networks. This study provides an efficient and lightweight design idea and method for the human pose estimation task and provides theoretical support for its deployment on edge devices. © 2024-IOS Press. All rights reserved.
引用
收藏
页码:8345 / 8359
页数:14
相关论文
共 50 条
  • [21] LMFormer: Lightweight and multi-feature perspective via transformer for human pose estimation
    Li, Biao
    Tang, Shoufeng
    Li, Wenyi
    NEUROCOMPUTING, 2024, 594
  • [22] Vision Transformer-based pilot pose estimation
    Wu, Honglan
    Liu, Hao
    Sun, Youchao
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (10): : 3100 - 3110
  • [23] Unsupervised Pose Estimation by Means of an Innovative Vision Transformer
    Brandizzi, Nicolo'
    Fanti, Andrea
    Gallotta, Roberto
    Russo, Samuele
    Iocchi, Luca
    Nardi, Daniele
    Napoli, Christian
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2022, PT II, 2023, 13589 : 3 - 20
  • [24] RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation
    Liu, Haiqiang
    Yao, Meibao
    Xiao, Xueming
    Xiong, Yonggang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [25] U-shaped stacked structure monolithic transformer for efficiency improvement
    Kang, Sungyoon
    Kim, Minchul
    Kim, Junghyun
    MICROWAVE AND OPTICAL TECHNOLOGY LETTERS, 2018, 60 (09) : 2325 - 2330
  • [26] Aggregation Transformer for Human Pose Estimation
    Dong, Hao
    Wang, Guodong
    Zhang, Xinyue
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3660 - 3667
  • [27] Collaborative transformer U-shaped network for medical image segmentation
    Gao, Yufei
    Zhang, Shichao
    Shi, Lei
    Zhao, Guohua
    Shi, Yucheng
    APPLIED SOFT COMPUTING, 2025, 173
  • [28] Weakly-Supervised 3D Human Pose Estimation With Cross-View U-Shaped Graph Convolutional Network
    Hua, Guoliang
    Liu, Hong
    Li, Wenhao
    Zhang, Qian
    Ding, Runwei
    Xu, Xin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1832 - 1843
  • [29] LENet: A Lightweight and Efficient High-Resolution Network for Human Pose Estimation
    Zhang, Ming
    Yu, Xiandong
    Li, Wenqiang
    Shu, Xin
    Pan, Lei
    Shen, Zhongwei
    IEEE ACCESS, 2025, 13 : 31032 - 31044
  • [30] Research on Lightweight and Efficient Bottom-Up Human Pose Estimation Algorithm
    Ma, Sai
    Ge, Haibo
    He, Wenhao
    Cheng, Mengyang
    An, Yu
    Computer Engineering and Applications, 2024, 60 (18) : 217 - 229