UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation

被引:0
|
作者
Li B. [1 ,2 ]
Tang S. [1 ]
Li W. [1 ,2 ]
机构
[1] School of Information and Control Engineering, China University of Mining and Technology, Xuzhou
[2] School of Mechanical and Electronic Engineering, Suzhou University, Suzhou
来源
关键词
attention mechanism; context enhancement; lightweight network; multi-branch structure; Pose estimation;
D O I
10.3233/JIFS-231440
中图分类号
学科分类号
摘要
Pose estimation plays a crucial role in human-centered vision applications and has advanced significantly in recent years. However, prevailing approaches use extremely complex structural designs for obtaining high scores on the benchmark dataset, hampering edge device applications. In this study, an efficient and lightweight human pose estimation problem is investigated. Enhancements are made to the context enhancement module of the U-shaped structure to improve the multi-scale local modeling capability. With a transformer structure, a lightweight transformer block was designed to enhance the local feature extraction and global modeling ability. Finally, a lightweight pose estimation network-U-shaped Hybrid Vision Transformer, UViT-was developed. The minimal network UViT-T achieved a 3.9% improvement in AP scores on the COCO validation set with fewer model parameters and computational complexity compared with the best-performing V2 version of the MobileNet series. Specifically, with an input size of 384×288, UViT-T achieves an impressive AP score of 70.2 on the COCO test-dev set, with only 1.52 M parameters and 2.32 GFLOPs. The inference speed is approximately twice that of general-purpose networks. This study provides an efficient and lightweight design idea and method for the human pose estimation task and provides theoretical support for its deployment on edge devices. © 2024-IOS Press. All rights reserved.
引用
收藏
页码:8345 / 8359
页数:14
相关论文
共 50 条
  • [41] Transformer-Based Cascade U-shaped Network for Action Segmentation
    Bao, Wenxia
    Lin, An
    Huang, Hua
    Yang, Xianjun
    Chen, Hemu
    2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 157 - 161
  • [42] A lightweight hybrid vision transformer network for radar-based human activity recognition
    Huan, Sha
    Wang, Zhaoyue
    Wang, Xiaoqiang
    Wu, Limei
    Yang, Xiaoxuan
    Huang, Hongming
    Dai, Gan E.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [43] A lightweight hybrid vision transformer network for radar-based human activity recognition
    Sha Huan
    Zhaoyue Wang
    Xiaoqiang Wang
    Limei Wu
    Xiaoxuan Yang
    Hongming Huang
    Gan E. Dai
    Scientific Reports, 13
  • [44] FTUNet: A Feature-Enhanced Network for Medical Image Segmentation Based on the Combination of U-Shaped Network and Vision Transformer
    Yuefei Wang
    Xi Yu
    Yixi Yang
    Shijie Zeng
    Yuquan Xu
    Ronghui Feng
    Neural Processing Letters, 56
  • [45] A Fast and Effective Transformer for Human Pose Estimation
    Wang, Dong
    Xie, Wenjun
    Cai, Youcheng
    Liu, Xiaoping
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 992 - 996
  • [46] FTUNet: A Feature-Enhanced Network for Medical Image Segmentation Based on the Combination of U-Shaped Network and Vision Transformer
    Wang, Yuefei
    Yu, Xi
    Yang, Yixi
    Zeng, Shijie
    Xu, Yuquan
    Feng, Ronghui
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [47] Split-and-recombine and vision transformer based 3D human pose estimation
    Lu, Xinyi
    Xu, Fan
    Hu, Shuiyi
    Yu, Tianqi
    Hu, Jianling
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [48] GTPT: Group-Based Token Pruning Transformer for Efficient Human Pose Estimation
    Wang, Haonan
    Liu, Jie
    Tang, Jie
    Wu, Gangshan
    Xu, Bo
    Chou, Yanbing
    Wang, Yong
    COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 213 - 230
  • [49] MobileDepth: Monocular Depth Estimation Based on Lightweight Vision Transformer
    Li, Yundong
    Wei, Xiaokun
    APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [50] URNet: A U-Shaped Residual Network for Lightweight Image Super-Resolution
    Wang, Yuntao
    Zhao, Lin
    Liu, Liman
    Hu, Huaifei
    Tao, Wenbing
    REMOTE SENSING, 2021, 13 (19)