UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation

被引:0
|
作者
Li B. [1 ,2 ]
Tang S. [1 ]
Li W. [1 ,2 ]
机构
[1] School of Information and Control Engineering, China University of Mining and Technology, Xuzhou
[2] School of Mechanical and Electronic Engineering, Suzhou University, Suzhou
来源
关键词
attention mechanism; context enhancement; lightweight network; multi-branch structure; Pose estimation;
D O I
10.3233/JIFS-231440
中图分类号
学科分类号
摘要
Pose estimation plays a crucial role in human-centered vision applications and has advanced significantly in recent years. However, prevailing approaches use extremely complex structural designs for obtaining high scores on the benchmark dataset, hampering edge device applications. In this study, an efficient and lightweight human pose estimation problem is investigated. Enhancements are made to the context enhancement module of the U-shaped structure to improve the multi-scale local modeling capability. With a transformer structure, a lightweight transformer block was designed to enhance the local feature extraction and global modeling ability. Finally, a lightweight pose estimation network-U-shaped Hybrid Vision Transformer, UViT-was developed. The minimal network UViT-T achieved a 3.9% improvement in AP scores on the COCO validation set with fewer model parameters and computational complexity compared with the best-performing V2 version of the MobileNet series. Specifically, with an input size of 384×288, UViT-T achieves an impressive AP score of 70.2 on the COCO test-dev set, with only 1.52 M parameters and 2.32 GFLOPs. The inference speed is approximately twice that of general-purpose networks. This study provides an efficient and lightweight design idea and method for the human pose estimation task and provides theoretical support for its deployment on edge devices. © 2024-IOS Press. All rights reserved.
引用
收藏
页码:8345 / 8359
页数:14
相关论文
共 50 条
  • [31] A hybrid U-shaped and transformer network for change detection in high-resolution remote sensing images
    Wu, Huapeng
    Yuan, Mengxue
    Zhan, Tianming
    IET IMAGE PROCESSING, 2024, 18 (05) : 1373 - 1384
  • [32] A Novel Lightweight U-Shaped Network for Crack Detection at Pixel Level
    Luo, Zhong
    Li, Xinle
    Zheng, Yanfeng
    IEEE ACCESS, 2024, 12 : 153385 - 153394
  • [33] Improved lightweight human pose estimation algorithm
    Wang Ming-he
    Xu Wang-ming
    Jiang Hao-kun
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2023, 38 (07) : 955 - 963
  • [34] Human pose estimation based on lightweight basicblock
    Yanping Li
    Ruyi Liu
    Xiangyang Wang
    Rui Wang
    Machine Vision and Applications, 2023, 34
  • [35] Lightweight Human Pose Estimation with Attention Mechanism
    Chu Xiaoshuai
    Ji Ruirui
    Dong Danyang
    Xi Yuzhuo
    FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705
  • [36] Pose Relation Transformer Refine Occlusions for Human Pose Estimation
    Chi, Hyung-gun
    Chi, Seunggeun
    Chan, Stanley
    Ramani, Karthik
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 6138 - 6145
  • [37] Human pose estimation based on lightweight basicblock
    Li, Yanping
    Liu, Ruyi
    Wang, Xiangyang
    Wang, Rui
    MACHINE VISION AND APPLICATIONS, 2023, 34 (01)
  • [38] Optimization of U-shaped pure transformer medical image segmentation network
    Dan, Yongping
    Jin, Weishou
    Wang, Zhida
    Sun, Changhao
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [39] UDT: U-shaped deformable transformer for subarachnoid haemorrhage image segmentation
    Xie, Wei
    Jin, Lianghao
    Hua, Shiqi
    Sun, Hao
    Sun, Bo
    Tu, Zhigang
    Liu, Jun
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (03) : 756 - 768
  • [40] Graph U-Shaped Network with Mapping-Aware Local Enhancement for Single-Frame 3D Human Pose Estimation
    Yu, Bing
    Huang, Yan
    Cheng, Guang
    Huang, Dongjin
    Ding, Youdong
    ELECTRONICS, 2023, 12 (19)