UViT: Efficient and lightweight U-shaped hybrid vision transformer for human pose estimation

被引：0

作者：

Li B. ^{[1
,2
]}

Tang S. ^{[1
]}

Li W. ^{[1
,2
]}

机构：

[1] School of Information and Control Engineering, China University of Mining and Technology, Xuzhou

[2] School of Mechanical and Electronic Engineering, Suzhou University, Suzhou

来源：

Journal of Intelligent and Fuzzy Systems | 2024年 / 46卷 / 04期

关键词：

attention mechanism; context enhancement; lightweight network; multi-branch structure; Pose estimation;

D O I：

10.3233/JIFS-231440

中图分类号：

学科分类号：

摘要：

Pose estimation plays a crucial role in human-centered vision applications and has advanced significantly in recent years. However, prevailing approaches use extremely complex structural designs for obtaining high scores on the benchmark dataset, hampering edge device applications. In this study, an efficient and lightweight human pose estimation problem is investigated. Enhancements are made to the context enhancement module of the U-shaped structure to improve the multi-scale local modeling capability. With a transformer structure, a lightweight transformer block was designed to enhance the local feature extraction and global modeling ability. Finally, a lightweight pose estimation network-U-shaped Hybrid Vision Transformer, UViT-was developed. The minimal network UViT-T achieved a 3.9% improvement in AP scores on the COCO validation set with fewer model parameters and computational complexity compared with the best-performing V2 version of the MobileNet series. Specifically, with an input size of 384×288, UViT-T achieves an impressive AP score of 70.2 on the COCO test-dev set, with only 1.52 M parameters and 2.32 GFLOPs. The inference speed is approximately twice that of general-purpose networks. This study provides an efficient and lightweight design idea and method for the human pose estimation task and provides theoretical support for its deployment on edge devices. © 2024-IOS Press. All rights reserved.

引用

页码：8345 / 8359

页数：14

共 50 条

[41] Transformer-Based Cascade U-shaped Network for Action Segmentation
Bao, Wenxia
Lin, An
Huang, Hua
Yang, Xianjun
Chen, Hemu
2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 157 - 161
[42] A lightweight hybrid vision transformer network for radar-based human activity recognition
Huan, Sha
Wang, Zhaoyue
Wang, Xiaoqiang
Wu, Limei
Yang, Xiaoxuan
Huang, Hongming
Dai, Gan E.
SCIENTIFIC REPORTS, 2023, 13 (01)
[43] A lightweight hybrid vision transformer network for radar-based human activity recognition
Sha Huan
Zhaoyue Wang
Xiaoqiang Wang
Limei Wu
Xiaoxuan Yang
Hongming Huang
Gan E. Dai
Scientific Reports, 13
[44] FTUNet: A Feature-Enhanced Network for Medical Image Segmentation Based on the Combination of U-Shaped Network and Vision Transformer
Yuefei Wang
Xi Yu
Yixi Yang
Shijie Zeng
Yuquan Xu
Ronghui Feng
Neural Processing Letters, 56
[45] A Fast and Effective Transformer for Human Pose Estimation
Wang, Dong
Xie, Wenjun
Cai, Youcheng
Liu, Xiaoping
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 992 - 996
[46] FTUNet: A Feature-Enhanced Network for Medical Image Segmentation Based on the Combination of U-Shaped Network and Vision Transformer
Wang, Yuefei
Yu, Xi
Yang, Yixi
Zeng, Shijie
Xu, Yuquan
Feng, Ronghui
NEURAL PROCESSING LETTERS, 2024, 56 (02)
[47] Split-and-recombine and vision transformer based 3D human pose estimation
Lu, Xinyi
Xu, Fan
Hu, Shuiyi
Yu, Tianqi
Hu, Jianling
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
[48] GTPT: Group-Based Token Pruning Transformer for Efficient Human Pose Estimation
Wang, Haonan
Liu, Jie
Tang, Jie
Wu, Gangshan
Xu, Bo
Chou, Yanbing
Wang, Yong
COMPUTER VISION - ECCV 2024, PT LXIX, 2025, 15127 : 213 - 230
[49] MobileDepth: Monocular Depth Estimation Based on Lightweight Vision Transformer
Li, Yundong
Wei, Xiaokun
APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
[50] URNet: A U-Shaped Residual Network for Lightweight Image Super-Resolution
Wang, Yuntao
Zhao, Lin
Liu, Liman
Hu, Huaifei
Tao, Wenbing
REMOTE SENSING, 2021, 13 (19)

← 1 2 3 4 5 →