WNet: A dual-encoded multi-human parsing network

被引：0

作者：

Hosen, Md Imran ^{[1
,2
]}

Aydin, Tarkan ^{[2
]}

Islam, Md Baharul ^{[2
]}

机构：

[1] Manarat Int Univ, Dept Comp Sci & Engn, Dhaka, Bangladesh

[2] Bahcesehir Univ, Dept Comp Engn, Istanbul, Turkiye

来源：

IET IMAGE PROCESSING | 2024年 / 18卷 / 12期

关键词：

computer vision; image processing; image segmentation; FRAMEWORK; POSE;

D O I：

10.1049/ipr2.13176

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, multi-human parsing has become a focal point in research, yet prevailing methods often rely on intermediate stages and lacking pixel-level analysis. Moreover, their high computational demands limit real-world efficiency. To address these challenges and enable real-time performance, low-latency end-to-end network is proposed. This approach leverages vision transformer and convolutional neural network in a dual-encoded network, featuring a lightweight Transformer-based vision encoder) and a convolution encoder based on Darknet. This combination adeptly captures long-range dependencies and spatial relationships. Incorporating a fuse block enables the seamless merging of features from the encoders. Residual connections in the decoder design amplify information flow. Experimental validation on crowd instance-level human parsing and look into person datasets showcases the WNet's effectiveness, achieving high-speed multi-human parsing at 26.7 frames per second. Ablation studies further underscore WNet's capabilities, emphasizing its efficiency and accuracy in complex multi-human parsing tasks. We present WNet, a low-latency end-to-end network for multi-human parsing that integrates vision transformer and Convolutional Neural Network in a dual-encoded structure (vision encoder and a convolution encoder). By adeptly capturing long-range dependencies and spatial relationships, WNet achieves real-time performance and high-speed parsing at 26.7 frames per second on crowd instance-level human parsing and look into person datasets. The inclusion of a fuse block for seamless feature merging, along with residual connections in the decoder, amplifies information flow, emphasizing WNet's efficiency and accuracy in complex multi-human parsing tasks. image

引用

页码：3316 / 3328

页数：13

共 50 条

[41] Sequence Similarity Measurement for Multi-Human Motion Ability Assessment
Chen, Lingling
Wang, Ding
Zheng, Ye
Guo, Xin
2024 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, ICMA 2024, 2024, : 363 - 368
[42] Self-supervised Multi-view Multi-Human Association and Tracking
Gan, Yiyang
Han, Ruize
Yin, Liqiang
Feng, Wei
Wang, Song
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 282 - 290
[43] Novel Siamese Robot Platform for Multi-human Robot Interaction
Ko, Woo-Ri
Kim, Jong-Hwan
SOCIAL ROBOTICS, ICSR 2018, 2018, 11357 : 561 - 568
[44] A Study of Multi-human Behavior in Substations' Operation Tickets Processing
Yan, Guangwei
Chen, Chao
ADVANCES IN ELECTRICAL ENGINEERING AND AUTOMATION, 2012, 139 : 165 - 170
[45] Benchmarking the Complementary-View Multi-human Association and Tracking
Ruize Han
Wei Feng
Feifan Wang
Zekun Qian
Haomin Yan
Song Wang
International Journal of Computer Vision, 2024, 132 : 118 - 136
[46] A Novel Human Parsing Method Driven by Multi-Scale Feature Blend Network
Wang, Chunxu
Xu, Benzhu
Zhang, Gaofeng
ICRSA 2021: 2021 4TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, 2021, : 30 - 38
[47] Detection-Based Multi-Human Tracking Using a CRF Model
Heili, Alexandre
Chen, Cheng
Odobez, Jean-Marc
2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
[48] Human-Centered Robot Navigation - Toward a Harmoniously Coexisting Multi-Human and Multi-Robot Environment
Lam, Chi-Pang
Chou, Chen-Tun
Chang, Chih-Fu
Fu, Li-Chen
IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 1813 - 1818
[49] Fluorescent-magnetic dual-encoded nanospheres: a promising tool for fast-simultaneous-addressable high-throughput analysis
Xie, Min
Hu, Jun
Wen, Cong-Ying
Zhang, Zhi-Ling
Xie, Hai-Yan
Pang, Dai-Wen
NANOTECHNOLOGY, 2012, 23 (03)
[50] Monocular Multi-Human Detection Using Augmented Histograms of Oriented Gradients
Chuang, Cheng-Hsiung
Huang, Shih-Shinh
Fu, Li-Chen
Hsiao, Pei-Yung
19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1013 - +

← 1 2 3 4 5 →