WNet: A dual-encoded multi-human parsing network

被引：0

作者：

Hosen, Md Imran ^{[1
,2
]}

Aydin, Tarkan ^{[2
]}

Islam, Md Baharul ^{[2
]}

机构：

[1] Manarat Int Univ, Dept Comp Sci & Engn, Dhaka, Bangladesh

[2] Bahcesehir Univ, Dept Comp Engn, Istanbul, Turkiye

来源：

IET IMAGE PROCESSING | 2024年 / 18卷 / 12期

关键词：

computer vision; image processing; image segmentation; FRAMEWORK; POSE;

D O I：

10.1049/ipr2.13176

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, multi-human parsing has become a focal point in research, yet prevailing methods often rely on intermediate stages and lacking pixel-level analysis. Moreover, their high computational demands limit real-world efficiency. To address these challenges and enable real-time performance, low-latency end-to-end network is proposed. This approach leverages vision transformer and convolutional neural network in a dual-encoded network, featuring a lightweight Transformer-based vision encoder) and a convolution encoder based on Darknet. This combination adeptly captures long-range dependencies and spatial relationships. Incorporating a fuse block enables the seamless merging of features from the encoders. Residual connections in the decoder design amplify information flow. Experimental validation on crowd instance-level human parsing and look into person datasets showcases the WNet's effectiveness, achieving high-speed multi-human parsing at 26.7 frames per second. Ablation studies further underscore WNet's capabilities, emphasizing its efficiency and accuracy in complex multi-human parsing tasks. We present WNet, a low-latency end-to-end network for multi-human parsing that integrates vision transformer and Convolutional Neural Network in a dual-encoded structure (vision encoder and a convolution encoder). By adeptly capturing long-range dependencies and spatial relationships, WNet achieves real-time performance and high-speed parsing at 26.7 frames per second on crowd instance-level human parsing and look into person datasets. The inclusion of a fuse block for seamless feature merging, along with residual connections in the decoder, amplifies information flow, emphasizing WNet's efficiency and accuracy in complex multi-human parsing tasks. image

引用

页码：3316 / 3328

页数：13

共 50 条

[31] Optimal task allocation in multi-human multi-robot interaction
Malvankar-Mehta, Monali S.
Mehta, Siddhartha S.
OPTIMIZATION LETTERS, 2015, 9 (08) : 1787 - 1803
[32] Dual-Encoded Affinity Microbead Signature Combinatorial Profiling for Acute Myocardial Infarction High-Sensitivity Diagnosis
He, Luxuan
Wu, Jiacheng
Lin, Zhun
Zhang, Yuanqing
Liu, Peiqing
ACS SENSORS, 2024, 9 (04): : 2083 - 2090
[33] Dual-Encoded Microbeads through a Host-Guest Structure: Enormous, Flexible, and Accurate Barcodes for Multiplexed Assays
Zhang, Ding Sheng-zi
Jiang, Yang
Yang, Haiou
Zhu, Youjie
Zhang, Shunjia
Zhu, Ying
Wei, Dan
Lin, Ye
Wang, Pingping
Fu, Qihua
Xu, Hong
Gu, Hongchen
ADVANCED FUNCTIONAL MATERIALS, 2016, 26 (34) : 6146 - 6157
[34] Sense and Validate: Fluorophore/Mass Dual-Encoded Nanoprobes for Fluorescence Imaging and MS Quantification of Intracellular Multiple MicroRNAs
Xu, Hongmei
Zhang, Zhenzhen
Wang, Yihan
Zhang, Xuemeng
Zhu, Jun-Jie
Min, Qianhao
ANALYTICAL CHEMISTRY, 2022, 94 (16) : 6329 - 6337
[35] Multi-Human Locating in Real Environment by Thermal Sensor
Kuki, Masato
Nakajima, Hiroshi
Tsuchiya, Naoki
Hata, Yutaka
2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 4623 - 4628
[36] Eliciting Compatible Demonstrations for Multi-Human Imitation Learning
Gandhi, Kanishk
Karamcheti, Siddharth
Liao, Madeline
Sadigh, Dorsa
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1981 - 1991
[37] Multi-class Human Body Parsing with Edge-Enhancement Network
Huang, Xi
Wu, Keyu
Hu, Gang
Shao, Jie
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 466 - 477
[38] A Dual-Encoded Bead-Based Immunoassay with Tunable Detection Range for COVID-19 Serum Evaluation
Lin, Zhun
Zhang, Jie
Zou, Zhengyu
Lu, Gen
Wu, Minhao
Niu, Li
Zhang, Yuanqing
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2022, 61 (37)
[39] Benchmarking the Complementary-View Multi-human Association and Tracking
Han, Ruize
Feng, Wei
Wang, Feifan
Qian, Zekun
Yan, Haomin
Wang, Song
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (01) : 118 - 136
[40] EgoHumans: An Egocentric 3D Multi-Human Benchmark
Khirodkar, Rawal
Bansal, Aayush
Ma, Lingni
Newcombe, Richard
Vo, Minh
Kitani, Kris
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19750 - 19762

← 1 2 3 4 5 →