WNet: A dual-encoded multi-human parsing network

被引:0
|
作者
Hosen, Md Imran [1 ,2 ]
Aydin, Tarkan [2 ]
Islam, Md Baharul [2 ]
机构
[1] Manarat Int Univ, Dept Comp Sci & Engn, Dhaka, Bangladesh
[2] Bahcesehir Univ, Dept Comp Engn, Istanbul, Turkiye
关键词
computer vision; image processing; image segmentation; FRAMEWORK; POSE;
D O I
10.1049/ipr2.13176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, multi-human parsing has become a focal point in research, yet prevailing methods often rely on intermediate stages and lacking pixel-level analysis. Moreover, their high computational demands limit real-world efficiency. To address these challenges and enable real-time performance, low-latency end-to-end network is proposed. This approach leverages vision transformer and convolutional neural network in a dual-encoded network, featuring a lightweight Transformer-based vision encoder) and a convolution encoder based on Darknet. This combination adeptly captures long-range dependencies and spatial relationships. Incorporating a fuse block enables the seamless merging of features from the encoders. Residual connections in the decoder design amplify information flow. Experimental validation on crowd instance-level human parsing and look into person datasets showcases the WNet's effectiveness, achieving high-speed multi-human parsing at 26.7 frames per second. Ablation studies further underscore WNet's capabilities, emphasizing its efficiency and accuracy in complex multi-human parsing tasks. We present WNet, a low-latency end-to-end network for multi-human parsing that integrates vision transformer and Convolutional Neural Network in a dual-encoded structure (vision encoder and a convolution encoder). By adeptly capturing long-range dependencies and spatial relationships, WNet achieves real-time performance and high-speed parsing at 26.7 frames per second on crowd instance-level human parsing and look into person datasets. The inclusion of a fuse block for seamless feature merging, along with residual connections in the decoder, amplifies information flow, emphasizing WNet's efficiency and accuracy in complex multi-human parsing tasks. image
引用
收藏
页码:3316 / 3328
页数:13
相关论文
共 50 条
  • [1] Multi-Human Parsing Machines
    Li, Jianshu
    Zhao, Jian
    Chen, Yunpeng
    Roy, Sujoy
    Yan, Shuicheng
    Feng, Jiashi
    Sim, Terence
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 45 - 53
  • [2] Fine-Grained Multi-human Parsing
    Zhao, Jian
    Li, Jianshu
    Liu, Hengzhu
    Yan, Shuicheng
    Feng, Jiashi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (8-9) : 2185 - 2203
  • [3] Fine-Grained Multi-human Parsing
    Jian Zhao
    Jianshu Li
    Hengzhu Liu
    Shuicheng Yan
    Jiashi Feng
    International Journal of Computer Vision, 2020, 128 : 2185 - 2203
  • [4] Multi-human Parsing with Pose and Boundary Guidance
    Du, Shuncheng
    Wang, Yigang
    Wu, Zizhao
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, PRCV 2020, 2020, 12305 : 481 - 492
  • [5] Multi-human Parsing Based on Dynamic Convolution
    Yan, Min
    Zhang, Guoshan
    Zhang, Tong
    Zhang, Yueming
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 7185 - 7190
  • [6] Nondiscriminatory treatment: A straightforward framework for multi-human parsing
    Yan, Min
    Zhang, Guoshan
    Zhang, Tong
    Zhang, Yueming
    NEUROCOMPUTING, 2021, 460 : 126 - 138
  • [7] MHCP-RCNN : Multi-Human Color Parsing Segmentation using Multi-Task Network
    Abhilash, S. K.
    Nookala, Venu Madhav
    Babu, Adithya
    Karthik, S.
    Mithun, V. R.
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] UniParser: Multi-Human Parsing With Unified Correlation Representation Learning
    Chu, Jiaming
    Jin, Lei
    Teng, Yinglei
    Li, Jianshu
    Wei, Yunchao
    Wang, Zheng
    Xing, Junliang
    Yan, Shuicheng
    Zhao, Jian
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5159 - 5171
  • [9] REAL-TIME MULTI-HUMAN PARSING ON EMBEDDED DEVICES
    Agyeman, Rockson
    Rinner, Bernhard
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5145 - 5149
  • [10] Multi-human Parsing with a Graph-based Generative Adversarial Model
    Li, Jianshu
    Zhao, Jian
    Lang, Congyan
    Li, Yidong
    Wei, Yunchao
    Guo, Guodong
    Sim, Terence
    Yan, Shuicheng
    Feng, Jiashi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)