WNet: A dual-encoded multi-human parsing network

被引:0
|
作者
Hosen, Md Imran [1 ,2 ]
Aydin, Tarkan [2 ]
Islam, Md Baharul [2 ]
机构
[1] Manarat Int Univ, Dept Comp Sci & Engn, Dhaka, Bangladesh
[2] Bahcesehir Univ, Dept Comp Engn, Istanbul, Turkiye
关键词
computer vision; image processing; image segmentation; FRAMEWORK; POSE;
D O I
10.1049/ipr2.13176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, multi-human parsing has become a focal point in research, yet prevailing methods often rely on intermediate stages and lacking pixel-level analysis. Moreover, their high computational demands limit real-world efficiency. To address these challenges and enable real-time performance, low-latency end-to-end network is proposed. This approach leverages vision transformer and convolutional neural network in a dual-encoded network, featuring a lightweight Transformer-based vision encoder) and a convolution encoder based on Darknet. This combination adeptly captures long-range dependencies and spatial relationships. Incorporating a fuse block enables the seamless merging of features from the encoders. Residual connections in the decoder design amplify information flow. Experimental validation on crowd instance-level human parsing and look into person datasets showcases the WNet's effectiveness, achieving high-speed multi-human parsing at 26.7 frames per second. Ablation studies further underscore WNet's capabilities, emphasizing its efficiency and accuracy in complex multi-human parsing tasks. We present WNet, a low-latency end-to-end network for multi-human parsing that integrates vision transformer and Convolutional Neural Network in a dual-encoded structure (vision encoder and a convolution encoder). By adeptly capturing long-range dependencies and spatial relationships, WNet achieves real-time performance and high-speed parsing at 26.7 frames per second on crowd instance-level human parsing and look into person datasets. The inclusion of a fuse block for seamless feature merging, along with residual connections in the decoder, amplifies information flow, emphasizing WNet's efficiency and accuracy in complex multi-human parsing tasks. image
引用
收藏
页码:3316 / 3328
页数:13
相关论文
共 50 条
  • [41] Sequence Similarity Measurement for Multi-Human Motion Ability Assessment
    Chen, Lingling
    Wang, Ding
    Zheng, Ye
    Guo, Xin
    2024 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, ICMA 2024, 2024, : 363 - 368
  • [42] Self-supervised Multi-view Multi-Human Association and Tracking
    Gan, Yiyang
    Han, Ruize
    Yin, Liqiang
    Feng, Wei
    Wang, Song
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 282 - 290
  • [43] Novel Siamese Robot Platform for Multi-human Robot Interaction
    Ko, Woo-Ri
    Kim, Jong-Hwan
    SOCIAL ROBOTICS, ICSR 2018, 2018, 11357 : 561 - 568
  • [44] A Study of Multi-human Behavior in Substations' Operation Tickets Processing
    Yan, Guangwei
    Chen, Chao
    ADVANCES IN ELECTRICAL ENGINEERING AND AUTOMATION, 2012, 139 : 165 - 170
  • [45] Benchmarking the Complementary-View Multi-human Association and Tracking
    Ruize Han
    Wei Feng
    Feifan Wang
    Zekun Qian
    Haomin Yan
    Song Wang
    International Journal of Computer Vision, 2024, 132 : 118 - 136
  • [46] A Novel Human Parsing Method Driven by Multi-Scale Feature Blend Network
    Wang, Chunxu
    Xu, Benzhu
    Zhang, Gaofeng
    ICRSA 2021: 2021 4TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, 2021, : 30 - 38
  • [47] Detection-Based Multi-Human Tracking Using a CRF Model
    Heili, Alexandre
    Chen, Cheng
    Odobez, Jean-Marc
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [48] Human-Centered Robot Navigation - Toward a Harmoniously Coexisting Multi-Human and Multi-Robot Environment
    Lam, Chi-Pang
    Chou, Chen-Tun
    Chang, Chih-Fu
    Fu, Li-Chen
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 1813 - 1818
  • [49] Fluorescent-magnetic dual-encoded nanospheres: a promising tool for fast-simultaneous-addressable high-throughput analysis
    Xie, Min
    Hu, Jun
    Wen, Cong-Ying
    Zhang, Zhi-Ling
    Xie, Hai-Yan
    Pang, Dai-Wen
    NANOTECHNOLOGY, 2012, 23 (03)
  • [50] Monocular Multi-Human Detection Using Augmented Histograms of Oriented Gradients
    Chuang, Cheng-Hsiung
    Huang, Shih-Shinh
    Fu, Li-Chen
    Hsiao, Pei-Yung
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1013 - +