Encoder-Decoder Structure Fusing Depth Information for Outdoor Semantic Segmentation

被引:1
|
作者
Chen, Songnan [1 ]
Tang, Mengxia [2 ]
Dong, Ruifang [2 ]
Kan, Jiangming [2 ]
机构
[1] Wuhan Polytech Univ, Sch Math & Comp Sci, 36 Huanhu Middle Rd, Wuhan 430048, Peoples R China
[2] Beijing Forestry Univ, Sch Technol, 35 Qinghua East Rd, Beijing 100083, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 17期
关键词
semantic segmentation; RGB-D image; predicted depth map; fusion structure; feature pyramid; NETWORK;
D O I
10.3390/app13179924
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The semantic segmentation of outdoor images is the cornerstone of scene understanding and plays a crucial role in the autonomous navigation of robots. Although RGB-D images can provide additional depth information for improving the performance of semantic segmentation tasks, current state-of-the-art methods directly use ground truth depth maps for depth information fusion, which relies on highly developed and expensive depth sensors. Aiming to solve such a problem, we proposed a self-calibrated RGB-D image semantic segmentation neural network model based on an improved residual network without relying on depth sensors, which utilizes multi-modal information from depth maps predicted with depth estimation models and RGB image fusion for image semantic segmentation to enhance the understanding of a scene. First, we designed a novel convolution neural network (CNN) with an encoding and decoding structure as our semantic segmentation model. The encoder was constructed using IResNet to extract the semantic features of the RGB image and the predicted depth map and then effectively fuse them with the self-calibration fusion structure. The decoder restored the resolution of the output features with a series of successive upsampling structures. Second, we presented a feature pyramid attention mechanism to extract the fused information at multiple scales and obtain features with rich semantic information. The experimental results using the publicly available Cityscapes dataset and collected forest scene images show that our model trained with the estimated depth information can achieve comparable performance to the ground truth depth map in improving the accuracy of the semantic segmentation task and even outperforming some competitive methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Encoder-Decoder With Cascaded CRFs for Semantic Segmentation
    Ji, Jian
    Shi, Rui
    Li, Sitong
    Chen, Peng
    Miao, Qiguang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1926 - 1938
  • [2] SCNet: A Simplified Encoder-Decoder CNN for Semantic Segmentation
    Yasrab, Robail
    Gu, Naijie
    Zhang, Xiaoci
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 785 - 789
  • [3] Encoder-Decoder with Multi-scale Information Fusion for Semantic Image Segmentation
    Ma, Xinxin
    Liu, Kai
    Ding, Chongyang
    Yan, Lin
    Duan, Meiyu
    ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
  • [4] Image Compression with Encoder-Decoder Matched Semantic Segmentation
    Hoang, Trinh Man
    Zhou, Jinjia
    Fan, Yibo
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 619 - 623
  • [5] Encoder-decoder with double spatial pyramid for semantic segmentation
    Kong, Huifang
    Hu, Jie
    Fan, Lei
    Zhang, Xiaoxue
    Fang, Yao
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (06)
  • [6] Semantic road segmentation using encoder-decoder architectures
    Latsaheb B.
    Sharma S.
    Hasija S.
    Multimedia Tools and Applications, 2025, 84 (9) : 5961 - 5983
  • [7] An Encoder-Decoder Network Based FCN Architecture for Semantic Segmentation
    Xing, Yongfeng
    Zhong, Luo
    Zhong, Xian
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
  • [8] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [9] Encoder-decoder semantic segmentation models for pressure wound images
    Eldem, Huseyin
    Ulker, Erkan
    Isikli, Osman Yasar
    IMAGING SCIENCE JOURNAL, 2022, 70 (02): : 75 - 86
  • [10] Attention Based Encoder-decoder Network for Cardiac Semantic Segmentation
    Yuan, Xiaohan
    Zhu, Yinsu
    Wang, Yangang
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 4578 - 4582