An RGB-D Fusion Based Semantic Segmentation Algorithm Based on Neighborhood Metric Relations

被引:0
|
作者
Zhang J. [1 ]
Chen Y. [1 ]
Zhu S. [1 ]
Li Y. [1 ]
机构
[1] Zhejiang Lab, Hangzhou
来源
Jiqiren/Robot | 2023年 / 45卷 / 02期
关键词
deep learning; neighborhood metric relation; RGB-D fusion; semantic segmentation;
D O I
10.13973/j.cnki.robot.210550
中图分类号
学科分类号
摘要
Aiming at the problem of low semantic segmentation accuracy due to the complex extraterrestrial environment and limited computing resources in deep space exploration activities, an RGB-D fusion based semantic segmentation algorithm based on neighborhood metric relations is proposed. The algorithm replaces traditional monocular camera data with multi-modal RGB-D information, constructs the basic network with the medium-term fusion framework, and additionally designs a neighborhood-metric-relations module to improve the performance. Specifically, the medium-term fusion network performs operations such as refining, fusion, and patching for original features of different scales to achieve effective complementation of cross-modal data and cross-level features. Furthermore, the neighborhood metric relationship is constructed by combining semantic feature maps and semantic tags without increasing the inference cost, and the correlation information between sample categories is mined from the global and local features to improve the performance of the segmentation network. Experiments are carried out on the indoor dataset NYUDv2 and the Mars simulation site dataset MARSv1, respectively, and the results show that the multi-modal RGB-D information and the neighborhood metric relations can significantly improve the accuracy of semantic segmentation. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:156 / 165
页数:9
相关论文
共 39 条
  • [1] Eigen D, Fergus R., Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, IEEE International Conference on Computer Vision, pp. 2650-2658, (2015)
  • [2] Li Z, Gan Y K, Liang X D, Et al., LSTM-CF: Unifying context modeling and fusion with LSTMs for RGB-D scene labeling, European Conference on Computer Vision, pp. 541-557, (2016)
  • [3] Hazirbas C, Ma L, Domokos C, Et al., FuseNet: Incorporating depth into semantic segmentation via fusion-based CNN architecture, Asian Conference on Computer Vision, pp. 213-228, (2016)
  • [4] Lee S, Park S J, Hong K S., RDFNet: RGB-D multi-level residual feature fusion for indoor semantic segmentation, IEEE International Conference on Computer Vision, pp. 4980-4989, (2017)
  • [5] Qi X J, Liao R J, Jia J Y, Et al., 3D graph neural networks for RGBD semantic segmentation, IEEE International Conference on Computer Vision, pp. 5199-5208, (2017)
  • [6] Cover T, Hart P., Nearest neighbor pattern classification, IEEE Transactions on Information Theory, 13, 1, pp. 21-27, (1967)
  • [7] Abdi H, Williams L J., Principal component analysis, WIREs Computational Statistics, 2, 4, pp. 433-459, (2010)
  • [8] Roweis S T, Saul L K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 5500, pp. 2323-2326, (2000)
  • [9] Chopra S, Hadsell R, LeCun Y., Learning a similarity metric discriminatively, with application to face verification, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 539-546, (2005)
  • [10] Gao Y, Chien S., Review on space robotics: Toward top-level science through space exploration[J], Science Robotics, 2, 7, (2017)