Multi-View Deep Learning for Consistent Semantic Mapping with RGB-D Cameras

被引:0
|
作者
Ma, Lingni [1 ]
Stueckler, Joerg [2 ]
Kerl, Christian [1 ]
Cremers, Daniel [1 ]
机构
[1] Tech Univ Munich, Dept Comp Sci, Comp Vis Grp, Munich, Germany
[2] Rhein Westfal TH Aachen, Comp Vis Grp, Visual Comp Inst, Aachen, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual scene understanding is an important capability that enables robots to purposefully act in their environment. In this paper, we propose a novel deep neural network approach to predict semantic segmentation from RGB-D sequences. The key innovation is to train our network to predict multi-view consistent semantics in a self-supervised way. At test time, its semantics predictions can be fused more consistently in semantic keyframe maps than predictions of a network trained on individual views. We base our network architecture on a recent single-view deep learning approach to RGB and depth fusion for semantic object-class segmentation and enhance it with multi-scale loss minimization. We obtain the camera trajectory using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth annotated frames in order to enforce multi-view consistency during training. At test time, predictions from multiple views are fused into keyframes. We propose and analyze several methods for enforcing multi-view consistency during training and testing. We evaluate the benefit of multi-view consistency training and demonstrate that pooling of deep features and fusion over multiple views outperforms single-view baselines on the NYUDv2 benchmark for semantic segmentation. Our end-to-end trained network achieves state-of-the-art performance on the NYUDv2 dataset in single-view segmentation as well as multi-view semantic fusion.
引用
收藏
页码:598 / 605
页数:8
相关论文
共 50 条
  • [1] Auto-Calibration of Multi-view RGB-D Cameras for Virtual Mirrors
    Kim, Jong-Sung
    2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-ASIA (ICCE-ASIA), 2021,
  • [2] Temporally Consistent Semantic Segmentation using Spatially Aware Multi-view Semantic Fusion for Indoor RGB-D videos
    Sun, Fengyuan
    Karaoglu, Sezer
    Gevers, Theo
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4250 - 4259
  • [3] Multi-View Inpainting for RGB-D Sequence
    Li, Feiran
    Ricardez, Gustavo Alfonso Garcia
    Takamatsu, Jun
    Ogasawara, Tsukasa
    2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, : 464 - 473
  • [4] Online semantic mapping of logistic environments using RGB-D cameras
    Himstedt, Marian
    Maehle, Erik
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2017, 14 (04): : 1 - 13
  • [5] Patch Volumes: Segmentation-based Consistent Mapping with RGB-D Cameras
    Henry, Peter
    Fox, Dieter
    Bhowmik, Achintya
    Mongia, Rajiv
    2013 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2013), 2013, : 398 - 405
  • [6] 3D Semantic Scene Segmentation with Multi-View RGB-D Images in Indoor Environments
    Bae H.-L.
    Kim I.
    Journal of Institute of Control, Robotics and Systems, 2023, 29 (03) : 235 - 244
  • [7] An RGB-D multi-view perspective for autonomous agricultural robots
    Vulpi, Fabio
    Marani, Roberto
    Petitti, Antonio
    Reina, Giulio
    Milella, Annalisa
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 202
  • [8] Design and Implementation of a RANSAC RGB-D mapping Algorithm for Multi-View Point Cloud Registration
    Tsai, Chi-Yi
    Wang, Chuan-Wei
    Wang, Wei-Yi
    2013 CACS INTERNATIONAL AUTOMATIC CONTROL CONFERENCE (CACS), 2013, : 367 - 370
  • [9] 3D Background Modeling in Multi-view RGB-D Video
    Huang, Yung-Lin
    Wei, Ku-Chu
    Chien, Shao-Yi
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1051 - 1054
  • [10] A Self-Calibration Approach for Multi-View RGB-D Sensing
    Petitti, Antonio
    Vulpi, Fabio
    Marani, Roberto
    Milella, Annalisa
    MULTIMODAL SENSING AND ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS II, 2021, 11785