MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

被引:38
|
作者
Li, Jiale [1 ]
Dai, Hang [2 ]
Han, Hao [3 ]
Ding, Yong [3 ]
机构
[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou, Peoples R China
[2] Univ Glasgow, Sch Comp Sci, Glasgow, Lanark, Scotland
[3] Zhejiang Univ, Sch Micronano Elect, Hangzhou, Peoples R China
关键词
REPRESENTATION;
D O I
10.1109/CVPR52729.2023.02078
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LiDAR and camera are two modalities available for 3D semantic segmentation in autonomous driving. The popular LiDAR-only methods severely suffer from inferior segmentation on small and distant objects due to insufficient laser points, while the robust multi-modal solution is under-explored, where we investigate three crucial inherent difficulties: modality heterogeneity, limited sensor field of view intersection, and multi-modal data augmentation. We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion to mitigate the modality heterogeneity. The multi-modal fusion in MSeg3D consists of geometry-based feature fusion GF-Phase, cross-modal feature completion, and semantic-based feature fusion SF-Phase on all visible points. The multi-modal data augmentation is reinvigorated by applying asymmetric transformations on LiDAR point cloud and multi-camera images individually, which benefits the model training with diversified augmentation transformations. MSeg3D achieves state-of-the-art results on nuScenes, Waymo, and SemanticKITTI datasets. Under the malfunctioning multi-camera input and the multi-frame point clouds input, MSeg3D still shows robustness and improves the LiDARonly baseline. Our code is publicly available at https://github.com/jialeli1/lidarseg3d.
引用
收藏
页码:21694 / 21704
页数:11
相关论文
共 50 条
  • [21] RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving
    Ando, Angelika
    Gidaris, Spyros
    Bursuc, Andrei
    Puy, Gilles
    Boulch, Alexandre
    Marlet, Renaud
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5240 - 5250
  • [22] MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation
    Shin, Inkyu
    Tsai, Yi-Hsuan
    Zhuang, Bingbing
    Schulter, Samuel
    Liu, Buyu
    Garg, Sparsh
    Kweon, In So
    Yoon, Kuk-Jin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16907 - 16916
  • [23] Multi-Modal and Multi-Scale Fusion 3D Object Detection of 4D Radar and LiDAR for Autonomous Driving
    Wang, Li
    Zhang, Xinyu
    Li, Jun
    Xv, Baowei
    Fu, Rong
    Chen, Haifeng
    Yang, Lei
    Jin, Dafeng
    Zhao, Lijun
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (05) : 5628 - 5641
  • [24] Adversarial Cross-modal Domain Adaptation for Multi-modal Semantic Segmentation in Autonomous Driving
    Shi, Mengqi
    Cao, Haozhi
    Xie, Lihua
    Yang, Jianfei
    2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 850 - 855
  • [25] Semantic 3D Grid Maps for Autonomous Driving
    Khoche, Ajinkya
    Wozniak, Maciej K.
    Duberg, Daniel
    Jensfelt, Patric
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 2681 - 2688
  • [26] 3D Multiple Object Tracking with Multi-modal Fusion of Low-cost Sensors for Autonomous Driving
    Zhou, Taohua
    Jiang, Kun
    Wang, Sijia
    Shi, Yining
    Yang, Mengmeng
    Ren, Weining
    Yang, Diange
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 1750 - 1757
  • [27] Multi-modal-fusion-based 3D semantic segmentation algorithm
    Chao Q.
    Zhao Y.
    Liu S.
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2024, 53 (05):
  • [28] Multi-Modal Segmentation of 3D Brain Scans Using Neural Networks
    Zopes, Jonathan
    Platscher, Moritz
    Paganucci, Silvio
    Federau, Christian
    FRONTIERS IN NEUROLOGY, 2021, 12
  • [29] HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for Autonomous Driving
    Zanfir, Andrei
    Zanfir, Mihai
    Gorban, Alexander
    Ji, Jingwei
    Zhou, Yin
    Anguelov, Dragomir
    Sminchisescu, Cristian
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1114 - 1124
  • [30] Real-Time Semantic Segmentation of 3D Point Cloud for Autonomous Driving
    Kang, Dongwan
    Wong, Anthony
    Lee, Banghyon
    Kim, Jungha
    ELECTRONICS, 2021, 10 (16)