Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

被引:705
|
作者
Feng, Di [1 ,2 ]
Haase-Schutz, Christian [3 ,4 ]
Rosenbaum, Lars [1 ]
Hertlein, Heinz [3 ]
Glaser, Claudius [1 ]
Timm, Fabian [1 ]
Wiesbeck, Werner [4 ]
Dietmayer, Klaus [2 ]
机构
[1] Robert Bosch GmbH, Corp Res, Driver Assistance Syst & Automated Driving, D-71272 Renningen, Germany
[2] Ulm Univ, Inst Measurement Control & Microtechnol, D-89081 Ulm, Germany
[3] Robert Bosch GmbH, Chassis Syst Control, Engn Cognit Syst, Automated Driving, D-74232 Abstatt, Germany
[4] Karlsruhe Inst Technol, Inst Radio Frequency Engn & Elect, D-76131 Karlsruhe, Germany
关键词
Multi-modality; object detection; semantic segmentation; deep learning; autonomous driving; NEURAL-NETWORKS; ROAD; FUSION; LIDAR; ENVIRONMENTS; SET;
D O I
10.1109/TITS.2020.2972974
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of "what to fuse", "when to fuse", and "how to fuse" remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/.
引用
收藏
页码:1341 / 1360
页数:20
相关论文
共 50 条
  • [31] MultiNet: Multi-Modal Multi-Task Learning for Autonomous Driving
    Chowdhuri, Sauhaarda
    Pankaj, Tushar
    Zipser, Karl
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1496 - 1504
  • [32] Multi-modal Stance Detection: New Datasets and Model
    Liang, Bin
    Li, Ang
    Zhao, Jingqian
    Gui, Lin
    Yang, Min
    Yu, Yue
    Wong, Kam-Fai
    Xu, Ruifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 12373 - 12387
  • [33] A Multi-Modal System for Road Detection and Segmentation
    Hu, Xiao
    Rodriguez F, Sergio A.
    Gepperth, Alexander
    2014 IEEE INTELLIGENT VEHICLES SYMPOSIUM PROCEEDINGS, 2014, : 1365 - 1370
  • [34] Multi-modal Queried Object Detection in the Wild
    Xu, Yifan
    Zhang, Mengdan
    Fu, Chaoyou
    Chen, Peixian
    Yang, Xiaoshan
    Li, Ke
    Xu, Changsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] DiPA: Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving
    Knittel, Anthony
    Hawasly, Majd
    Albrecht, Stefano V.
    Redford, John
    Ramamoorthy, Subramanian
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (08) : 4887 - 4894
  • [36] Multi-modal deep feature learning for RGB-D object detection
    Xu, Xiangyang
    Li, Yuncheng
    Wu, Gangshan
    Luo, Jiebo
    PATTERN RECOGNITION, 2017, 72 : 300 - 313
  • [37] Deep learning based object detection from multi-modal sensors: an overview
    Ye Liu
    Shiyang Meng
    Hongzhang Wang
    Jun Liu
    Multimedia Tools and Applications, 2024, 83 : 19841 - 19870
  • [38] Deep learning based object detection from multi-modal sensors: an overview
    Liu, Ye
    Meng, Shiyang
    Wang, Hongzhang
    Liu, Jun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19841 - 19870
  • [39] Towards Autonomous Driving: a Multi-Modal 360° Perception Proposal
    Beltran, Jorge
    Guindel, Carlos
    Cortes, Irene
    Barrera, Alejandro
    Astudillo, Armando
    Urdiale, Jesus
    Alvarez, Mario
    Bekka, Farid
    Milanes, Vicente
    Garcia, Fernando
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [40] Virtual Multi-modal Object Detection and Classification with Deep Convolutional Neural Networks
    Mitsakos, Nikolaos
    Papadakis, Manos
    WAVELETS AND SPARSITY XVIII, 2019, 11138