LidarMultiNet: Towards a Unified Multi-Task Network for LiDAR Perception

被引:0
|
作者
Ye, Dongqiangzi [1 ]
Zhou, Zixiang [1 ,2 ]
Chen, Weijia [1 ]
Xie, Yufei [1 ]
Wang, Yu [1 ]
Wang, Panqu [1 ]
Foroosh, Hassan [2 ]
机构
[1] TuSimple, San Diego, CA 92122 USA
[2] Univ Cent Florida, Orlando, FL USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LiDAR-based 3D object detection, semantic segmentation, and panoptic segmentation are usually implemented in specialized networks with distinctive architectures that are difficult to adapt to each other. This paper presents LidarMultiNet, a LiDAR-based multi-task network that unifies these three major LiDAR perception tasks. Among its many benefits, a multi-task network can reduce the overall cost by sharing weights and computation among multiple tasks. However, it typically underperforms compared to independently combined single-task models. The proposed LidarMultiNet aims to bridge the performance gap between the multi-task network and multiple single-task networks. At the core of LidarMultiNet is a strong 3D voxel-based encoder-decoder architecture with a Global Context Pooling (GCP) module extracting global contextual features from a LiDAR frame. Task-specific heads are added on top of the network to perform the three LiDAR perception tasks. More tasks can be implemented simply by adding new task-specific heads while introducing little additional cost. A second stage is also proposed to refine the first-stage segmentation and generate accurate panoptic segmentation results. LidarMultiNet is extensively tested on both Waymo Open Dataset and nuScenes dataset, demonstrating for the first time that major LiDAR perception tasks can be unified in a single strong network that is trained end-to-end and achieves state-of-the-art performance. Notably, LidarMultiNet reaches the official 1st place in the Waymo Open Dataset 3D semantic segmentation challenge 2022 with the highest mIoU and the best accuracy for most of the 22 classes on the test set, using only LiDAR points as input. It also sets the new state-of-the-art for a single model on the Waymo 3D object detection benchmark and three nuScenes benchmarks.
引用
收藏
页码:3231 / 3240
页数:10
相关论文
共 50 条
  • [41] Multi-task convolution network for face alignment
    Sun, Yang
    Zhang, Xuan
    Li, Chongrong
    2ND ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI2017), 2017, 887
  • [42] A Multi-Task Music Artist Classification Network
    Panda, Swaroop
    Namboodiri, Vinay P.
    2020 4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NETWORKS (CINE 2020), 2020,
  • [43] Multi-Task Learning Based Network Embedding
    Wang, Shanfeng
    Wang, Qixiang
    Gong, Maoguo
    FRONTIERS IN NEUROSCIENCE, 2020, 13
  • [44] Multi-Task and Multi-Scene Unified Ranking Model for Online Advertising
    Tan, Shulong
    Li, Meifang
    Zhao, Weijie
    Zheng, Yandan
    Pei, Xin
    Li, Ping
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2046 - 2051
  • [45] Multi-task gradient descent for multi-task learning
    Bai, Lu
    Ong, Yew-Soon
    He, Tiantian
    Gupta, Abhishek
    MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
  • [46] Multi-task gradient descent for multi-task learning
    Lu Bai
    Yew-Soon Ong
    Tiantian He
    Abhishek Gupta
    Memetic Computing, 2020, 12 : 355 - 369
  • [47] OmniDet: Surround View Cameras Based Multi-Task Visual Perception Network for Autonomous Driving
    Kumar, Varun Ravi
    Yogamani, Senthil
    Rashed, Hazem
    Sitsu, Ganesh
    Witt, Christian
    Leang, Isabelle
    Milz, Stefan
    Maeder, Patrick
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 2830 - 2837
  • [48] Towards Mixture of Task-Intensive Experts for Multi-task Recommendation
    Cai, Xun
    Lu, Yuxiang
    Lu, Hongtao
    Ding, Yue
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT 3, 2025, 14852 : 323 - 332
  • [49] UniNet: A Unified Scene Understanding Network and Exploring Multi-Task Relationships through the Lens of Adversarial Attacks
    Gurulingan, Naresh Kumar
    Arani, Elahe
    Zonooz, Bahram
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2239 - 2248
  • [50] Towards multi-task learning of speech and speaker recognition
    Vaessen, Nik
    van Leeuwen, David A.
    INTERSPEECH 2023, 2023, : 4898 - 4902