4D-Former: Multimodal 4D Panoptic Segmentation

被引:0
|
作者
Athar, Ali [1 ,3 ]
Li, Enxu [1 ,2 ]
Casas, Sergio [1 ,2 ]
Urtasun, Raquel [1 ,2 ]
机构
[1] Waabi, Toronto, ON, Canada
[2] Univ Toronto, Toronto, ON M5S 1A1, Canada
[3] Rhein Westfal TH Aachen, Aachen, Germany
来源
关键词
Panoptic Segmentation; Sensor Fusion; Temporal Reasoning; Autonomous Driving;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
4D panoptic segmentation is a challenging but practically useful task that requires every point in a LiDAR point-cloud sequence to be assigned a semantic class label, and individual objects to be segmented and tracked over time. Existing approaches utilize only LiDAR inputs which convey limited information in regions with point sparsity. This problem can, however, be mitigated by utilizing RGB camera images which offer appearance-based information that can reinforce the geometry-based LiDAR features. Motivated by this, we propose 4D-Former: a novel method for 4D panoptic segmentation which leverages both LiDAR and image modalities, and predicts semantic masks as well as temporally consistent object masks for the input point-cloud sequence. We encode semantic classes and objects using a set of concise queries which absorb feature information from both data modalities. Additionally, we propose a learned mechanism to associate object tracks over time which reasons over both appearance and spatial location. We apply 4D-Former to the nuScenes and SemanticKITTI datasets where it achieves state-of-the-art results. For more information, visit the project website: https://waabi.ai/4dformer.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] 4D Panoptic LiDAR Segmentation
    Ayguen, Mehmet
    Osep, Aljosa
    Weber, Mark
    Maximov, Maxim
    Stachniss, Cyrill
    Behley, Jens
    Leal-Taixe, Laura
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5523 - 5533
  • [2] 4D Panoptic Segmentation as Invariant and Equivariant Field Prediction
    Zhu, Minghan
    Han, Shizhong
    Cai, Hong
    Borse, Shubhankar
    Ghaffari, Maani
    Porikli, Fatih
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22431 - 22441
  • [3] Unified 3D and 4D Panoptic Segmentation via Dynamic Shifting Networks
    Hong, Fangzhou
    Kong, Lingdong
    Zhou, Hui
    Zhu, Xinge
    Li, Hongsheng
    Liu, Ziwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3480 - 3495
  • [4] 4D Panoptic Scene Graph Generation
    Yang, Jingkang
    Cen, Jun
    Peng, Wenxuan
    Liu, Shuai
    Hong, Fangzhou
    Li, Xiangtai
    Zhou, Kaiyang
    Chen, Qifeng
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36, NEURIPS 2023, 2023,
  • [5] Mask4D: End-to-End Mask-Based 4D Panoptic Segmentation for LiDAR Sequences
    Marcuzzi, Rodrigo
    Nunes, Lucas
    Wiesmann, Louis
    Marks, Elias
    Behley, Jens
    Stachniss, Cyrill
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11): : 7487 - 7494
  • [6] Contrastive Instance Association for 4D Panoptic Segmentation Using Sequences of 3D LiDAR Scans
    Marcuzzi, Rodrigo
    Nunes, Lucas
    Wiesmann, Louis
    Vizzo, Ignacio
    Behley, Jens
    Stachniss, Cyrill
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02): : 1550 - 1557
  • [7] Development of an anthropomorphic 4D Phantom for multimodal imaging, 4D radiation and SGRT
    Moghaddam, Anahita Bakhtiari
    Runz, Armin
    Haering, Peter
    Seeber, Steffen
    Moeller, Michael
    Echner, Gernot
    RADIOTHERAPY AND ONCOLOGY, 2023, 182 : S2083 - S2084
  • [8] 4D, or not 4D: that is the question
    Campbell, S
    ULTRASOUND IN OBSTETRICS & GYNECOLOGY, 2002, 19 (01) : 1 - 4
  • [9] 4D deformable models with temporal constraints: application to 4D cardiac image segmentation
    Montagnat, J
    Delingette, H
    MEDICAL IMAGE ANALYSIS, 2005, 9 (01) : 87 - 100
  • [10] 4D VESSEL SEGMENTATION AND TRACKING IN ULTRASOUND
    Patwardhan, Kedar A.
    Yu, Yongjian
    Gupta, Sandeep
    Dentinger, Aaron
    Mills, David
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 2317 - 2320