Monocular 3D Object Detection With Motion Feature Distillation

被引:2
|
作者
Hu, Henan [1 ,2 ]
Li, Muyu [3 ]
Zhu, Ming [1 ]
Gao, Wen [4 ]
Liu, Peiyu [5 ]
Chan, Kwok-Leung [6 ]
机构
[1] Chinese Acad Sci, Changchun Inst Opt Fine Mech & Phys, Changchun 130033, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Ctr Intelligent Multidimens Data Anal Ltd, Hong Kong, Peoples R China
[4] BYD Auto Ind Co Ltd, Shenzhen 518119, Peoples R China
[5] Shenyang Aircraft Design & Res Inst, Shenyang 110036, Liaoning, Peoples R China
[6] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
关键词
Three-dimensional displays; Object detection; Feature extraction; Estimation; Location awareness; Image resolution; Solid modeling; 3D object detection; bird's-eye-view (BEV); monocular depth estimation; motion feature; knowledge distillation; autonomous driving;
D O I
10.1109/ACCESS.2023.3300708
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of autonomous driving, environmental perception within a 360-degree field of view is extremely important. This can be achieved via the detection of three-dimensional (3D) objects in the surrounding scene with the inputs acquired by sensors such as LiDAR or RGB camera. The 3D perception generated is commonly represented as the bird's-eye-view (BEV) of the sensor. RGB camera has the advantages of low-cost and long-range acquisition. As the RGB images are two-dimensional (2D), the BEV generated from 2D images suffers from low accuracy due to limitations such as lack of temporal correlation. To address the problems, we propose a monocular 3D object detection method based on long short-term feature fusion and motion feature distillation. Long short-term temporal features are extracted with different feature map resolutions. The motion features and depth information are combined and encoded using an encoder based on the Transformer cross-correlation module, and further integrated into the BEV space of fused long short-term temporal features. Subsequently, a decoder with motion feature distillation is used to localize objects in 3D space. By combining BEV feature representations of different time steps, and supplemented with embedded motion features and depth information, our proposed method significantly improves the accuracy of monocular 3D object detection as demonstrated from experimental results obtained on nuScenes dataset. Our proposed method outperforms state-of-the-art methods, in particular the previous best art by 6.7% on mAP, and 8.3% on mATE.
引用
收藏
页码:82933 / 82945
页数:13
相关论文
共 50 条
  • [31] Objects are Different: Flexible Monocular 3D Object Detection
    Zhang, Yunpeng
    Lu, Jiwen
    Zhou, Jie
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3288 - 3297
  • [32] Monocular 3D object detection for construction scene analysis
    Shen, Jie
    Jiao, Lang
    Zhang, Cong
    Peng, Keran
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2024, 39 (09) : 1370 - 1389
  • [33] Delving into Localization Errors for Monocular 3D Object Detection
    Ma, Xinzhu
    Zhang, Yinmin
    Xu, Dan
    Zhou, Dongzhan
    Yi, Shuai
    Li, Haojie
    Ouyang, Wanli
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4719 - 4728
  • [34] Shape-Aware Monocular 3D Object Detection
    Chen, Wei
    Zhao, Jie
    Zhao, Wan-Lei
    Wu, Song-Yuan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (06) : 6416 - 6424
  • [35] Competition for roadside camera monocular 3D object detection
    Jinrang Jia
    Yifeng Shi
    Yuli Qu
    Rui Wang
    Xing Xu
    Hai Zhang
    NationalScienceReview, 2023, 10 (06) : 34 - 37
  • [36] MonoGRNet: A General Framework for Monocular 3D Object Detection
    Qin, Zengyi
    Wang, Jinglu
    Lu, Yan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5170 - 5184
  • [37] Temporal Feature Fusion for 3D Detection in Monocular Video
    Cheng, Haoran
    Peng, Liang
    Yang, Zheng
    Lin, Binbin
    He, Xiaofei
    Wu, Boxi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2665 - 2675
  • [38] M3DGAF: Monocular 3D Object Detection With Geometric Appearance Awareness and Feature Fusion
    Chen, Mu
    Liu, Pengfei
    Zhao, Huaici
    IEEE SENSORS JOURNAL, 2023, 23 (11) : 11232 - 11240
  • [39] Object-Aware Centroid Voting for Monocular 3D Object Detection
    Bao, Wentao
    Yu, Qi
    Kong, Yu
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 2197 - 2204
  • [40] FD3D: Exploiting Foreground Depth Map for Feature-Supervised Monocular 3D Object Detection
    Wu, Zizhang
    Gan, Yuanzhu
    Wu, Yunzhe
    Wang, Ruihao
    Wang, Xiaoquan
    Pu, Jian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6189 - 6197