Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

被引:0
|
作者
Bakr, Eslam Mohamed [1 ]
Alsaedy, Yasmeen [1 ]
Elhoseiny, Mohamed [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The 3D visual grounding task has been explored with visual and language streams comprehending referential language to identify target objects in 3D scenes. However, most existing methods devote the visual stream to capturing the 3D visual clues using off-the-shelf point clouds encoders. The main question we address in this paper is "can we consolidate the 3D visual stream by 2D clues synthesized from point clouds and efficiently utilize them in training and testing?". The main idea is to assist the 3D encoder by incorporating rich 2D object representations without requiring extra 2D inputs. To this end, we leverage 2D clues, synthetically generated from 3D point clouds, and empirically show their aptitude to boost the quality of the learned visual representations. We validate our approach through comprehensive experiments on Nr3D, Sr3D, and ScanRefer datasets and show consistent performance gains compared to existing methods. Our proposed module, dubbed as Look Around and Refer (LAR), significantly outperforms the state-of-the-art 3D visual grounding techniques on three benchmarks, i.e., Nr3D, Sr3D, and ScanRefer. The code is available at https://eslambakr.github.io/LAR.github.io/.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Probabilistic integration of 2D and 3D cues for visual servoing
    Abdul Hafez, A. H.
    Jawahar, C. V.
    2006 9TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, VOLS 1- 5, 2006, : 75 - +
  • [22] Assessing 2D visual encoding of 3D spatial connectivity
    Baldi, Benedetta F.
    Vuong, Jenny
    O'Donoghue, Sean I.
    FRONTIERS IN BIOINFORMATICS, 2024, 3
  • [23] 3D VISUAL SPEECH ANIMATION USING 2D VIDEOS
    Algadhy, Rabab
    Gotoh, Yoshihiko
    Maddock, Steve
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2367 - 2371
  • [24] Illiteracy influences 2D but not 3D visual object naming
    Reis, A
    Castro-Caldas, A
    Ingvar, M
    Petersson, KM
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 211 - 212
  • [25] Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training
    Wu, Qiangqiang
    Xia, Yan
    Wan, Jia
    Chan, Antoni B.
    COMPUTER VISION - ECCV 2024, PT XII, 2025, 15070 : 270 - 288
  • [26] FloorUSG: Indoor floorplan reconstruction by unifying 2D semantics and 3D geometry
    Han, Jiali
    Liu, Yuzhou
    Rong, Mengqi
    Zheng, Xianwei
    Shen, Shuhan
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2023, 196 : 490 - 501
  • [27] 2D or not 2D That is the Question, but 3D is the, answer
    Cronin, Paul
    ACADEMIC RADIOLOGY, 2007, 14 (07) : 769 - 771
  • [28] 3D and 2D/3D holograms model
    A. A. Boriskevich
    V. K. Erohovets
    V. V. Tkachenko
    Optical Memory and Neural Networks, 2012, 21 (4) : 242 - 248
  • [29] An Algorithm to Generate Synthetic 3D Microstructures from 2D Exemplars
    Tristan N. Ashton
    Donna Post Guillen
    William H. Harris
    JOM, 2020, 72 : 65 - 74
  • [30] An Algorithm to Generate Synthetic 3D Microstructures from 2D Exemplars
    Ashton, Tristan N.
    Guillen, Donna Post
    Harris, William H.
    JOM, 2020, 72 (01) : 65 - 74