Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

被引:0
|
作者
Bakr, Eslam Mohamed [1 ]
Alsaedy, Yasmeen [1 ]
Elhoseiny, Mohamed [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The 3D visual grounding task has been explored with visual and language streams comprehending referential language to identify target objects in 3D scenes. However, most existing methods devote the visual stream to capturing the 3D visual clues using off-the-shelf point clouds encoders. The main question we address in this paper is "can we consolidate the 3D visual stream by 2D clues synthesized from point clouds and efficiently utilize them in training and testing?". The main idea is to assist the 3D encoder by incorporating rich 2D object representations without requiring extra 2D inputs. To this end, we leverage 2D clues, synthetically generated from 3D point clouds, and empirically show their aptitude to boost the quality of the learned visual representations. We validate our approach through comprehensive experiments on Nr3D, Sr3D, and ScanRefer datasets and show consistent performance gains compared to existing methods. Our proposed module, dubbed as Look Around and Refer (LAR), significantly outperforms the state-of-the-art 3D visual grounding techniques on three benchmarks, i.e., Nr3D, Sr3D, and ScanRefer. The code is available at https://eslambakr.github.io/LAR.github.io/.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A Note on Visual Semantics in SAR 3D Imaging
    Hu Z.
    Journal of Radars, 2022, 11 (01) : 20 - 26
  • [32] 21/2D or 3D?
    Roth, S
    Küster, B
    Sura, H
    KUNSTSTOFFE-PLAST EUROPE, 2004, 94 (07): : 65 - 67
  • [33] 2D and 3D on demand
    Philippi, Anne
    F & M; Feinwerktechnik, Mikrotechnik, Messtechnik, 1998, 106 (06): : 412 - 414
  • [34] Accuracy of Synthetic 2D Mammography Compared With Conventional 2D Digital Mammography Obtained With 3D Tomosynthesis
    Simon, Katherine
    Dodelzon, Katerina
    Drotman, Michele
    Levy, Allison
    Arleo, Elizabeth Kagan
    Askin, Gulce
    Katzen, Janine
    AMERICAN JOURNAL OF ROENTGENOLOGY, 2019, 212 (06) : 1406 - 1411
  • [35] From 2D to 3D
    Steven De Feyter
    Nature Chemistry, 2011, 3 (1) : 14 - 15
  • [36] Combining 2D and 3D Visualization with Visual Analytics in the Environmental Domain
    Vuckovic, Milena
    Schmidt, Johanna
    Ortner, Thomas
    Cornel, Daniel
    INFORMATION, 2022, 13 (01)
  • [37] Visual storytelling in 2D and stereoscopic 3D video: effect of blur on visual attention
    Quan Huynh-Thu
    Vienne, Cyril
    Blonde, Laurent
    HUMAN VISION AND ELECTRONIC IMAGING XVIII, 2013, 8651
  • [38] 2D–3D synchronous/asynchronous camera fusion for visual odometry
    Danda Pani Paudel
    Cédric Demonceaux
    Adlane Habed
    Pascal Vasseur
    Autonomous Robots, 2019, 43 : 21 - 35
  • [39] Hybrid 2D and 3D Visual Analytics of Network Simulation Data
    Su, Simon
    Perry, Vincent
    Dasari, Venkat
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 3992 - 3999
  • [40] Similarities and differences in the visual psychology of 3D animation and 2D animation
    Zhang, Yang
    PSYCHOLOGICAL REPORTS, 2024, 127 : 305 - 306