T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance

被引:0
|
作者
Nie, Weizhi [1 ]
Chen, Ruidong [1 ]
Wang, Weijie [2 ]
Lepri, Bruno [3 ]
Sebe, Nicu [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300384, Peoples R China
[2] Univ Trento, Dept Informat Engn & Comp Sci, I-38122 Trento, Italy
[3] Fdn Bruno Kessler, I-38122 Trento, Italy
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Solid modeling; Shape; Data models; Knowledge graphs; Legged locomotion; Natural languages; 3D model generation; causal model inference; cross-modal representation; knowledge graph; natural language;
D O I
10.1109/TPAMI.2024.3463753
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, 3D models have been utilized in many applications, such as auto-drivers, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the creative mechanisms of human imagination, which concretely supplement the target model from ambiguous descriptions built upon human experiential knowledge, we propose a novel text-3D generation model (T2TD). T2TD aims to generate the target model based on the textual description with the aid of experiential knowledge. Its target creation process simulates the imaginative mechanisms of human beings. In this process, we first introduce the text-3D knowledge graph to preserve the relationship between 3D models and textual semantic information, which provides related shapes like humans' experiential information. Second, we propose an effective causal inference model to select useful feature information from these related shapes, which can remove the unrelated structure information and only retain solely the feature information strongly related to the textual description. Third, we adopt a novel multi-layer transformer structure to progressively fuse this strongly related structure information and textual information, compensating for the lack of structural information, and enhancing the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.
引用
收藏
页码:172 / 189
页数:18
相关论文
共 50 条
  • [41] 3D Facial Expression Recognition Based on Multi-View and Prior Knowledge Fusion
    Quang Nhat Vo
    Khanh Tran
    Zhao, Guoying
    2019 IEEE 21ST INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2019), 2019,
  • [42] Segmentation of fetal 3d ultrasound based on statistical prior and deformable model
    Anquez, Jeremie
    Angelini, Elsa D.
    Bloch, Isabelle
    2008 IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING: FROM NANO TO MACRO, VOLS 1-4, 2008, : 17 - 20
  • [43] DreamScene: 3D Gaussian-Based Text-to-3D Scene Generation via Formation Pattern Sampling
    Li, Haoran
    Shi, Haolin
    Zhang, Wenli
    Wu, Wenjun
    Liao, Yong
    Wang, Lin
    Lee, Lik-Hang
    Zhou, Peng Yuan
    COMPUTER VISION - ECCV 2024, PT LXXIV, 2025, 15132 : 214 - 230
  • [44] Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
    Cha, Junuk
    Kim, Jihyeon
    Yoon, Jae Shin
    Baek, Seungryul
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 1577 - 1585
  • [45] A New Deep Wavefront Based Model for Text Localization in 3D Video
    Nandanwar, Lokesh
    Shivakumara, Palaiahnakote
    Ramachandra, Raghavendra
    Lu, Tong
    Pal, Umapada
    Antonacopoulos, Apostolos
    Lu, Yue
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3375 - 3389
  • [46] Reconstruction of 3-D geometry using 2-D profiles and a geometric prior model
    Lötjönen, J
    Magnin, IE
    Nenonen, J
    Katila, T
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 1999, 18 (10) : 992 - 1002
  • [47] Multi-modal fusion network guided by prior knowledge for 3D CAD model recognition
    Li, Qiang
    Xu, Zibo
    Bai, Shaojin
    Nie, Weizhi
    Liu, Anan
    NEUROCOMPUTING, 2024, 590
  • [48] An improved 3D location algorithm model research based on directional TD-MUSIC
    Zhang Keyi
    Zhao Ping
    Yao Hongfei
    Liu Jie
    Shi Wenzhe
    Wang Wenhao
    Wang Zhenjin
    Li Bo
    2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY), 2015, : 581 - 585
  • [49] Debiasing Scores and Prompts of 2D Diffusion for View-consistent Text-to-3D Generation
    Hong, Susung
    Ahn, Donghoon
    Kim, Seungryong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Debiasing Scores and Prompts of 2D Diffusion for View-consistent Text-to-3D Generation
    Hong, Susung
    Ahn, Donghoon
    Kim, Seungryong
    Advances in Neural Information Processing Systems, 2023, 36