T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance

被引:0
|
作者
Nie, Weizhi [1 ]
Chen, Ruidong [1 ]
Wang, Weijie [2 ]
Lepri, Bruno [3 ]
Sebe, Nicu [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300384, Peoples R China
[2] Univ Trento, Dept Informat Engn & Comp Sci, I-38122 Trento, Italy
[3] Fdn Bruno Kessler, I-38122 Trento, Italy
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Solid modeling; Shape; Data models; Knowledge graphs; Legged locomotion; Natural languages; 3D model generation; causal model inference; cross-modal representation; knowledge graph; natural language;
D O I
10.1109/TPAMI.2024.3463753
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, 3D models have been utilized in many applications, such as auto-drivers, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the creative mechanisms of human imagination, which concretely supplement the target model from ambiguous descriptions built upon human experiential knowledge, we propose a novel text-3D generation model (T2TD). T2TD aims to generate the target model based on the textual description with the aid of experiential knowledge. Its target creation process simulates the imaginative mechanisms of human beings. In this process, we first introduce the text-3D knowledge graph to preserve the relationship between 3D models and textual semantic information, which provides related shapes like humans' experiential information. Second, we propose an effective causal inference model to select useful feature information from these related shapes, which can remove the unrelated structure information and only retain solely the feature information strongly related to the textual description. Third, we adopt a novel multi-layer transformer structure to progressively fuse this strongly related structure information and textual information, compensating for the lack of structural information, and enhancing the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.
引用
收藏
页码:172 / 189
页数:18
相关论文
共 50 条
  • [21] 3-D Line Matching Network Based on Matching Existence Guidance and Knowledge Distillation
    Tang, Jie
    Liu, Yong
    Yu, Bo
    Liu, Xue
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (20): : 33418 - 33438
  • [22] INSTANT3D: FAST TEXT-TO-3D WITH SPARSE-VIEW GENERATION AND LARGE RECONSTRUCTION MODEL
    Li, Jiahao
    Tan, Hao
    Zhang, Kai
    Xu, Zexiang
    Luan, Fujun
    Xu, Yinghao
    Hong, Yicong
    Sunkavalli, Kalyan
    Shakhnarovich, Greg
    Bi, Sai
    arXiv, 2023,
  • [23] Text-guided 3D Human Generation from 2D Collections
    Fu, Tsu-Jui
    Xiong, Wenhan
    Nie, Yixin
    Liu, Jingyu
    Oguz, Barlas
    Wang, William Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4508 - 4520
  • [24] Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior
    Wu, Yiqian
    Xu, Hao
    Tang, Xiangjun
    Chen, Xien
    Tang, Siyu
    Zhang, Zhebin
    Li, Chen
    Jin, Xiaogang
    ACM TRANSACTIONS ON GRAPHICS, 2024, 43 (04):
  • [25] 3D Model Retrieval Based on a 3D Shape Knowledge Graph
    Nie, Weizhi
    Wang, Ya
    Song, Dan
    Li, Wenhui
    IEEE Access, 2020, 8 : 142632 - 142641
  • [26] 3D Model Retrieval Based on a 3D Shape Knowledge Graph
    Nie, Weizhi
    Wang, Ya
    Song, Dan
    Li, Wenhui
    IEEE ACCESS, 2020, 8 : 142632 - 142641
  • [27] 3D Trajectory Reconstruction From Monocular Vision Based on Prior Spatial Knowledge
    Liu, Changjiang
    Zhang, Yi
    IEEE SENSORS JOURNAL, 2016, 16 (03) : 817 - 822
  • [28] Ensemble-NQG-T5: Ensemble Neural Question Generation Model Based on Text-to-Text Transfer Transformer
    Hwang, Myeong-Ha
    Shin, Jikang
    Seo, Hojin
    Im, Jeong-Seon
    Cho, Hee
    Lee, Chun-Kwon
    APPLIED SCIENCES-BASEL, 2023, 13 (02):
  • [29] Text2NeRF: Text-Driven 3D Scene Generation With Neural Radiance Fields
    Zhang, Jingbo
    Li, Xiaoyu
    Wan, Ziyu
    Wang, Can
    Liao, Jing
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (12) : 7749 - 7762
  • [30] Does Learners' Prior Knowledge Moderate the Detrimental Effects of Seductive Details in Reading from Text? A 2 by 3 Study
    Wang, Zhe
    Adesope, Olusola
    INTERNATIONAL JOURNAL OF INSTRUCTION, 2016, 9 (02) : 35 - 50