T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance

被引:0
|
作者
Nie, Weizhi [1 ]
Chen, Ruidong [1 ]
Wang, Weijie [2 ]
Lepri, Bruno [3 ]
Sebe, Nicu [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300384, Peoples R China
[2] Univ Trento, Dept Informat Engn & Comp Sci, I-38122 Trento, Italy
[3] Fdn Bruno Kessler, I-38122 Trento, Italy
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Solid modeling; Shape; Data models; Knowledge graphs; Legged locomotion; Natural languages; 3D model generation; causal model inference; cross-modal representation; knowledge graph; natural language;
D O I
10.1109/TPAMI.2024.3463753
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, 3D models have been utilized in many applications, such as auto-drivers, 3D reconstruction, VR, and AR. However, the scarcity of 3D model data does not meet its practical demands. Thus, generating high-quality 3D models efficiently from textual descriptions is a promising but challenging way to solve this problem. In this paper, inspired by the creative mechanisms of human imagination, which concretely supplement the target model from ambiguous descriptions built upon human experiential knowledge, we propose a novel text-3D generation model (T2TD). T2TD aims to generate the target model based on the textual description with the aid of experiential knowledge. Its target creation process simulates the imaginative mechanisms of human beings. In this process, we first introduce the text-3D knowledge graph to preserve the relationship between 3D models and textual semantic information, which provides related shapes like humans' experiential information. Second, we propose an effective causal inference model to select useful feature information from these related shapes, which can remove the unrelated structure information and only retain solely the feature information strongly related to the textual description. Third, we adopt a novel multi-layer transformer structure to progressively fuse this strongly related structure information and textual information, compensating for the lack of structural information, and enhancing the final performance of the 3D generation model. The final experimental results demonstrate that our approach significantly improves 3D model generation quality and outperforms the SOTA methods on the text2shape datasets.
引用
收藏
页码:172 / 189
页数:18
相关论文
共 50 条
  • [1] DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
    Huang, Tianyu
    Zeng, Yihan
    Zhang, Zhilu
    Xu, Wan
    Xu, Hang
    Xu, Songcen
    Lau, Rynson W.H.
    Zuo, Wangmeng
    arXiv, 2023,
  • [2] DreamView: Injecting View-Specific Text Guidance Into Text-to-3D Generation
    Yan, Junkai
    Gao, Yipeng
    Yang, Qize
    Wei, Xihan
    Xie, Xuansong
    Wu, Ancong
    Zheng, Wei-Shi
    COMPUTER VISION - ECCV 2024, PT XXV, 2025, 15083 : 358 - 374
  • [3] DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance
    Zhang, Longwen
    Qiu, Qiwei
    Lin, Hongyang
    Zhang, Qixuan
    Shi, Cheng
    Yang, Wei
    Shi, Ye
    Yang, Sibei
    Xu, Lan
    Yu, Jingyi
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04):
  • [4] Weakly-Supervised 3D Hand Reconstruction with Knowledge Prior and Uncertainty Guidance
    Zhang, Yufei
    Kephart, Jeffrey O.
    Ji, Qiang
    COMPUTER VISION - ECCV 2024, PT LXXVIII, 2025, 15136 : 106 - 125
  • [5] HIGH-FIDELITY 3D MODEL GENERATION WITH RELIGHTABLE APPEARANCE FROM SINGLE FREEHAND SKETCHES AND TEXT GUIDANCE
    Chen, Tianrun
    Cao, Runlong
    Lu, Ankang
    Xu, Tao
    Zhang, Xiaoling
    Papa, Mao
    Zhang, Ming
    Sun, Lingyun
    Zhang, Ying
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [6] Multi-layer Convolutional Neural Network Model Based on Prior Knowledge of Knowledge Graph for Text Classification
    Meng, Yining
    Wang, Guoyin
    Liu, Qun
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2019, : 618 - 624
  • [7] Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
    Chen, Cheng
    Yan, Xiaofeng
    Yang, Fan
    Feng, Chengzeng
    Fu, Zhoujie
    Foo, Chuan-Sheng
    Lin, Guosheng
    Liu, Fayao
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10228 - 10237
  • [8] GenDeck: Towards a HoloDeck with Text-to-3D Model Generation
    Weid, Manuel
    Khezrian, Navid
    Mana, Aparna Pindali
    Farzinnejad, Forouzan
    Grubert, Jens
    2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 1188 - 1189
  • [9] Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
    Wu, Zike
    Zhou, Pan
    Yi, Xuanyu
    Yuan, Xiaoding
    Zhang, Hanwang
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 9892 - 9902
  • [10] RiFeGAN2: Rich Feature Generation for Text-to-Image Synthesis From Constrained Prior Knowledge
    Cheng, Jun
    Wu, Fuxiang
    Tian, Yanling
    Wang, Lei
    Tao, Dapeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5187 - 5200