Portrait3D: Text-Guided High-Quality 3D Portrait Generation Using Pyramid Representation and GANs Prior

被引:1
|
作者
Wu, Yiqian [1 ]
Xu, Hao [1 ]
Tang, Xiangjun [1 ]
Chen, Xien [2 ]
Tang, Siyu [3 ]
Zhang, Zhebin [4 ]
Li, Chen [4 ]
Jin, Xiaogang [1 ]
机构
[1] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou, Peoples R China
[2] Yale Univ, New Haven, CT USA
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] OPPO US Res Ctr, Menlo Pk, CA USA
来源
ACM TRANSACTIONS ON GRAPHICS | 2024年 / 43卷 / 04期
基金
中国国家自然科学基金;
关键词
3D portrait generation; 3D-aware GANs; diffusion models;
D O I
10.1145/3658162
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Existing neural rendering-based text-to-3D-portrait generation methods typically make use of human geometry prior and diffusion models to obtain guidance. However, relying solely on geometry information introduces issues such as the Janus problem, over-saturation, and over-smoothing. We present Portrait3D, a novel neural rendering-based framework with a novel joint geometry-appearance prior to achieve text-to-3D-portrait generation that overcomes the aforementioned issues. To accomplish this, we train a 3D portrait generator, 3DPortraitGAN(sic), as a robust prior. This generator is capable of producing 360 degrees. canonical 3D portraits, serving as a starting point for the subsequent diffusion-based generation process. To mitigate the "grid-like" artifact caused by the high-frequency information in the featuremap-based 3D representation commonly used by most 3D-aware GANs, we integrate a novel pyramid tri-grid 3D representation into 3DPortraitGAN(sic). To generate 3D portraits from text, we first project a randomly generated image aligned with the given prompt into the pre-trained 3DPortraitGAN(sic) 's latent space. The resulting latent code is then used to synthesize a pyramid tri-grid. Beginning with the obtained pyramid tri-grid, we use score distillation sampling to distill the diffusion model's knowledge into the pyramid tri-grid. Following that, we utilize the diffusion model to refine the rendered images of the 3D portrait and then use these refined images as training data to further optimize the pyramid tri-grid, effectively eliminating issues with unrealistic color and unnatural artifacts. Our experimental results show that Portrait3D can produce realistic, high-quality, and canonical 3D portraits that align with the prompt.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks
    Chen, Zhuo
    Xu, Xudong
    Yan, Yichao
    Pan, Ye
    Zhu, Wenhan
    Wu, Wayne
    Dai, Bo
    Yang, Xiaokang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9997 - 10010
  • [2] DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
    Lei, Biwen
    Yu, Kai
    Feng, Mengyang
    Cui, Miaomiao
    Xie, Xuansong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10487 - 10497
  • [3] Towards Implicit Text-Guided 3D Shape Generation
    Liu, Zhengzhe
    Wang, Yi
    Qi, Xiaojuan
    Fu, Chi-Wing
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17875 - 17885
  • [4] WordRobe: Text-Guided Generation of Textured 3D Garments
    Srivastava, Astitva
    Manu, Pranav
    Raj, Amit
    Jampani, Varun
    Sharma, Avinash
    COMPUTER VISION-ECCV 2024, PT I, 2025, 15059 : 458 - 475
  • [5] EXIM: A Hybrid Explicit-Implicit Representation for Text-Guided 3D Shape Generation
    Liu, Zhengzhe
    Hu, Jingyu
    Hui, Ka-Hei
    Qi, Xiaojuan
    Cohen-Or, Daniel
    Fu, Chi-Wing
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (06):
  • [6] TECA: Text-Guided Generation and Editing of Compositional 3D Avatars
    Zhang, Hao
    Feng, Yao
    Kulits, Peter
    Wen, Yandong
    Thies, Justus
    Black, Michael J.
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1520 - 1530
  • [7] DREAMCRAFT: Text-Guided Generation of Functional 3D Environments in Minecraft
    Earle, Sam
    Kokkinos, Filippos
    Nie, Yuhe
    Togelius, Julian
    Raileanu, Roberta
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF DIGITAL GAMES, FDG 2024, 2024,
  • [8] Text-guided 3D Human Generation from 2D Collections
    Fu, Tsu-Jui
    Xiong, Wenhan
    Nie, Yixin
    Liu, Jingyu
    Oguz, Barlas
    Wang, William Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4508 - 4520
  • [9] Advances in text-guided 3D editing: a survey
    Lu, Lihua
    Li, Ruyang
    Zhang, Xiaohui
    Wei, Hui
    Du, Guoguang
    Wang, Binqiang
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (12)
  • [10] TEXTure: Text-Guided Texturing of 3D Shapes
    Richardson, Elad
    Metzer, Gal
    Alaluf, Yuval
    Giryes, Raja
    Cohen-Or, Daniel
    PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,