3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models

被引:5
|
作者
Yang, Haibo [1 ]
Chen, Yang [2 ]
Pan, Yingwei [2 ]
Yao, Ting [3 ]
Chen, Zhineng [1 ]
Mei, Tao [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] HiDream Ai Inc, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Text-driven 3D Stylization; Diffusion Model; Depth;
D O I
10.1145/3581783.3612363
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community. Recent advances of cross-modal foundation models (e.g., CLIP) have made this problem feasible. Those approaches commonly leverage CLIP to align the holistic semantics of stylized mesh with the given text prompt. Nevertheless, it is not trivial to enable more controllable stylization of fine-grained details in 3D meshes solely based on such semantic-level cross-modal supervision. In this work, we propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models. Technically, 3DStyle-Diffusion first parameterizes the texture of 3D mesh into reflectance properties and scene lighting using implicit MLP networks. Meanwhile, an accurate depth map of each sampled view is achieved conditioned on 3D mesh. Then, 3DStyle-Diffusion leverages a pretrained controllable 2D Diffusion model to guide the learning of rendered images, encouraging the synthesized image of each view semantically aligned with text prompt and geometrically consistent with depth map. This way elegantly integrates both image rendering via implicit MLP networks and diffusion process of image synthesis in an end-to-end fashion, enabling a high-quality fine-grained stylization of 3D meshes. We also build a new dataset derived from Objaverse and the evaluation protocol for this task. Through both qualitative and quantitative experiments, we validate the capability of our 3DStyle-Diffusion. Source code and data are available at https://github.com/yanghb22- fdu/3DStyle- Diffusion-Official.
引用
收藏
页码:6860 / 6868
页数:9
相关论文
共 50 条
  • [41] Effect of diffusion on nucleation of 2D and 3D nanoclusters in supersaturated solutions
    Korolev, D. N.
    Sorokin, M. V.
    Volkov, A. E.
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2008, 387 (11) : 2419 - 2426
  • [42] HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
    Karnewar, Animesh
    Vedaldi, Andrea
    Novotny, David
    Mitra, Niloy J.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18423 - 18433
  • [43] HAL3D: Hierarchical Active Learning for Fine-Grained 3D Part Labeling
    Yu, Fenggen
    Qian, Yiming
    Gil-Ureta, Francisca
    Jackson, Brian
    Bennett, Eric
    Zhang, Hao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 865 - 875
  • [44] Skybridge-3D-CMOS: A Fine-Grained 3D CMOS Integrated Circuit Technology
    Li, Mingyu
    Shi, Jiajun
    Rahman, Mostafizur
    Khasanvis, Santosh
    Bhat, Sachin
    Moritz, Csaba Andras
    IEEE TRANSACTIONS ON NANOTECHNOLOGY, 2017, 16 (04) : 639 - 652
  • [45] Learning Canonical 3D Object Representation for Fine-Grained Recognition
    Joung, Sunghun
    Kim, Seungryong
    Kim, Minsu
    Kim, Ig-Jae
    Sohn, Kwanghoon
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1015 - 1025
  • [46] Jointly Optimizing 3D Model Fitting and Fine-Grained Classification
    Lin, Yen-Liang
    Morariu, Vlad I.
    Hsu, Winston
    Davis, Larry S.
    COMPUTER VISION - ECCV 2014, PT IV, 2014, 8692 : 466 - 480
  • [47] Designer alloy enables 3D printing of fine-grained metals
    Amy J. Clarke
    Nature, 2019, 576 (7785) : 41 - 42
  • [48] A Refined 3D Pose Dataset for Fine-Grained Object Categories
    Wang, Yaming
    Tan, Xiao
    Yang, Yi
    Li, Ziyu
    Liu, Xiao
    Zhou, Feng
    Davis, Larry S.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2797 - 2806
  • [49] 3D-aware Image Generation using 2D Diffusion Models
    Xiang, Jianfeng
    Yang, Jiaolong
    Huang, Binbin
    Tong, Xin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2383 - 2393
  • [50] RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture
    Song, Liangchen
    Cao, Liangliang
    Xu, Hongyu
    Kang, Kai
    Tang, Feng
    Yuan, Junsong
    Yang, Zhao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6898 - 6906