CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language

被引:13
|
作者
Sanghi, Aditya [1 ]
Fu, Rao [2 ]
Liu, Vivian [3 ]
Willis, Karl D. D. [1 ]
Shayani, Hooman [1 ]
Khasahmadi, Amir H.
Sridhar, Srinath [2 ]
Ritchie, Daniel [2 ]
机构
[1] Autodesk Res, San Francisco, CA 94105 USA
[2] Brown Univ, Providence, RI USA
[3] Columbia Univ, New York, NY USA
关键词
D O I
10.1109/CVPR52729.2023.01759
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent works have demonstrated that natural language can be used to generate and edit 3D shapes. However, these methods generate shapes with limited fidelity and diversity. We introduce CLIP-Sculptor, a method to address these constraints by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs during training. CLIP-Sculptor achieves this in a multi-resolution approach that first generates in a low-dimensional latent space and then upscales to a higher resolution for improved shape fidelity. For improved shape diversity, we use a discrete latent space which is modeled using a transformer conditioned on CLIP's image-text embedding space. We also present a novel variant of classifier-free guidance, which improves the accuracy-diversity trade-off. Finally, we perform extensive experiments demonstrating that CLIP-Sculptor outperforms state-of-the-art baselines.
引用
收藏
页码:18339 / 18348
页数:10
相关论文
共 10 条
  • [1] TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
    Liu, Yufei
    Zhu, Junwei
    Tang, Junshu
    Zhang, Shijie
    Zhang, Jiangning
    Cao, Weijian
    Wang, Chengjie
    Wu, Yunsheng
    Huang, Dongjin
    COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 184 - 202
  • [2] Zero-Shot Grounding of Objects from Natural Language Queries
    Sadhu, Arka
    Chen, Kan
    Nevatia, Ram
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4693 - 4702
  • [3] Zero-shot Learning of Classifiers from Natural Language Quantification
    Srivastava, Shashank
    Labutov, Igor
    Mitchell, Tom
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 306 - 316
  • [4] High-fidelity 3D Face Generation from Natural Language Descriptions
    Wu, Menghua
    Zhu, Hao
    Huang, Linjia
    Zhuang, Yiyu
    Lu, Yuanxun
    Cao, Xun
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4521 - 4530
  • [5] ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
    Yang, Bang
    Liu, Fenglin
    Zou, Yuexian
    Wu, Xian
    Wang, Yaowei
    Clifton, David A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5712 - 5724
  • [6] Logic2Text: High-Fidelity Natural Language Generation from Logical Forms
    Chen, Zhiyu
    Chen, Wenhu
    Zha, Hanwen
    Zhou, Xiyou
    Zhang, Yunkai
    Sundaresan, Sairam
    Wang, William Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2096 - 2111
  • [7] CANZSL: Cycle-Consistent Adversarial Networks for Zero-Shot Learning from Natural Language
    Chen, Zhi
    Li, Jingjing
    Luo, Yadan
    Huang, Zi
    Yang, Yang
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 863 - 872
  • [8] Attention Biasing and Context Augmentation for Zero-Shot Control of Encoder-Decoder Transformers for Natural Language Generation
    Hazarika, Devamanyu
    Namazifar, Mahdi
    Hakkani-Tur, Dilek
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10738 - 10748
  • [9] Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
    Park, Briton
    Altieri, Nicholas
    DeNero, John
    Odisho, Anobel Y.
    Yu, Bin
    JAMIA OPEN, 2021, 4 (03)
  • [10] Feasibility of Using Zero-Shot Learning in TransformerBased Natural Language Processing Algorithm for Key Information Extraction from Head and Neck Tumor Board Notes
    Zhu, S.
    Gilbert, M.
    Ghanem, A. I.
    Siddiqui, F.
    Thind, K.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2023, 117 (02): : E500 - E500