Zero-Shot Text-Guided Object Generation with Dream Fields

被引:178
|
作者
Jain, Ajay [1 ,2 ]
Mildenhall, Ben [2 ]
Barron, Jonathan T. [2 ]
Abbeel, Pieter [1 ]
Poole, Ben [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Google Res, Mountain View, CA 94043 USA
关键词
D O I
10.1109/CVPR52688.2022.00094
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects solely from natural language descriptions. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision. Due to the scarcity of diverse, captioned 3D data, prior methods only generate objects from a handful of categories, such as ShapeNet. Instead, we guide generation with image-text models pre-trained on large datasets of captioned images from the web. Our method optimizes a Neural Radiance Field from many camera views so that rendered images score highly with a target caption according to a pre-trained CLIP model. To improve fidelity and visual quality, we introduce simple geometric priors, including sparsity-inducing transmittance regularization, scene bounds, and new MLP architectures. In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.
引用
收藏
页码:857 / 866
页数:10
相关论文
共 50 条
  • [1] CLIPCAM: A SIMPLE BASELINE FOR ZERO-SHOT TEXT-GUIDED OBJECT AND ACTION LOCALIZATION
    Hsia, Hsuan-An
    Lin, Che-Hsien
    Kung, Bo-Han
    Chen, Jhao-Ting
    Tan, Daniel Stanley
    Chen, Jun-Cheng
    Hua, Kai-Lung
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4453 - 4457
  • [2] CLIP-Count: Towards Text-Guided Zero-Shot Object Counting
    Jiang, Ruixiang
    Liu, Lingbo
    Chen, Changwen
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4535 - 4545
  • [3] Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
    Yang, Serin
    Hwang, Hyunmin
    Ye, Jong Chul
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22816 - 22825
  • [4] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
    Yang, Shuai
    Zhou, Yifan
    Liu, Ziwei
    Loy, Chen Change
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [5] Zero-Shot Text-to-Image Generation
    Ramesh, Aditya
    Pavlov, Mikhail
    Goh, Gabriel
    Gray, Scott
    Voss, Chelsea
    Radford, Alec
    Chen, Mark
    Sutskever, Ilya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [6] LANGUAGE-GUIDED ZERO-SHOT OBJECT COUNTING
    Wang, Mingjie
    Yuan, Song
    Li, Zhuohang
    Zhu, Longlong
    Buys, Eric
    Gong, Minglun
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [7] Zero-Shot Object Counting
    Xu, Jingyi
    Le, Hieu
    Nguyen, Vu
    Ranjan, Viresh
    Samaras, Dimitris
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15548 - 15557
  • [8] Zero-Shot Object Detection
    Bansal, Ankan
    Sikka, Karan
    Sharma, Gaurav
    Chellappa, Rama
    Divakaran, Ajay
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 397 - 414
  • [9] Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields
    Gordon, Ori
    Avrahami, Omri
    Lischinski, Dani
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2933 - 2943
  • [10] Neural Pipeline for Zero-Shot Data-to-Text Generation
    Kasner, Zdenek
    Dusek, Ondrej
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3914 - 3932