MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion

被引:0
|
作者
Tang, Shitao [1 ]
Zhang, Fuyang [1 ]
Chen, Jiacheng [1 ]
Wang, Peng [2 ]
Furukawa, Yasutaka [1 ]
机构
[1] Simon Fraser Univ, Burnaby, BC, Canada
[2] Bytedance, Beijing, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces MVDiffusion, a simple yet effective method for generating consistent multi-view images from text prompts given pixel-to-pixel correspondences (e.g., perspective crops from a panorama or multi-view images given depth maps and poses). Unlike prior methods that rely on iterative image warping and inpainting, MVDiffusion simultaneously generates all images with a global awareness, effectively addressing the prevalent error accumulation issue. At its core, MVDiffusion processes perspective images in parallel with a pre-trained text-to-image diffusion model, while integrating novel correspondence-aware attention layers to facilitate cross-view interactions. For panorama generation, while only trained with 10k panoramas, MVDiffusion is able to generate high-resolution photorealistic images for arbitrary texts or extrapolate one perspective image to a 360-degree view. For multi-view depth-to-image generation, MVDiffusion demonstrates state-of-the-art performance for texturing a scene mesh. The project page is at https://mvdiffusion.github.io/.
引用
收藏
页数:32
相关论文
共 50 条
  • [1] Multi-View 3D Shape Recognition via Correspondence-Aware Deep Learning
    Xu, Yong
    Zheng, Chaoda
    Xu, Ruotao
    Quan, Yuhui
    Ling, Haibin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5299 - 5312
  • [2] Multi-View Image Generation from a Single-View
    Zhao, Bo
    Wu, Xiao
    Cheng, Zhi-Qi
    Liu, Hao
    Jie, Zequn
    Feng, Jiashi
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 383 - 391
  • [3] PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-View Self-guidance
    Liu, Aoming
    Li, Zhong
    Chen, Zhang
    Li, Nannan
    Xu, Yi
    Plummer, Bryan A.
    COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 146 - 164
  • [4] Deep Multi-View Correspondence for Identity-Aware Multi-Target Tracking
    Hanif, Adnan
    Bin Mansoor, Atif
    Imran, Ali Shariq
    2017 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING - TECHNIQUES AND APPLICATIONS (DICTA), 2017, : 497 - 504
  • [5] Semantic-aware Generation of Multi-view Portrait Drawings
    Ma, Biao
    Gao, Fei
    Jiang, Chang
    Wang, Nannan
    Xu, Gang
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1258 - 1266
  • [6] Multi-view frontal face image generation: A survey
    Ning, Xin
    Nan, Fangzhe
    Xu, Shaohui
    Yu, Lina
    Zhang, Liping
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (18):
  • [7] Development of multi-view HDTV image generation system
    Tomiyama K.
    Iwadate Y.
    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2010, 64 (04): : 622 - 628
  • [8] Centralized and Distributed Multi-view Correspondence
    Shai Avidan
    Yael Moses
    Yoram Moses
    International Journal of Computer Vision, 2007, 71 : 49 - 69
  • [9] Centralized and distributed multi-view correspondence
    Avidan, Shai
    Moses, Yael
    Moses, Yoram
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2007, 71 (01) : 49 - 69
  • [10] Multi-View Diffusion Process for Spectral Clustering and Image Retrieval
    Li, Qilin
    An, Senjian
    Li, Ling
    Liu, Wanquan
    Shao, Yanda
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4610 - 4620