Controllable Mind Visual Diffusion Model

被引:0
|
作者
Zeng, Bohan [1 ]
Li, Shanglin [1 ]
Liu, Xuhui [1 ]
Gao, Sicheng [1 ]
Jiang, Xiaolong [3 ]
Tang, Xu [3 ]
Hu, Yao [3 ]
Liu, Jianzhuang [4 ]
Zhang, Baochang [1 ,2 ,5 ]
机构
[1] Beihang Univ, Hangzhou Res Inst, Inst Artificial Intelligence, Hangzhou, Peoples R China
[2] Nanchang Inst Technol, Nanchang, Jiangxi, Peoples R China
[3] Xiaohongshu Inc, Shanghai, Peoples R China
[4] Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[5] Zhongguancun Lab, Beijing, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7 | 2024年
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
BRAIN; RECONSTRUCTION; IMAGES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models. Diffusion-based methods have recently shown promise in analyzing functional magnetic resonance imaging (fMRI) data, including the reconstruction of high-quality images consistent with original visual stimuli. Nonetheless, it remains a critical challenge to effectively harness the semantic and silhouette information extracted from brain signals. In this paper, we propose a novel approach, termed as Controllable Mind Visual Diffusion Model (CMVDM). Specifically, CMVDM first extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks. Then, a control model is introduced in conjunction with a residual block to fully exploit the extracted information for image synthesis, generating high-quality images that closely resemble the original visual stimuli in both semantic content and silhouette characteristics. Through extensive experimentation, we demonstrate that CMVDM outperforms existing state-of-the-art methods both qualitatively and quantitatively. Our code is available at https://github.com/zengbohan0217/CMVDM.
引用
收藏
页码:6935 / 6943
页数:9
相关论文
共 50 条
  • [31] VISUAL DISCOVERY IN MIND AND ON PAPER
    ANDERSON, RE
    HELSTRUP, T
    MEMORY & COGNITION, 1993, 21 (03) : 283 - 293
  • [32] Diagrams in the Mind: Visual or Spatial?
    Barkowsky, Thomas
    DIAGRAMMATIC REPRESENTATION AND INFERENCE, 2010, 6170 : 1 - 1
  • [33] StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
    Zhai, Shangjin
    Ye, Zhichao
    Liu, Jialin
    Xie, Weijian
    Hu, Jiaqi
    Peng, Zhen
    Xue, Hua
    Chen, Danpeng
    Wang, Xiaomeng
    Yang, Lei
    Wang, Nan
    Liu, Haomin
    Zhang, Guofeng
    arXiv,
  • [35] A Survey of Multimodal Controllable Diffusion Models
    Jiang, Rui
    Zheng, Guang-Cong
    Li, Teng
    Yang, Tian-Rui
    Wang, Jing-Dong
    Li, Xi
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (03) : 509 - 541
  • [36] Controllable Visual-Tactile Synthesis
    Gao, Ruihan
    Yuan, Wenzhen
    Zhu, Jun-Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7017 - 7029
  • [37] Controllable fashion rendering via brownian bridge diffusion model with latent sketch encoding
    Wang, Zengmao
    Li, Jituo
    Gao, Wei
    VISUAL COMPUTER, 2025,
  • [38] An endocrine-immune system inspired controllable information diffusion model in social networks
    Liu, Yanjun
    Qi, Jie
    Ding, Yongsheng
    NEUROCOMPUTING, 2018, 301 : 25 - 35
  • [39] A NOISE REMOVAL MODEL WITH ANISOTROPIC DIFFUSION BASED ON VISUAL GRADIENT
    Li Shi-Fei
    Wang Ping
    Shen Zhen-Kang
    VISAPP 2009: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 1, 2009, : 61 - 64
  • [40] DiffusionTracker: Targets Denoising Based on Diffusion Model for Visual Tracking
    Zhang, Runqing
    Cai, Dunbo
    Qian, Ling
    Du, Yujian
    Lu, Huijun
    Zhang, Yijun
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XII, 2024, 14436 : 225 - 237