Autoencoder-Based Collaborative Attention GAN for Multi-Modal Image Synthesis

被引:9
|
作者
Cao, Bing [1 ,2 ]
Cao, Haifang [1 ,3 ]
Liu, Jiaxu [1 ,3 ]
Zhu, Pengfei [1 ,3 ]
Zhang, Changqing [1 ,3 ]
Hu, Qinghua [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300403, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian 710000, Peoples R China
[3] Tianjin Univ, Haihe Lab Informat echnol Applicat Innovat, Tianjin 300403, Peoples R China
关键词
Image synthesis; Collaboration; Task analysis; Generative adversarial networks; Feature extraction; Data models; Image reconstruction; Multi-modal image synthesis; collaborative attention; single-modal attention; multi-modal attention; TRANSLATION; NETWORK;
D O I
10.1109/TMM.2023.3274990
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-modal images are required in a wide range of practical scenarios, from clinical diagnosis to public security. However, certain modalities may be incomplete or unavailable because of the restricted imaging conditions, which commonly leads to decision bias in many real-world applications. Despite the significant advancement of existing image synthesis techniques, learning complementary information from multi-modal inputs remains challenging. To address this problem, we propose an autoencoder-based collaborative attention generative adversarial network (ACA-GAN) that uses available multi-modal images to generate the missing ones. The collaborative attention mechanism deploys a single-modal attention module and a multi-modal attention module to effectively extract complementary information from multiple available modalities. Considering the significant modal gap, we further developed an autoencoder network to extract the self-representation of target modality, guiding the generative model to fuse target-specific information from multiple modalities. This considerably improves cross-modal consistency with the desired modality, thereby greatly enhancing the image synthesis performance. Quantitative and qualitative comparisons for various multi-modal image synthesis tasks highlight the superiority of our approach over several prior methods by demonstrating more precise and realistic results.
引用
收藏
页码:995 / 1010
页数:16
相关论文
共 50 条
  • [31] Based on Multi-Feature Information Attention Fusion for Multi-Modal Remote Sensing Image Semantic Segmentation
    Zhang, Chongyu
    2021 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (IEEE ICMA 2021), 2021, : 71 - 76
  • [32] Attention-based multi-modal fusion sarcasm detection
    Liu, Jing
    Tian, Shengwei
    Yu, Long
    Long, Jun
    Zhou, Tiejun
    Wang, Bo
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 2097 - 2108
  • [33] Multi-modal Perception Fusion Method Based on Cross Attention
    Zhang B.-L.
    Pan Z.-H.
    Jiang J.-Z.
    Zhang C.-B.
    Wang Y.-X.
    Yang C.-L.
    Zhongguo Gonglu Xuebao/China Journal of Highway and Transport, 2024, 37 (03): : 181 - 193
  • [34] Multi-Modal Sentiment Analysis Based on Interactive Attention Mechanism
    Wu, Jun
    Zhu, Tianliang
    Zheng, Xinli
    Wang, Chunzhi
    APPLIED SCIENCES-BASEL, 2022, 12 (16):
  • [35] A brain-computer interface based on multi-modal attention
    Zhang, Dan
    Wang, Yijun
    Maye, Alexander
    Engel, Andreas K.
    Gao, Xiaorong
    Hong, Bo
    Gao, Shangkai
    2007 3RD INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING, VOLS 1 AND 2, 2007, : 414 - +
  • [36] Thermal-visible stereo matching at night based on Multi-Modal Autoencoder
    Zhang, Quan
    Li, Yiran
    Yang, Le
    Zhang, Yi
    Li, Zechao
    Chen, Xiaoyu
    Han, Jing
    INFRARED PHYSICS & TECHNOLOGY, 2024, 136
  • [37] Unified Multi-Modal Image Synthesis for Missing Modality Imputation
    Zhang, Yue
    Peng, Chengtao
    Wang, Qiuli
    Song, Dan
    Li, Kaiyan
    Zhou, S. Kevin
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 4 - 18
  • [38] PhotoScout: Synthesis-Powered Multi-Modal Image Search
    Barnaby, Celeste
    Chen, Qiaochu
    Wang, Chenglong
    Dillig, Isil
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024, 2024,
  • [39] Multi-Modal Memory Enhancement Attention Network for Image-Text Matching
    Ji, Zhong
    Lin, Zhigang
    Wang, Haoran
    He, Yuqing
    IEEE ACCESS, 2020, 8 : 38438 - 38447
  • [40] Multi-Modal Retinal Image Classification With Modality-Specific Attention Network
    He, Xingxin
    Deng, Ying
    Fang, Leyuan
    Peng, Qinghua
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (06) : 1591 - 1602