Building Multimodal Knowledge Bases With Multimodal Computational Sequences and Generative Adversarial Networks

被引:3
|
作者
Chen, Donghua [1 ]
Zhang, Runtong [1 ]
机构
[1] Univ Int Business & Econ, Sch Informat Technol & Management, Dept Artificial Intelligence, Beijing 100029, Peoples R China
基金
中国国家自然科学基金;
关键词
Cognition; Generative adversarial networks; Data models; Visualization; Feature extraction; Databases; Computational modeling; Decision support systems; deep learning; generative adversarial networks; knowledge representation; multimodal data; INFORMATION FUSION;
D O I
10.1109/TMM.2023.3291503
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Conventional knowledge graphs (KGs) are composed solely of entities, attributes, and relationships, which poses challenges for enhancing multimodal knowledge representation and reasoning. To address the issue, this article proposes a multimodal deep learning-based approach to build a multimodal knowledge base (MMKB) for better multimodal feature (MMF) utilization. First, we construct a multimodal computation sequence (MCS) model for structured multimodal data storage. Then, we propose multimodal node, relationship, and dictionary models to enhance multimodal knowledge representation. Various feature extractors are used to extract MMFs from text, audio, image, and video data. Finally, we leverage generative adversarial networks (GANs) to facilitate MMF representation and update the MMKB dynamically. We examine the performance of the proposed method by using three multimodal datasets. BOW-, LBP-, Volume-, and VGGish-based feature extractors outperform the other methods by reducing at least 1.13%, 22.14%, 39.87, and 5.65% of the time cost, respectively. The average time costs of creating multimodal indexes improve by approximately 55.07% and 68.60% exact matching rates compared with the baseline method, respectively. The deep learning-based autoencoder method reduces the search time cost by 98.90% after using the trained model, outperforming the state-of-the-art methods. In terms of multimodal data representation, the GAN-CNN models achieve an average correct rate of 82.70%. Our open-source work highlights the importance of flexible MMF utilization in multimodal KGs, leading to more powerful and diverse applications that can leverage different types of data.
引用
收藏
页码:2027 / 2040
页数:14
相关论文
共 50 条
  • [31] Multimodal medical image fusion combining saliency perception and generative adversarial network
    Albekairi, Mohammed
    Mohamed, Mohamed vall O.
    Kaaniche, Khaled
    Abbas, Ghulam
    Alanazi, Meshari D.
    Alanazi, Turki M.
    Emara, Ahmed
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [32] Synthesis of Multimodal Cardiological Signals Using a Conditional Wasserstein Generative Adversarial Network
    Cretu, Ioana
    Tindale, Alexander
    Balachandran, Wamadeva
    Abbod, Maysam
    Khir, Ashraf William
    Meng, Hongying
    IEEE ACCESS, 2024, 12 : 133994 - 134007
  • [33] Speaker Recognition Based on Multimodal Generative Adversarial Nets with Triplet-loss
    Chen Ying
    Chen Huangkang
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (02) : 379 - 385
  • [34] Computational Environments with Multimodal Representations of Architectural Design Knowledge
    Aksamija, Ajla
    Iordanova, Ivanka
    INTERNATIONAL JOURNAL OF ARCHITECTURAL COMPUTING, 2010, 8 (04) : 440 - 460
  • [35] PKDGAN: Private Knowledge Distillation With Generative Adversarial Networks
    Zhuo, Cheng
    Gao, Di
    Liu, Liangwei
    IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (06) : 775 - 788
  • [36] Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks
    Huang, Feiran
    Jolfaei, Alireza
    Bashir, Ali Kashif
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2021, 25 (05) : 856 - 868
  • [37] Multimodal Spatio-Temporal Prediction with Stochastic Adversarial Networks
    Saxena, Divya
    Cao, Jiannong
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (02)
  • [38] Building Footprint Generation Using Improved Generative Adversarial Networks
    Shi, Yilei
    Li, Qingyu
    Zhu, Xiao Xiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (04) : 603 - 607
  • [39] MedFusionGAN: multimodal medical image fusion using an unsupervised deep generative adversarial network
    Safari, Mojtaba
    Fatemi, Ali
    Archambault, Louis
    BMC MEDICAL IMAGING, 2023, 23 (01)
  • [40] MedFusionGAN: multimodal medical image fusion using an unsupervised deep generative adversarial network
    Mojtaba Safari
    Ali Fatemi
    Louis Archambault
    BMC Medical Imaging, 23