Building Multimodal Knowledge Bases With Multimodal Computational Sequences and Generative Adversarial Networks

被引:3
|
作者
Chen, Donghua [1 ]
Zhang, Runtong [1 ]
机构
[1] Univ Int Business & Econ, Sch Informat Technol & Management, Dept Artificial Intelligence, Beijing 100029, Peoples R China
基金
中国国家自然科学基金;
关键词
Cognition; Generative adversarial networks; Data models; Visualization; Feature extraction; Databases; Computational modeling; Decision support systems; deep learning; generative adversarial networks; knowledge representation; multimodal data; INFORMATION FUSION;
D O I
10.1109/TMM.2023.3291503
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Conventional knowledge graphs (KGs) are composed solely of entities, attributes, and relationships, which poses challenges for enhancing multimodal knowledge representation and reasoning. To address the issue, this article proposes a multimodal deep learning-based approach to build a multimodal knowledge base (MMKB) for better multimodal feature (MMF) utilization. First, we construct a multimodal computation sequence (MCS) model for structured multimodal data storage. Then, we propose multimodal node, relationship, and dictionary models to enhance multimodal knowledge representation. Various feature extractors are used to extract MMFs from text, audio, image, and video data. Finally, we leverage generative adversarial networks (GANs) to facilitate MMF representation and update the MMKB dynamically. We examine the performance of the proposed method by using three multimodal datasets. BOW-, LBP-, Volume-, and VGGish-based feature extractors outperform the other methods by reducing at least 1.13%, 22.14%, 39.87, and 5.65% of the time cost, respectively. The average time costs of creating multimodal indexes improve by approximately 55.07% and 68.60% exact matching rates compared with the baseline method, respectively. The deep learning-based autoencoder method reduces the search time cost by 98.90% after using the trained model, outperforming the state-of-the-art methods. In terms of multimodal data representation, the GAN-CNN models achieve an average correct rate of 82.70%. Our open-source work highlights the importance of flexible MMF utilization in multimodal KGs, leading to more powerful and diverse applications that can leverage different types of data.
引用
收藏
页码:2027 / 2040
页数:14
相关论文
共 50 条
  • [1] Multimodal Image Fusion Based on Generative Adversarial Networks
    Yang Xiaoli
    Lin Suzhen
    Lu Xiaofei
    Wang Lifang
    Li Dawei
    Wang Bin
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (16)
  • [2] Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking
    Vukotic, Vedran
    Raymond, Christian
    Gravier, Guillaume
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 421 - 424
  • [3] Multimodal MRI synthesis using unified generative adversarial networks
    Dai, Xianjin
    Lei, Yang
    Fu, Yabo
    Curran, Walter J.
    Liu, Tian
    Mao, Hui
    Yang, Xiaofeng
    MEDICAL PHYSICS, 2020, 47 (12) : 6343 - 6354
  • [4] Generative Adversarial Networks Under CutMix Transformations for Multimodal Change Detection
    Radoi, Anamaria
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [5] Multimodal attention for lip synthesis using conditional generative adversarial networks
    Vidal, Andrea
    Busso, Carlos
    SPEECH COMMUNICATION, 2023, 153
  • [6] A student performance prediction model based on multimodal generative adversarial networks
    Liu, Junjie
    Yang, Yong
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2025, 47 (03) : 186 - 198
  • [7] MEGAN: Mixture of Experts of Generative Adversarial Networks for Multimodal Image Generation
    Park, David Keetae
    Yoo, Seungjoo
    Bahng, Hyojin
    Choo, Jaegul
    Park, Noseong
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 878 - 884
  • [8] Boundary-Focused Generative Adversarial Networks for Imbalanced and Multimodal Time Series
    Lee, Han Kyu
    Lee, Jiyoon
    Kim, Seoung Bum
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (09) : 4102 - 4118
  • [9] Robust Multimodal Depth Estimation using Transformer based Generative Adversarial Networks
    Khan, Md Fahim Faysal
    Devulapally, Anusha
    Advani, Siddharth
    Narayanan, Vijaykrishnan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3559 - 3568
  • [10] Multimodal Storytelling via Generative Adversarial Imitation Learning
    Chen, Zhiqian
    Zhang, Xuchao
    Boedihardjo, Arnold P.
    Dai, Jing
    Lu, Chang-Tien
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3967 - 3973