Building Multimodal Knowledge Bases With Multimodal Computational Sequences and Generative Adversarial Networks

被引:3
|
作者
Chen, Donghua [1 ]
Zhang, Runtong [1 ]
机构
[1] Univ Int Business & Econ, Sch Informat Technol & Management, Dept Artificial Intelligence, Beijing 100029, Peoples R China
基金
中国国家自然科学基金;
关键词
Cognition; Generative adversarial networks; Data models; Visualization; Feature extraction; Databases; Computational modeling; Decision support systems; deep learning; generative adversarial networks; knowledge representation; multimodal data; INFORMATION FUSION;
D O I
10.1109/TMM.2023.3291503
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Conventional knowledge graphs (KGs) are composed solely of entities, attributes, and relationships, which poses challenges for enhancing multimodal knowledge representation and reasoning. To address the issue, this article proposes a multimodal deep learning-based approach to build a multimodal knowledge base (MMKB) for better multimodal feature (MMF) utilization. First, we construct a multimodal computation sequence (MCS) model for structured multimodal data storage. Then, we propose multimodal node, relationship, and dictionary models to enhance multimodal knowledge representation. Various feature extractors are used to extract MMFs from text, audio, image, and video data. Finally, we leverage generative adversarial networks (GANs) to facilitate MMF representation and update the MMKB dynamically. We examine the performance of the proposed method by using three multimodal datasets. BOW-, LBP-, Volume-, and VGGish-based feature extractors outperform the other methods by reducing at least 1.13%, 22.14%, 39.87, and 5.65% of the time cost, respectively. The average time costs of creating multimodal indexes improve by approximately 55.07% and 68.60% exact matching rates compared with the baseline method, respectively. The deep learning-based autoencoder method reduces the search time cost by 98.90% after using the trained model, outperforming the state-of-the-art methods. In terms of multimodal data representation, the GAN-CNN models achieve an average correct rate of 82.70%. Our open-source work highlights the importance of flexible MMF utilization in multimodal KGs, leading to more powerful and diverse applications that can leverage different types of data.
引用
收藏
页码:2027 / 2040
页数:14
相关论文
共 50 条
  • [21] Research on Knowledge Distillation of Generative Adversarial Networks
    Wang, Wei
    Zhang, Baohua
    Cui, Tao
    Chai, Yimeng
    Li, Yue
    2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, : 376 - 376
  • [22] KDGAN: Knowledge Distillation with Generative Adversarial Networks
    Wang, Xiaojie
    Zhang, Rui
    Sun, Yu
    Qi, Jianzhong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [23] Application of Knowledge Distillation in Generative Adversarial Networks
    Zhang, Xu
    2023 3RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE, ACCTCS, 2023, : 65 - 71
  • [24] Contextual information in terminological knowledge bases: A multimodal approach
    Reimerink, Arianne
    Garcia de Quesada, Mercedes
    Montero-Martinez, Silvia
    JOURNAL OF PRAGMATICS, 2010, 42 (07) : 1928 - 1950
  • [25] Multimodal Vigilance Estimation with Adversarial Domain Adaptation Networks
    Li, He
    Zheng, Wei-Long
    Lu, Bao-Liang
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [26] Learning Joint Multimodal Representation with Adversarial Attention Networks
    Huang, Feiran
    Zhang, Xiaoming
    Li, Zhoujun
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1874 - 1882
  • [27] Attention-based generative adversarial networks improve prognostic outcome prediction of cancer from multimodal data
    Shi, Mingguang
    Li, Xuefeng
    Li, Mingna
    Si, Yichong
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (06)
  • [28] TRANSFER-GAN: MULTIMODAL CT IMAGE SUPER-RESOLUTION VIA TRANSFER GENERATIVE ADVERSARIAL NETWORKS
    Xiao, Yao
    Peters, Keith R.
    Fox, W. Christopher
    Rees, John H.
    Rajderkar, Dhanashree A.
    Arreola, Manuel M.
    Barreto, Izabella
    Bolch, Wesley E.
    Fang, Ruogu
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 195 - 198
  • [29] Leveraging Dual Variational Autoencoders and Generative Adversarial Networks for Enhanced Multimodal Interaction in Zero-Shot Learning
    Li, Ning
    Chen, Jie
    Fu, Nanxin
    Xiao, Wenzhuo
    Ye, Tianrun
    Gao, Chunming
    Zhang, Ping
    ELECTRONICS, 2024, 13 (03)
  • [30] Multimodal Ophthalmic Image Registration: A Generalizable Framework Based on Image Synthesis using Cycle Generative Adversarial Networks
    Bollepalli, Sandeep
    Gadari, Adarsh
    Arasikere, Raveena
    Darandale, Aditi
    Suthaharan, Shan
    Dansingani, Kunal
    Sahel, Jose
    Chhablani, Jay
    Vupparaboina, Kiran
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)