Building Multimodal Knowledge Bases With Multimodal Computational Sequences and Generative Adversarial Networks

被引：3

作者：

Chen, Donghua ^{[1
]}

Zhang, Runtong ^{[1
]}

机构：

[1] Univ Int Business & Econ, Sch Informat Technol & Management, Dept Artificial Intelligence, Beijing 100029, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

基金：

中国国家自然科学基金;

关键词：

Cognition; Generative adversarial networks; Data models; Visualization; Feature extraction; Databases; Computational modeling; Decision support systems; deep learning; generative adversarial networks; knowledge representation; multimodal data; INFORMATION FUSION;

D O I：

10.1109/TMM.2023.3291503

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Conventional knowledge graphs (KGs) are composed solely of entities, attributes, and relationships, which poses challenges for enhancing multimodal knowledge representation and reasoning. To address the issue, this article proposes a multimodal deep learning-based approach to build a multimodal knowledge base (MMKB) for better multimodal feature (MMF) utilization. First, we construct a multimodal computation sequence (MCS) model for structured multimodal data storage. Then, we propose multimodal node, relationship, and dictionary models to enhance multimodal knowledge representation. Various feature extractors are used to extract MMFs from text, audio, image, and video data. Finally, we leverage generative adversarial networks (GANs) to facilitate MMF representation and update the MMKB dynamically. We examine the performance of the proposed method by using three multimodal datasets. BOW-, LBP-, Volume-, and VGGish-based feature extractors outperform the other methods by reducing at least 1.13%, 22.14%, 39.87, and 5.65% of the time cost, respectively. The average time costs of creating multimodal indexes improve by approximately 55.07% and 68.60% exact matching rates compared with the baseline method, respectively. The deep learning-based autoencoder method reduces the search time cost by 98.90% after using the trained model, outperforming the state-of-the-art methods. In terms of multimodal data representation, the GAN-CNN models achieve an average correct rate of 82.70%. Our open-source work highlights the importance of flexible MMF utilization in multimodal KGs, leading to more powerful and diverse applications that can leverage different types of data.

引用

页码：2027 / 2040

页数：14

共 50 条

[21] Research on Knowledge Distillation of Generative Adversarial Networks
Wang, Wei
Zhang, Baohua
Cui, Tao
Chai, Yimeng
Li, Yue
2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, : 376 - 376
[22] KDGAN: Knowledge Distillation with Generative Adversarial Networks
Wang, Xiaojie
Zhang, Rui
Sun, Yu
Qi, Jianzhong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[23] Application of Knowledge Distillation in Generative Adversarial Networks
Zhang, Xu
2023 3RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE, ACCTCS, 2023, : 65 - 71
[24] Contextual information in terminological knowledge bases: A multimodal approach
Reimerink, Arianne
Garcia de Quesada, Mercedes
Montero-Martinez, Silvia
JOURNAL OF PRAGMATICS, 2010, 42 (07) : 1928 - 1950
[25] Multimodal Vigilance Estimation with Adversarial Domain Adaptation Networks
Li, He
Zheng, Wei-Long
Lu, Bao-Liang
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[26] Learning Joint Multimodal Representation with Adversarial Attention Networks
Huang, Feiran
Zhang, Xiaoming
Li, Zhoujun
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1874 - 1882
[27] Attention-based generative adversarial networks improve prognostic outcome prediction of cancer from multimodal data
Shi, Mingguang
Li, Xuefeng
Li, Mingna
Si, Yichong
BRIEFINGS IN BIOINFORMATICS, 2023, 24 (06)
[28] TRANSFER-GAN: MULTIMODAL CT IMAGE SUPER-RESOLUTION VIA TRANSFER GENERATIVE ADVERSARIAL NETWORKS
Xiao, Yao
Peters, Keith R.
Fox, W. Christopher
Rees, John H.
Rajderkar, Dhanashree A.
Arreola, Manuel M.
Barreto, Izabella
Bolch, Wesley E.
Fang, Ruogu
2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 195 - 198
[29] Leveraging Dual Variational Autoencoders and Generative Adversarial Networks for Enhanced Multimodal Interaction in Zero-Shot Learning
Li, Ning
Chen, Jie
Fu, Nanxin
Xiao, Wenzhuo
Ye, Tianrun
Gao, Chunming
Zhang, Ping
ELECTRONICS, 2024, 13 (03)
[30] Multimodal Ophthalmic Image Registration: A Generalizable Framework Based on Image Synthesis using Cycle Generative Adversarial Networks
Bollepalli, Sandeep
Gadari, Adarsh
Arasikere, Raveena
Darandale, Aditi
Suthaharan, Shan
Dansingani, Kunal
Sahel, Jose
Chhablani, Jay
Vupparaboina, Kiran
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)

← 1 2 3 4 5 →