Deep Multimodal Data Fusion

被引:22
|
作者
Zhao, Fei [1 ]
Zhang, Chengcui [2 ]
Geng, Baocheng [3 ]
机构
[1] Univ Alabama Birmingham, Univ Hall 4105,1402 10th Ave S, Birmingham, AL 35294 USA
[2] Univ Alabama Birmingham, Univ Hall 4143,1402 10th Ave S, Birmingham, AL 35294 USA
[3] Univ Alabama Birmingham, Univ Hall 4147,1402 10th Ave S, Birmingham, AL 35294 USA
关键词
Data fusion; neural networks; multimodal deep learning; PERSON REIDENTIFICATION; ATTENTION NETWORK; NEURAL-NETWORKS; URBAN DATASET; IMAGE;
D O I
10.1145/3649447
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data (e.g., images, texts, or data collected from different sensors), feature engineering (e.g., extraction, combination/fusion), and decision-making (e.g., majority vote). As architectures become more and more sophisticated, multimodal neural networks can integrate feature extraction, feature fusion, and decision-making processes into one single model. The boundaries between those processes are increasingly blurred. The conventional multimodal data fusion taxonomy (e.g., early/late fusion), based on which the fusion occurs in, is no longer suitable for the modern deep learning era. Therefore, based on the main-stream techniques used, we propose a new fine-grained taxonomy grouping the state-of-the-art (SOTA) models into five classes: Encoder-Decoder methods, Attention Mechanism methods, Graph Neural Network methods, Generative Neural Network methods, and other Constraint-based methods. Most existing surveys on multimodal data fusion are only focused on one specific task with a combination of two specific modalities. Unlike those, this survey covers a broader combination of modalities, including Vision + Language (e.g., videos, texts), Vision + Sensors (e.g., images, LiDAR), and so on, and their corresponding tasks (e.g., video captioning, object detection). Moreover, a comparison among these methods is provided, as well as challenges and future directions in this area.
引用
收藏
页数:36
相关论文
共 50 条
  • [31] Multimodal data fusion for object recognition
    Knyaz, Vladimir
    MULTIMODAL SENSING: TECHNOLOGIES AND APPLICATIONS, 2019, 11059
  • [32] Data fusion of multimodal cardiovascular signals
    Er, Kenneth
    Acharya U, Rajendra
    Kannathal, N.
    Min, Lim Choo
    2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 4689 - 4692
  • [33] Adaptive Fusion Techniques for Multimodal Data
    Sahu, Gaurav
    Vechtomova, Olga
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3156 - 3166
  • [34] A Simple Analysis of Multimodal Data Fusion
    Cheng, Jiangchang
    Dai, Yinglong
    Yuan, Yao
    Zhu, Hongli
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1472 - 1475
  • [35] Endoscopic orientation by multimodal data fusion
    Pulwer, Silvio
    Fiebelkorn, Richard
    Zesch, Christoph
    Steglich, Patrick
    Villringer, Claus
    Villasmunta, Francesco
    Gedat, Egbert
    Handrich, Jan
    Schrader, Sigurd
    Vandenhouten, Ralf
    MOEMS AND MINIATURIZED SYSTEMS XVIII, 2019, 10931
  • [36] Multimodal Data Fusion with Quantum Inspiration
    Li, Qiuchi
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1451 - 1451
  • [37] Multimodal Data Fusion for Big Events
    Papacharalampous, Alexandros E.
    Cats, Oded
    Lankhaar, Jan-Willem
    Daamen, Winnie
    van Lint, Hans
    TRANSPORTATION RESEARCH RECORD, 2016, (2594) : 118 - 126
  • [38] Multimodal Data Fusion and Deep Learning for Occupant-Centric Indoor Environmental Quality Classification
    Lee, Min Jae
    Zhang, Ruichuan
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2025, 39 (02)
  • [39] AnimalEnvNet: A Deep Reinforcement Learning Method for Constructing Animal Agents Using Multimodal Data Fusion
    Chen, Zhao
    Wang, Dianchang
    Zhao, Feixiang
    Dai, Lingnan
    Zhao, Xinrong
    Jiang, Xian
    Zhang, Huaiqing
    APPLIED SCIENCES-BASEL, 2024, 14 (14):
  • [40] Multimodal data fusion enhanced deep learning prediction of crack path segmentation in CFRP composites
    Zhang, Peng
    Tang, Keke
    Chen, Guangxu
    Li, Jiangfeng
    Li, Yan
    COMPOSITES SCIENCE AND TECHNOLOGY, 2024, 257