Deep Multimodal Data Fusion

被引:22
|
作者
Zhao, Fei [1 ]
Zhang, Chengcui [2 ]
Geng, Baocheng [3 ]
机构
[1] Univ Alabama Birmingham, Univ Hall 4105,1402 10th Ave S, Birmingham, AL 35294 USA
[2] Univ Alabama Birmingham, Univ Hall 4143,1402 10th Ave S, Birmingham, AL 35294 USA
[3] Univ Alabama Birmingham, Univ Hall 4147,1402 10th Ave S, Birmingham, AL 35294 USA
关键词
Data fusion; neural networks; multimodal deep learning; PERSON REIDENTIFICATION; ATTENTION NETWORK; NEURAL-NETWORKS; URBAN DATASET; IMAGE;
D O I
10.1145/3649447
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data (e.g., images, texts, or data collected from different sensors), feature engineering (e.g., extraction, combination/fusion), and decision-making (e.g., majority vote). As architectures become more and more sophisticated, multimodal neural networks can integrate feature extraction, feature fusion, and decision-making processes into one single model. The boundaries between those processes are increasingly blurred. The conventional multimodal data fusion taxonomy (e.g., early/late fusion), based on which the fusion occurs in, is no longer suitable for the modern deep learning era. Therefore, based on the main-stream techniques used, we propose a new fine-grained taxonomy grouping the state-of-the-art (SOTA) models into five classes: Encoder-Decoder methods, Attention Mechanism methods, Graph Neural Network methods, Generative Neural Network methods, and other Constraint-based methods. Most existing surveys on multimodal data fusion are only focused on one specific task with a combination of two specific modalities. Unlike those, this survey covers a broader combination of modalities, including Vision + Language (e.g., videos, texts), Vision + Sensors (e.g., images, LiDAR), and so on, and their corresponding tasks (e.g., video captioning, object detection). Moreover, a comparison among these methods is provided, as well as challenges and future directions in this area.
引用
收藏
页数:36
相关论文
共 50 条
  • [1] A Survey on Deep Learning for Multimodal Data Fusion
    Gao, Jing
    Li, Peng
    Chen, Zhikui
    Zhang, Jianing
    NEURAL COMPUTATION, 2020, 32 (05) : 829 - 864
  • [2] Multimodal deep learning for biomedical data fusion: a review
    Stahlschmidt, Soren Richard
    Ulfenborg, Benjamin
    Synnergren, Jane
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (02)
  • [3] Heterogeneous Sensor Data Fusion By Deep Multimodal Encoding
    Liu, Zuozhu
    Zhang, Wenyu
    Lin, Shaowei
    Quek, Tony Q. S.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (03) : 479 - 491
  • [4] A deep multimodal generative and fusion framework for class-imbalanced multimodal data
    Qing Li
    Guanyuan Yu
    Jun Wang
    Yuehao Liu
    Multimedia Tools and Applications, 2020, 79 : 25023 - 25050
  • [5] A deep multimodal generative and fusion framework for class-imbalanced multimodal data
    Li, Qing
    Yu, Guanyuan
    Wang, Jun
    Liu, Yuehao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 25023 - 25050
  • [6] Multimodal data fusion for cancer biomarker discovery with deep learning
    Steyaert, Sandra
    Pizurica, Marija
    Nagaraj, Divya
    Khandelwal, Priya
    Hernandez-Boussard, Tina
    Gentles, Andrew J.
    Gevaert, Olivier
    NATURE MACHINE INTELLIGENCE, 2023, 5 (04) : 351 - 362
  • [7] Multimodal data fusion for cancer biomarker discovery with deep learning
    Sandra Steyaert
    Marija Pizurica
    Divya Nagaraj
    Priya Khandelwal
    Tina Hernandez-Boussard
    Andrew J. Gentles
    Olivier Gevaert
    Nature Machine Intelligence, 2023, 5 : 351 - 362
  • [8] Guest Editorial: Information Fusion for Medical Data: Early, Late, and Deep Fusion Methods for Multimodal Data
    Domingues, Ines
    Mueller, Henning
    Ortiz, Andres
    Dasarathy, Belur V.
    Abreu, Pedro H.
    Calhoun, Vince D.
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (01) : 14 - 16
  • [9] A Deep Reinforcement Learning Method For Multimodal Data Fusion in Action Recognition
    Guo, Jiale
    Liu, Qiang
    Chen, Enqing
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 120 - 124
  • [10] Deep Symmetric Fusion Transformer for Multimodal Remote Sensing Data Classification
    Chang, Honghao
    Bi, Haixia
    Li, Fan
    Xu, Chen
    Chanussot, Jocelyn
    Hong, Danfeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62