Deep Multimodal Data Fusion

被引:22
|
作者
Zhao, Fei [1 ]
Zhang, Chengcui [2 ]
Geng, Baocheng [3 ]
机构
[1] Univ Alabama Birmingham, Univ Hall 4105,1402 10th Ave S, Birmingham, AL 35294 USA
[2] Univ Alabama Birmingham, Univ Hall 4143,1402 10th Ave S, Birmingham, AL 35294 USA
[3] Univ Alabama Birmingham, Univ Hall 4147,1402 10th Ave S, Birmingham, AL 35294 USA
关键词
Data fusion; neural networks; multimodal deep learning; PERSON REIDENTIFICATION; ATTENTION NETWORK; NEURAL-NETWORKS; URBAN DATASET; IMAGE;
D O I
10.1145/3649447
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multimodal Artificial Intelligence (Multimodal AI), in general, involves various types of data (e.g., images, texts, or data collected from different sensors), feature engineering (e.g., extraction, combination/fusion), and decision-making (e.g., majority vote). As architectures become more and more sophisticated, multimodal neural networks can integrate feature extraction, feature fusion, and decision-making processes into one single model. The boundaries between those processes are increasingly blurred. The conventional multimodal data fusion taxonomy (e.g., early/late fusion), based on which the fusion occurs in, is no longer suitable for the modern deep learning era. Therefore, based on the main-stream techniques used, we propose a new fine-grained taxonomy grouping the state-of-the-art (SOTA) models into five classes: Encoder-Decoder methods, Attention Mechanism methods, Graph Neural Network methods, Generative Neural Network methods, and other Constraint-based methods. Most existing surveys on multimodal data fusion are only focused on one specific task with a combination of two specific modalities. Unlike those, this survey covers a broader combination of modalities, including Vision + Language (e.g., videos, texts), Vision + Sensors (e.g., images, LiDAR), and so on, and their corresponding tasks (e.g., video captioning, object detection). Moreover, a comparison among these methods is provided, as well as challenges and future directions in this area.
引用
收藏
页数:36
相关论文
共 50 条
  • [41] A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition
    Yu, Mingyang
    Xu, Haiqing
    Zhou, Fangliang
    Xu, Shuai
    Yin, Hongling
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (12)
  • [42] Deep Multimodal Fusion Network for Semantic Segmentation Using Remote Sensing Image and LiDAR Data
    Sun, Yangjie
    Fu, Zhongliang
    Sun, Chuanxia
    Hu, Yinglei
    Zhang, Shengyuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [43] Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: a review
    Cui, Can
    Yang, Haichun
    Wang, Yaohong
    Zhao, Shilin
    Asad, Zuhayr
    Coburn, Lori A.
    Wilson, Keith T.
    Landman, Bennett A.
    Huo, Yuankai
    PROGRESS IN BIOMEDICAL ENGINEERING, 2023, 5 (02):
  • [44] Deep Learning Based Optimal Multimodal Fusion Framework for Intrusion Detection Systems for Healthcare Data
    Phong Thanh Nguyen
    Vy Dang Bich Huynh
    Khoa Dang Vo
    Phuong Thanh Phan
    Elhoseny, Mohamed
    Dac-Nhuong Le
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (03): : 2555 - 2571
  • [45] Utilisation of Deep Learning with Multimodal Data Fusion for Determination of Pineapple Quality Using Thermal Imaging
    Mohd Ali, Maimunah
    Hashim, Norhashila
    Abd Aziz, Samsuzana
    Lasekan, Ola
    AGRONOMY-BASEL, 2023, 13 (02):
  • [46] Sensor Data Acquisition and Multimodal Sensor Fusion for Human Activity Recognition Using Deep Learning
    Chung, Seungeun
    Lim, Jiyoun
    Noh, Kyoung Ju
    Kim, Gague
    Jeong, Hyuntae
    SENSORS, 2019, 19 (07)
  • [47] Deep multimodal fusion for semantic image segmentation: A survey
    Zhang, Yifei
    Sidibe, Desire
    Morel, Olivier
    Meriaudeau, Fabrice
    IMAGE AND VISION COMPUTING, 2021, 105
  • [48] A Multimodal Deep Fusion Network for Mobile Traffic Classification
    Ding, Shuai
    Xu, Yifei
    Xu, Hao
    Deng, Haojiang
    Ge, Jingguo
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS (WASA 2022), PT II, 2022, 13472 : 384 - 392
  • [49] Multimodal Biometric Fusion Model Based on Deep Learning
    Li, Zhuorong
    Tang, Yunqi
    Computer Engineering and Applications, 2023, 59 (07) : 180 - 189
  • [50] Deep Relationship Analysis in Video with Multimodal Feature Fusion
    Yu, Fan
    Wang, DanDan
    Zhang, Beibei
    Ren, Tongwei
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4640 - 4644