Multimodal Machine Learning: A Survey and Taxonomy

被引:2031
|
作者
Baltrusaitis, Tadas [1 ]
Ahuja, Chaitanya [2 ]
Morency, Louis-Philippe [2 ]
机构
[1] Microsoft Corp, Cambridge CB1 2FB, England
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Multimodal; machine learning; introductory; survey; EMOTION RECOGNITION; NEURAL-NETWORKS; SPEECH; TEXT; FUSION; VIDEO; LANGUAGE; MODELS; GENERATION; ALIGNMENT;
D O I
10.1109/TPAMI.2018.2798607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities. In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together. Multimodal machine learning aims to build models that can process and relate information from multiple modalities. It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential. Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy. We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning. This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research.
引用
收藏
页码:423 / 443
页数:21
相关论文
共 50 条
  • [41] Multimodal Machine Learning for Automated ICD Coding
    Xu, Keyang
    Lam, Mike
    Pang, Jingzhi
    Gao, Xin
    Band, Charlotte
    Mathur, Piyush
    Papay, Frank
    Khanna, Ashish K.
    Cywinski, Jacek B.
    Maheshwari, Kamal
    Xie, Pengtao
    Xing, Eric P.
    Proceedings of Machine Learning Research, 2019, 106 : 197 - 215
  • [42] Interactive Machine Learning for Multimodal Affective Computing
    Titung, Rajesh
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
  • [43] Review of multimodal machine learning approaches in healthcare
    Krones, Felix
    Marikkar, Umar
    Parsons, Guy
    Szmul, Adam
    Mahdi, Adam
    INFORMATION FUSION, 2025, 114
  • [44] A Machine Learning-Oriented Survey on Tiny Machine Learning
    Capogrosso, Luigi
    Cunico, Federico
    Cheng, Dong Seon
    Fummi, Franco
    Cristani, Marco
    IEEE ACCESS, 2024, 12 : 23406 - 23426
  • [45] A TAXONOMY OF QUALITY OF SERVICE AND QUALITY OF EXPERIENCE OF MULTIMODAL HUMAN-MACHINE INTERACTION
    Moeller, Sebastian
    Engelbrecht, Klaus-Peter
    Kuehnel, Christine
    Wechsung, Ina
    Weiss, Benjamin
    QOMEX: 2009 INTERNATIONAL WORKSHOP ON QUALITY OF MULTIMEDIA EXPERIENCE, 2009, : 7 - 12
  • [46] A Survey on Deep Learning for Multimodal Data Fusion
    Gao, Jing
    Li, Peng
    Chen, Zhikui
    Zhang, Jianing
    NEURAL COMPUTATION, 2020, 32 (05) : 829 - 864
  • [47] Survey of Research on Deep Multimodal Representation Learning
    Pan, Mengzhu
    Li, Qianmu
    Qiu, Tian
    Computer Engineering and Applications, 2024, 59 (02) : 48 - 64
  • [48] Federated Learning on Multimodal Data: A Comprehensive Survey
    Yi-Ming Lin
    Yuan Gao
    Mao-Guo Gong
    Si-Jia Zhang
    Yuan-Qiao Zhang
    Zhi-Yuan Li
    Machine Intelligence Research, 2023, 20 : 539 - 553
  • [49] A Survey of Multimodal Learning: Methods, Applications, and Future
    Yuan, Yuan
    Li, Zhaojian
    Zhao, Bin
    ACM COMPUTING SURVEYS, 2025, 57 (07)
  • [50] Machine Learning Techniques: A Survey
    Kour, Herleen
    Gondhi, Naveen
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, 2020, 46 : 266 - 275