A unified multimodal classification framework based on deep metric learning

被引:0
|
作者
Peng, Liwen [1 ,2 ]
Jian, Songlei [2 ]
Li, Minne [1 ]
Kan, Zhigang [1 ]
Qiao, Linbo [2 ]
Li, Dongsheng [2 ]
机构
[1] Intelligent Game & Decis Lab, Beijing 100080, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal classification; Deep metric learning; Multimodal learning; Fake news detection; Sentiment analysis; FUSION;
D O I
10.1016/j.neunet.2024.106747
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal classification algorithms play an essential role in multimodal machine learning, aiming to categorize distinct data points by analyzing data characteristics from multiple modalities. Extensive research has been conducted on distilling multimodal attributes and devising specialized fusion strategies for targeted classification tasks. Nevertheless, current algorithms mainly concentrate on a specific classification task and process data about the corresponding modalities. To address these limitations, we propose a unified multimodal classification framework proficient in handling diverse multimodal classification tasks and processing data from disparate modalities. UMCF is task-independent, and its unimodal feature extraction module can be adaptively substituted to accommodate data from diverse modalities. Moreover, we construct the multimodal learning scheme based on deep metric learning to mine latent characteristics within multimodal data. Specifically, we design the metric-based triplet learning to extract the intra-modal relationships within each modality and the contrastive pairwise learning to capture the inter-modal relationships across various modalities. Extensive experiments on two multimodal classification tasks, fake news detection and sentiment analysis, demonstrate that UMCF can extract multimodal data features and achieve superior classification performance than task- specific benchmarks. UMCF outperforms the best fake news detection baselines by 2.3% on average regarding F1 scores.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A UNIFIED MULTIMODAL DEEP LEARNING FRAMEWORK FOR REMOTE SENSING IMAGERY CLASSIFICATION
    Hong, Danfeng
    Gao, Lianru
    Wu, Xin
    Yao, Jing
    Yokoya, Naoto
    Zhang, Bing
    2021 11TH WORKSHOP ON HYPERSPECTRAL IMAGING AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2021,
  • [2] HYDRA: A multimodal deep learning framework for malware classification
    Gibert, Daniel
    Mateu, Carles
    Planes, Jordi
    COMPUTERS & SECURITY, 2020, 95
  • [3] Graph-based multimodal fusion with metric learning for multimodal classification
    Angelou, Michalis
    Solachidis, Vassilis
    Vretos, Nicholas
    Daras, Petros
    PATTERN RECOGNITION, 2019, 95 : 296 - 307
  • [4] Hierarchical Multimodal Metric Learning for Multimodal Classification
    Zhang, Heng
    Patel, Vishal M.
    Chellappa, Rama
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2925 - 2933
  • [5] A Unified Framework for Metric Transfer Learning
    Xu, Yonghui
    Pan, Sinno Jialin
    Xiong, Hui
    Wu, Qingyao
    Luo, Ronghua
    Min, Huaqing
    Song, Hengjie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (06) : 1158 - 1171
  • [6] A Novel Multimodal Deep Learning Framework for Encrypted Traffic Classification
    Lin, Peng
    Ye, Kejiang
    Hu, Yishen
    Lin, Yanying
    Xu, Cheng-Zhong
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2023, 31 (03) : 1369 - 1384
  • [7] Deep Hierarchical Multimodal Metric Learning
    Wang, Di
    Ding, Aqiang
    Tian, Yumin
    Wang, Quan
    He, Lihuo
    Gao, Xinbo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15787 - 15799
  • [8] Deep Hierarchical Multimodal Metric Learning
    Wang, Di
    Ding, Aqiang
    Tian, Yumin
    Wang, Quan
    He, Lihuo
    Gao, Xinbo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15787 - 15799
  • [9] Deep metric loss for multimodal learning
    Moon, Sehwan
    Lee, Hyunju
    MACHINE LEARNING, 2025, 114 (01)
  • [10] Multimodal Cascaded Framework with Metric Learning Robust to Missing Modalities for Person Classification
    John, Vijay
    Kawanishi, Yasutomo
    PROCEEDINGS OF THE 2023 PROCEEDINGS OF THE 14TH ACM MULTIMEDIA SYSTEMS CONFERENCE, MMSYS 2023, 2023, : 257 - 265