An effective multimodal representation and fusion method for multimodal intent recognition

被引:12
|
作者
Huang, Xuejian [1 ,2 ]
Ma, Tinghuai [1 ]
Jia, Li [1 ]
Zhang, Yuanjian [3 ]
Rong, Huan [1 ]
Alnabhan, Najla [4 ]
机构
[1] Nanjing Univ Informat Sci Technol, Sch Comp, Nanjing 210044, Jiangsu, Peoples R China
[2] Jiangxi Univ Finance & Econ, Sch VR Modern Ind, Nanchang 330013, Jiangxi, Peoples R China
[3] China UnionPay Co Ltd, Shanghai 201201, Peoples R China
[4] King Saud Univ, Sch Comp & Informat Sci, Riyadh, Saudi Arabia
基金
中国国家自然科学基金;
关键词
Multimodal intent recognition; Multimodal representation; Multimodal fusion; Attention mechanism; Gated neural network; CLASSIFICATION; TRANSFORMER; KNOWLEDGE;
D O I
10.1016/j.neucom.2023.126373
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Intent recognition is a crucial task in natural language understanding. Current research mainly focuses on task-specific unimodal intent recognition. However, in real-world scenes, human intentions are complex and need to be judged by integrating information such as speech, tone, expression, and action. Therefore, this paper proposes an effective multimodal representation and fusion method (EMRFM) for intent recognition in real-world multimodal scenes. First, text, audio, and vision features are extracted based on pre trained BERT, Wav2vec 2.0, and Faster R-CNN. Then, considering the complementarity and consistency among the modalities, the modality-shared and modality-specific encoders are constructed to learn shared and specific feature representations of the modalities. Finally, an adaptive multimodal fusion method based on an attention-based gated neural network is designed to eliminate noise features. Comprehensive experiments are conducted on the multimodal intent recognition MIntRec benchmark dataset. Our proposed model achieves higher accuracy, precision, recall, and F1-score than state-ofthe-art multimodal learning methods. We also conduct multimodal sentiment recognition experiments on the CMU-MOSI dataset, and our model still outperforms state-of-the-art methods. In addition, the experiment demonstrates that the model's multimodal representation well learned the modality's shared and specific features. The multimodal fusion of the model achieves adaptive fusion and effectively reduces possible noise interference. & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
    Lao, An
    Zhang, Qi
    Shi, Chongyang
    Cao, Longbing
    Yi, Kun
    Hu, Liang
    Miao, Duoqian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18426 - 18434
  • [2] MFIRA: Multimodal Fusion Intent Recognition Algorithm for AR Chemistry Experiments
    Xia, Zishuo
    Feng, Zhiquan
    Yang, Xiaohui
    Kong, Dehui
    Cui, Hong
    APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [3] DWMF: A Method for Hybrid Multimodal Intent Fusion Based on Dynamic Weights
    Lv, Meng
    Feng, Zhiquan
    Yang, Xiaohui
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 247 - 260
  • [4] MIntRec: A New Dataset for Multimodal Intent Recognition
    Zhang, Hanlei
    Xu, Hua
    Wang, Xin
    Zhou, Qianrui
    Zhao, Shaojie
    Teng, Jiayan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1688 - 1697
  • [5] Multimodal fusion for pattern recognition
    Khan, Zubair
    Kumar, Shishir
    Garcia Reyes, Edel B.
    Mahanti, Prabhat
    PATTERN RECOGNITION LETTERS, 2018, 115 : 1 - 3
  • [6] MULTIMODAL EMOTION RECOGNITION WITH CAPSULE GRAPH CONVOLUTIONAL BASED REPRESENTATION FUSION
    Liu, Jiaxing
    Chen, Sen
    Wang, Longbiao
    Liu, Zhilei
    Fu, Yahui
    Guo, Lili
    Dang, Jianwu
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6339 - 6343
  • [7] Research on Multimodal Hierarchical Feature Mapping and Fusion Representation Method
    Guo, Xiaoyu
    Ma, Jing
    Chen, Jie
    Computer Engineering and Applications, 2025, 61 (06) : 171 - 182
  • [8] Multimodal Marketing Intent Analysis for Effective Targeted Advertising
    Zhang, Lu
    Shen, Jialie
    Zhang, Jian
    Xu, Jingsong
    Li, Zhibin
    Yao, Yazhou
    Yu, Litao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1830 - 1843
  • [9] Multimodal data fusion for object recognition
    Knyaz, Vladimir
    MULTIMODAL SENSING: TECHNOLOGIES AND APPLICATIONS, 2019, 11059
  • [10] Multimodal fusion recognition for digital twin
    Zhou, Tianzhe
    Zhang, Xuguang
    Kang, Bing
    Chen, Mingkai
    DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (02) : 337 - 346