Cross-modal dynamic sentiment annotation for speech sentiment analysis

被引:0
|
作者
Chen, Jincai [1 ]
Sun, Chao [1 ]
Zhang, Sheng [1 ]
Zeng, Jiangfeng [2 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan 430074, Peoples R China
[2] Cent China Normal Univ, Sch Informat Management, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech sentiment analysis; Multi-modal video; Sentiment profiles; Cross-modal annotation;
D O I
10.1016/j.compeleceng.2023.108598
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Traditionally, one single hard label determines the sentiment label of an entire utterance for speech sentiment analysis. It obviously ignores the inherent dynamic and ambiguity of speech sentiments. Moreover, there are few segment-level ground truth labels in the most existing sentiment corpora, due to the label ambiguity and annotation cost. In this work, to capture segment-level sentiment fluctuations across one utterance, we propose sentiment profiles (SPs) to express segment-level soft labels. Meanwhile, we introduce massive multi-modal wild video data to solve the data shortage problem, and facial expression knowledge is used to guide audio segments generate soft labels through the Cross-modal Sentiment Annotation Module. Then, we design a Speech Encoder Module to encode audio segments into SPs. We further exploit the sentiment profile purifier (SPP) to iteratively improve the accuracy of SPs. Numerous experiments show that our model achieves state-of-the-art performance on CH-SIMS and IEMOCAP datasets with unlabeled data respectively.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] CCMA: CapsNet for audio-video sentiment analysis using cross-modal attention
    Li, Haibin
    Guo, Aodi
    Li, Yaqian
    VISUAL COMPUTER, 2025, 41 (03): : 1609 - 1620
  • [32] Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification
    Peng, Cheng
    Zhang, Chunxia
    Xue, Xiaojun
    Gao, Jiameng
    Liang, Hongjian
    Niu, Zhengdong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (04) : 664 - 679
  • [33] Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification
    Cheng Peng
    Chunxia Zhang
    Xiaojun Xue
    Jiameng Gao
    Hongjian Liang
    Zhengdong Niu
    TsinghuaScienceandTechnology, 2022, 27 (04) : 664 - 679
  • [34] CiteNet: Cross-modal incongruity perception network for multimodal sentiment prediction
    Wang, Jie
    Yang, Yan
    Liu, Keyu
    Xie, Zhuyang
    Zhang, Fan
    Li, Tianrui
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [35] Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling
    Yan, Xueming
    Xue, Haiwei
    Jiang, Shengyi
    Liu, Ziang
    APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [36] Multimodal Sentiment Analysis Method Based on Cross-Modal Attention and Gated Unit Fusion Network
    Chen, Yansong
    Zhang, Le
    Zhang, Leihan
    Lü, Xueqiang
    Data Analysis and Knowledge Discovery, 2024, 8 (07) : 67 - 76
  • [37] Content-aware sentiment understanding: cross-modal analysis with encoder-decoder architectures
    Pakdaman, Zahra
    Koochari, Abbas
    Sharifi, Arash
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2025, 8 (02):
  • [38] Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks
    Quan, Zhibang
    Sun, Tao
    Su, Mengli
    Wei, Jishu
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [39] Multichannel Cross-Modal Fusion Network for Multimodal Sentiment Analysis Considering Language Information Enhancement
    Hu, Ronglong
    Yi, Jizheng
    Chen, Aibin
    Chen, Lijiang
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (07) : 9814 - 9824
  • [40] Multi-task Gated Contextual Cross-Modal Attention Framework for Sentiment and Emotion Analysis
    Sangwan, Suyash
    Chauhan, Dushyant Singh
    Akhtar, Md Shad
    Ekbal, Asif
    Bhattacharyya, Pushpak
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 662 - 669