Cross-modal dynamic sentiment annotation for speech sentiment analysis

被引：0

作者：

Chen, Jincai ^{[1
]}

Sun, Chao ^{[1
]}

Zhang, Sheng ^{[1
]}

Zeng, Jiangfeng ^{[2
]}

机构：

[1] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan 430074, Peoples R China

[2] Cent China Normal Univ, Sch Informat Management, Wuhan 430079, Peoples R China

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2023年 / 106卷

基金：

中国国家自然科学基金;

关键词：

Speech sentiment analysis; Multi-modal video; Sentiment profiles; Cross-modal annotation;

D O I：

10.1016/j.compeleceng.2023.108598

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Traditionally, one single hard label determines the sentiment label of an entire utterance for speech sentiment analysis. It obviously ignores the inherent dynamic and ambiguity of speech sentiments. Moreover, there are few segment-level ground truth labels in the most existing sentiment corpora, due to the label ambiguity and annotation cost. In this work, to capture segment-level sentiment fluctuations across one utterance, we propose sentiment profiles (SPs) to express segment-level soft labels. Meanwhile, we introduce massive multi-modal wild video data to solve the data shortage problem, and facial expression knowledge is used to guide audio segments generate soft labels through the Cross-modal Sentiment Annotation Module. Then, we design a Speech Encoder Module to encode audio segments into SPs. We further exploit the sentiment profile purifier (SPP) to iteratively improve the accuracy of SPs. Numerous experiments show that our model achieves state-of-the-art performance on CH-SIMS and IEMOCAP datasets with unlabeled data respectively.

引用

页数：14

共 50 条

[31] CCMA: CapsNet for audio-video sentiment analysis using cross-modal attention
Li, Haibin
Guo, Aodi
Li, Yaqian
VISUAL COMPUTER, 2025, 41 (03): : 1609 - 1620
[32] Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification
Peng, Cheng
Zhang, Chunxia
Xue, Xiaojun
Gao, Jiameng
Liang, Hongjian
Niu, Zhengdong
TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (04) : 664 - 679
[33] Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification
Cheng Peng
Chunxia Zhang
Xiaojun Xue
Jiameng Gao
Hongjian Liang
Zhengdong Niu
TsinghuaScienceandTechnology, 2022, 27 (04) : 664 - 679
[34] CiteNet: Cross-modal incongruity perception network for multimodal sentiment prediction
Wang, Jie
Yang, Yan
Liu, Keyu
Xie, Zhuyang
Zhang, Fan
Li, Tianrui
KNOWLEDGE-BASED SYSTEMS, 2024, 295
[35] Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling
Yan, Xueming
Xue, Haiwei
Jiang, Shengyi
Liu, Ziang
APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
[36] Multimodal Sentiment Analysis Method Based on Cross-Modal Attention and Gated Unit Fusion Network
Chen, Yansong
Zhang, Le
Zhang, Leihan
Lü, Xueqiang
Data Analysis and Knowledge Discovery, 2024, 8 (07) : 67 - 76
[37] Content-aware sentiment understanding: cross-modal analysis with encoder-decoder architectures
Pakdaman, Zahra
Koochari, Abbas
Sharifi, Arash
JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2025, 8 (02):
[38] Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks
Quan, Zhibang
Sun, Tao
Su, Mengli
Wei, Jishu
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[39] Multichannel Cross-Modal Fusion Network for Multimodal Sentiment Analysis Considering Language Information Enhancement
Hu, Ronglong
Yi, Jizheng
Chen, Aibin
Chen, Lijiang
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (07) : 9814 - 9824
[40] Multi-task Gated Contextual Cross-Modal Attention Framework for Sentiment and Emotion Analysis
Sangwan, Suyash
Chauhan, Dushyant Singh
Akhtar, Md Shad
Ekbal, Asif
Bhattacharyya, Pushpak
NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 : 662 - 669

← 1 2 3 4 5 →