A Cross-Domain Exploration of Audio and Textual Data for Multi-Modal Emotion Detection

被引：0

作者：

Haque, Mohd Ariful ^{[1
]}

George, Roy ^{[1
]}

Rifat, Rakib Hossain ^{[2
]}

Uddin, Md Shihab ^{[3
]}

Kamal, Marufa ^{[3
]}

Gupta, Kishor Datta ^{[1
]}

机构：

[1] Clark Atlanta Univ, Atlanta, GA 30314 USA

[2] BRAC Univ, Dhaka, Bangladesh

[3] Comilla Univ, Cumilla, Bangladesh

来源：

17TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2024 | 2024年

关键词：

Emotion Detection; Bi-LSTM; distilroberta base; Ensemble Methods; Multi-Modal Emotion Detection;

D O I：

10.1145/3652037.3663943

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The field of sentiment and emotion analysis is a challenging problem that has received research attention. The complexity of emotion and sentiment recognition draws from variability in expression, cultural and individual differences, context dependency, etc. This work takes an exploratory approach to the problem by performing an extensive classification of emotion using machine learning (ML) applied to textual and auditory data sources. We create a pipeline that facilitates the examination of textual and auditory inputs, resulting in more reliable emotional classification. The study uses multiple audio and textual datasets for the prediction of four distinct emotions. A four-layer Bi-LSTM model achieved 95% accuracy in emotion analysis from auditory clips. The training set contained 2391 samples, with Angry (20%), Fearful (18%), Happy (38%), and Neutral (24%). In the validation set of 713 samples, emotions were similarly distributed. The test set had 312 samples, with percentages of emotions comparable to the training set. We merged four datasets for textual analysis and utilized the "emotion english distilroberta base" model [5], achieving 90% accuracy on the test data. In the training set, emotions were distributed as follows: Angry (25%), Fearful (23%), Happy (23%), and Neutral (29%). The validation set comprised 305 samples, with similar distributions across emotions. The test set consisted of 712 samples, with percentages of emotions similar to the training set. We develop an application that combines both classifications to obtain a robust classification of arbitrary audio tracks.

引用

页码：375 / 381

页数：7

共 50 条

[31] Multi-Modal Self-Supervised Learning for Cross-Domain One-Shot Bearing Fault Diagnosis
Chen, Xiaohan
Xue, Yihao
Huang, Mengjie
Yang, Rui
IFAC PAPERSONLINE, 2024, 58 (04): : 746 - 751
[32] Cross-Domain Rumor Detection based on Dual-Modal Domain Alignment
Liu, Danni
Liu, Bo
Chen, Yida
Wu, Wanmeng
Cao, Jiuxin
Hou, Yiwen
2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 544 - 548
[33] Low-level fusion of audio and video feature for multi-modal emotion recognition
Wimmer, Matthias
Schuller, Bjoern
Arsic, Dejan
Rigoll, Gerhard
Radig, Bernd
VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2008, : 145 - +
[34] Multi-modal depression detection based on emotional audio and evaluation text
Ye, Jiayu
Yu, Yanhong
Wang, Qingxiang
Li, Wentao
Liang, Hu
Zheng, Yunshao
Fu, Gang
JOURNAL OF AFFECTIVE DISORDERS, 2021, 295 : 904 - 913
[35] InSpectr: Multi-Modal Exploration, Visualization, and Analysis of Spectral Data
Amirkhanov, Artem
Froehler, Bernhard
Kastner, Johann
Groeller, Eduard
Heinzl, Christoph
COMPUTER GRAPHICS FORUM, 2014, 33 (03) : 91 - 100
[36] Multi-modal Multi-label Emotion Detection with Modality and Label Dependence
Dong Zhang
Ju, Xincheng
Li, Junhui
Li, Shoushan
Zhu, Qiaoming
Zhou, Guodong
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3584 - 3593
[37] Multi-modal authentication system based on audio-visual data
Debnath, Saswati
Roy, Pinki
PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 2507 - 2512
[38] Improving Gender Identification in Movie Audio using Cross-Domain Data
Hebbar, Rajat
Somandepalli, Krishna
Narayanan, Shrikanth
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 282 - 286
[39] Fake news detection based on multi-modal domain adaptation
Xiaopei Wang
Jiana Meng
Di Zhao
Xuan Meng
Hewen Sun
Neural Computing and Applications, 2025, 37 (7) : 5781 - 5793
[40] A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
Cheng, Cheng
Liu, Wenzhe
Fan, Zhaoxin
Feng, Lin
Jia, Ziyu
NEURAL NETWORKS, 2024, 172

← 1 2 3 4 5 →