A Cross-Domain Exploration of Audio and Textual Data for Multi-Modal Emotion Detection

被引:0
|
作者
Haque, Mohd Ariful [1 ]
George, Roy [1 ]
Rifat, Rakib Hossain [2 ]
Uddin, Md Shihab [3 ]
Kamal, Marufa [3 ]
Gupta, Kishor Datta [1 ]
机构
[1] Clark Atlanta Univ, Atlanta, GA 30314 USA
[2] BRAC Univ, Dhaka, Bangladesh
[3] Comilla Univ, Cumilla, Bangladesh
关键词
Emotion Detection; Bi-LSTM; distilroberta base; Ensemble Methods; Multi-Modal Emotion Detection;
D O I
10.1145/3652037.3663943
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The field of sentiment and emotion analysis is a challenging problem that has received research attention. The complexity of emotion and sentiment recognition draws from variability in expression, cultural and individual differences, context dependency, etc. This work takes an exploratory approach to the problem by performing an extensive classification of emotion using machine learning (ML) applied to textual and auditory data sources. We create a pipeline that facilitates the examination of textual and auditory inputs, resulting in more reliable emotional classification. The study uses multiple audio and textual datasets for the prediction of four distinct emotions. A four-layer Bi-LSTM model achieved 95% accuracy in emotion analysis from auditory clips. The training set contained 2391 samples, with Angry (20%), Fearful (18%), Happy (38%), and Neutral (24%). In the validation set of 713 samples, emotions were similarly distributed. The test set had 312 samples, with percentages of emotions comparable to the training set. We merged four datasets for textual analysis and utilized the "emotion english distilroberta base" model [5], achieving 90% accuracy on the test data. In the training set, emotions were distributed as follows: Angry (25%), Fearful (23%), Happy (23%), and Neutral (29%). The validation set comprised 305 samples, with similar distributions across emotions. The test set consisted of 712 samples, with percentages of emotions similar to the training set. We develop an application that combines both classifications to obtain a robust classification of arbitrary audio tracks.
引用
收藏
页码:375 / 381
页数:7
相关论文
共 50 条
  • [31] Multi-Modal Self-Supervised Learning for Cross-Domain One-Shot Bearing Fault Diagnosis
    Chen, Xiaohan
    Xue, Yihao
    Huang, Mengjie
    Yang, Rui
    IFAC PAPERSONLINE, 2024, 58 (04): : 746 - 751
  • [32] Cross-Domain Rumor Detection based on Dual-Modal Domain Alignment
    Liu, Danni
    Liu, Bo
    Chen, Yida
    Wu, Wanmeng
    Cao, Jiuxin
    Hou, Yiwen
    2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 544 - 548
  • [33] Low-level fusion of audio and video feature for multi-modal emotion recognition
    Wimmer, Matthias
    Schuller, Bjoern
    Arsic, Dejan
    Rigoll, Gerhard
    Radig, Bernd
    VISAPP 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2008, : 145 - +
  • [34] Multi-modal depression detection based on emotional audio and evaluation text
    Ye, Jiayu
    Yu, Yanhong
    Wang, Qingxiang
    Li, Wentao
    Liang, Hu
    Zheng, Yunshao
    Fu, Gang
    JOURNAL OF AFFECTIVE DISORDERS, 2021, 295 : 904 - 913
  • [35] InSpectr: Multi-Modal Exploration, Visualization, and Analysis of Spectral Data
    Amirkhanov, Artem
    Froehler, Bernhard
    Kastner, Johann
    Groeller, Eduard
    Heinzl, Christoph
    COMPUTER GRAPHICS FORUM, 2014, 33 (03) : 91 - 100
  • [36] Multi-modal Multi-label Emotion Detection with Modality and Label Dependence
    Dong Zhang
    Ju, Xincheng
    Li, Junhui
    Li, Shoushan
    Zhu, Qiaoming
    Zhou, Guodong
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3584 - 3593
  • [37] Multi-modal authentication system based on audio-visual data
    Debnath, Saswati
    Roy, Pinki
    PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 2507 - 2512
  • [38] Improving Gender Identification in Movie Audio using Cross-Domain Data
    Hebbar, Rajat
    Somandepalli, Krishna
    Narayanan, Shrikanth
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 282 - 286
  • [39] Fake news detection based on multi-modal domain adaptation
    Xiaopei Wang
    Jiana Meng
    Di Zhao
    Xuan Meng
    Hewen Sun
    Neural Computing and Applications, 2025, 37 (7) : 5781 - 5793
  • [40] A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
    Cheng, Cheng
    Liu, Wenzhe
    Fan, Zhaoxin
    Feng, Lin
    Jia, Ziyu
    NEURAL NETWORKS, 2024, 172