A Cross-Domain Exploration of Audio and Textual Data for Multi-Modal Emotion Detection

被引:0
|
作者
Haque, Mohd Ariful [1 ]
George, Roy [1 ]
Rifat, Rakib Hossain [2 ]
Uddin, Md Shihab [3 ]
Kamal, Marufa [3 ]
Gupta, Kishor Datta [1 ]
机构
[1] Clark Atlanta Univ, Atlanta, GA 30314 USA
[2] BRAC Univ, Dhaka, Bangladesh
[3] Comilla Univ, Cumilla, Bangladesh
关键词
Emotion Detection; Bi-LSTM; distilroberta base; Ensemble Methods; Multi-Modal Emotion Detection;
D O I
10.1145/3652037.3663943
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The field of sentiment and emotion analysis is a challenging problem that has received research attention. The complexity of emotion and sentiment recognition draws from variability in expression, cultural and individual differences, context dependency, etc. This work takes an exploratory approach to the problem by performing an extensive classification of emotion using machine learning (ML) applied to textual and auditory data sources. We create a pipeline that facilitates the examination of textual and auditory inputs, resulting in more reliable emotional classification. The study uses multiple audio and textual datasets for the prediction of four distinct emotions. A four-layer Bi-LSTM model achieved 95% accuracy in emotion analysis from auditory clips. The training set contained 2391 samples, with Angry (20%), Fearful (18%), Happy (38%), and Neutral (24%). In the validation set of 713 samples, emotions were similarly distributed. The test set had 312 samples, with percentages of emotions comparable to the training set. We merged four datasets for textual analysis and utilized the "emotion english distilroberta base" model [5], achieving 90% accuracy on the test data. In the training set, emotions were distributed as follows: Angry (25%), Fearful (23%), Happy (23%), and Neutral (29%). The validation set comprised 305 samples, with similar distributions across emotions. The test set consisted of 712 samples, with percentages of emotions similar to the training set. We develop an application that combines both classifications to obtain a robust classification of arbitrary audio tracks.
引用
收藏
页码:375 / 381
页数:7
相关论文
共 50 条
  • [21] Multi-Modal Learning over User-Contributed Content from Cross-Domain Social Media
    Lee, Wen-Yu
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 4301 - 4302
  • [22] Multi-modal cross-domain self-supervised pre-training for fMRI and EEG fusion
    Wei, Xinxu
    Zhao, Kanhao
    Jiao, Yong
    Carlisle, Nancy B.
    Xie, Hua
    Fonzo, Gregory A.
    Zhang, Yu
    NEURAL NETWORKS, 2025, 184
  • [23] Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction
    Brady, Kevin
    Gwon, Youngjune
    Khorrami, Pooya
    Godoy, Elizabeth
    Campbell, William
    Dagli, Charlie
    Huang, Thomas S.
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON AUDIO/VISUAL EMOTION CHALLENGE (AVEC'16), 2016, : 97 - 104
  • [24] Multi-Modal Emotion Recognition Based On deep Learning Of EEG And Audio Signals
    Li, Zhongjie
    Zhang, Gaoyan
    Dang, Jianwu
    Wang, Longbiao
    Wei, Jianguo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [25] Cross-cultural and multi-modal investigation of emotion expression
    Institute of Linguistics, CASS, Beijing 100732, China
    不详
    不详
    Qinghua Daxue Xuebao, SUPPL. 1 (1393-1401):
  • [26] Multi-Modal Anomaly Detection by Using Audio and Visual Cues
    Rehman, Ata-Ur
    Ullah, Hafiz Sami
    Farooq, Haroon
    Khan, Muhammad Salman
    Mahmood, Tayyeb
    Khan, Hafiz Owais Ahmed
    IEEE ACCESS, 2021, 9 : 30587 - 30603
  • [27] Multi-Modal Residual Perceptron Network for Audio-Video Emotion Recognition
    Chang, Xin
    Skarbek, Wladyslaw
    SENSORS, 2021, 21 (16)
  • [28] Audio-Visual Emotion Recognition System Using Multi-Modal Features
    Handa, Anand
    Agarwal, Rashi
    Kohli, Narendra
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [29] Influence of Multi-Modal Interactive Formats on Subjective Audio Quality and Exploration Behavior
    Robotham, Thomas
    Singla, Ashutosh
    Raake, Alexander
    Rummukainen, Olli S.
    Habets, Emanuel A. P.
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2023, 2023, : 115 - 128
  • [30] CFDA-CSF: A Multi-Modal Domain Adaptation Method for Cross-Subject Emotion Recognition
    Jimenez-Guarneros, Magdiel
    Fuentes-Pineda, Gibran
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1502 - 1513