A Cross-Domain Exploration of Audio and Textual Data for Multi-Modal Emotion Detection

被引:0
|
作者
Haque, Mohd Ariful [1 ]
George, Roy [1 ]
Rifat, Rakib Hossain [2 ]
Uddin, Md Shihab [3 ]
Kamal, Marufa [3 ]
Gupta, Kishor Datta [1 ]
机构
[1] Clark Atlanta Univ, Atlanta, GA 30314 USA
[2] BRAC Univ, Dhaka, Bangladesh
[3] Comilla Univ, Cumilla, Bangladesh
关键词
Emotion Detection; Bi-LSTM; distilroberta base; Ensemble Methods; Multi-Modal Emotion Detection;
D O I
10.1145/3652037.3663943
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The field of sentiment and emotion analysis is a challenging problem that has received research attention. The complexity of emotion and sentiment recognition draws from variability in expression, cultural and individual differences, context dependency, etc. This work takes an exploratory approach to the problem by performing an extensive classification of emotion using machine learning (ML) applied to textual and auditory data sources. We create a pipeline that facilitates the examination of textual and auditory inputs, resulting in more reliable emotional classification. The study uses multiple audio and textual datasets for the prediction of four distinct emotions. A four-layer Bi-LSTM model achieved 95% accuracy in emotion analysis from auditory clips. The training set contained 2391 samples, with Angry (20%), Fearful (18%), Happy (38%), and Neutral (24%). In the validation set of 713 samples, emotions were similarly distributed. The test set had 312 samples, with percentages of emotions comparable to the training set. We merged four datasets for textual analysis and utilized the "emotion english distilroberta base" model [5], achieving 90% accuracy on the test data. In the training set, emotions were distributed as follows: Angry (25%), Fearful (23%), Happy (23%), and Neutral (29%). The validation set comprised 305 samples, with similar distributions across emotions. The test set consisted of 712 samples, with percentages of emotions similar to the training set. We develop an application that combines both classifications to obtain a robust classification of arbitrary audio tracks.
引用
收藏
页码:375 / 381
页数:7
相关论文
共 50 条
  • [41] A novel transformer autoencoder for multi-modal emotion recognition with incomplete data
    Cheng, Cheng
    Liu, Wenzhe
    Fan, Zhaoxin
    Feng, Lin
    Jia, Ziyu
    Neural Networks, 2024, 172
  • [42] Multi-Modal Sarcasm Detection with Interactive In-Modal and Cross-Modal Graphs
    Liang, Bin
    Lou, Chenwei
    Li, Xiang
    Gui, Lin
    Yang, Min
    Xu, Ruifeng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4707 - 4715
  • [43] Learning consumer preferences through textual and visual data: a multi-modal approach
    Liu, Xinyu
    Liu, Yezheng
    Qian, Yang
    Jiang, Yuanchun
    Ling, Haifeng
    ELECTRONIC COMMERCE RESEARCH, 2023,
  • [44] Strategies for Multi-Modal Scene Exploration
    Bohg, Jeannette
    Johnson-Roberson, Matthew
    Bjorkman, Marten
    Kragic, Danica
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 4509 - 4515
  • [45] Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
    Liang, Jingjun
    Li, Ruichen
    Jin, Qin
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2852 - 2861
  • [46] Real-time multi-modal semantic fusion on unmanned aerial vehicles with label propagation for cross-domain adaptation
    Bultmann, Simon
    Quenzel, Jan
    Behnke, Sven
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2023, 159
  • [47] Cross-Modal Diversity-Based Active Learning for Multi-Modal Emotion Estimation
    Xu, Yifan
    Meng, Lubin
    Peng, Ruimin
    Yin, Yingjie
    Ding, Jingting
    Li, Liang
    Wu, Dongrui
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [48] Sound of Story: Multi-modal Storytelling with Audio
    Bae, Jaeyeon
    Jeong, Seokhoon
    Kong, Seokun
    Han, Namgi
    Lee, Jae-Yon
    Kim, Hyounghun
    Kim, Taehwan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13467 - 13479
  • [49] Multi-modal data novelty detection with adversarial autoencoders
    Chen, Zeqiu
    Zhao, Kaiyi
    Sun, Ruizhi
    APPLIED SOFT COMPUTING, 2024, 165
  • [50] Improving multi-modal data fusion by anomaly detection
    Jakub Simanek
    Vladimir Kubelka
    Michal Reinstein
    Autonomous Robots, 2015, 39 : 139 - 154