A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach

被引:52
|
作者
Seng, Kah Phooi [1 ]
Ang, Li-Minn [1 ]
Ooi, Chien Shing [2 ]
机构
[1] Charles Sturt Univ, Sch Comp & Math, Bathurst, NSW 2678, Australia
[2] Sunway Univ, Dept Comp Sci & Networked Syst, Subang Jaya 47500, Malaysia
关键词
Emotion recognition; audio-visual processing; rule-based; machine learning; multimodal system; LINEAR DISCRIMINANT-ANALYSIS; EFFICIENT APPROACH; FACE; FRAMEWORK; FUSION; AUDIO; LDA;
D O I
10.1109/TAFFC.2016.2588488
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an audio-visual emotion recognition system that uses a mixture of rule-based and machine learning techniques to improve the recognition efficacy in the audio and video paths. The visual path is designed using the Bi-directional Principal Component Analysis (BDPCA) and Least-Square Linear Discriminant Analysis (LSLDA) for dimensionality reduction and discrimination. The extracted visual features are passed into a newly designed Optimized Kernel-Laplacian Radial Basis Function (OKL-RBF) neural classifier. The audio path is designed using a combination of input prosodic features (pitch, log-energy, zero crossing rates and Teager energy operator) and spectral features (Mel-scale frequency cepstral coefficients). The extracted audio features are passed into an audio feature level fusion module that uses a set of rules to determine the most likely emotion contained in the audio signal. An audio visual fusion module fuses outputs from both paths. The performances of the proposed audio path, visual path, and the final system are evaluated on standard databases. Experiment results and comparisons reveal the good performance of the proposed system.
引用
收藏
页码:3 / 13
页数:11
相关论文
共 50 条
  • [41] Fully automatic face recognition system using a combined audio-visual approach
    Albiol, A
    Torres, L
    Delp, EJ
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2005, 152 (03): : 318 - 326
  • [42] Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
    Lubis, Nurul
    Gomez, Randy
    Sakti, Sakriani
    Nakamura, Keisuke
    Yoshino, Koichiro
    Nakamura, Satoshi
    Nakadai, Kazuhiro
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2180 - 2184
  • [43] Deep learning based multimodal emotion recognition using model-level fusion of audio-visual modalities
    Middya, Asif Iqbal
    Nag, Baibhav
    Roy, Sarbani
    KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [44] Deep learning based multimodal emotion recognition using model-level fusion of audio-visual modalities
    Middya, Asif Iqbal
    Nag, Baibhav
    Roy, Sarbani
    KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [45] Audio-Visual Emotion Recognition with Capsule-like Feature Representation and Model-Based Reinforcement Learning
    Ouyang, Xi
    Nagisetty, Srikanth
    Goh, Ester Gue Hua
    Shen, Shengmei
    Ding, Wan
    Ming, Huaiping
    Huang, Dong-Yan
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [46] Deep Learning for Audio Visual Emotion Recognition
    Hussain, T.
    Wang, W.
    Bouaynaya, N.
    Fathallah-Shaykh, H.
    Mihaylova, L.
    2022 25TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2022), 2022,
  • [47] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [48] Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
    Miao, Yajie
    Metze, Florian
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3414 - 3418
  • [49] To Join or Not to Join: A Study on the Impact of Joint or Unimodal Representation Learning on Audio-Visual Emotion Recognition
    Hajavi, Amirhossein
    Singh, Harmanpreet
    Fashandi, Homa
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [50] Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition
    Takashima, Akihiko
    Masumura, Ryo
    Ando, Atsushi
    Yamazaki, Yoshihiro
    Uchida, Mihiro
    Orihashi, Shota
    INTERSPEECH 2022, 2022, : 4740 - 4744