A Multi-Scale Multi-Task Learning Model for Continuous Dimensional Emotion Recognition from Audio

被引:4
|
作者
Li, Xia [1 ,2 ]
Lu, Guanming [1 ]
Yan, Jingjie [1 ]
Zhang, Zhengyan [1 ,3 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Peoples R China
[2] Anhui Univ Technol, Sch Math & Phys, Maanshan 243000, Peoples R China
[3] Jiangsu Univ Sci & Technol, Sch Elect & Informat, Zhenjiang 212003, Peoples R China
基金
中国国家自然科学基金;
关键词
continuous dimensional emotion recognition; multi-task learning; deep belief network;
D O I
10.3390/electronics11030417
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the advantages of many aspects of the dimensional emotion model, continuous dimensional emotion recognition from audio has attracted increasing attention in recent years. Features and dimensional emotion labels on different time scales have different characteristics and contain different information. To make full use of the advantages of features and emotion representations from multiple time scales, a novel multi-scale multi-task (MSMT) learning model is proposed in this paper. The MSMT model is constructed by a deep belief network (DBN) with only one hidden layer. The same hidden layer parameters and linear layer parameters are shared by all features. Multiple temporal pooling operations are inserted between the hidden layer and the linear layer to obtain information on multiple time scales. The mean squared error (MSE) of the main and the secondary task are combined to form the final objective function. Extensive experiments were conducted on RECOLA and SEMAINE datasets to illustrate the effectiveness of our model. The results for the two sets show that even adding a secondary scale to the scale with optimal single-scale single-task performance can achieve significant performance improvements.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    INTERSPEECH 2021, 2021, : 4508 - 4512
  • [2] Multi-Task Emotion Recognition Based on Dimensional Model and Category Label
    Huo, Yi
    Ge, Yun
    IEEE ACCESS, 2024, 12 : 75169 - 75179
  • [3] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
    Yue, Pengcheng
    Qu, Leyuan
    Zheng, Shukai
    Li, Taihao
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
  • [4] Meta Multi-task Learning for Speech Emotion Recognition
    Cai, Ruichu
    Guo, Kaibin
    Xu, Boyan
    Yang, Xiaoyan
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 3336 - 3340
  • [5] Emotion Recognition With Sequential Multi-task Learning Technique
    Phan Tran Dac Thinh
    Hoang Manh Hung
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Guee-Sang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3586 - 3589
  • [6] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [7] EmoComicNet: A multi-task model for comic emotion recognition
    Dutta, Arpita
    Biswas, Samit
    Das, Amit Kumar
    PATTERN RECOGNITION, 2024, 150
  • [8] Multi-Task Learning Model Based on Multi-Scale CNN and LSTM for Sentiment Classification
    Jin, Ning
    Wu, Jiaxian
    Ma, Xiang
    Yan, Ke
    Mo, Yuchang
    IEEE ACCESS, 2020, 8 : 77060 - 77072
  • [9] Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features
    Hao M.
    Cao W.-H.
    Liu Z.-T.
    Wu M.
    Xiao P.
    Cao, Wei-Hua (weihuacao@cug.edu.cn), 1600, Elsevier B.V., Netherlands (391): : 42 - 51
  • [10] MATTE: Multi-task multi-scale attention
    Strezoski, Gjorgji
    van Noord, Nanne
    Worring, Marcel
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 228