Audio Features for Music Emotion Recognition: A Survey

被引:41
|
作者
Panda, Renato [1 ,2 ]
Malheiro, Ricardo [1 ,3 ]
Paiva, Rui Pedro [1 ]
机构
[1] Univ Coimbra, Ctr Informat & Syst, Dept Informat Engn, P-3030290 Coimbra, Portugal
[2] Polytech Inst Tomar, Ci2, P-2300313 Tomar, Portugal
[3] Miguel Torga Higher Inst, P-3000132 Coimbra, Portugal
关键词
Rhythm; Feature extraction; Emotion recognition; Psychology; Indexes; Machine learning; Affective computing; music emotion recognition; audio feature design; music information retrieval; PERCEPTION; EXPRESSION; PITCH; EXTRACTION; SPEECH; TIMBRE; REPRESENTATIONS; CLASSIFICATION; REGRESSION; RESPONSES;
D O I
10.1109/TAFFC.2020.3032373
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The design of meaningful audio features is a key need to advance the state-of-the-art in music emotion recognition (MER). This article presents a survey on the existing emotionally-relevant computational audio features, supported by the music psychology literature on the relations between eight musical dimensions (melody, harmony, rhythm, dynamics, tone color, expressivity, texture and form) and specific emotions. Based on this review, current gaps and needs are identified and strategies for future research on feature engineering for MER are proposed, namely ideas for computational audio features that capture elements of musical form, texture and expressivity that should be further researched. Previous MER surveys offered broad reviews, covering topics such as emotion paradigms, approaches for the collection of ground-truth data, types of MER problems and overviewing different MER systems. On the contrary, our approach is to offer a deep and specific review on one key MER problem: the design of emotionally-relevant audio features.
引用
收藏
页码:68 / 88
页数:21
相关论文
共 50 条
  • [31] Perceptual audio features for emotion detection
    Sezgin, Mehmet Cenk
    Gunsel, Bilge
    Kurt, Gunes Karabulut
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012,
  • [32] Perceptual audio features for emotion detection
    Mehmet Cenk Sezgin
    Bilge Gunsel
    Gunes Karabulut Kurt
    EURASIP Journal on Audio, Speech, and Music Processing, 2012
  • [33] Learning Affective Features With a Hybrid Deep Model for Audio-Visual Emotion Recognition
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    Tian, Qi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3030 - 3043
  • [34] Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning
    Gondohanindijo, Jutono
    Muljono
    Noersasongko, Edi
    Pujiono
    Setiadi, De Rosal Moses
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 198 - 206
  • [35] Visual -audio emotion recognition based on multi -task and ensemble learning with multiple features ?
    Hao, Man
    Cao, Wei-Hua
    Liu, Zhen-Tao
    Wu, Min
    Xiao, Peng
    NEUROCOMPUTING, 2020, 391 : 42 - 51
  • [36] Audio Visual Emotion Recognition Using Cross Correlation and Wavelet Packet Domain Features
    Noor, Shamman
    Dhrubo, Ehsan Ahmed
    Minhaz, Ahmed Tahseen
    Shahnaz, Celia
    Fattah, Shaikh Anowarul
    2017 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (IEEE WIECON-ECE 2017), 2017, : 233 - 236
  • [37] Human emotion recognition from videos using spatio-temporal and audio features
    Munaf Rashid
    S. A. R. Abu-Bakar
    Musa Mokji
    The Visual Computer, 2013, 29 : 1269 - 1275
  • [38] Audio technology - A bridge Between Music and Emotion
    Suteu, Ligia-Claudia
    Dragulin, Stela
    INFORMATION AND COMMUNICATION TECHNOLOGY IN MUSICAL FIELD, 2018, 9 (02): : 25 - 30
  • [39] Audio-Visual Emotion Recognition System Using Multi-Modal Features
    Handa, Anand
    Agarwal, Rashi
    Kohli, Narendra
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [40] Human emotion recognition from videos using spatio-temporal and audio features
    Rashid, Munaf
    Abu-Bakar, S. A. R.
    Mokji, Musa
    VISUAL COMPUTER, 2013, 29 (12): : 1269 - 1275