RETRIEVING SPEECH SAMPLES WITH SIMILAR EMOTIONAL CONTENT USING A TRIPLET LOSS FUNCTION

被引:0
|
作者
Harvill, John [1 ]
AbdelWahab, Mohammed [1 ]
Lotfian, Reza [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Multimodal Signal Proc MSP Lab, Dept Elect & Comp Engn, Richardson, TX 75080 USA
关键词
emotion retrieval; triplet loss; ranking; perception; preference learning; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The ability to identify speech with similar emotional content is valuable to many applications, including speech retrieval, surveillance, and emotional speech synthesis. While current formulations in speech emotion recognition based on classification or regression are not appropriate for this task, solutions based on preference learning offer appealing approaches for this task. This paper aims to find speech samples that are emotionally similar to an anchor speech sample provided as a query. This novel formulation opens interesting research questions. How well can a machine complete this task? How does the accuracy of automatic algorithms compare to the performance of a human performing this task? This study addresses these questions by training a deep learning model using a triplet loss function, mapping the acoustic features into an embedding that is discriminative for this task. The network receives an anchor speech sample and two competing speech samples, and the task is to determine which of the candidate speech sample conveys the closest emotional content to the emotion conveyed by the anchor. By comparing the results from our model with human perceptual evaluations, this study demonstrates that the proposed approach has performance very close to human performance in retrieving samples with similar emotional content.
引用
收藏
页码:7400 / 7404
页数:5
相关论文
共 25 条
  • [1] Similar Finger Gesture Recognition using Triplet-loss Networks
    Benitez-Garcia, Gibran
    Haris, Muhammad
    Tsuda, Yoshiyuki
    Ukita, Norimichi
    PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2019,
  • [2] Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function
    Huang, Jian
    Li, Ya
    Tao, Jianhua
    Lian, Zheng
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3673 - 3677
  • [3] Facial Emotion Classification Using Deep Embedding with Triplet Loss Function
    Bircanoglu, Cenk
    Arica, Nafiz
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [4] Speaker Identification using Triplet Loss Function Combined with Clustering Techniques
    Shalaby, Mohamed
    Hassan, Mohamed
    Omar, Yasser M. K.
    2021 62ND INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION TECHNOLOGY AND MANAGEMENT SCIENCE OF RIGA TECHNICAL UNIVERSITY (ITMS), 2021,
  • [5] Time-Frequency Emotional Assessment of Speech using the Wigner Function
    Materdey, Thomas
    Materdey, Albert
    Materdey, Alexander
    Truong, Alice
    Materdey, Tomas
    2018 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING (ICCECE), 2018, : 128 - 133
  • [6] Using Perceptual Quality Features in the Design of the Loss Function for Speech Enhancement
    Eng, Nicholas
    Hioka, Yusuke
    Watson, Catherine I.
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1904 - 1909
  • [7] Facial Expression Recognition Robust to Occlusion using Spatial Transformer Network with Triplet Loss Function
    Kim, Jieun
    Lee, Eung-Joo
    Lee, Deokwoo
    PATTERN RECOGNITION AND TRACKING XXXIII, 2022, 12101
  • [9] Detection of coronary artery disease using a triplet network and hybrid loss function on heart sound signal
    Liu, Xu
    Lv, Chengcong
    Cao, Linchun
    Guo, Xingming
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 104
  • [10] Empirical Bayes estimation of the truncation parameter with asymmetric loss function using NA samples
    Shi Y.
    Shi X.
    Gao S.
    Journal of Applied Mathematics and Computing, 2004, 14 (1-2) : 305 - 317