Exploring Speaker Age Estimation on Different Self-Supervised Learning Models

被引:0
|
作者
Truong, Duc-Tuan [1 ]
Anh, Tran The [1 ]
Siong, Chng Eng [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning (SSL) has played an important role in various tasks in the field of speech and audio processing. However, there is limited research on adapting these SSL models to predict the speaker's age and gender using speech signals. In this paper, we investigate seven SSL models, namely PASE+, NPC, wav2vec 2.0, XLSR, HuBERT, WavLM, and data2vec in the joint age estimation and gender classification task on the TIMIT corpus. Additionally, we also study the effect of using different hidden encoder layers within these models on the age estimation result. Furthermore, we evaluate how the performance of different SSL models varies in predicting the speaker's age under simulated noisy conditions. The simulated noisy speech is created by mixing the clean utterance from the TIMIT test set with random noises from the Music and Noise category of the MUSAN corpus on multiple levels of signal-to-noise ratio (SNR). Our findings confirm that a recent SSL model, namely WavLM can obtain better and more robust speech representation than wav2vec 2.0 SSL model used in the current state-of-the-art (SOTA) approach by achieving a 3.6% and 11.32% mean average error (MAE) reduction on the clean and 5dB SNR TIMIT test set.
引用
收藏
页码:1950 / 1955
页数:6
相关论文
共 50 条
  • [1] Exploring the vulnerability of self-supervised monocular depth estimation models
    Hou, Ruitao
    Mo, Kanghua
    Long, Yucheng
    Li, Ning
    Rao, Yuan
    INFORMATION SCIENCES, 2024, 677
  • [2] Exploring self-supervised learning techniques for hand pose estimation
    Dahiya, Aneesh
    Spurr, Adrian
    Hilliges, Otmar
    NEURIPS 2020 WORKSHOP ON PRE-REGISTRATION IN MACHINE LEARNING, VOL 148, 2020, 148 : 255 - 271
  • [3] Curriculum learning for self-supervised speaker verification
    Heo, Hee-Soo
    Jung, Jee-weon
    Kang, Jingu
    Kwon, Youngki
    Kim, You Jin
    Lee, Bong-Jin
    Chung, Joon Son
    INTERSPEECH 2023, 2023, : 4693 - 4697
  • [4] Self-Supervised Learning for Online Speaker Diarization
    Chien, Jen-Tzung
    Luo, Sixun
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 2036 - 2042
  • [5] ROBUST SPEAKER VERIFICATION WITH JOINT SELF-SUPERVISED AND SUPERVISED LEARNING
    Wang, Kai
    Zhang, Xiaolei
    Zhang, Miao
    Li, Yuguang
    Lee, Jaeyun
    Cho, Kiho
    Park, Sung-UN
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7637 - 7641
  • [6] ADVERSARIAL DEFENSE FOR AUTOMATIC SPEAKER VERIFICATION BY CASCADED SELF-SUPERVISED LEARNING MODELS
    Wu, Haibin
    Li, Xu
    Liu, Andy T.
    Wu, Zhiyong
    Meng, Helen
    Lee, Hung-yi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6718 - 6722
  • [7] Self-supervised Speaker Diarization
    Dissen, Yehoshua
    Kreuk, Felix
    Keshet, Joseph
    INTERSPEECH 2022, 2022, : 4013 - 4017
  • [8] Self-supervised speaker embeddings
    Stafylakis, Themos
    Rohdin, Johan
    Plchot, Oldrich
    Mizera, Petr
    Burget, Lukas
    INTERSPEECH 2019, 2019, : 2863 - 2867
  • [9] Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-Supervised Speaker Verification
    Mun, Sung Hwan
    Han, Min Hyun
    Lee, Dongjune
    Kim, Jihwan
    Kim, Nam Soo
    IEEE ACCESS, 2021, 9 : 167615 - 167627
  • [10] SELF-SUPERVISED SPEAKER RECOGNITION WITH LOSS-GATED LEARNING
    Tao, Ruijie
    Lee, Kong Aik
    Das, Rohan Kumar
    Hautamaki, Ville
    Li, Haizhou
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6142 - 6146