Exploring Speaker Age Estimation on Different Self-Supervised Learning Models

被引:0
|
作者
Truong, Duc-Tuan [1 ]
Anh, Tran The [1 ]
Siong, Chng Eng [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning (SSL) has played an important role in various tasks in the field of speech and audio processing. However, there is limited research on adapting these SSL models to predict the speaker's age and gender using speech signals. In this paper, we investigate seven SSL models, namely PASE+, NPC, wav2vec 2.0, XLSR, HuBERT, WavLM, and data2vec in the joint age estimation and gender classification task on the TIMIT corpus. Additionally, we also study the effect of using different hidden encoder layers within these models on the age estimation result. Furthermore, we evaluate how the performance of different SSL models varies in predicting the speaker's age under simulated noisy conditions. The simulated noisy speech is created by mixing the clean utterance from the TIMIT test set with random noises from the Music and Noise category of the MUSAN corpus on multiple levels of signal-to-noise ratio (SNR). Our findings confirm that a recent SSL model, namely WavLM can obtain better and more robust speech representation than wav2vec 2.0 SSL model used in the current state-of-the-art (SOTA) approach by achieving a 3.6% and 11.32% mean average error (MAE) reduction on the clean and 5dB SNR TIMIT test set.
引用
收藏
页码:1950 / 1955
页数:6
相关论文
共 50 条
  • [41] Self-supervised monocular image depth learning and confidence estimation
    Chen, Long
    Tang, Wen
    Wan, Tao Ruan
    John, Nigel W.
    NEUROCOMPUTING, 2020, 381 : 272 - 281
  • [42] Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy
    Liu, Xingtong
    Sinha, Ayushi
    Unberath, Mathias
    Ishii, Masaru
    Hager, Gregory D.
    Taylor, Russell H.
    Reiter, Austin
    OR 2.0 CONTEXT-AWARE OPERATING THEATERS, COMPUTER ASSISTED ROBOTIC ENDOSCOPY, CLINICAL IMAGE-BASED PROCEDURES, AND SKIN IMAGE ANALYSIS, OR 2.0 2018, 2018, 11041 : 128 - 138
  • [43] A Novel Self-Supervised Learning Network for Binocular Disparity Estimation
    Tian, Jiawei
    Zhou, Yu
    Chen, Xiaobing
    AlQahtani, Salman A.
    Chen, Hongrong
    Yang, Bo
    Lu, Siyu
    Zheng, Wenfeng
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2025, 142 (01):
  • [44] Self-Supervised Learning of Domain Invariant Features for Depth Estimation
    Akada, Hiroyasu
    Bhat, Shariq Farooq
    Alhashim, Ibraheem
    Wonka, Peter
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 997 - 1007
  • [45] Gated Self-supervised Learning for Improving Supervised Learning
    Fuadi, Erland Hillman
    Ruslim, Aristo Renaldo
    Wardhana, Putu Wahyu Kusuma
    Yudistira, Novanto
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 611 - 615
  • [46] EXPLORING EFFICIENT-TUNING METHODS IN SELF-SUPERVISED SPEECH MODELS
    Chen, Zih-Ching
    Fu, Chin-Lun
    Liu, Chih-Ying
    Li, Shang-Wen
    Lee, Hung-yi
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1120 - 1127
  • [47] Comparison of Different Supervised and Self-supervised Learning Techniques in Skin Disease Classification
    Cino, Loris
    Mazzeo, Pier Luigi
    Distante, Cosimo
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 77 - 88
  • [48] Self-Supervised Dialogue Learning
    Wu, Jiawei
    Wang, Xin
    Wang, William Yang
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3857 - 3867
  • [49] Self-supervised learning model
    Saga, Kazushie
    Sugasaka, Tamami
    Sekiguchi, Minoru
    Fujitsu Scientific and Technical Journal, 1993, 29 (03): : 209 - 216
  • [50] Longitudinal self-supervised learning
    Zhao, Qingyu
    Liu, Zixuan
    Adeli, Ehsan
    Pohl, Kilian M.
    MEDICAL IMAGE ANALYSIS, 2021, 71