Exploring Speaker Age Estimation on Different Self-Supervised Learning Models

被引:0
|
作者
Truong, Duc-Tuan [1 ]
Anh, Tran The [1 ]
Siong, Chng Eng [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning (SSL) has played an important role in various tasks in the field of speech and audio processing. However, there is limited research on adapting these SSL models to predict the speaker's age and gender using speech signals. In this paper, we investigate seven SSL models, namely PASE+, NPC, wav2vec 2.0, XLSR, HuBERT, WavLM, and data2vec in the joint age estimation and gender classification task on the TIMIT corpus. Additionally, we also study the effect of using different hidden encoder layers within these models on the age estimation result. Furthermore, we evaluate how the performance of different SSL models varies in predicting the speaker's age under simulated noisy conditions. The simulated noisy speech is created by mixing the clean utterance from the TIMIT test set with random noises from the Music and Noise category of the MUSAN corpus on multiple levels of signal-to-noise ratio (SNR). Our findings confirm that a recent SSL model, namely WavLM can obtain better and more robust speech representation than wav2vec 2.0 SSL model used in the current state-of-the-art (SOTA) approach by achieving a 3.6% and 11.32% mean average error (MAE) reduction on the clean and 5dB SNR TIMIT test set.
引用
收藏
页码:1950 / 1955
页数:6
相关论文
共 50 条
  • [31] Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
    Amrani, Elad
    Ben-Ari, Rami
    Rotman, Daniel
    Bronstein, Alex
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6644 - 6652
  • [32] MULTI-SPEAKER PITCH TRACKING VIA EMBODIED SELF-SUPERVISED LEARNING
    Li, Xiang
    Sun, Yifan
    Wu, Xihong
    Chen, Jing
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8257 - 8261
  • [33] ROBUST SELF-SUPERVISED SPEAKER REPRESENTATION LEARNING VIA INSTANCE MIX REGULARIZATION
    Kang, Woo Hyun
    Alam, Jahangir
    Fathan, Abderrahim
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6617 - 6621
  • [34] Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
    Chen, Sanyuan
    Wu, Yu
    Wang, Chengyi
    Liu, Shujie
    Chen, Zhuo
    Wang, Peidong
    Liu, Gang
    Li, Jinyu
    Wu, Jian
    Yu, Xiangzhan
    Wei, Furu
    INTERSPEECH 2022, 2022, : 3699 - 3703
  • [35] Embodiment: Self-Supervised Depth Estimation Based on Camera Models
    Zhang, Jinchang
    Reddy, Praveen Kumar
    Wong, Xue-Iuan
    Aloimonos, Yiannis
    Lu, Guoyu
    2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2024), 2024, : 7809 - 7816
  • [36] Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation
    Liu, Pengpeng
    Lyu, Michael R.
    King, Irwin
    Xu, Jia
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5026 - 5041
  • [37] Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
    Karpov, Aleksei
    Makarov, Ilya
    2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2022), 2022, : 711 - 719
  • [38] Structural Equivariance Self-Supervised Learning for Facial Pose Estimation
    Wang, Yaoxing
    Zhou, Heng
    Li, Zhendong
    Mo, Xian
    Liu, Hao
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2651 - 2656
  • [39] Self-Supervised Learning of Point Clouds via Orientation Estimation
    Poursaeed, Omid
    Jiang, Tianxing
    Qiao, Han
    Xu, Nayun
    Kim, Vladimir G.
    2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1018 - 1028
  • [40] Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning
    Denize, Julien
    Rabarisoa, Jaonary
    Orcesi, Astrid
    Herault, Romain
    Canu, Stephane
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2705 - 2715