Exploring Speaker Age Estimation on Different Self-Supervised Learning Models

被引：0

作者：

Truong, Duc-Tuan ^{[1
]}

Anh, Tran The ^{[1
]}

Siong, Chng Eng ^{[1
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

来源：

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Self-supervised learning (SSL) has played an important role in various tasks in the field of speech and audio processing. However, there is limited research on adapting these SSL models to predict the speaker's age and gender using speech signals. In this paper, we investigate seven SSL models, namely PASE+, NPC, wav2vec 2.0, XLSR, HuBERT, WavLM, and data2vec in the joint age estimation and gender classification task on the TIMIT corpus. Additionally, we also study the effect of using different hidden encoder layers within these models on the age estimation result. Furthermore, we evaluate how the performance of different SSL models varies in predicting the speaker's age under simulated noisy conditions. The simulated noisy speech is created by mixing the clean utterance from the TIMIT test set with random noises from the Music and Noise category of the MUSAN corpus on multiple levels of signal-to-noise ratio (SNR). Our findings confirm that a recent SSL model, namely WavLM can obtain better and more robust speech representation than wav2vec 2.0 SSL model used in the current state-of-the-art (SOTA) approach by achieving a 3.6% and 11.32% mean average error (MAE) reduction on the clean and 5dB SNR TIMIT test set.

引用

页码：1950 / 1955

页数：6

共 50 条

[31] Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Amrani, Elad
Ben-Ari, Rami
Rotman, Daniel
Bronstein, Alex
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6644 - 6652
[32] MULTI-SPEAKER PITCH TRACKING VIA EMBODIED SELF-SUPERVISED LEARNING
Li, Xiang
Sun, Yifan
Wu, Xihong
Chen, Jing
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8257 - 8261
[33] ROBUST SELF-SUPERVISED SPEAKER REPRESENTATION LEARNING VIA INSTANCE MIX REGULARIZATION
Kang, Woo Hyun
Alam, Jahangir
Fathan, Abderrahim
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6617 - 6621
[34] Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Chen, Sanyuan
Wu, Yu
Wang, Chengyi
Liu, Shujie
Chen, Zhuo
Wang, Peidong
Liu, Gang
Li, Jinyu
Wu, Jian
Yu, Xiangzhan
Wei, Furu
INTERSPEECH 2022, 2022, : 3699 - 3703
[35] Embodiment: Self-Supervised Depth Estimation Based on Camera Models
Zhang, Jinchang
Reddy, Praveen Kumar
Wong, Xue-Iuan
Aloimonos, Yiannis
Lu, Guoyu
2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2024), 2024, : 7809 - 7816
[36] Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation
Liu, Pengpeng
Lyu, Michael R.
King, Irwin
Xu, Jia
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5026 - 5041
[37] Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
Karpov, Aleksei
Makarov, Ilya
2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR 2022), 2022, : 711 - 719
[38] Structural Equivariance Self-Supervised Learning for Facial Pose Estimation
Wang, Yaoxing
Zhou, Heng
Li, Zhendong
Mo, Xian
Liu, Hao
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2651 - 2656
[39] Self-Supervised Learning of Point Clouds via Orientation Estimation
Poursaeed, Omid
Jiang, Tianxing
Qiao, Han
Xu, Nayun
Kim, Vladimir G.
2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1018 - 1028
[40] Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning
Denize, Julien
Rabarisoa, Jaonary
Orcesi, Astrid
Herault, Romain
Canu, Stephane
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2705 - 2715

← 1 2 3 4 5 →