Efficient Integrated Features Based on Pre-trained Models for Speaker Verification

被引:0
|
作者
Li, Yishuang [1 ,2 ]
Guan, Wenhao [3 ]
Huang, Hukai [3 ]
Miao, Shiyu [2 ]
Su, Qi [2 ]
Li, Lin [1 ,2 ]
Hong, Qingyang [3 ]
机构
[1] Xiamen Univ, Inst Artificial Intelligence, Xian, Peoples R China
[2] Xiamen Univ, Sch Elect Sci & Engn, Xian, Peoples R China
[3] Xiamen Univ, Sch Informat, Xian, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
speaker verification; pre-trained models; feature integration; t-SNE; SPEECH;
D O I
10.21437/Interspeech.2024-1889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous work has explored the application of pre-trained models (PTMs) in speaker verification(SV). Most researchers directly replaced handcrafted features with the universal representations of the PTMs, and jointly fine-tuned PTMs with the downstream SV networks, which undoubtedly discarded important spectral information contained in handcrafted features and also increased the training cost. In this paper, we proposed an efficient feature integration method that utilized a Fine-grained Fusion Module to fuse the multi-layer representations of the PTMs adaptively. Then we integrated the fused representations with handcrafted features to obtain the integrated features, which were subsequently fed into the SV network. The experimental results demonstrated that using the integrated features effectively enhanced the performance of the SV systems, and yielded decent results with no need to fine-tune the PTMs. Moreover, employing full-parameter fine-tuning led to the current optimal results.
引用
收藏
页码:2140 / 2144
页数:5
相关论文
共 50 条
  • [1] Semi-supervised speaker verification system based on pre-trained models
    Li, Yishuang
    Chen, Zhicong
    Miao, Shiyu
    Su, Qi
    Li, Lin
    Hong, Qingyang
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2024, 64 (11): : 1936 - 1943
  • [2] PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification
    Zheng, Siqi
    Suo, Hongbin
    Chen, Qian
    INTERSPEECH 2022, 2022, : 1431 - 1435
  • [3] An iVector Extractor Using Pre-trained Neural Networks for Speaker Verification
    Zhang, Shanshan
    Zheng, Rong
    Xu, Bo
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 73 - 77
  • [4] Speaker Anonymization: Disentangling Speaker Features from Pre-Trained Speech Embeddings for Voice Conversion
    Matassoni, Marco
    Fong, Seraphina
    Brutti, Alessio
    APPLIED SCIENCES-BASEL, 2024, 14 (09):
  • [5] EFFICIENT TEXT ANALYSIS WITH PRE-TRAINED NEURAL NETWORK MODELS
    Cui, Jia
    Lu, Heng
    Wang, Wenjie
    Kang, Shiyin
    He, Liqiang
    Li, Guangzhi
    Yu, Dong
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 671 - 676
  • [6] Structured Pruning for Efficient Generative Pre-trained Language Models
    Tao, Chaofan
    Hou, Lu
    Bai, Haoli
    Wei, Jiansheng
    Jiang, Xin
    Liu, Qun
    Lu, Ping
    Wong, Ngai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10880 - 10895
  • [7] MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
    Thangarasa, Vithursan
    Salem, Mahmoud
    Saxena, Shreyas
    Leong, Kevin
    Hestness, Joel
    Lie, Sean
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 214 - 230
  • [8] Text clustering based on pre-trained models and autoencoders
    Xu, Qiang
    Gu, Hao
    Ji, ShengWei
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 17
  • [9] A Comparative Study on Pre-Trained Models Based on BERT
    Zhang, Minghua
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 326 - 330
  • [10] Efficient Key-Based Adversarial Defense for ImageNet by Using Pre-Trained Models
    Maungmaung, Aprilpyone
    Echizen, Isao
    Kiya, Hitoshi
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 902 - 913