ZERO-SHOT PERSONALIZED SPEECH ENHANCEMENT THROUGH SPEAKER-INFORMED MODEL SELECTION

被引:4
|
作者
Sivaraman, Aswin [1 ]
Kim, Minje [1 ]
机构
[1] Indiana Univ, Dept Intelligent Syst Engn, Bloomington, IN 47405 USA
基金
美国国家科学基金会;
关键词
Speech enhancement; deep learning; adaptive mixture of local experts; model compression by selection; NEURAL-NETWORKS; ADAPTATION;
D O I
10.1109/WASPAA52581.2021.9632752
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance and reduce run-time complexity. However, test-time model adaptation may be challenging if collecting data from the test-time speaker is not possible. To this end, we propose using an ensemble model wherein each specialist module denoises noisy utterances from a distinct partition of training set speakers. The gating module inexpensively estimates test-time speaker characteristics in the form of an embedding vector and selects the most appropriate specialist module for denoising the test signal. Grouping the training set speakers into non-overlapping semantically similar groups is non-trivial and ill-defined. To do this, we first train a Siamese network using noisy speech pairs to maximize or minimize the similarity of its output vectors depending on whether the utterances derive from the same speaker or not. Next, we perform k-means clustering on the latent space formed by the averaged embedding vectors per training set speaker. In this way, we designate speaker groups and train specialist modules optimized around partitions of the complete training set. Our experiments show that ensemble models made up of low-capacity specialists can out-perform high-capacity generalist models with greater efficiency and improved adaptation towards unseen test-time speakers.
引用
收藏
页码:171 / 175
页数:5
相关论文
共 50 条
  • [41] A Biologically Inspired Feature Enhancement Framework for Zero-Shot Learning
    Xie, Zhongwu
    Cao, Weipeng
    Wang, Xizhao
    Ming, Zhong
    Zhang, Jingjing
    Zhang, Jiyong
    2020 7TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD 2020)/2020 6TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (EDGECOM 2020), 2020, : 120 - 125
  • [42] ZERO-SHOT TEXT-TO-SPEECH SYNTHESIS CONDITIONED USING SELF-SUPERVISED SPEECH REPRESENTATION MODEL
    Fujita, Kenichi
    Ashihara, Takanori
    Kanagawa, Hiroki
    Moriya, Takafumi
    Ijima, Yusuke
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [43] ZERO-SHOT VOICE CONVERSION WITH ADJUSTED SPEAKER EMBEDDINGS AND SIMPLE ACOUSTIC FEATURES
    Tan, Zhiyuan
    Wei, Jianguo
    Xu, Junhai
    He, Yuqing
    Lu, Wenhuan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5964 - 5968
  • [44] DGC-VECTOR: A NEW SPEAKER EMBEDDING FOR ZERO-SHOT VOICE CONVERSION
    Xiao, Ruitong
    Zhang, Haitong
    Lin, Yue
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6547 - 6551
  • [45] Entropy-driven zero-shot deep learning model selection for viral proteins
    Yu, Yuanxi
    Jiang, Fan
    Zhong, Bozitao
    Hong, Liang
    Li, Mingchen
    PHYSICAL REVIEW RESEARCH, 2025, 7 (01):
  • [46] Zero-shot multi-speaker accent TTS with limited accent data
    Zhang, Mingyang
    Zhou, Yi
    Wu, Zhizheng
    Li, Haizhou
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1931 - 1936
  • [47] Dynamic visual-guided selection for zero-shot learning
    Zhou, Yuan
    Xiang, Lei
    Liu, Fan
    Duan, Haoran
    Long, Yang
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (03): : 4401 - 4419
  • [48] Zero-Shot Feature Selection via Transferring Supervised Knowledge
    Wang, Zheng
    Wang, Qiao
    Zhao, Tingzhang
    Wang, Chaokun
    Ye, Xiaojun
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2021, 17 (02) : 1 - 20
  • [49] Feature Selection Methods for Zero-Shot Learning of Neural Activity
    Caceres, Carlos A.
    Roos, Matthew J.
    Rupp, Kyle M.
    Milsap, Griffin
    Crone, Nathan E.
    Wolmetz, Michael E.
    Ratto, Christopher R.
    FRONTIERS IN NEUROINFORMATICS, 2017, 11
  • [50] Bidirectional Mask Selection for Zero-Shot Referring Image Segmentation
    Li, Wenhui
    Pang, Chao
    Nie, Weizhi
    Tian, Hongshuo
    Liu, An-An
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 911 - 921