SPEAKER-AWARE TARGET SPEAKER ENHANCEMENT BY JOINTLY LEARNING WITH SPEAKER EMBEDDING EXTRACTION

被引:0
|
作者
Ji, Xuan [1 ]
Yu, Meng [2 ]
Zhang, Chunlei [2 ]
Su, Dan [1 ]
Yu, Tao [3 ]
Liu, Xiaoyu [4 ]
Yu, Dong [2 ]
机构
[1] Tencent AI Lab, Shenzhen, Peoples R China
[2] Tencent AI Lab, Bellevue, WA USA
[3] Tencent IEG, Bellevue, WA USA
[4] Tencent IEG, Shenzhen, Peoples R China
关键词
speaker-aware; target speech enhancement; speaker embedding; joint learning;
D O I
10.1109/icassp40776.2020.9054311
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning based speech separation approaches have received great interest, among which the recent speaker-aware speech enhancement methods are promising for solving difficulties such as arbitrary source permutation and unknown number of sources. In this paper, we propose a novel training framework which jointly learns the speaker-conditioned target speaker extraction model and its associated speaker embedding model. The resulting unified model directly learns the appropriate speaker embedding for improved target speech enhancement. We demonstrate, on our large simulated noisy and far-field evaluation sets of overlapped speech signals, that our proposed approach significantly improves the speech enhancement performance compared to the baseline speaker-aware speech enhancement models.
引用
收藏
页码:7294 / 7298
页数:5
相关论文
共 50 条
  • [31] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
    Rouhe, Aku
    Kaseva, Tuomas
    Kurimo, Mikko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068
  • [32] LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION
    Zmolikova, Katerina
    Delcroix, Marc
    Kinoshita, Keisuke
    Higuchi, Takuya
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 8 - 15
  • [33] Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding
    Yamamoto, Hitoshi
    Lee, Kong Aik
    Okabe, Koji
    Koshinaka, Takafumi
    INTERSPEECH 2019, 2019, : 406 - 410
  • [34] Introducing phonetic information to speaker embedding for speaker verification
    Liu, Yi
    He, Liang
    Liu, Jia
    Johnson, Michael T.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [35] Speaker-Aware Interactive Graph Attention Network for Emotion Recognition in Conversation
    Jia, Zhaohong
    Shi, Yunwei
    Liu, Weifeng
    Huang, Zhenhua
    Sun, Xiao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
  • [36] HIERARCHICAL SPEAKER-AWARE SEQUENCE-TO-SEQUENCE MODEL FOR DIALOGUE SUMMARIZATION
    Lei, Yuejie
    Yan, Yuanmeng
    Zeng, Zhiyuan
    He, Keqing
    Zhang, Ximing
    Xu, Weiran
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7823 - 7827
  • [37] SPEAKER EMBEDDING EXTRACTION WITH VIRTUAL PHONETIC INFORMATION
    Sreekanth, S.
    Rafi, Shaik Mohammad B.
    Murty, K. Sri Rama
    Bhati, Saurabhchand
    2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
  • [38] Introducing phonetic information to speaker embedding for speaker verification
    Yi Liu
    Liang He
    Jia Liu
    Michael T. Johnson
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [39] Leveraging speaker-aware structure and factual knowledge for faithful dialogue summarization
    Zhao, Lulu
    Xu, Weiran
    Zhang, Chunyun
    Guo, Jun
    KNOWLEDGE-BASED SYSTEMS, 2022, 245
  • [40] Learning Discriminative Speaker Embedding by Improving Aggregation Strategy and Loss Function for Speaker Verification
    Luo, Chengfang
    Guo, Xin
    Deng, Aiwen
    Xu, Wei
    Zhao, Junhong
    Kang, Wenxiong
    2021 INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2021), 2021,