SPEAKER-AWARE TARGET SPEAKER ENHANCEMENT BY JOINTLY LEARNING WITH SPEAKER EMBEDDING EXTRACTION

被引:0
|
作者
Ji, Xuan [1 ]
Yu, Meng [2 ]
Zhang, Chunlei [2 ]
Su, Dan [1 ]
Yu, Tao [3 ]
Liu, Xiaoyu [4 ]
Yu, Dong [2 ]
机构
[1] Tencent AI Lab, Shenzhen, Peoples R China
[2] Tencent AI Lab, Bellevue, WA USA
[3] Tencent IEG, Bellevue, WA USA
[4] Tencent IEG, Shenzhen, Peoples R China
关键词
speaker-aware; target speech enhancement; speaker embedding; joint learning;
D O I
10.1109/icassp40776.2020.9054311
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning based speech separation approaches have received great interest, among which the recent speaker-aware speech enhancement methods are promising for solving difficulties such as arbitrary source permutation and unknown number of sources. In this paper, we propose a novel training framework which jointly learns the speaker-conditioned target speaker extraction model and its associated speaker embedding model. The resulting unified model directly learns the appropriate speaker embedding for improved target speech enhancement. We demonstrate, on our large simulated noisy and far-field evaluation sets of overlapped speech signals, that our proposed approach significantly improves the speech enhancement performance compared to the baseline speaker-aware speech enhancement models.
引用
收藏
页码:7294 / 7298
页数:5
相关论文
共 50 条
  • [21] Speaker Embedding Extraction with Phonetic Information
    Liu, Yi
    He, Liang
    Liu, Jia
    Johnson, Michael T.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2247 - 2251
  • [22] SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation
    Lim, Seunguook
    Kim, Jihie
    ALGORITHMS, 2023, 16 (01)
  • [23] SPEAKER-AWARE TRAINING OF LSTM-RNNS FOR ACOUSTIC MODELLING
    Tan, Tian
    Qian, Yanmin
    Yu, Dong
    Kundu, Souvik
    Lu, Liang
    Sim, Khe Chai
    Xiao, Xiong
    Zhang, Yu
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5280 - 5284
  • [24] DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION
    Yi, Lu
    Mak, Man-Wai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7662 - 7666
  • [25] Who is Speaking? Speaker-Aware Multiparty Dialogue Act Classification
    Qamar, Ayesha
    Pyarelal, Adarsh
    Huang, Ruihong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10122 - 10135
  • [26] Direction-aware Speaker Beam for Multi-channel Speaker Extraction
    Li, Guanjun
    Liang, Shan
    Nie, Shuai
    Liu, Wenju
    Yu, Meng
    Chen, Lianwu
    Peng, Shouye
    Li, Changliang
    INTERSPEECH 2019, 2019, : 2713 - 2717
  • [27] Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Liu, Zhilei
    Guan, Haotian
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 14 - 25
  • [28] Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism
    Wang Sijie
    Hamdulla, Askar
    Ablimit, Mijit
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1995 - 2001
  • [29] Sequential Speaker Embedding and Transfer Learning for Text-Independent Speaker Identification
    Hong, Qian-Bei
    Wu, Chung-Hsien
    Su, Ming-Hsiang
    Wang, Hsin-Min
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 827 - 832
  • [30] Efficient Speaker Embedding Extraction Using a Twofold Sliding Window Algorithm for Speaker Diarization
    Choi, Jeong-Hwan
    Jeoung, Ye-Rin
    Kim, Ilseok
    Chang, Joon-Hyuk
    INTERSPEECH 2024, 2024, : 3749 - 3753