LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION

被引:0
|
作者
Zmolikova, Katerina [1 ,2 ]
Delcroix, Marc [1 ]
Kinoshita, Keisuke [1 ]
Higuchi, Takuya [1 ]
Ogawa, Atsunori [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Brno Univ Technol, Speech FIT, Brno, Czech Republic
关键词
speaker extraction; speaker adaptive neural network; multi-speaker speech recognition; speaker representation learning; beamforming; SOURCE SEPARATION; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, schemes employing deep neural networks (DNNs) for extracting speech from noisy observation have demonstrated great potential for noise robust automatic speech recognition. However, these schemes are not well suited when the interfering noise is another speaker. To enable extracting a target speaker from a mixture of speakers, we have recently proposed to inform the neural network using speaker information extracted from an adaptation utterance from the same speaker. In our previous work, we explored ways how to inform the network about the speaker and found a speaker adaptive layer approach to be suitable for this task. In our experiments, we used speaker features designed for speaker recognition tasks as the additional speaker information, which may not be optimal for the speaker extraction task. In this paper, we propose a usage of a sequence summarizing scheme enabling to learn the speaker representation jointly with the network. Furthermore, we extend the previous experiments to demonstrate the potential of our proposed method as a front-end for speech recognition and explore the effect of additional noise on the performance of the method.
引用
收藏
页码:8 / 15
页数:8
相关论文
共 50 条
  • [21] Speaker recognition method based on quantum neural network
    Wang, J.-M. (wjm_ice@163.com), 1600, University of Science and Technology (13):
  • [22] Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding
    Secujski, Milan
    Pekar, Darko
    Suzic, Sinisa
    Smirnov, Anton
    Nosek, Tijana
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2020, 26 (04) : 434 - 453
  • [23] DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION
    Snyder, David
    Ghahremani, Pegah
    Povey, Daniel
    Garcia-Romero, Daniel
    Carmiel, Yishay
    Khudanpur, Sanjeev
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 165 - 170
  • [24] Capture inter-speaker information with a neural network for speaker identification
    Wang, L
    Chen, K
    Chi, HH
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL V, 2000, : 247 - 252
  • [25] Reliability criterion based on learning-phase entropy for speaker recognition with neural network
    Bousquet, Pierre-Michel
    Rouvier, Mickael
    Bonastre, Jean-Francois
    INTERSPEECH 2022, 2022, : 281 - 285
  • [26] Binary Neural Network for Speaker Verification
    Zhu, Tinglong
    Qin, Xiaoyi
    Li, Ming
    INTERSPEECH 2021, 2021, : 86 - 90
  • [27] SEF-Net: Speaker Embedding Free Target Speaker Extraction Network
    Zeng, Bang
    Suo, Hongbin
    Wan, Yulong
    Li, Ming
    INTERSPEECH 2023, 2023, : 3452 - 3456
  • [28] Robust Speaker Extraction Network based on Iterative Refined Adaptation
    Deng, Chengyun
    Ma, Shiqian
    Sha, Yongtao
    Zhang, Yi
    Zhang, Hui
    Song, Hui
    Wang, Fei
    INTERSPEECH 2021, 2021, : 3530 - 3534
  • [29] Neural adversarial learning for speaker recognition
    Chien, Jen-Tzung
    Peng, Kang-Ting
    COMPUTER SPEECH AND LANGUAGE, 2019, 58 : 422 - 440
  • [30] Speaker Change Detection using Features through a Neural Network Speaker Classifier
    Ge, Zhenhao
    Iyer, Ananth N.
    Cheluvaraja, Srinath
    Ganapathiraju, Aravind
    PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 1111 - 1116