LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION

被引:0
|
作者
Zmolikova, Katerina [1 ,2 ]
Delcroix, Marc [1 ]
Kinoshita, Keisuke [1 ]
Higuchi, Takuya [1 ]
Ogawa, Atsunori [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Brno Univ Technol, Speech FIT, Brno, Czech Republic
关键词
speaker extraction; speaker adaptive neural network; multi-speaker speech recognition; speaker representation learning; beamforming; SOURCE SEPARATION; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, schemes employing deep neural networks (DNNs) for extracting speech from noisy observation have demonstrated great potential for noise robust automatic speech recognition. However, these schemes are not well suited when the interfering noise is another speaker. To enable extracting a target speaker from a mixture of speakers, we have recently proposed to inform the neural network using speaker information extracted from an adaptation utterance from the same speaker. In our previous work, we explored ways how to inform the network about the speaker and found a speaker adaptive layer approach to be suitable for this task. In our experiments, we used speaker features designed for speaker recognition tasks as the additional speaker information, which may not be optimal for the speaker extraction task. In this paper, we propose a usage of a sequence summarizing scheme enabling to learn the speaker representation jointly with the network. Furthermore, we extend the previous experiments to demonstrate the potential of our proposed method as a front-end for speech recognition and explore the effect of additional noise on the performance of the method.
引用
收藏
页码:8 / 15
页数:8
相关论文
共 50 条
  • [31] CONVOLUTIONAL NEURAL NETWORK FOR SPEAKER CHANGE DETECTION IN TELEPHONE SPEAKER DIARIZATION SYSTEM
    Hruz, Marek
    Zajic, Zbynek
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4945 - 4949
  • [32] SPEAKER ACTIVITY DRIVEN NEURAL SPEECH EXTRACTION
    Delcroix, Marc
    Zmolikova, Katerina
    Ochiai, Tsubasa
    Kinoshita, Keisuke
    Nakatani, Tomohiro
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6099 - 6103
  • [33] Mixture Representation Learning for Deep Speaker Embedding
    Lin, Weiwei
    Mak, Man-Wai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 968 - 978
  • [34] Disentangled Representation Learning for Multilingual Speaker Recognition
    Nam, Kihyun
    Kim, Youkyum
    Huh, Jaesung
    Heo, Hee-Soo
    Jung, Jee-weon
    Chung, Joon Son
    INTERSPEECH 2023, 2023, : 5316 - 5320
  • [35] Latent discriminative representation learning for speaker recognition
    Huang, Duolin
    Mao, Qirong
    Ma, Zhongchen
    Zheng, Zhishen
    Routryar, Sidheswar
    Ocquaye, Elias-Nii-Noi
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (05) : 697 - 708
  • [36] Design of speaker recognition system based on artificial neural network
    Chen, Yanhong
    Wang, Li
    Lin, Han
    Li, Jinlong
    6TH INTERNATIONAL SYMPOSIUM ON ADVANCED OPTICAL MANUFACTURING AND TESTING TECHNOLOGIES: OPTICAL SYSTEM TECHNOLOGIES FOR MANUFACTURING AND TESTING, 2012, 8420
  • [37] Robust speaker recognition method based on convolutional neural network
    Zeng C.
    Ma C.
    Wang Z.
    Kong X.
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2020, 48 (06): : 39 - 44
  • [38] A unified neural-network-based speaker localization technique
    Arslan, G
    Sakarya, EA
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (04): : 997 - 1002
  • [39] Speaker segmentation based on subsegmental features and neural network models
    Dhananjaya, N
    Guruprasad, S
    Yegnanarayana, B
    NEURAL INFORMATION PROCESSING, 2004, 3316 : 1210 - 1215
  • [40] An Attention-based Neural Network on Multiple Speaker Diarization
    Cheng, Shao Wen
    Hung, Kai Jyun
    Chang, Hsie Chia
    Liao, Yen Chin
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 431 - 434