LEARNING SPEAKER REPRESENTATION FOR NEURAL NETWORK BASED MULTICHANNEL SPEAKER EXTRACTION

被引：0

作者：

Zmolikova, Katerina ^{[1
,2
]}

Delcroix, Marc ^{[1
]}

Kinoshita, Keisuke ^{[1
]}

Higuchi, Takuya ^{[1
]}

Ogawa, Atsunori ^{[1
]}

Nakatani, Tomohiro ^{[1
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan

[2] Brno Univ Technol, Speech FIT, Brno, Czech Republic

来源：

2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2017年

关键词：

speaker extraction; speaker adaptive neural network; multi-speaker speech recognition; speaker representation learning; beamforming; SOURCE SEPARATION; SPEECH;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, schemes employing deep neural networks (DNNs) for extracting speech from noisy observation have demonstrated great potential for noise robust automatic speech recognition. However, these schemes are not well suited when the interfering noise is another speaker. To enable extracting a target speaker from a mixture of speakers, we have recently proposed to inform the neural network using speaker information extracted from an adaptation utterance from the same speaker. In our previous work, we explored ways how to inform the network about the speaker and found a speaker adaptive layer approach to be suitable for this task. In our experiments, we used speaker features designed for speaker recognition tasks as the additional speaker information, which may not be optimal for the speaker extraction task. In this paper, we propose a usage of a sequence summarizing scheme enabling to learn the speaker representation jointly with the network. Furthermore, we extend the previous experiments to demonstrate the potential of our proposed method as a front-end for speech recognition and explore the effect of additional noise on the performance of the method.

引用

页码：8 / 15

页数：8

共 50 条

[31] CONVOLUTIONAL NEURAL NETWORK FOR SPEAKER CHANGE DETECTION IN TELEPHONE SPEAKER DIARIZATION SYSTEM
Hruz, Marek
Zajic, Zbynek
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4945 - 4949
[32] SPEAKER ACTIVITY DRIVEN NEURAL SPEECH EXTRACTION
Delcroix, Marc
Zmolikova, Katerina
Ochiai, Tsubasa
Kinoshita, Keisuke
Nakatani, Tomohiro
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6099 - 6103
[33] Mixture Representation Learning for Deep Speaker Embedding
Lin, Weiwei
Mak, Man-Wai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 968 - 978
[34] Disentangled Representation Learning for Multilingual Speaker Recognition
Nam, Kihyun
Kim, Youkyum
Huh, Jaesung
Heo, Hee-Soo
Jung, Jee-weon
Chung, Joon Son
INTERSPEECH 2023, 2023, : 5316 - 5320
[35] Latent discriminative representation learning for speaker recognition
Huang, Duolin
Mao, Qirong
Ma, Zhongchen
Zheng, Zhishen
Routryar, Sidheswar
Ocquaye, Elias-Nii-Noi
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (05) : 697 - 708
[36] Design of speaker recognition system based on artificial neural network
Chen, Yanhong
Wang, Li
Lin, Han
Li, Jinlong
6TH INTERNATIONAL SYMPOSIUM ON ADVANCED OPTICAL MANUFACTURING AND TESTING TECHNOLOGIES: OPTICAL SYSTEM TECHNOLOGIES FOR MANUFACTURING AND TESTING, 2012, 8420
[37] Robust speaker recognition method based on convolutional neural network
Zeng C.
Ma C.
Wang Z.
Kong X.
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2020, 48 (06): : 39 - 44
[38] A unified neural-network-based speaker localization technique
Arslan, G
Sakarya, EA
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (04): : 997 - 1002
[39] Speaker segmentation based on subsegmental features and neural network models
Dhananjaya, N
Guruprasad, S
Yegnanarayana, B
NEURAL INFORMATION PROCESSING, 2004, 3316 : 1210 - 1215
[40] An Attention-based Neural Network on Multiple Speaker Diarization
Cheng, Shao Wen
Hung, Kai Jyun
Chang, Hsie Chia
Liao, Yen Chin
2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 431 - 434

← 1 2 3 4 5 →