3-D CNN MODELS FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION

被引：0

作者：

Ganapathy, Sriram ^{[1
]}

Peddinti, Vijayaditya ^{[2
]}

机构：

[1] Indian Inst Sci, Bangalore, Karnataka, India

[2] Google Inc, Mountain View, CA USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

Far-field speech recognition; 3D CNN modeling; Multi-party conversational speech; NEURAL-NETWORKS; CORPUS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automatic speech recognition (ASR) in far-field reverberant environments, especially when involving natural conversational multiparty speech conditions, is challenging even with the state-of-theart recognition methodologies. The two main issues are artifacts in the signal due to reverberation and the presence of multiple speakers. In this paper, we propose a three dimensional (3-D) convolutional neural network (CNN) architecture for multi-channel far-field ASR. This architecture processes time, frequency & channel dimensions of the input spectrogram to learn representations using convolutional layers. Experiments are performed on the REVERB challenge LVCSR task and the augmented multi-party (AMI) LVCSR task using the array microphone recordings. The proposed method shows improvements over the baseline system that uses beamforming of the multi-channel audio along with a 2-D conventional CNN framework (absolute improvements of 1.1 % over the beamformed baseline system on AMI dataset).

引用

页码：5499 / 5503

页数：5

共 50 条

[31] SPEAKER ADAPTED BEAMFORMING FOR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION
Menne, Tobias
Schlueter, Ralf
Ney, Hermann
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 535 - 541
[32] END-TO-END MULTI-CHANNEL TRANSFORMER FOR SPEECH RECOGNITION
Chang, Feng-Ju
Radfar, Martin
Mouchtaris, Athanasios
King, Brian
Kunzmann, Siegfried
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5884 - 5888
[33] A 3-D Folded Dipole Antenna Array for Far-Field Electromagnetic Energy Transfer
Almoneef, Thamer S.
Sun, Hu
Ramahi, Omar M.
IEEE ANTENNAS AND WIRELESS PROPAGATION LETTERS, 2016, 15 : 1406 - 1409
[34] Multi-channel Attention for End-to-End Speech Recognition
Braun, Stefan
Neil, Daniel
Anumula, Jithendar
Ceolini, Enea
Liu, Shih-Chii
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 17 - 21
[35] Audio-visual Multi-channel Recognition of Overlapped Speech
Yu, Jianwei
Wu, Bo
Gu, Rongzhi
Zhang, Shi-Xiong
Chen, Lianwu
Xu, Yong
Yu, Meng
Su, Dan
Yu, Dong
Liu, Xunying
Meng, Helen
INTERSPEECH 2020, 2020, : 3496 - 3500
[36] The segmentation of multi-channel meeting recordings for automatic speech recognition
Dines, John
Vepa, Jithendra
Hain, Thomas
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1213 - +
[37] Quaternion Neural Networks for Multi-channel Distant Speech Recognition
Qiu, Xinchi
Parcollet, Titouan
Ravanelli, Mirco
Lane, Nicholas D.
Morchid, Mohamed
INTERSPEECH 2020, 2020, : 329 - 333
[38] Multi-Channel sEMG Signal Gesture Recognition Based on Improved CNN-LSTM Hybrid Models
Bai, Dianchun
Liu, Tie
Han, Xinghua
Chen, Guo
Jiang, Yinlai
Hiroshi, Yokoi
2021 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SAFETY FOR ROBOTICS (ISR), 2021, : 111 - 116
[39] MULTI-CHANNEL OVERLAPPED SPEECH RECOGNITION WITH LOCATION GUIDED SPEECH EXTRACTION NETWORK
Chen, Zhuo
Xiao, Xiong
Yoshioka, Takuya
Erdogan, Hakan
Li, Jinyu
Gong, Yifan
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 558 - 565
[40] SPATIAL ATTENTION FOR FAR-FIELD SPEECH RECOGNITION WITH DEEP BEAMFORMING NEURAL NETWORKS
He, Weipeng
Lu, Lu
Zhang, Biqiao
Mahadeokar, Jay
Kalgaonkar, Kaustubh
Fuegen, Christian
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7499 - 7503

← 1 2 3 4 5 →