3-D CNN MODELS FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION

被引:0
|
作者
Ganapathy, Sriram [1 ]
Peddinti, Vijayaditya [2 ]
机构
[1] Indian Inst Sci, Bangalore, Karnataka, India
[2] Google Inc, Mountain View, CA USA
关键词
Far-field speech recognition; 3D CNN modeling; Multi-party conversational speech; NEURAL-NETWORKS; CORPUS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) in far-field reverberant environments, especially when involving natural conversational multiparty speech conditions, is challenging even with the state-of-theart recognition methodologies. The two main issues are artifacts in the signal due to reverberation and the presence of multiple speakers. In this paper, we propose a three dimensional (3-D) convolutional neural network (CNN) architecture for multi-channel far-field ASR. This architecture processes time, frequency & channel dimensions of the input spectrogram to learn representations using convolutional layers. Experiments are performed on the REVERB challenge LVCSR task and the augmented multi-party (AMI) LVCSR task using the array microphone recordings. The proposed method shows improvements over the baseline system that uses beamforming of the multi-channel audio along with a 2-D conventional CNN framework (absolute improvements of 1.1 % over the beamformed baseline system on AMI dataset).
引用
收藏
页码:5499 / 5503
页数:5
相关论文
共 50 条
  • [21] Multi-Channel Feature Adaptation for Robust Speech Recognition
    Zhang, Zhaofeng
    Xiao, Xiong
    Wang, Longbiao
    Dang, Jianwu
    Iwahashi, Masahiro
    Chng, Eng Siong
    Li, Haizhou
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [22] DEEP BEAMFORMING NETWORKS FOR MULTI-CHANNEL SPEECH RECOGNITION
    Xiao, Xiong
    Watanabe, Shinji
    Erdogan, Hakan
    Lu, Liang
    Hershey, John
    Seltzer, Michael L.
    Chen, Guoguo
    Zhang, Yu
    Mandel, Michael
    Yu, Dong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5745 - 5749
  • [23] The far-field expansion of the Green's function in a 3-D optical waveguide
    Alexandrov, Oleg
    ASYMPTOTIC ANALYSIS, 2007, 52 (1-2) : 157 - 171
  • [24] Feature mapping using far-field microphones for distant speech recognition
    Himawan, Ivan
    Motlicek, Petr
    Imseng, David
    Sridharan, Sridha
    SPEECH COMMUNICATION, 2016, 83 : 1 - 9
  • [25] A Study on Improving Acoustic Model for Robust and Far-Field Speech Recognition
    Xue, Shaofei
    Yan, Zhijie
    Yu, Tao
    Liu, Zhang
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [26] Precipitation Forecast Based on Multi-channel ConvLSTM and 3D-CNN
    Niu, Dan
    Diao, Li
    Xu, Liujia
    Zang, Zengliang
    Chen, Xisong
    Liang, ShaSha
    2020 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS'20), 2020, : 367 - 371
  • [27] 3-D Mixed Far-Field and Near-Field Sources Localization With Cross Array
    Wu, Xiaohuan
    Yan, Jun
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (06) : 6833 - 6837
  • [28] Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
    Gong, Rong
    Quillen, Carl
    Sharma, Dushyant
    Goderre, Andrew
    Lainez, Jose
    Milanovic, Ljubomir
    INTERSPEECH 2021, 2021, : 3840 - 3844
  • [29] Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios
    Qin, Xiaoyi
    Cai, Danwei
    Li, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 71 - 85
  • [30] Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays
    Liang, Chengdong
    Chen, Yijiang
    Yao, Jiadi
    Zhang, Xiao-Lei
    INTERSPEECH 2022, 2022, : 3679 - 3683