3-D CNN MODELS FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION

被引:0
|
作者
Ganapathy, Sriram [1 ]
Peddinti, Vijayaditya [2 ]
机构
[1] Indian Inst Sci, Bangalore, Karnataka, India
[2] Google Inc, Mountain View, CA USA
关键词
Far-field speech recognition; 3D CNN modeling; Multi-party conversational speech; NEURAL-NETWORKS; CORPUS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) in far-field reverberant environments, especially when involving natural conversational multiparty speech conditions, is challenging even with the state-of-theart recognition methodologies. The two main issues are artifacts in the signal due to reverberation and the presence of multiple speakers. In this paper, we propose a three dimensional (3-D) convolutional neural network (CNN) architecture for multi-channel far-field ASR. This architecture processes time, frequency & channel dimensions of the input spectrogram to learn representations using convolutional layers. Experiments are performed on the REVERB challenge LVCSR task and the augmented multi-party (AMI) LVCSR task using the array microphone recordings. The proposed method shows improvements over the baseline system that uses beamforming of the multi-channel audio along with a 2-D conventional CNN framework (absolute improvements of 1.1 % over the beamformed baseline system on AMI dataset).
引用
收藏
页码:5499 / 5503
页数:5
相关论文
共 50 条
  • [41] Beamforming Networks Using Spatial Covariance Features for Far-field Speech Recognition
    Xiao, Xiong
    Watanabe, Shinji
    Chng, Eng Siong
    Li, Haizhou
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [42] 3D CNN to Estimate Reaction Time from Multi-Channel EEG
    Chowdhury, Mohammad Samin Nur
    Dutta, Arindam
    Robison, Matthew K.
    Blais, Chris
    Brewer, Gene
    Bliss, Daniel W.
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 5932 - 5935
  • [43] MULTICHANNEL AUDIO FRONT-END FOR FAR-FIELD AUTOMATIC SPEECH RECOGNITION
    Chhetri, Amit
    Hilmes, Philip
    Kristjansson, Trausti
    Chu, Wai
    Mansour, Mohamed
    Li, Xiaoxue
    Zhang, Xianxian
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1527 - 1531
  • [44] End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming
    Zhang, Wangyou
    Subramanian, Aswin Shanmugam
    Chang, Xuankai
    Watanabe, Shinji
    Qian, Yanmin
    INTERSPEECH 2020, 2020, : 324 - 328
  • [45] 3D SPATIAL FEATURES FOR MULTI-CHANNEL TARGET SPEECH SEPARATION
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Yu, Meng
    Yu, Dong
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 996 - 1002
  • [46] Multi-Channel 2-D Convolutional Recurrent Neural Networks for Speech Emotion Recognition
    Zhou, Weidong
    Zhou, Houpan
    Xia, Pengfei
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5884 - 5889
  • [47] A unified network for multi-speaker speech recognition with multi-channel recordings
    Liu, Conggui
    Inoue, Nakamasa
    Shinoda, Koichi
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1304 - 1307
  • [48] Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition
    Li, Guanjun
    Liang, Shan
    Nie, Shuai
    Liu, Wenju
    Yang, Zhanlei
    NEURAL NETWORKS, 2021, 141 : 225 - 237
  • [49] Multi-Channel Image Encryption Based on an All-Dielectric Metasurface Incorporating Near-Field Nanoprinting and Far-Field Holography
    Yuan, Huan
    Zhong, Zheqiang
    Zhang, Yunhao
    Zhang, Bin
    ADVANCED OPTICAL MATERIALS, 2023, 11 (17)
  • [50] Far-field displacements of 3-D soil in scaled boundary finite-element method
    Wolf, JP
    WAVE 2000: WAVE PROPAGATION, MOVING LOAD, VIBRATION REDUCTION, 2000, : 421 - 430