3-D CNN MODELS FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION

被引:0
|
作者
Ganapathy, Sriram [1 ]
Peddinti, Vijayaditya [2 ]
机构
[1] Indian Inst Sci, Bangalore, Karnataka, India
[2] Google Inc, Mountain View, CA USA
关键词
Far-field speech recognition; 3D CNN modeling; Multi-party conversational speech; NEURAL-NETWORKS; CORPUS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) in far-field reverberant environments, especially when involving natural conversational multiparty speech conditions, is challenging even with the state-of-theart recognition methodologies. The two main issues are artifacts in the signal due to reverberation and the presence of multiple speakers. In this paper, we propose a three dimensional (3-D) convolutional neural network (CNN) architecture for multi-channel far-field ASR. This architecture processes time, frequency & channel dimensions of the input spectrogram to learn representations using convolutional layers. Experiments are performed on the REVERB challenge LVCSR task and the augmented multi-party (AMI) LVCSR task using the array microphone recordings. The proposed method shows improvements over the baseline system that uses beamforming of the multi-channel audio along with a 2-D conventional CNN framework (absolute improvements of 1.1 % over the beamformed baseline system on AMI dataset).
引用
收藏
页码:5499 / 5503
页数:5
相关论文
共 50 条
  • [1] 3-D ACOUSTIC MODELING FOR FAR-FIELD MULTI-CHANNEL SPEECH RECOGNITION
    Purushothaman, Anurenjan
    Sreeram, Anirudh
    Ganapathy, Sriram
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6964 - 6968
  • [2] Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget
    Drude, Lukas
    Heymann, Jahn
    Schwarz, Andreas
    Valin, Jean-Marc
    INTERSPEECH 2021, 2021, : 1669 - 1673
  • [3] Parameter-efficient adaptation with multi-channel adversarial training for far-field speech recognition
    Tong Niu
    Yaqi Chen
    Dan Qu
    Hengbo Hu
    ChengRan Liu
    EURASIP Journal on Audio, Speech, and Music Processing, 2025 (1)
  • [4] CONFERENCINGSPEECH CHALLENGE: TOWARDS FAR-FIELD MULTI-CHANNEL SPEECH ENHANCEMENT FOR VIDEO CONFERENCING
    Rao, Wei
    Fu, Yihui
    Hu, Yanxin
    Xu, Xin
    Jv, Yvkai
    Han, Jiangyu
    Jiang, Zhongjie
    Xie, Lei
    Wang, Yannan
    Watanabe, Shinji
    Tan, Zheng-Hua
    Bu, Hui
    Yu, Tao
    Shang, Shidong
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 679 - 686
  • [5] EFFICIENT INTEGRATION OF FIXED BEAMFORMERS AND SPEECH SEPARATION NETWORKS FOR MULTI-CHANNEL FAR-FIELD SPEECH SEPARATION
    Chen, Zhuo
    Yoshioka, Takuya
    Xiao, Xiong
    Li, Jinyu
    Seltzer, Michael L.
    Gong, Yifan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5384 - 5388
  • [6] MULTISV: DATASET FOR FAR-FIELD MULTI-CHANNEL SPEAKER VERIFICATION
    Mosner, Ladislav
    Plchot, Oldrich
    Burget, Lukas
    Cernocky, Jan ''Honza''
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7977 - 7981
  • [7] Far-Field Speech Recognition Using Multivariate Autoregressive Models
    Ganapathy, Sriram
    Harish, Madhumita
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3023 - 3027
  • [8] Far-Field Automatic Speech Recognition
    Haeb-Umbach, Reinhold
    Heymann, Jahn
    Drude, Lukas
    Watanabe, Shinji
    Delcroix, Marc
    Nakatani, Tomohiro
    PROCEEDINGS OF THE IEEE, 2021, 109 (02) : 124 - 148
  • [9] SRIB-LEAP submission to Far-field Multi-Channel Speech Enhancement Challenge for Video Conferencing
    Raj, R. G. Prithvi
    Kumar, Rohit
    Jayesh, M. K.
    Purushothaman, Anurenjan
    Ganapathy, Sriram
    Shaik, M. A. Basha
    INTERSPEECH 2021, 2021, : 1857 - 1861
  • [10] FAR-FIELD SPEECH RECOGNITION USING CNN-DNN-HMM WITH CONVOLUTION IN TIME
    Yoshioka, Takuya
    Karita, Shigeki
    Nakatani, Tomohiro
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4360 - 4364