Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

被引:0
|
作者
Galic, Jovan [1 ]
Markovic, Branko [2 ]
Grozdic, Dorde [3 ,4 ]
Popovic, Branislav [5 ]
Sajic, Slavko [1 ]
机构
[1] Univ Banja Luka, Fac Elect Engn, Dept Telecommun, Banja Luka 78000, Bosnia & Herceg
[2] Univ Kragujevac, Fac Tech Sci, Dept Comp & Software Engn, Cacak 32000, Serbia
[3] Grid Dynamics, Belgrade 11000, Serbia
[4] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia
[5] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期
关键词
artificial neural networks; audio databases; automatic speech recognition; convolutional neural network; hidden Markov models; inverse filtering; whispered speech;
D O I
10.3390/app14188223
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
    Oneata, Dan
    Cucu, Horia
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4578 - 4587
  • [32] Exploitation of Phase-Based Features for Whispered Speech Emotion Recognition
    Deng, Jun
    Xu, Xinzhou
    Zhang, Zixing
    Fruehholz, Sascha
    Schuller, Bjoern
    IEEE ACCESS, 2016, 4 : 4299 - 4309
  • [33] APPLICATION OF NEURAL NETWORKS IN WHISPERED SPEECH RECOGNITION
    Grozdic, Dorde T.
    Markovic, Branko
    Galic, Jovan
    Jovicic, Slobodan T.
    2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 728 - 731
  • [34] WORD TONE RECOGNITION IN VIETNAMESE WHISPERED SPEECH
    MILLER, JD
    WORD-JOURNAL OF THE INTERNATIONAL LINGUISTIC ASSOCIATION, 1961, 17 (01): : 11 - 15
  • [35] GENERATING SYNTHETIC AUDIO DATA FOR ATTENTION-BASED SPEECH RECOGNITION SYSTEMS
    Rossenbach, Nick
    Zeyer, Albert
    Schlueter, Ralf
    Ney, Hermann
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7069 - 7073
  • [36] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [37] Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation
    Baek, Ji-Young
    Lee, Seok-Pil
    Tsihrintzis, George A.
    ELECTRONICS, 2023, 12 (18)
  • [38] GENERATIVE ADVERSARIAL NETWORKS BASED DATA AUGMENTATION FOR NOISE ROBUST SPEECH RECOGNITION
    Hu, Hu
    Tan, Tian
    Qian, Yanmin
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5044 - 5048
  • [39] Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition
    Ranjan, Sumit
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    INTERSPEECH 2024, 2024, : 1040 - 1044
  • [40] Lattice-based Data Augmentation for Code-switching Speech Recognition
    Hartanto, Roland
    Uto, Kuniaki
    Shinoda, Koichi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1667 - 1672