Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

被引:0
|
作者
Galic, Jovan [1 ]
Markovic, Branko [2 ]
Grozdic, Dorde [3 ,4 ]
Popovic, Branislav [5 ]
Sajic, Slavko [1 ]
机构
[1] Univ Banja Luka, Fac Elect Engn, Dept Telecommun, Banja Luka 78000, Bosnia & Herceg
[2] Univ Kragujevac, Fac Tech Sci, Dept Comp & Software Engn, Cacak 32000, Serbia
[3] Grid Dynamics, Belgrade 11000, Serbia
[4] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia
[5] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期
关键词
artificial neural networks; audio databases; automatic speech recognition; convolutional neural network; hidden Markov models; inverse filtering; whispered speech;
D O I
10.3390/app14188223
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] DATA AUGMENTATION BASED ON VOWEL STRETCH FOR IMPROVING CHILDREN'S SPEECH RECOGNITION
    Nagano, Tohru
    Fukuda, Takashi
    Suzuki, Masayuki
    Kurata, Gakuto
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 502 - 508
  • [43] Adaptive data augmentation for mandarin automatic speech recognition
    Ding, Kai
    Li, Ruixuan
    Xu, Yuelin
    Du, Xingyue
    Deng, Bin
    APPLIED INTELLIGENCE, 2024, 54 (07) : 5674 - 5687
  • [44] Adversarial Data Augmentation Network for Speech Emotion Recognition
    Yi, Lu
    Mak, Man-Wai
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 529 - 534
  • [45] Investigation of Data Augmentation Techniques for Disordered Speech Recognition
    Geng, Mengzhe
    Xie, Xurong
    Liu, Shansong
    Yu, Jianwei
    Hu, Shoukang
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2020, 2020, : 696 - 700
  • [46] Data Augmentation using GANs for Speech Emotion Recognition
    Chatziagapi, Aggelina
    Paraskevopoulos, Georgios
    Sgouropoulos, Dimitris
    Pantazopoulos, Georgios
    Nikandrou, Malvina
    Giannakopoulos, Theodoros
    Katsamanis, Athanasios
    Potamianos, Alexandros
    Narayanan, Shrikanth
    INTERSPEECH 2019, 2019, : 171 - 175
  • [47] Data Augmentation Improves Recognition of Foreign Accented Speech
    Fukuda, Takashi
    Fernandez, Raul
    Rosenberg, Andrew
    Thomas, Samuel
    Ramabhadran, Bhuvana
    Sorin, Alexander
    Kurata, Gakuto
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2409 - 2413
  • [48] Performance Improvement of Mandarin Digital Whispered Speech Recognition Based on Multistage Classification
    Chen Xueqin
    Sha Jun
    Yu Yibiao
    Zhao Heming
    2016 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2016, : 544 - 547
  • [49] Enhancing Automatic Speech Recognition: Effects of Semantic Audio Filtering on Models Performance
    Perezhohin, Yuriy
    Santos, Tiago
    Costa, Victor
    Peres, Fernando
    Castelli, Mauro
    IEEE ACCESS, 2024, 12 : 155136 - 155150
  • [50] Whispered speech recognition using deep denoising autoencoder
    Grozdic, Dorde T.
    Jovicic, Slobodan T.
    Subotic, Misko
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 59 : 15 - 22