Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

被引:0
|
作者
Galic, Jovan [1 ]
Markovic, Branko [2 ]
Grozdic, Dorde [3 ,4 ]
Popovic, Branislav [5 ]
Sajic, Slavko [1 ]
机构
[1] Univ Banja Luka, Fac Elect Engn, Dept Telecommun, Banja Luka 78000, Bosnia & Herceg
[2] Univ Kragujevac, Fac Tech Sci, Dept Comp & Software Engn, Cacak 32000, Serbia
[3] Grid Dynamics, Belgrade 11000, Serbia
[4] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia
[5] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期
关键词
artificial neural networks; audio databases; automatic speech recognition; convolutional neural network; hidden Markov models; inverse filtering; whispered speech;
D O I
10.3390/app14188223
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering
    Grozdic, Dorde T.
    Jovicic, Slobodan T.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2313 - 2322
  • [2] Audio Codec Simulation based Data Augmentation for Telephony Speech Recognition
    Thi-Ly Vu
    Zeng, Zhiping
    Xu, Haihua
    Chng, Eng-Siong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 198 - 203
  • [3] Audio Augmentation for Speech Recognition
    Ko, Tom
    Peddinti, Vijayaditya
    Povey, Daniel
    Khudanpur, Sanjeev
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3586 - 3589
  • [4] AUDIO-VISUAL ISOLATED DIGIT RECOGNITION FOR WHISPERED SPEECH
    Fan, Xing
    Busso, Carlos
    Hansen, John H. L.
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1500 - 1503
  • [5] HTK-Based Recognition of Whispered Speech
    Galic, Jovan
    Jovicic, Slobodan T.
    Grozdic, Dorde
    Markovic, Branko
    SPEECH AND COMPUTER, 2014, 8773 : 251 - 258
  • [6] Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation
    Hailu, Nirayo
    Siegert, Ingo
    Nurnberger, Andreas
    2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,
  • [7] Analysis and recognition of whispered speech
    Ito, T
    Takeda, K
    Itakura, F
    SPEECH COMMUNICATION, 2005, 45 (02) : 139 - 152
  • [8] Group Delay based Methods for Detection and Recognition of Whispered Speech
    Vedvyasan, Kishore
    Nathwani, Karan
    Hegde, Rajesh M.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 499 - 505
  • [9] Acoustic analysis and recognition of whispered speech
    Itoh, T
    Takeda, K
    Itakura, F
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 389 - 392
  • [10] Study on the Emotion Recognition of Whispered Speech
    Jin, Yun
    Zhao, Yan
    Huang, Chengwei
    Zhao, Li
    PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL III, 2009, : 242 - 246