Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

被引:0
|
作者
Galic, Jovan [1 ]
Markovic, Branko [2 ]
Grozdic, Dorde [3 ,4 ]
Popovic, Branislav [5 ]
Sajic, Slavko [1 ]
机构
[1] Univ Banja Luka, Fac Elect Engn, Dept Telecommun, Banja Luka 78000, Bosnia & Herceg
[2] Univ Kragujevac, Fac Tech Sci, Dept Comp & Software Engn, Cacak 32000, Serbia
[3] Grid Dynamics, Belgrade 11000, Serbia
[4] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia
[5] Univ Novi Sad, Fac Tech Sci, Novi Sad 21000, Serbia
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 18期
关键词
artificial neural networks; audio databases; automatic speech recognition; convolutional neural network; hidden Markov models; inverse filtering; whispered speech;
D O I
10.3390/app14188223
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recognition. Creating large databases of whispered speech is expensive and time-consuming, so research studies explore the synthetic generation using pre-existing normal or whispered speech databases. The impact of standard audio data augmentation techniques on the accuracy of isolated-word recognizers based on Hidden Markov Models (HMM) and Convolutional Neural Networks (CNN) is examined in this research study. Furthermore, the study explores the potential of inverse filtering as an augmentation strategy for producing pseudo-whisper speech. The Whi-Spe speech database, containing recordings in normal and whisper phonation, is utilized for data augmentation, while the internally recorded speech database, developed specifically for this study, is employed for testing purposes. Experimental results demonstrate statistically significant improvement in performance when employing data augmentation strategies and inverse filtering.
引用
收藏
页数:20
相关论文
共 50 条
  • [21] Speech emotion recognition using data augmentation
    Praseetha, V. M.
    Joby, P. P.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 25 (4) : 783 - 792
  • [22] Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation
    Tao, Huawei
    Shan, Shuai
    Hu, Ziyi
    Zhu, Chunhua
    Ge, Hongyi
    ENTROPY, 2023, 25 (01)
  • [23] Adversarial Data Augmentation for Disordered Speech Recognition
    Jin, Zengrui
    Geng, Mengzhe
    Xie, Xurong
    Yu, Jianwei
    Liu, Shansong
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4803 - 4807
  • [24] SNR-Selection-Based-Data Augmentation for Dysarthric Speech Recognition
    Nawroly, Sarkhell Sirwan
    Popescu, Decebal Gheorghe
    Antony, Mariya Celin Thekekara
    Philominal, Actlin Jeeva Muthu
    STUDIES IN INFORMATICS AND CONTROL, 2023, 32 (04): : 129 - 140
  • [25] Data Augmentation Based on Frequency Warping for Recognition of Cleft Palate Speech
    Fujiwara, Kento
    Takashima, Ryoichi
    Sugiyama, Chihiro
    Tanaka, Nobukazu
    Nohara, Kanji
    Nozaki, Kazunori
    Takiguchi, Tetsuya
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 471 - 476
  • [26] Performance Analysis of Mandarin Whispered Speech Recognition Based on Normal Speech Training Model
    Chen Xueqin
    Zhao Heming
    Fan Xiaohe
    2016 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2016, : 548 - 551
  • [27] Hypo and Hyperarticulated Speech Data Augmentation for Spontaneous Speech Recognition
    Lee, Sung Joo
    Kang, Byung-Ok
    Chung, Hoon
    Park, Jeon Gue
    Lee, Yun Keun
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2080 - 2084
  • [28] A STUDY ON DATA AUGMENTATION OF REVERBERANT SPEECH FOR ROBUST SPEECH RECOGNITION
    Ko, Tom
    Peddinti, Vijayaditya
    Povey, Daniel
    Seltzer, Michael L.
    Khudanpur, Sanjeev
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5220 - 5224
  • [29] Data Augmentation using Healthy Speech for Dysarthric Speech Recognition
    Vachhani, Bhavik
    Bhat, Chitralekha
    Kopparapu, Sunil Kumar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 471 - 475
  • [30] Mandarin Connected Digits Recognition for Whispered Speech
    Ru Tingting
    Xie Xiang
    Yin Hui
    Kuang Jingming
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1141 - 1144