A Performance Comparison of Commercial Speech Recognition APIs in Noisy Environments

被引:0
|
作者
Lee G. [2 ]
Lee S. [2 ]
Ji S. [3 ]
Kim A. [1 ,3 ]
Im H. [1 ,3 ]
机构
[1] Dept. of Computer Science and Engineering, Dept. of Convergence Security, Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University
[2] Dept. of Convergence Security, Kangwon National University
[3] Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University
基金
新加坡国家研究基金会;
关键词
Character error rate; Noisy environment; Speech recognition; Word error rate;
D O I
10.5370/KIEE.2022.71.9.1266
中图分类号
学科分类号
摘要
This paper compares the performance of five commercial speech recognition APIs under noisy environments, namely those provided by Amazon AWS, Microsoft Azure, Google, Kakao, and Naver. To this end, we used an open dataset for development and evaluation of multi-channel noise processing technology provided in AI Hub. We tested each API's performance with respect to the speaker's gender and location and the speech content, and measured their error rate using both word error rate (WER) and character error rate (CER). Except for the AWS API, the error rate was higher when tested with female's data than male's one, and when tested with the data recorded from the side than the front. The error rate was also relatively high when the test sentences contained proper nouns such as person's names and local names, and the shorter the sentences, the higher the error rate. Moreover, the Google API outperformed all the others in terms of both WER and CER, with 53% and 18% of error rate, respectively. © 2022 Korean Institute of Electrical Engineers. All rights reserved.
引用
收藏
页码:1266 / 1273
页数:7
相关论文
共 50 条
  • [31] TALKER-INDEPENDENT SPEECH RECOGNITION IN COMMERCIAL ENVIRONMENTS
    MOSHIER, S
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 : S132 - S132
  • [32] SPEECH RECOGNITION WITH NO SPEECH OR WITH NOISY SPEECH
    Krishna, Gautam
    Co Tran
    Yu, Jianguo
    Tewfik, Ahmed H.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1090 - 1094
  • [33] A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments
    Visser, E
    Otsuka, M
    Lee, TW
    SPEECH COMMUNICATION, 2003, 41 (2-3) : 393 - 407
  • [34] Group Attack Dingo Optimizer for enhancing speech recognition in noisy environments
    Kumar, T. N. Mahesh
    Kumar, K. Ganesh
    Deepak, K. T.
    Narasimhadhan, A. V.
    EUROPEAN PHYSICAL JOURNAL PLUS, 2023, 138 (12):
  • [35] Group Attack Dingo Optimizer for enhancing speech recognition in noisy environments
    T. N. Mahesh Kumar
    K. Ganesh Kumar
    K. T. Deepak
    A. V. Narasimhadhan
    The European Physical Journal Plus, 138
  • [36] Auditory model for robust speech recognition in real world noisy environments
    Kim, DS
    Lee, SY
    Kil, RM
    Zhu, XL
    ELECTRONICS LETTERS, 1997, 33 (01) : 12 - 13
  • [37] Blind source extraction for robust speech recognition in multisource noisy environments
    Nesta, Francesco
    Matassoni, Marco
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 703 - 725
  • [38] Speech Recognition using Deep Canonical Correlation Analysis in Noisy Environments
    Isobe, Shinnosuke
    Tamura, Satoshi
    Hayamizu, Satoru
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 63 - 70
  • [39] Prosodic Features and Formant Contribution for Arabic Speech Recognition in Noisy Environments
    Amrous, Anissa Imen
    Debyeche, Mohamed
    Amrouche, Abderrahman
    SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS, 6TH INTERNATIONAL CONFERENCE SOCO 2011, 2011, 87 : 465 - 474
  • [40] Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments
    Boril, Hynek
    Hansen, John H. L.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1379 - 1393