A Performance Comparison of Commercial Speech Recognition APIs in Noisy Environments

被引:0
|
作者
Lee G. [2 ]
Lee S. [2 ]
Ji S. [3 ]
Kim A. [1 ,3 ]
Im H. [1 ,3 ]
机构
[1] Dept. of Computer Science and Engineering, Dept. of Convergence Security, Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University
[2] Dept. of Convergence Security, Kangwon National University
[3] Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University
基金
新加坡国家研究基金会;
关键词
Character error rate; Noisy environment; Speech recognition; Word error rate;
D O I
10.5370/KIEE.2022.71.9.1266
中图分类号
学科分类号
摘要
This paper compares the performance of five commercial speech recognition APIs under noisy environments, namely those provided by Amazon AWS, Microsoft Azure, Google, Kakao, and Naver. To this end, we used an open dataset for development and evaluation of multi-channel noise processing technology provided in AI Hub. We tested each API's performance with respect to the speaker's gender and location and the speech content, and measured their error rate using both word error rate (WER) and character error rate (CER). Except for the AWS API, the error rate was higher when tested with female's data than male's one, and when tested with the data recorded from the side than the front. The error rate was also relatively high when the test sentences contained proper nouns such as person's names and local names, and the shorter the sentences, the higher the error rate. Moreover, the Google API outperformed all the others in terms of both WER and CER, with 53% and 18% of error rate, respectively. © 2022 Korean Institute of Electrical Engineers. All rights reserved.
引用
收藏
页码:1266 / 1273
页数:7
相关论文
共 50 条
  • [41] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
  • [42] Better speech recognition accuracy in noisy environments using voice extraction
    Zuluaga, W
    Amalashekaran, K
    Chisholm, J
    PROCEEDINGS OF THE 43RD IEEE MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I-III, 2000, : 596 - 598
  • [43] An experiment of Moroccan dialect speech recognition in noisy environments using PocketSphinx
    Ouisaadane A.
    Safi S.
    Frikel M.
    International Journal of Speech Technology, 2024, 27 (02) : 329 - 339
  • [44] Automatic speech/speaker recognition in noisy environments using wavelet transform
    Alkhaldi, W
    Fakhr, W
    Hamdy, N
    2002 45TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL I, CONFERENCE PROCEEDINGS, 2002, : 463 - 466
  • [45] Performance Estimation of Noisy Speech Recognition Considering Recognition Task Complexity
    Yamada, Takeshi
    Nakajima, Tomohiro
    Kitawaki, Nobuhiko
    Makino, Shoji
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2042 - 2045
  • [46] Advancing Speech Recognition With No Speech Or With Noisy Speech
    Krishna, Gautam
    Tran, Co
    Carnahan, Mason
    Tewfik, Ahmed
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [47] An effective cluster-based model for robust speech detection and speech recognition in noisy environments
    Gorriz, J. M.
    Ramirez, J.
    Segura, J. C.
    Puntonet, C. G.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (01): : 470 - 481
  • [48] Auditory processing of speech signals for robust speech recognition in real-world noisy environments
    Kim, DS
    Lee, SY
    Kil, RM
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (01): : 55 - 69
  • [49] Speech enhancement method based on feature compensation gain for effective speech recognition in noisy environments
    Bae, Ara
    Kim, Wooil
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (01): : 51 - 55
  • [50] An effective cluster-based model for robust speech detection and speech recognition in noisy environments
    Górriz, J.M.
    Ramírez, J.
    Segura, J.C.
    Puntonet, C.G.
    Journal of the Acoustical Society of America, 2006, 120 (01): : 470 - 481