Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction

被引:0
|
作者
Campos-Soberanis, Mario [1 ]
Campos-Sobrino, Diego [1 ]
Viana-Camara, Rafael [1 ]
机构
[1] SoldAI Res, Merida, Yucatan, Mexico
关键词
Automatic speech recognition; Phonetic correction; Neural networks; Named entity recognition;
D O I
10.1007/978-3-030-89820-5_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the successful implementation of a conversational speech recognition system applied to telephonic sales performed by an autonomous agent. Our implementation uses a post-processing corrector based on phonetic representations of text and subsequent neural network classifier. The classifier assesses the proposed correction's relevance to reduce the errors in the transcript sent to a downstream Natural Language Understanding engine. The experiments were carried on correcting transcripts from real audios of orders placed by customers of a large bottling company. We measured the Word Error Rate of the corrected transcripts against human-annotated ground-truth to verify the improvement produced by the system. To evaluate the corrections' impact on the entities detected by the Natural Language Understanding engine, we used Jaccard distance, Precision, Recall, and F-1. Results show that the implemented system and architecture enhance the transcript relative Word Error Rate on a 39% and Jaccard distance on 13% in comparison to the Automatic Speech Recognition baseline, making them suitable for real-time telephonic sales systems implementation.
引用
收藏
页码:46 / 58
页数:13
相关论文
共 50 条
  • [41] Neural fuzzy training approach for improving speech recognition
    Komori, Yasuhiro, 1600, Publ by Scripta Technica Inc, New York, NY, United States (24):
  • [42] LISTEN, ATTEND AND SPELL: A NEURAL NETWORK FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION
    Chan, William
    Jaitly, Navdeep
    Quoc Le
    Vinyals, Oriol
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4960 - 4964
  • [43] Improving Speech Emotion Recognition System Using Spectral and Prosodic Features
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 399 - 409
  • [44] A NARROW BAND SPEECH TRANSMISSION SYSTEM EMPLOYING THE RECOGNITION OF PHONETIC ELEMENTS
    DUDLEY, H
    DAVIS, KH
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953, 25 (04): : 831 - 832
  • [45] The 2001 BYBLOS english large vocabulary conversational speech recognition system
    Matsoukas, S
    Colthurst, T
    Kimball, O
    Solomonoff, A
    Richardson, F
    Quillen, C
    Gish, H
    Dognin, P
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 721 - 724
  • [46] The BBN Byblos 1997 Large Vocabulary conversational Speech Recognition system
    Zavaliagkos, G
    McDonough, J
    Miller, D
    El-Jaroudi, A
    Billa, J
    Richardson, F
    Ma, K
    Siu, M
    Gish, H
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 905 - 908
  • [47] Speech recognition using neural networks
    Khan, SU
    Sharma, G
    Rao, PRK
    PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY 2000, VOLS 1 AND 2, 2000, : 432 - 437
  • [48] SPEECH RECOGNITION USING NEURAL NETWORKS
    Kumar, T. Lalith
    Kumar, T. Kishore
    Rajan, K. Soundar
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2009, : 248 - +
  • [49] Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech
    Likitsupin, Krerksak
    Punyabukkana, Proadpran
    Wutiwiwatchai, Chai
    Suchato, Atiwong
    ENGINEERING JOURNAL-THAILAND, 2016, 20 (02): : 179 - 197
  • [50] Using broad phonetic group-experts for improved speech recognition
    Scanlon, Patricia
    Ellis, Daniel P. W.
    Reilly, Richard B.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 803 - 812