Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction

被引:0
|
作者
Campos-Soberanis, Mario [1 ]
Campos-Sobrino, Diego [1 ]
Viana-Camara, Rafael [1 ]
机构
[1] SoldAI Res, Merida, Yucatan, Mexico
关键词
Automatic speech recognition; Phonetic correction; Neural networks; Named entity recognition;
D O I
10.1007/978-3-030-89820-5_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the successful implementation of a conversational speech recognition system applied to telephonic sales performed by an autonomous agent. Our implementation uses a post-processing corrector based on phonetic representations of text and subsequent neural network classifier. The classifier assesses the proposed correction's relevance to reduce the errors in the transcript sent to a downstream Natural Language Understanding engine. The experiments were carried on correcting transcripts from real audios of orders placed by customers of a large bottling company. We measured the Word Error Rate of the corrected transcripts against human-annotated ground-truth to verify the improvement produced by the system. To evaluate the corrections' impact on the entities detected by the Natural Language Understanding engine, we used Jaccard distance, Precision, Recall, and F-1. Results show that the implemented system and architecture enhance the transcript relative Word Error Rate on a 39% and Jaccard distance on 13% in comparison to the Automatic Speech Recognition baseline, making them suitable for real-time telephonic sales systems implementation.
引用
收藏
页码:46 / 58
页数:13
相关论文
共 50 条
  • [31] The IBM 2015 English Conversational Telephone Speech Recognition System
    Saon, George
    Kuo, Hong-Kwang J.
    Rennie, Steven
    Picheny, Michael
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3140 - 3144
  • [32] Implementation of Phonetic Level Speech Recognition in Kannada using HTK
    Priya, Jeeva K.
    Sree, S. Sowmya
    Navya, T. V. S.
    Gupta, Deepa
    PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 82 - 85
  • [33] Lithuanian Speech Recognition Using Purely Phonetic Deep Learning
    Pipiras, Laurynas
    Maskeliunas, Rytis
    Damasevicius, Robertas
    COMPUTERS, 2019, 8 (04)
  • [34] Implementation of Tamil speech recognition system using neural networks
    Saraswathi, S
    Geetha, TV
    APPLIED COMPUTING, PROCEEDINGS, 2004, 3285 : 169 - 176
  • [35] Speech Recognition System Based On Phonemes Using Neural Networks
    Maheswari, N. Uma
    Kabilan, A. P.
    Venkatesh, R.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (07): : 148 - 153
  • [36] An Analysis of Deep Neural Networks in Broad Phonetic Classes for Noisy Speech Recognition
    de-la-Calle-Silos, F.
    Gallardo-Antolin, A.
    Pelaez-Moreno, C.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 87 - 96
  • [37] A robust speech disorders correction system for Arabic language using visual speech recognition
    Farag, Ahmed
    El Adawy, Mohamed
    Ismail, Ahmed
    BIOMEDICAL RESEARCH-INDIA, 2013, 24 (02): : 185 - 192
  • [38] RECOGNITION OF PHONETIC LABELS OF THE TIMIT SPEECH CORPUS BY MEANS OF AN ARTIFICIAL NEURAL NETWORK
    WU, JX
    CHAN, C
    PATTERN RECOGNITION, 1991, 24 (11) : 1085 - 1091
  • [40] PHRASE RECOGNITION IN CONVERSATIONAL SPEECH USING PROSODIC AND PHONEMIC INFORMATION
    OKAWA, S
    ENDO, T
    KOBAYASHI, T
    SHIRAI, K
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (01) : 44 - 50