Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction

被引:0
|
作者
Campos-Soberanis, Mario [1 ]
Campos-Sobrino, Diego [1 ]
Viana-Camara, Rafael [1 ]
机构
[1] SoldAI Res, Merida, Yucatan, Mexico
关键词
Automatic speech recognition; Phonetic correction; Neural networks; Named entity recognition;
D O I
10.1007/978-3-030-89820-5_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the successful implementation of a conversational speech recognition system applied to telephonic sales performed by an autonomous agent. Our implementation uses a post-processing corrector based on phonetic representations of text and subsequent neural network classifier. The classifier assesses the proposed correction's relevance to reduce the errors in the transcript sent to a downstream Natural Language Understanding engine. The experiments were carried on correcting transcripts from real audios of orders placed by customers of a large bottling company. We measured the Word Error Rate of the corrected transcripts against human-annotated ground-truth to verify the improvement produced by the system. To evaluate the corrections' impact on the entities detected by the Natural Language Understanding engine, we used Jaccard distance, Precision, Recall, and F-1. Results show that the implemented system and architecture enhance the transcript relative Word Error Rate on a 39% and Jaccard distance on 13% in comparison to the Automatic Speech Recognition baseline, making them suitable for real-time telephonic sales systems implementation.
引用
收藏
页码:46 / 58
页数:13
相关论文
共 50 条
  • [21] Improving RNN Transducer Acoustic Models for English Conversational Speech Recognition
    Cui, Xiaodong
    Saon, George
    Kingsbury, Brian
    INTERSPEECH 2023, 2023, : 1299 - 1303
  • [22] USING A CONNECTIONIST NETWORK TO ELIMINATE REDUNDANCY FROM A PHONETIC LATTICE IN A SPEECH RECOGNITION SYSTEM
    CAHAREL, MH
    MICLET, L
    FIRST IEE INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1989, : 301 - 305
  • [23] Conversational speech recognition using acoustic and articulatory input
    Kirchhoff, K
    Fink, GA
    Sagerer, G
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1435 - 1438
  • [24] Development of a phonetic system for large vocabulary Arabic speech recognition
    Gales, M. J. F.
    Diehl, F.
    Raut, C. K.
    Tomalin, M.
    Woodland, P. C.
    Yu, K.
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 24 - 29
  • [25] Implementation of Phonetic Level Speech Recognition System for Punjabi Language
    Mittal, Shama
    Kaur, Rupinderdeep
    2016 1ST INDIA INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING (IICIP), 2016,
  • [26] Cross-sentence Neural Language Models for Conversational Speech Recognition
    Chiu, Shih-Hsuan
    Lo, Tien-Hong
    Chen, Berlin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [27] The Influence of Errors in Phonetic Annotations on Performance of Speech Recognition System
    Safarik, Radek
    Mateju, Lukas
    Weingartova, Lenka
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 419 - 427
  • [28] Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
    Li, Ke
    Xu, Hainan
    Wang, Yiming
    Povey, Daniel
    Khudanpur, Sanjeev
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3373 - 3377
  • [29] Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure
    Hanzlicek, Zdenek
    Matousek, Jindrich
    Vit, Jakub
    COMPUTATIONAL INTELLIGENCE, 2024, 40 (02)
  • [30] The IBM 2016 English Conversational Telephone Speech Recognition System
    Saon, George
    Sercu, Tom
    Rennie, Steven
    Kuo, Hong-Kwang J.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 7 - 11