Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction

被引:0
|
作者
Campos-Soberanis, Mario [1 ]
Campos-Sobrino, Diego [1 ]
Viana-Camara, Rafael [1 ]
机构
[1] SoldAI Res, Merida, Yucatan, Mexico
关键词
Automatic speech recognition; Phonetic correction; Neural networks; Named entity recognition;
D O I
10.1007/978-3-030-89820-5_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the successful implementation of a conversational speech recognition system applied to telephonic sales performed by an autonomous agent. Our implementation uses a post-processing corrector based on phonetic representations of text and subsequent neural network classifier. The classifier assesses the proposed correction's relevance to reduce the errors in the transcript sent to a downstream Natural Language Understanding engine. The experiments were carried on correcting transcripts from real audios of orders placed by customers of a large bottling company. We measured the Word Error Rate of the corrected transcripts against human-annotated ground-truth to verify the improvement produced by the system. To evaluate the corrections' impact on the entities detected by the Natural Language Understanding engine, we used Jaccard distance, Precision, Recall, and F-1. Results show that the implemented system and architecture enhance the transcript relative Word Error Rate on a 39% and Jaccard distance on 13% in comparison to the Automatic Speech Recognition baseline, making them suitable for real-time telephonic sales systems implementation.
引用
收藏
页码:46 / 58
页数:13
相关论文
共 50 条
  • [1] Improving a Conversational Speech Recognition System Using Phonetic and Neural Transcript Correction
    Campos-Soberanis, Mario
    Campos-Sobrino, Diego
    Viana-Cámara, Rafael
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, 13068 LNAI : 46 - 58
  • [2] Improving English Conversational Telephone Speech Recognition
    Medennikov, Ivan
    Prudnikov, Alexey
    Zatvornitskiy, Alexander
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2 - 6
  • [3] Two Protocols Comparing Human and Machine Phonetic Recognition Performance in Conversational Speech
    Shen, Wade
    Olive, Joseph
    Jones, Douglas
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1630 - +
  • [4] Random-Forests-based phonetic decision trees for conversational speech recognition
    Xue, Jian
    Zhao, Yunxin
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4169 - 4172
  • [5] Random forests of phonetic decision trees for acoustic modeling in conversational speech recognition
    Xue, Jian
    Zhao, Yunxin
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 519 - 528
  • [6] A BAYESIAN APPROACH FOR PHONETIC DECISION TREE STATE TYING IN CONVERSATIONAL SPEECH RECOGNITION
    Hu, Rusheng
    Zhao, Yunxin
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 661 - +
  • [7] LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION
    Punjabi, Surabhi
    Arsikere, Harish
    Garimella, Sri
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 487 - 493
  • [8] Recent advances in conversational speech recognition using conventional and recurrent neural networks
    Saon, G.
    Picheny, M.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2017, 61 (4-5)
  • [9] THE MICROSOFT 2016 CONVERSATIONAL SPEECH RECOGNITION SYSTEM
    Xiong, W.
    Droppo, J.
    Huang, X.
    Seide, F.
    Seltzer, M.
    Stolcke, A.
    Yu, D.
    Zweig, G.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5255 - 5259
  • [10] LINGUISTIC PROCESSOR IN A CONVERSATIONAL SPEECH RECOGNITION SYSTEM
    SHIKANO, K
    KOHDA, M
    REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1978, 26 (11-1): : 1505 - 1520