Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems

被引:0
|
作者
Viet The Bui [1 ]
Tho Chi Luong [2 ]
Oanh Thi Tran [3 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore
[2] FPT Univ, FPT Technol Res Inst, Hanoi, Vietnam
[3] Vietnam Natl Univ Hanoi, Int Sch, Hanoi, Vietnam
关键词
ASR; named entity recognition; post-processing; punctuator; text normalization; transformer-based joint learning models;
D O I
10.1080/01969722.2022.2145654
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e., period, comma); and detecting and standardizing named entities (i.e., numbers, person names) from spoken forms to their appropriate written forms. To achieve these goals, we introduce a complete corpus including of 87,700 sentences and investigate conditional joint learning approaches which globally optimize two sub-tasks simultaneously. The experimental results are quite promising. Overall, the proposed architecture outperformed the conventional architecture which trains individual models on the two sub-tasks separately. The joint models are furthered improved when integrated with the surrounding contexts (SCs). Specifically, we obtained 81.13% for the first sub-task and 94.41% for the second sub-task in the F1 scores using the best model.
引用
收藏
页码:1614 / 1630
页数:17
相关论文
共 50 条
  • [31] Automatic text summarization using transformer-based language models
    Rao, Ritika
    Sharma, Sourabh
    Malik, Nitin
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
  • [32] Transformer-based Question Text Generation in the Learning System
    Li, Jiajun
    Song, Huazhu
    Li, Jun
    6TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE, ICIAI2022, 2022, : 50 - 56
  • [33] Recovering Capitalization for Automatic Speech Recognition of Vietnamese using Transformer and Chunk Merging
    Hien Nguyen Thi Thu
    Binh Nguyen Thai
    Hung Nguyen Vu Bao
    Truong Do Quoc
    Mai Luong Chi
    Huyen Nguyen Thi Minh
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 430 - 434
  • [34] A Transformer-Based Contrastive Semi-Supervised Learning Framework for Automatic Modulation Recognition
    Kong, Weisi
    Jiao, Xun
    Xu, Yuhua
    Zhang, Bolin
    Yang, Qinghai
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2023, 9 (04) : 950 - 962
  • [35] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
  • [36] A transformer-based approach to Nigerian Pidgin text generation
    Garba, Kabir
    Kolajo, Taiwo
    Agbogun, Joshua B.
    International Journal of Speech Technology, 2024, 27 (04) : 1027 - 1037
  • [37] Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition
    Park, Sunchan
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 515 - 522
  • [38] UNTIED POSITIONAL ENCODINGS FOR EFFICIENT TRANSFORMER-BASED SPEECH RECOGNITION
    Samarakoon, Lahiru
    Fung, Ivan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 108 - 114
  • [39] Multimodal Integration of Mel Spectrograms and Text Transcripts for Enhanced Automatic Speech Recognition: Leveraging Extractive Transformer-Based Approaches and Late Fusion Strategies
    Mehra, Sunakshi
    Ranga, Virender
    Agarwal, Ritu
    COMPUTATIONAL INTELLIGENCE, 2024, 40 (06)
  • [40] Intra-ensemble: A New Method for Combining Intermediate Outputs in Transformer-based Automatic Speech Recognition
    Kim, DoHee
    Choi, Jieun
    Chang, Joon-Hyuk
    INTERSPEECH 2023, 2023, : 2203 - 2207