Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems

被引:0
|
作者
Viet The Bui [1 ]
Tho Chi Luong [2 ]
Oanh Thi Tran [3 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore
[2] FPT Univ, FPT Technol Res Inst, Hanoi, Vietnam
[3] Vietnam Natl Univ Hanoi, Int Sch, Hanoi, Vietnam
关键词
ASR; named entity recognition; post-processing; punctuator; text normalization; transformer-based joint learning models;
D O I
10.1080/01969722.2022.2145654
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e., period, comma); and detecting and standardizing named entities (i.e., numbers, person names) from spoken forms to their appropriate written forms. To achieve these goals, we introduce a complete corpus including of 87,700 sentences and investigate conditional joint learning approaches which globally optimize two sub-tasks simultaneously. The experimental results are quite promising. Overall, the proposed architecture outperformed the conventional architecture which trains individual models on the two sub-tasks separately. The joint models are furthered improved when integrated with the surrounding contexts (SCs). Specifically, we obtained 81.13% for the first sub-task and 94.41% for the second sub-task in the F1 scores using the best model.
引用
收藏
页码:1614 / 1630
页数:17
相关论文
共 50 条
  • [41] A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
    Du, Yeqian
    Zhang, Jie
    Zhu, Qiu-shi
    Dai, Lirong
    Wu, MingHui
    Fang, Xin
    Yang, ZhouWang
    INTERSPEECH 2022, 2022, : 2613 - 2617
  • [42] Transformer-based approach for symptom recognition and multilingual linking
    Vassileva, Sylvia
    Grazhdanski, Georgi
    Koychev, Ivan
    Boytcheva, Svetla
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2024, 2024
  • [43] A Transformer-Based Approach for Better Hand Gesture Recognition
    Besrour, Sinda
    Surapaneni, Yogesh
    Mubibya, Gael S.
    Ashkar, Fahim
    Almhana, Jalal
    20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 1135 - 1140
  • [44] A window attention based Transformer for Automatic Speech Recognition
    Feng, Zhao
    Li, Yongming
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 449 - 454
  • [45] Simulating reading mistakes for child speech Transformer-based phone recognition
    Gelin, Lucile
    Pellegrini, Thomas
    Pinquier, Julien
    Daniel, Morgane
    INTERSPEECH 2021, 2021, : 3860 - 3864
  • [46] Transfer Learning for Automatic Speech Recognition Systems
    Asefisaray, Behnam
    Haznedaroglu, Ali
    Erden, Mustafa
    Arslan, Levent M.
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [47] End to end transformer-based contextual speech recognition based on pointer network
    Lin, Binghuai
    Wang, Liyuan
    INTERSPEECH 2021, 2021, : 2087 - 2091
  • [48] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
    Oh, Yoo Rhee
    Park, Kiyoung
    Park, Jeon Gue
    ETRI JOURNAL, 2022, 44 (03) : 476 - 490
  • [49] Vietnamese Voice2Text: A Web Application for Whisper Implementation in Vietnamese Automatic Speech Recognition Tasks: Vietnamese Voice2Text
    Nguyen, Quangphuoc
    Nguyen, Ngocminh
    Dang, Thanhluan
    Tran, Vanha
    ACM International Conference Proceeding Series, 2023, : 312 - 318
  • [50] STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION
    Gaur, Yashesh
    Kibre, Nick
    Xue, Jian
    Shu, Kangyuan
    Wang, Yuhui
    Alphanso, Issac
    Li, Jinyu
    Gong, Yifan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 237 - 244