Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems

被引:0
|
作者
Viet The Bui [1 ]
Tho Chi Luong [2 ]
Oanh Thi Tran [3 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore
[2] FPT Univ, FPT Technol Res Inst, Hanoi, Vietnam
[3] Vietnam Natl Univ Hanoi, Int Sch, Hanoi, Vietnam
关键词
ASR; named entity recognition; post-processing; punctuator; text normalization; transformer-based joint learning models;
D O I
10.1080/01969722.2022.2145654
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e., period, comma); and detecting and standardizing named entities (i.e., numbers, person names) from spoken forms to their appropriate written forms. To achieve these goals, we introduce a complete corpus including of 87,700 sentences and investigate conditional joint learning approaches which globally optimize two sub-tasks simultaneously. The experimental results are quite promising. Overall, the proposed architecture outperformed the conventional architecture which trains individual models on the two sub-tasks separately. The joint models are furthered improved when integrated with the surrounding contexts (SCs). Specifically, we obtained 81.13% for the first sub-task and 94.41% for the second sub-task in the F1 scores using the best model.
引用
收藏
页码:1614 / 1630
页数:17
相关论文
共 50 条
  • [21] Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition
    Tanaka, Tomohiro
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Moriya, Takafumi
    Ashihara, Takanori
    Orihashi, Shota
    Makishima, Naoki
    INTERSPEECH 2021, 2021, : 4059 - 4063
  • [22] Transfer Learning of Transformer-Based Speech Recognition Models from Czech to Slovak
    Lehecka, Jan
    Psutka, Josef, V
    Psutka, Josef
    TEXT, SPEECH, AND DIALOGUE, TSD 2023, 2023, 14102 : 328 - 338
  • [23] A Light Transformer-Based Architecture for Handwritten Text Recognition
    Barrere, Killian
    Soullard, Yann
    Lemaitre, Aurelie
    Couasnon, Bertrand
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 275 - 290
  • [24] TRANSFORMER-BASED ACOUSTIC MODELING FOR HYBRID SPEECH RECOGNITION
    Wang, Yongqiang
    Mohamed, Abdelrahman
    Le, Duc
    Liu, Chunxi
    Xiao, Alex
    Mahadeokar, Jay
    Huang, Hongzhao
    Tjandra, Andros
    Zhang, Xiaohui
    Zhang, Frank
    Fuegen, Christian
    Zweig, Geoffrey
    Seltzer, Michael L.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6874 - 6878
  • [25] RM-Transformer: A Transformer-based Model for Mandarin Speech Recognition
    Lu, Xingyu
    Hu, Jianguo
    Li, Shenhao
    Ding, Yanyu
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 194 - 198
  • [26] Transformer-based approach for joint handwriting and named entity recognition in historical document
    Rouhou, Ahmed Cheikh
    Dhiaf, Marwa
    Kessentini, Yousri
    Ben Salem, Sinda
    PATTERN RECOGNITION LETTERS, 2022, 155 : 128 - 134
  • [27] TRANSFORMER-BASED TEXT-TO-SPEECH WITH WEIGHTED FORCED ATTENTION
    Okamoto, Takuma
    Toda, Tomoki
    Shiga, Yoshinori
    Kawai, Hisashi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6729 - 6733
  • [28] EXPRESSIVITY TRANSFER IN TRANSFORMER-BASED TEXT-TO-SPEECH SYNTHESIS
    Hamed, Mohamed
    Lachiri, Zied
    2024 IEEE 7TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES, SIGNAL AND IMAGE PROCESSING, ATSIP 2024, 2024, : 443 - 448
  • [29] RobuTrans: A Robust Transformer-Based Text-to-Speech Model
    Li, Naihan
    Liu, Yanqing
    Wu, Yu
    Liu, Shujie
    Zhao, Sheng
    Liu, Ming
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8228 - 8235
  • [30] Cyberbullying Text Identification: A Deep Learning and Transformer-based Language Modeling Approach
    Saifullah K.
    Khan M.I.
    Jamal S.
    Sarker I.H.
    EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2024, 11 (01) : 1 - 12