Transformer-Based Joint Learning Approach for Text Normalization in Vietnamese Automatic Speech Recognition Systems

被引：0

作者：

Viet The Bui ^{[1
]}

Tho Chi Luong ^{[2
]}

Oanh Thi Tran ^{[3
]}

机构：

[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore

[2] FPT Univ, FPT Technol Res Inst, Hanoi, Vietnam

[3] Vietnam Natl Univ Hanoi, Int Sch, Hanoi, Vietnam

来源：

CYBERNETICS AND SYSTEMS | 2024年 / 55卷 / 07期

关键词：

ASR; named entity recognition; post-processing; punctuator; text normalization; transformer-based joint learning models;

D O I：

10.1080/01969722.2022.2145654

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we investigate the task of normalizing transcribed texts in Vietnamese Automatic Speech Recognition (ASR) systems in order to improve user readability and the performance of downstream tasks. This task usually consists of two main sub-tasks: predicting and inserting punctuation (i.e., period, comma); and detecting and standardizing named entities (i.e., numbers, person names) from spoken forms to their appropriate written forms. To achieve these goals, we introduce a complete corpus including of 87,700 sentences and investigate conditional joint learning approaches which globally optimize two sub-tasks simultaneously. The experimental results are quite promising. Overall, the proposed architecture outperformed the conventional architecture which trains individual models on the two sub-tasks separately. The joint models are furthered improved when integrated with the surrounding contexts (SCs). Specifically, we obtained 81.13% for the first sub-task and 94.41% for the second sub-task in the F1 scores using the best model.

引用

页码：1614 / 1630

页数：17

共 50 条

[41] A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Du, Yeqian
Zhang, Jie
Zhu, Qiu-shi
Dai, Lirong
Wu, MingHui
Fang, Xin
Yang, ZhouWang
INTERSPEECH 2022, 2022, : 2613 - 2617
[42] Transformer-based approach for symptom recognition and multilingual linking
Vassileva, Sylvia
Grazhdanski, Georgi
Koychev, Ivan
Boytcheva, Svetla
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2024, 2024
[43] A Transformer-Based Approach for Better Hand Gesture Recognition
Besrour, Sinda
Surapaneni, Yogesh
Mubibya, Gael S.
Ashkar, Fahim
Almhana, Jalal
20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 1135 - 1140
[44] A window attention based Transformer for Automatic Speech Recognition
Feng, Zhao
Li, Yongming
2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 449 - 454
[45] Simulating reading mistakes for child speech Transformer-based phone recognition
Gelin, Lucile
Pellegrini, Thomas
Pinquier, Julien
Daniel, Morgane
INTERSPEECH 2021, 2021, : 3860 - 3864
[46] Transfer Learning for Automatic Speech Recognition Systems
Asefisaray, Behnam
Haznedaroglu, Ali
Erden, Mustafa
Arslan, Levent M.
2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
[47] End to end transformer-based contextual speech recognition based on pointer network
Lin, Binghuai
Wang, Liyuan
INTERSPEECH 2021, 2021, : 2087 - 2091
[48] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
Oh, Yoo Rhee
Park, Kiyoung
Park, Jeon Gue
ETRI JOURNAL, 2022, 44 (03) : 476 - 490
[49] Vietnamese Voice2Text: A Web Application for Whisper Implementation in Vietnamese Automatic Speech Recognition Tasks: Vietnamese Voice2Text
Nguyen, Quangphuoc
Nguyen, Ngocminh
Dang, Thanhluan
Tran, Vanha
ACM International Conference Proceeding Series, 2023, : 312 - 318
[50] STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION
Gaur, Yashesh
Kibre, Nick
Xue, Jian
Shu, Kangyuan
Wang, Yuhui
Alphanso, Issac
Li, Jinyu
Gong, Yifan
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 237 - 244

← 1 2 3 4 5 →