Vietnamese Automatic Speech Recognition for Financial Conversation Data

被引:0
|
作者
Doan, Tung Tran Nguyen [1 ,2 ,3 ]
Huynh, Son Thanh [1 ,2 ,3 ]
Nguyen, An Trong [1 ,2 ,3 ]
Le, An Tran-Hoai [1 ,2 ,3 ]
Thuy, An Phan Thi [1 ,2 ,3 ]
Huynh, Dang T. [3 ,4 ]
Nguyen, Binh T. [1 ,2 ,3 ]
机构
[1] Univ Sci, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
[3] AISIA Res Lab, Ho Chi Minh City, Vietnam
[4] Fulbright Univ Vietnam, Ho Chi Minh City, Vietnam
关键词
Vietnamese Automatic Speech Recognition; Low-resource; Transformers; Practical application; NEURAL-NETWORKS;
D O I
10.1007/978-981-97-4985-0_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The need for accurate speech recognition systems has increased in recent years due to the growing demand for speech-based interfaces in various applications, such as mobile devices and smart speakers. However, current solutions for speech recognition in Vietnamese are limited in accuracy and practicality. To address these limitations, we proposed a novel framework for Vietnamese automatic speech recognition that leverages the Whisper model, a transformer-based approach, and our own collected dataset to improve the accuracy of speech recognition. While the Whisper model achieved state-of-the-art performance on languages with a large training dataset, it still leaves much to be desired for others, such as the Vietnamese language. Therefore, we collected a Vietnamese dataset with the intention of finetuning the Whisper model before incorporating it into our framework. Although the dataset can be collected without being domain-specific, our current dataset is in finance since we are working on applications in this domain. Through the implementation and evaluation of the proposed framework, we demonstrated the feasibility of using the Whisper model for Vietnamese speech recognition, which was confirmed by the improved accuracy compared to existing solutions. Our findings highlight the potential for further improvements and the practical application potential of the proposed framework in real-world settings. Furthermore, the proposed framework was deployed as a "Streamlit" app, highlighting its practical application potential in real-world settings and further contributing to the advancement of speech recognition technology.
引用
收藏
页码:372 / 383
页数:12
相关论文
共 50 条
  • [1] Vietnamese automatic speech recognition: The FLaVoR approach
    Vu, Quan
    Demuynck, Kris
    Van Compernolle, Dirk
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 464 - +
  • [2] Automatic Speech Recognition of Vietnamese for a New Large-Scale Corpus
    Tran, Linh Thi Thuc
    Kim, Han-Gyu
    La, Hoang Minh
    Pham, Su Van
    ELECTRONICS, 2024, 13 (05)
  • [3] NORMALIZATION AND ADAPTATION OF SPEECH DATA FOR AUTOMATIC SPEECH RECOGNITION
    SCARR, RWA
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1970, 2 (01): : 41 - 59
  • [4] Recovering Capitalization for Automatic Speech Recognition of Vietnamese using Transformer and Chunk Merging
    Hien Nguyen Thi Thu
    Binh Nguyen Thai
    Hung Nguyen Vu Bao
    Truong Do Quoc
    Mai Luong Chi
    Huyen Nguyen Thi Minh
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 430 - 434
  • [5] Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language
    Le, Viet-Bac
    Besacier, Laurent
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08): : 1471 - 1482
  • [6] Validation of Speech Data for Training Automatic Speech Recognition Systems
    Krizaj, Janes
    Gros, Jerneja Zganec
    Dobrisek, Simon
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1165 - 1169
  • [7] Automatic Speech Recognition Experiments with Articulatory Data
    Uraga, Esmeralda
    Hain, Thomas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 353 - 356
  • [8] Deaf, Hard of Hearing, and Hearing Perspectives on using Automatic Speech Recognition in Conversation
    Glasser, Abraham
    Kushalnagar, Kesavan
    Kushalnagar, Raja
    PROCEEDINGS OF THE 19TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY (ASSETS'17), 2017, : 427 - 432
  • [9] Automatic Speech Recognition of Multiple Accented English Data
    Vergyri, Dimitra
    Lamel, Lori
    Gauvain, Jean-Luc
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1652 - +
  • [10] Hierarchical Automatic Speech Recognition Powered by Data Infrastructure
    Jagatheesan, Arun
    Ahnn, Jong-Hoon
    Phan, Thomas
    Singh, Abhishek
    Lee, Juhan
    2014 IEEE 11TH CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE (CCNC), 2014,