Vietnamese Automatic Speech Recognition for Financial Conversation Data

被引:0
|
作者
Doan, Tung Tran Nguyen [1 ,2 ,3 ]
Huynh, Son Thanh [1 ,2 ,3 ]
Nguyen, An Trong [1 ,2 ,3 ]
Le, An Tran-Hoai [1 ,2 ,3 ]
Thuy, An Phan Thi [1 ,2 ,3 ]
Huynh, Dang T. [3 ,4 ]
Nguyen, Binh T. [1 ,2 ,3 ]
机构
[1] Univ Sci, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
[3] AISIA Res Lab, Ho Chi Minh City, Vietnam
[4] Fulbright Univ Vietnam, Ho Chi Minh City, Vietnam
关键词
Vietnamese Automatic Speech Recognition; Low-resource; Transformers; Practical application; NEURAL-NETWORKS;
D O I
10.1007/978-981-97-4985-0_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The need for accurate speech recognition systems has increased in recent years due to the growing demand for speech-based interfaces in various applications, such as mobile devices and smart speakers. However, current solutions for speech recognition in Vietnamese are limited in accuracy and practicality. To address these limitations, we proposed a novel framework for Vietnamese automatic speech recognition that leverages the Whisper model, a transformer-based approach, and our own collected dataset to improve the accuracy of speech recognition. While the Whisper model achieved state-of-the-art performance on languages with a large training dataset, it still leaves much to be desired for others, such as the Vietnamese language. Therefore, we collected a Vietnamese dataset with the intention of finetuning the Whisper model before incorporating it into our framework. Although the dataset can be collected without being domain-specific, our current dataset is in finance since we are working on applications in this domain. Through the implementation and evaluation of the proposed framework, we demonstrated the feasibility of using the Whisper model for Vietnamese speech recognition, which was confirmed by the improved accuracy compared to existing solutions. Our findings highlight the potential for further improvements and the practical application potential of the proposed framework in real-world settings. Furthermore, the proposed framework was deployed as a "Streamlit" app, highlighting its practical application potential in real-world settings and further contributing to the advancement of speech recognition technology.
引用
收藏
页码:372 / 383
页数:12
相关论文
共 50 条
  • [41] Automatic speech recognition systems
    Catariov, A
    Information Technologies 2004, 2004, 5822 : 83 - 93
  • [42] Automatic speech recognition: A review
    Haton, JP
    ENTERPRISE INFORMATION SYSTEMS V, 2004, : 6 - 11
  • [43] FORMANTS IN AUTOMATIC SPEECH RECOGNITION
    BROAD, DJ
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1972, 4 (04): : 411 - 424
  • [44] Automatic speech recognition: a survey
    Mishaim Malik
    Muhammad Kamran Malik
    Khawar Mehmood
    Imran Makhdoom
    Multimedia Tools and Applications, 2021, 80 : 9411 - 9457
  • [45] Automatic testing of speech recognition
    Francart, Tom
    Moonen, Marc
    Wouters, Jan
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2009, 48 (02) : 80 - 90
  • [46] Efficient automatic speech recognition
    O'Shaughnessy, D
    PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON INTERNET AND MULTIMEDIA SYSTEMS AND APPLICATIONS, 2004, : 323 - 327
  • [47] AUTOMATIC SPEECH RECOGNITION PROCEDURES
    PETERSON, GE
    LANGUAGE AND SPEECH, 1961, 4 (04) : 200 - 219
  • [48] Corpus for automatic speech recognition
    Adda-Decker, Martine
    REVUE FRANCAISE DE LINGUISTIQUE APPLIQUEE, 2007, 12 (01): : 71 - 84
  • [49] AUTOMATIC RECOGNITION OF DEAF SPEECH
    ABDELHAMIED, K
    WALDRON, M
    FOX, RA
    VOLTA REVIEW, 1990, 92 (03) : 121 - 130
  • [50] Thai automatic speech recognition
    Suebvisai, S
    Charoenpomsawat, P
    Black, A
    Woszczyna, M
    Schultz, T
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 857 - 860