Vietnamese Automatic Speech Recognition for Financial Conversation Data

被引:0
|
作者
Doan, Tung Tran Nguyen [1 ,2 ,3 ]
Huynh, Son Thanh [1 ,2 ,3 ]
Nguyen, An Trong [1 ,2 ,3 ]
Le, An Tran-Hoai [1 ,2 ,3 ]
Thuy, An Phan Thi [1 ,2 ,3 ]
Huynh, Dang T. [3 ,4 ]
Nguyen, Binh T. [1 ,2 ,3 ]
机构
[1] Univ Sci, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
[3] AISIA Res Lab, Ho Chi Minh City, Vietnam
[4] Fulbright Univ Vietnam, Ho Chi Minh City, Vietnam
关键词
Vietnamese Automatic Speech Recognition; Low-resource; Transformers; Practical application; NEURAL-NETWORKS;
D O I
10.1007/978-981-97-4985-0_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The need for accurate speech recognition systems has increased in recent years due to the growing demand for speech-based interfaces in various applications, such as mobile devices and smart speakers. However, current solutions for speech recognition in Vietnamese are limited in accuracy and practicality. To address these limitations, we proposed a novel framework for Vietnamese automatic speech recognition that leverages the Whisper model, a transformer-based approach, and our own collected dataset to improve the accuracy of speech recognition. While the Whisper model achieved state-of-the-art performance on languages with a large training dataset, it still leaves much to be desired for others, such as the Vietnamese language. Therefore, we collected a Vietnamese dataset with the intention of finetuning the Whisper model before incorporating it into our framework. Although the dataset can be collected without being domain-specific, our current dataset is in finance since we are working on applications in this domain. Through the implementation and evaluation of the proposed framework, we demonstrated the feasibility of using the Whisper model for Vietnamese speech recognition, which was confirmed by the improved accuracy compared to existing solutions. Our findings highlight the potential for further improvements and the practical application potential of the proposed framework in real-world settings. Furthermore, the proposed framework was deployed as a "Streamlit" app, highlighting its practical application potential in real-world settings and further contributing to the advancement of speech recognition technology.
引用
收藏
页码:372 / 383
页数:12
相关论文
共 50 条
  • [21] Robust automatic speech recognition with missing and unreliable acoustic data
    Cooke, M
    Green, P
    Josifovski, L
    Vizinho, A
    SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
  • [22] Data Pruning for Template-based Automatic Speech Recognition
    Seppi, Dino
    Van Compernolle, Dirk
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 901 - 904
  • [23] Data mining for generating predictive models of automatic speech recognition
    Al-Zobaydi, AT
    Al-Akaidi, MM
    John, RI
    MESM 2005: 7th Middle East Simulation Multiconference, 2005, : 147 - 150
  • [24] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
    Park, Daniel S.
    Chan, William
    Zhang, Yu
    Chiu, Chung-Cheng
    Zoph, Barret
    Cubuk, Ekin D.
    Le, Quoc, V
    INTERSPEECH 2019, 2019, : 2613 - 2617
  • [25] Handling Convolutional Noise in Missing Data Automatic Speech Recognition
    Van Segbroeck, Maarten
    Van Hamme, Hugo
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2562 - 2565
  • [26] A Survey of the Effects of Data Augmentation for Automatic Speech Recognition Systems
    Manuel Ramirez, Jose
    Montalvo, Ana
    Ramon Calvo, Jose
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS (CIARP 2019), 2019, 11896 : 669 - 678
  • [27] PREDICTING ERROR RATES FOR UNKNOWN DATA IN AUTOMATIC SPEECH RECOGNITION
    Meyer, Bernd T.
    Mallidi, Harish
    Kayser, Hendrik
    Hermansky, Hynek
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5330 - 5334
  • [28] Automatic intention recognition in conversation processing
    Holtgraves, Thomas
    JOURNAL OF MEMORY AND LANGUAGE, 2008, 58 (03) : 627 - 645
  • [29] Vietnamese Large Vocabulary Continuous Speech Recognition
    Ngoc Thang Vu
    Schultz, Tanja
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 333 - 338
  • [30] WORD TONE RECOGNITION IN VIETNAMESE WHISPERED SPEECH
    MILLER, JD
    WORD-JOURNAL OF THE INTERNATIONAL LINGUISTIC ASSOCIATION, 1961, 17 (01): : 11 - 15