A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models

被引:0
|
作者
Li, Yan [1 ]
Wang, Yapeng [1 ]
Hoi, Lap Man [1 ]
Yang, Dingcheng [3 ]
Im, Sio-Kei [2 ]
机构
[1] Macao Polytech Univ, Fac Appl Sci, Macau, Peoples R China
[2] Macao Polytech Univ, Macau, Peoples R China
[3] Nanchang Univ, Sch Informat Engn, Nanchang, Peoples R China
来源
关键词
Portuguese speech recognition; Review; End-to-end models;
D O I
10.1186/s13636-024-00388-w
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data resources have constrained the development and application of Portuguese speech recognition systems. The neglect of accent issues is also detrimental to the promotion of recognition systems. This study focuses on the research progress of end-to-end technology on Portuguese speech recognition task. It discusses relevant research from two directions: Brazilian Portuguese recognition and European Portuguese recognition, and organizes available corpus resources for potential researchers. Then, taking European Portuguese speech recognition as an example, it takes the Fairseq-S2T and Whisper as benchmarks tested on a 500-h European Portuguese dataset to estimate the performance of large-scale pre-trained models and fine-tuning techniques. Whisper obtained a WER of 5.11% which indicates that multilingual joint training can enhance the generalization ability. Finally, to the existing problems in Portuguese speech recognition, it explores future research directions, which provides new ideas for the next stage of research and system construction.
引用
收藏
页数:13
相关论文
共 24 条
  • [21] Foundations and Applications in Large-scale AI Models: Pre-training, Fine-tuning, and Prompt-based Learning
    Cheng, Derek
    Patel, Dhaval
    Pang, Linsey
    Mehta, Sameep
    Xie, Kexin
    Chi, Ed H.
    Liu, Wei
    Chawla, Nitesh
    Bailey, James
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5853 - 5854
  • [22] An Online Review Data-Driven Fuzzy Large-Scale Group Decision-Making Method Based on Dual Fine-Tuning
    Yuan, Xuechan
    Xu, Tingyu
    He, Shiqi
    Zhang, Chao
    ELECTRONICS, 2024, 13 (14)
  • [23] Fine-Tuning Bidirectional Encoder Representations From Transformers (BERT)-Based Models on Large-Scale Electronic Health Record Notes: An Empirical Study
    Li, Fei
    Jin, Yonghao
    Liu, Weisong
    Rawat, Bhanu Pratap Singh
    Cai, Pengshan
    Yu, Hong
    JMIR MEDICAL INFORMATICS, 2019, 7 (03)
  • [24] Y\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\cal{Y}$$\end{document}-Tuning: an efficient tuning paradigm for large-scale pre-trained models via label representation learning
    Yitao Liu
    Chenxin An
    Xipeng Qiu
    Frontiers of Computer Science, 2024, 18 (4)