Three-stage training and orthogonality regularization for spoken language recognition

被引:1
|
作者
Li, Zimu [1 ,2 ]
Xu, Yanyan [1 ,2 ]
Ke, Dengfeng [3 ]
Su, Kaile [4 ]
机构
[1] Beijing Forestry Univ, Sch Informat Sci & Technol, 35 Qing Hua East Rd, Beijing 100083, Peoples R China
[2] Beijing Forestry Univ, Engn Res Ctr Forestry Oriented Intelligent Informa, Grassland Adm, 35 Qing Hua East Rd, Beijing 100083, Peoples R China
[3] Beijing Language & Culture Univ, Sch Informat Sci, 15 Xueyuan Rd, Beijing 100083, Peoples R China
[4] Griffith Univ, Inst Integrated & Intelligent Syst, Nathan, Qld 4111, Australia
关键词
Spoken language recognition; Automatic speech recognition; Three-stage training; Orthogonality regularization; Multi-task learning; IDENTIFICATION; SPEECH; FEATURES;
D O I
10.1186/s13636-023-00281-y
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken language recognition has made significant progress in recent years, for which automatic speech recognition has been used as a parallel branch to extract phonetic features. However, there is still a lack of a better training strategy for such architectures of two individual branches. In this paper, we analyze the mostly used two-stage training strategies and reveal a trade-off between the recognition accuracy and the generalization ability. Based on the analysis, we propose a three-stage training strategy and an orthogonality regularization method. The former adds a multi-task learning stage to the traditional two-stage training strategy to extract hybrid-level and noiseless features, which can improve the recognition accuracy on the basis of maintaining the generalization ability, while the latter constrains the orthogonality of base vectors and introduces prior knowledge to improve the recognition accuracy. Experiments on the Oriental Language Recognition (OLR) dataset indicate that these two proposed methods can improve both the language recognition accuracy and the generalization ability, especially in complex challenge tasks, such as cross-channel or noisy conditions. Also, our model, which combines these two proposed methods, performs better than the top three teams in the OLR20 challenge.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Three-stage training and orthogonality regularization for spoken language recognition
    Zimu Li
    Yanyan Xu
    Dengfeng Ke
    Kaile Su
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [2] A three-stage approach to the automated scoring of spontaneous spoken responses
    Higgins, Derrick
    Xi, Xiaoming
    Zechner, Klaus
    Williamson, David
    COMPUTER SPEECH AND LANGUAGE, 2011, 25 (02): : 282 - 306
  • [3] An Improved Three-Stage Classifier for Activity Recognition
    Garcia-Ceja, Enrique
    Brena, Ramon F.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (01)
  • [4] Optimizing the Performance of Spoken Language Recognition With Discriminative Training
    Zhu, Donglai
    Li, Haizhou
    Ma, Bin
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (08): : 1642 - 1653
  • [5] An efficient three-stage classifier for handwritten digit recognition
    Gorgevik, D
    Cakmakov, D
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, 2004, : 507 - 510
  • [6] Detection of glaucoma using three-stage training with EfficientNet
    de Zarza, I.
    de Curto, J.
    Calafate, Carlos T.
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2022, 16
  • [7] Three-Stage Framework for Unsupervised Acoustic Modeling Using Untranscribed Spoken Content
    Zgank, Andrej
    ETRI JOURNAL, 2010, 32 (05) : 810 - 818
  • [8] Three-stage transfer learning for motor imagery EEG recognition
    Li, Junhao
    She, Qingshan
    Meng, Ming
    Du, Shengzhi
    Zhang, Yingchun
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (06) : 1689 - 1701
  • [9] Three-stage transfer learning for motor imagery EEG recognition
    Junhao Li
    Qingshan She
    Ming Meng
    Shengzhi Du
    Yingchun Zhang
    Medical & Biological Engineering & Computing, 2024, 62 : 1689 - 1701
  • [10] Spoken word recognition: A stage-processing approach to language differences
    Kolinsky, R
    EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY, 1998, 10 (01): : 1 - 40