Three-stage training and orthogonality regularization for spoken language recognition

被引:1
|
作者
Li, Zimu [1 ,2 ]
Xu, Yanyan [1 ,2 ]
Ke, Dengfeng [3 ]
Su, Kaile [4 ]
机构
[1] Beijing Forestry Univ, Sch Informat Sci & Technol, 35 Qing Hua East Rd, Beijing 100083, Peoples R China
[2] Beijing Forestry Univ, Engn Res Ctr Forestry Oriented Intelligent Informa, Grassland Adm, 35 Qing Hua East Rd, Beijing 100083, Peoples R China
[3] Beijing Language & Culture Univ, Sch Informat Sci, 15 Xueyuan Rd, Beijing 100083, Peoples R China
[4] Griffith Univ, Inst Integrated & Intelligent Syst, Nathan, Qld 4111, Australia
关键词
Spoken language recognition; Automatic speech recognition; Three-stage training; Orthogonality regularization; Multi-task learning; IDENTIFICATION; SPEECH; FEATURES;
D O I
10.1186/s13636-023-00281-y
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken language recognition has made significant progress in recent years, for which automatic speech recognition has been used as a parallel branch to extract phonetic features. However, there is still a lack of a better training strategy for such architectures of two individual branches. In this paper, we analyze the mostly used two-stage training strategies and reveal a trade-off between the recognition accuracy and the generalization ability. Based on the analysis, we propose a three-stage training strategy and an orthogonality regularization method. The former adds a multi-task learning stage to the traditional two-stage training strategy to extract hybrid-level and noiseless features, which can improve the recognition accuracy on the basis of maintaining the generalization ability, while the latter constrains the orthogonality of base vectors and introduces prior knowledge to improve the recognition accuracy. Experiments on the Oriental Language Recognition (OLR) dataset indicate that these two proposed methods can improve both the language recognition accuracy and the generalization ability, especially in complex challenge tasks, such as cross-channel or noisy conditions. Also, our model, which combines these two proposed methods, performs better than the top three teams in the OLR20 challenge.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] A Three-Stage Knowledge Acquisition Method
    曹存根
    刘薇
    JournalofComputerScienceandTechnology, 1995, (03) : 274 - 280
  • [42] An Unconditionally Stable Three-Stage OTA
    Aminzadeh, H.
    Ballo, A.
    Valinezhad, M.
    Grasso, A. D.
    IEEE SOLID-STATE CIRCUITS LETTERS, 2022, 5 : 230 - 233
  • [43] A three-stage scheme for medical negligence
    Otton, J
    JOURNAL OF THE ROYAL SOCIETY OF MEDICINE, 1998, 91 (08) : 421 - 426
  • [44] Three-stage knowledge acquisition method
    Cao, Cungen
    Liu, Wei
    Journal of Computer Science and Technology, 1995, 10 (03): : 274 - 280
  • [45] Three-stage network for age estimation
    Yu Tingting
    Wang Junqian
    Wu Lintai
    Xu Yong
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (02) : 122 - 126
  • [46] Radial Three-Stage Power Turbine
    V. A. Gusarov
    D. Yu. Pisarev
    E. V. Gusarova
    Journal of Machinery Manufacture and Reliability, 2021, 50 : 164 - 170
  • [47] A three-stage quantum cryptography protocol
    Kak, S
    FOUNDATIONS OF PHYSICS LETTERS, 2006, 19 (03) : 293 - 296
  • [48] A three-stage system for camera calibration
    Wu, Q
    Xu, GY
    Wang, L
    VISUALIZATION AND OPTIMIZATION TECHNIQUES, 2001, 4553 : 111 - 115
  • [49] Three-Stage Superconducting XRAM Generator
    Dedie, Philipp
    Brommer, Volker
    Badel, Arnaud
    Tixador, Pascal
    IEEE TRANSACTIONS ON DIELECTRICS AND ELECTRICAL INSULATION, 2011, 18 (04) : 1189 - 1193
  • [50] TransBERT: A Three-Stage Pre-training Technology for Story-Ending Prediction
    Li, Zhongyang
    Ding, Xiao
    Liu, Ting
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)