Three-stage training and orthogonality regularization for spoken language recognition

被引:1
|
作者
Li, Zimu [1 ,2 ]
Xu, Yanyan [1 ,2 ]
Ke, Dengfeng [3 ]
Su, Kaile [4 ]
机构
[1] Beijing Forestry Univ, Sch Informat Sci & Technol, 35 Qing Hua East Rd, Beijing 100083, Peoples R China
[2] Beijing Forestry Univ, Engn Res Ctr Forestry Oriented Intelligent Informa, Grassland Adm, 35 Qing Hua East Rd, Beijing 100083, Peoples R China
[3] Beijing Language & Culture Univ, Sch Informat Sci, 15 Xueyuan Rd, Beijing 100083, Peoples R China
[4] Griffith Univ, Inst Integrated & Intelligent Syst, Nathan, Qld 4111, Australia
关键词
Spoken language recognition; Automatic speech recognition; Three-stage training; Orthogonality regularization; Multi-task learning; IDENTIFICATION; SPEECH; FEATURES;
D O I
10.1186/s13636-023-00281-y
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken language recognition has made significant progress in recent years, for which automatic speech recognition has been used as a parallel branch to extract phonetic features. However, there is still a lack of a better training strategy for such architectures of two individual branches. In this paper, we analyze the mostly used two-stage training strategies and reveal a trade-off between the recognition accuracy and the generalization ability. Based on the analysis, we propose a three-stage training strategy and an orthogonality regularization method. The former adds a multi-task learning stage to the traditional two-stage training strategy to extract hybrid-level and noiseless features, which can improve the recognition accuracy on the basis of maintaining the generalization ability, while the latter constrains the orthogonality of base vectors and introduces prior knowledge to improve the recognition accuracy. Experiments on the Oriental Language Recognition (OLR) dataset indicate that these two proposed methods can improve both the language recognition accuracy and the generalization ability, especially in complex challenge tasks, such as cross-channel or noisy conditions. Also, our model, which combines these two proposed methods, performs better than the top three teams in the OLR20 challenge.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] 3sG: Three-stage guidance for indoor human action recognition
    Nan, Hai
    Ye, Qilang
    Yu, Zitong
    An, Kang
    IET IMAGE PROCESSING, 2024, 18 (08) : 2000 - 2010
  • [22] A robust three-stage approach to large-scale urban scene recognition
    Jinglu WANG
    Yonghua LU
    Jingbo LIU
    Long QUAN
    Science China(Information Sciences), 2017, 60 (10) : 239 - 251
  • [23] A robust three-stage approach to large-scale urban scene recognition
    Jinglu Wang
    Yonghua Lu
    Jingbo Liu
    Long Quan
    Science China Information Sciences, 2017, 60
  • [24] A robust three-stage approach to large-scale urban scene recognition
    Wang, Jinglu
    Lu, Yonghua
    Liu, Jingbo
    Quan, Long
    SCIENCE CHINA-INFORMATION SCIENCES, 2017, 60 (10)
  • [25] Spoken Language Recognition With Prosodic Features
    Ng, Raymond W. M.
    Lee, Tan
    Leung, Cheung-Chi
    Ma, Bin
    Li, Haizhou
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (09): : 1841 - 1853
  • [26] Spoken language recognition with relevance feedback
    Tong, Rong
    Li, Haizhou
    Ma, Bin
    Chng, Eng Siong
    Cho, Siu-Yeung
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 861 - +
  • [27] Discriminative vector for spoken language recognition
    Ma, Bin
    Tong, Rong
    Li, Haizhou
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1001 - +
  • [28] Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024
    Thi, Thuy Nguyen
    Viet, Anh Nguyen
    Van, Thin Dang
    Nguyen, Ngan Luu-Thuy
    NATURAL SCIENTIFIC LANGUAGE PROCESSING AND RESEARCH KNOWLEDGE GRAPHS, NSLP 2024, 2024, 14770 : 257 - 266
  • [29] Vehicle recognition by silhouettes - a three-stage machine learning method in computer vision systems
    Duke, Vyacheslav A.
    Malygin, Igor G.
    Pritsker, Vladimir, I
    MARINE INTELLECTUAL TECHNOLOGIES, 2022, (02): : 162 - 167
  • [30] A three-stage color model - Comment
    deValois, RL
    deValois, KK
    VISION RESEARCH, 1996, 36 (06) : 833 - 836