HYBRID DNN-LATENT STRUCTURED SVM ACOUSTIC MODELS FOR CONTINUOUS SPEECH RECOGNITION

被引:0
|
作者
Ravuri, Suman [1 ,2 ]
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
Structured SVM; Deep Learning; Sequence-Discriminative Training; Large Margin; Acoustic Modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose Deep Neural Network (DNN)-Latent Structured Support Vector Machine (LSSVM) Acoustic Models as replacement for more standard sequence-discriminative trained DNN-HMM hybrid acoustic models. Compared to existing methods, approaches based on margin maximization, as is considered in this work, enjoy better theoretical justification. In addition to a max-margin based criteria, we also extend the Structured SVM model to include latent variables in the model to account for uncertainty in state alignments. Introducing latent structure allows for better sample complexity, often requiring 3 3 % to 6 6 % fewer utterances to converge compared to alternate criteria. On an 8-hour independent test set of conversational speech, the proposed method decreases word error rate by 9% relative to a cross-entropy trained hybrid system, while the best existing system decreases the word error rate by 6.5% relative.
引用
收藏
页码:37 / 44
页数:8
相关论文
共 50 条
  • [1] Building DNN acoustic models for large vocabulary speech recognition
    Maas, Andrew L.
    Qi, Peng
    Xie, Ziang
    Hannun, Awni Y.
    Lengerich, Christopher T.
    Jurafsky, Daniel
    Ng, Andrew Y.
    COMPUTER SPEECH AND LANGUAGE, 2017, 41 : 195 - 213
  • [2] DNN Acoustic Models for Dysarthric Speech
    Tejaswi, Seeram
    Umesh, S.
    2017 TWENTY-THIRD NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2017,
  • [3] DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION
    Ghorbani, Shahram
    Khorram, Soheil
    Hansen, John H. L.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 107 - 113
  • [4] Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition
    Abdelaziz, Ahmed Hussen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 475 - 484
  • [5] STRUCTURED DISCRIMINATIVE MODELS FOR NOISE ROBUST CONTINUOUS SPEECH RECOGNITION
    Ragni, A.
    Gales, M. J. F.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4788 - 4791
  • [6] Hybrid continuous speech recognition systems by HMM, MLP and SVM: a comparative study
    Zarrouk, Elyes
    Ben Ayed, Yassine
    Gargouri, Faiez
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2014, 17 (03) : 223 - 233
  • [7] INVESTIGATION OF DEEP NEURAL NETWORKS (DNN) FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION: WHY DNN SURPASSES GMMS IN ACOUSTIC MODELING
    Pan, Jia
    Liu, Cong
    Wang, Zhiguo
    Hu, Yu
    Jiang, Hui
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 301 - 305
  • [8] ADAPTIVE BEAMFORMING AND ADAPTIVE TRAINING OF DNN ACOUSTIC MODELS FOR ENHANCED MULTICHANNEL NOISY SPEECH RECOGNITION
    Prudnikov, Alexey
    Korenevsky, Maxim
    Aleinik, Sergei
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 401 - 408
  • [9] Acoustic models of the elderly for large-vocabulary continuous speech recognition
    Baba, A
    Yoshizawa, S
    Yamada, M
    Lee, A
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2004, 87 (07): : 49 - 57
  • [10] Development & evaluation of different acoustic models for Malayalam continuous speech recognition
    Kurian, Cini
    Balakrishnan, Kannan
    INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY AND SYSTEM DESIGN 2011, 2012, 30 : 1081 - 1088