Integrating DNN-HMM Technique with Hierarchical Multi-layer Acoustic Model for Text-Dependent Speaker Verification

被引:9
|
作者
Laskar, Mohammad Azharuddin [1 ]
Laskar, Rabul Hussain [1 ]
机构
[1] Natl Inst Technol Silchar, Dept Elect & Commun Engn, Silchar 788010, Assam, India
关键词
Text-dependent speaker verification; DNN; HiLAM; DNN-HMM; NEURAL-NETWORKS; RECOGNITION;
D O I
10.1007/s00034-019-01103-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Subspace techniques, such as i-vector/probabilistic linear discriminant analysis and joint factor analysis, have been the most commonly used techniques in the field of text-dependent speaker verification. These techniques, however, do not model the temporal structure of the pass-phrase which otherwise is an important cue in the context of text-dependent speaker verification. The hierarchical multi-layer acoustic model (HiLAM) uses Gaussian mixture model (GMM)-hidden Markov model (HMM) technique, which also accounts for the temporal information of the pass-phrase. Owing to its contextual information modeling, HiLAM has been found to outperform the subspace techniques. In this paper, we propose integrating DNN-HMM technique with HiLAM to further improve the system performance. Firstly, an attempt has been made to define a speaker-text unit/class that could characterize the speaker idiosyncrasies, which are known to be associated with shorter and more fundamental units of speech text. To this end, HiLAM is used to propose a new class definition, and the training data is aligned with respect to this class definition. The labeled data is then used to discriminatively train a deep neural network (DNN). The new method of alignment enables the neural network to learn the actual context of the pass-phrase components. This is not the case with DNN trained in automatic speech recognition fashion. Besides, the network also models the speaker idiosyncrasies associated with specific and finer text units. The use of DNN posteriors to replace the GMM likelihood probabilities of HiLAM has led to significant improvement in performance over the baseline HiLAM system. Relative EER reduction of up to 36.58% has been observed on Part 1 of RSR2015 database.
引用
收藏
页码:3548 / 3572
页数:25
相关论文
共 21 条
  • [1] Integrating DNN–HMM Technique with Hierarchical Multi-layer Acoustic Model for Text-Dependent Speaker Verification
    Mohammad Azharuddin Laskar
    Rabul Hussain Laskar
    Circuits, Systems, and Signal Processing, 2019, 38 : 3548 - 3572
  • [2] Unsupervised Learning of HMM Topology for Text-dependent Speaker Verification
    Liu, Ming
    Huang, Thomas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 921 - 924
  • [3] DNN BASED SPEAKER EMBEDDING USING CONTENT INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Koshinaka, Takafumi
    Motlicek, Petr
    Madikeri, Srikanth
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5344 - 5348
  • [4] An alternative normalization scheme in HMM-based text-dependent speaker verification
    Charlet, D
    Jouvet, D
    Collin, O
    SPEECH COMMUNICATION, 2000, 31 (2-3) : 113 - 120
  • [5] Multi-Task Learning for Text-dependent Speaker Verification
    Chen, Nanxin
    Qian, Yanmin
    Yu, Kai
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 185 - 189
  • [6] Joint Training of Expanded End-to-end DNN for Text-dependent Speaker Verification
    Heo, Hee-soo
    Jung, Jee-weon
    Yang, Il-ho
    Yoon, Sung-hyun
    Yu, Ha-jin
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1532 - 1536
  • [7] i-vector/HMM Based Text-dependent Speaker Verification System for RedDots Challenge
    Zeinali, Hossein
    Sameti, Hossein
    Burget, Lukas
    Cernock, Jan
    Maghsoodi, Nooshin
    Matejka, Pavel
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 440 - 444
  • [8] Spectral subtraction and RASTA-filtering in text-dependent HMM-based speaker verification
    Hardt, D
    Fellbaum, K
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 867 - 870
  • [9] Model selection and score normalization for text-dependent single utterance speaker verification
    Buyuk, Osman
    Arslan, Mustafa Levent
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1277 - 1295
  • [10] Integrating Online i-vector into GMM-UBM for Text-dependent Speaker Verification
    Jiang, Xiaowei
    Wang, Shuai
    Xiang, Xu
    Qian, Yanmin
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1628 - 1632