Multi channel far field speaker verification using teacher student deep neural networks

被引:0
|
作者
Jung, Jee-weon [1 ]
Heo, Hee-Soo [1 ]
Shim, Hye-jin [1 ]
Yu, Ha-Jin [1 ]
机构
[1] Univ Seoul, Coll Engn, Sch Cmputer Sci, 163 Siripdae Ro, Seoul 02504, South Korea
来源
关键词
Teacher student learning; Deep neural networks; Far-distance speaker verification; Multi channel speaker verification;
D O I
10.7776/ASK.2018.37.6.483
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Far field input utterance is one of the major causes of performance degradation of speaker verification systems. In this study, we used teacher student learning framework to compensate for the performance degradation caused by far field utterances. Teacher student learning refers to training the student deep neural network in possible performance degradation condition using the teacher deep neural network trained without such condition. In this study, we use the teacher network trained with near distance utterances to train the student network with far distance utterances. However, through experiments, it was found that performance of near distance utterances were deteriorated. To avoid such phenomenon, we proposed techniques that use trained teacher network as initialization of student network and training the student network using both near and far field utterances. Experiments were conducted using deep neural networks that input raw waveforms of 4-channel utterances recorded in both near and far distance. Results show the equal error rate of near and far-field utterances respectively, 2.55 % / 2.8 % without teacher student learning, 9.75 % / 1.8 % for conventional teacher student learning, and 2.5 % / 2.7 % with proposed techniques.
引用
收藏
页码:483 / 488
页数:6
相关论文
共 50 条
  • [31] A COMPLETE END-TO-END SPEAKER VERIFICATION SYSTEM USING DEEP NEURAL NETWORKS: FROM RAW SIGNALS TO VERIFICATION RESULT
    Jung, Jee-Weon
    Heo, Hee-Soo
    Yang, Il-Ho
    Shim, Hye-Jin
    Yu, Ha-Jin
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5349 - 5353
  • [32] Text-independent speaker verification using predictive neural networks
    Finan, RA
    Sapeluk, AT
    Damper, RI
    FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440): : 274 - 279
  • [33] Vowel Based Neural Networks for Speaker Verification
    Xu, Yun-Fei
    Huang, Yu-Fei
    Zhou, Ruo-Hua
    Yan, Yong-Hong
    INTERNATIONAL ACADEMIC CONFERENCE ON THE INFORMATION SCIENCE AND COMMUNICATION ENGINEERING (ISCE 2014), 2014, : 89 - 97
  • [34] SELF-ADAPTIVE SOFT VOICE ACTIVITY DETECTION USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION
    Jung, Youngmoon
    Choi, Yeunju
    Kim, Hoirin
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 365 - 372
  • [35] Speaker verification: Minimizing the channel effects using autoassociative neural network models
    Kishore, SP
    Yegnanarayana, B
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1101 - 1104
  • [36] Verification of Deep Convolutional Neural Networks Using ImageStars
    Hoang-Dung Tran
    Bak, Stanley
    Xiang, Weiming
    Johnson, Taylor T.
    COMPUTER AIDED VERIFICATION (CAV 2020), PT I, 2020, 12224 : 18 - 42
  • [37] SPEAKER ADAPTIVE TRAINING IN DEEP NEURAL NETWORKS USING SPEAKER DEPENDENT BOTTLENECK FEATURES
    Doddipatla, Rama
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5290 - 5294
  • [38] The INTERSPEECH 2020 Far-Field Speaker Verification Challenge
    Qin, Xiaoyi
    Li, Ming
    Bu, Hui
    Rao, Wei
    Das, Rohan Kumar
    Narayanan, Shrikanth
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 3456 - 3460
  • [39] Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification
    Zhang, Li
    Wang, Qing
    Lee, Kong Aik
    Xie, Lei
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 1094 - 1098
  • [40] An iVector Extractor Using Pre-trained Neural Networks for Speaker Verification
    Zhang, Shanshan
    Zheng, Rong
    Xu, Bo
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 73 - 77