Multi channel far field speaker verification using teacher student deep neural networks

被引:0
|
作者
Jung, Jee-weon [1 ]
Heo, Hee-Soo [1 ]
Shim, Hye-jin [1 ]
Yu, Ha-Jin [1 ]
机构
[1] Univ Seoul, Coll Engn, Sch Cmputer Sci, 163 Siripdae Ro, Seoul 02504, South Korea
来源
关键词
Teacher student learning; Deep neural networks; Far-distance speaker verification; Multi channel speaker verification;
D O I
10.7776/ASK.2018.37.6.483
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Far field input utterance is one of the major causes of performance degradation of speaker verification systems. In this study, we used teacher student learning framework to compensate for the performance degradation caused by far field utterances. Teacher student learning refers to training the student deep neural network in possible performance degradation condition using the teacher deep neural network trained without such condition. In this study, we use the teacher network trained with near distance utterances to train the student network with far distance utterances. However, through experiments, it was found that performance of near distance utterances were deteriorated. To avoid such phenomenon, we proposed techniques that use trained teacher network as initialization of student network and training the student network using both near and far field utterances. Experiments were conducted using deep neural networks that input raw waveforms of 4-channel utterances recorded in both near and far distance. Results show the equal error rate of near and far-field utterances respectively, 2.55 % / 2.8 % without teacher student learning, 9.75 % / 1.8 % for conventional teacher student learning, and 2.5 % / 2.7 % with proposed techniques.
引用
收藏
页码:483 / 488
页数:6
相关论文
共 50 条
  • [41] Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation
    Saleem, Nasir
    Khattak, Muhammad Irfan
    APPLIED ACOUSTICS, 2020, 167
  • [42] Speaker verification with fake intonation based on Neural Networks
    Natalia Vasquez, Angie
    Maria Ballesteros, Dora
    Renza, Diego
    2019 7TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF), 2019,
  • [43] Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic TimeWarping
    Adel, Mohamed
    Afify, Mohamed
    Gaballah, Akram
    Fayek, Magda
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1001 - 1006
  • [44] Segment unit shuffling layer in deep neural networks for text-independent speaker verification
    Heo, Jungwoo
    Shim, Hye-jin
    Kim, Ju-ho
    Yu, Ha-Jin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (02): : 148 - 154
  • [45] Jointing Multi-task Learning and Gradient Reversal Layer for Far-Field Speaker Verification
    Xu, Wei
    Wang, Xinghao
    Wan, Hao
    Guo, Xin
    Zhao, Junhong
    Deng, Feiqi
    Kang, Wenxiong
    BIOMETRIC RECOGNITION (CCBR 2021), 2021, 12878 : 449 - 457
  • [46] SPEAKER ADAPTATION OF DEEP NEURAL NETWORKS USING A HIERARCHY OF OUTPUT LAYERS
    Price, Ryan
    Iso, Ken-ichi
    Shinoda, Koichi
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 153 - 158
  • [47] Speaker-dependent Multipitch Tracking Using Deep Neural Networks
    Liu, Yuzhou
    Wang, DeLiang
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3279 - 3283
  • [48] Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks
    Tkachenko, Maxim
    Yamshinin, Alexander
    Lyubimov, Nikolay
    Kotov, Mikhail
    Nastasenko, Marina
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 690 - 699
  • [49] Speaker-dependent multipitch tracking using deep neural networks
    Liu, Yuzhou
    Wang, DeLiang
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (02): : 710 - 721
  • [50] IMPROVED SPEAKER INDEPENDENT LIP READING USING SPEAKER ADAPTIVE TRAINING AND DEEP NEURAL NETWORKS
    Almajai, Ibrahim
    Cox, Stephen
    Harvey, Richard
    Lan, Yuxuan
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2722 - 2726