ROBUST PITCH TRACKING IN NOISY SPEECH USING SPEAKER-DEPENDENT DEEP NEURAL NETWORKS

被引:0
|
作者
Liu, Yuzhou [1 ]
Wane, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Pitch estimation; deep neural network; hidden Markov model; speaker-dependent modeling; ADAPTATION; DATABASE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A reliable estimate of pitch in noisy speech is crucial for many speech applications. In this paper, we propose to use speaker-dependent (SD) deep neural networks (DNNs) to model the harmonic patterns of each speaker. Specifically, SD-DNNs take spectral features as input and estimate probabilistic pitch states at each time frame. We investigate two methods for SD-DNN training. The first one is direct training when speaker-dependent data is sufficient. The second one is speaker adaptation of a speaker-independent (SI) DNN with limited data. The Viterbi algorithm is then used to track pitch through time. Experiments show that both training methods of SD-DNNs outperform an SI-DNN based system as well as a state-of-the-art pitch tracking algorithm in all SNR conditions.
引用
收藏
页码:5255 / 5259
页数:5
相关论文
共 50 条
  • [31] Speaker-dependent model interpolation for statistical emotional speech synthesis
    Hsu, Chih-Yu
    Chen, Chia-Ping
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2012, : 1 - 10
  • [32] Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions
    Mitra, Vikramjit
    Wang, Wen
    Franco, Horacio
    Lei, Yun
    Bartels, Chris
    Graciarena, Martin
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 895 - 899
  • [33] Regularized sparse features for noisy speech enhancement using deep neural networks
    Khattak, Muhammad Irfan
    Saleem, Nasir
    Gao, Jiechao
    Verdu, Elena
    Fuente, Javier Parra
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
  • [34] Comparing Speaker-Dependent and Speaker-Adaptive Acoustic Models for Recognizing Dysarthric Speech
    Rudzicz, Frank
    ASSETS'07: PROCEEDINGS OF THE NINTH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2007, : 255 - 256
  • [35] FFTNET: A REAL-TIME SPEAKER-DEPENDENT NEURAL VOCODER
    Jin, Zeyu
    Finkelstein, Adam
    Mysore, Gautham J.
    Lu, Jingwan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2251 - 2255
  • [36] Robust speaker detection using Neural Networks
    Shell, John R.
    PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2006, : 414 - 419
  • [37] The Analysis of Influence Factors and Identification of Speaker-Dependent Primi Speech Recognition
    Guo, Lin
    Bai, Yang
    Su, Jie
    Pan, Wen-lin
    Zhang, Tian-jun
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, 2016, 127
  • [38] Deep neural networks for speaker verification with short speech utterances
    Yang, Il-Ho
    Heo, Hee-Soo
    Yoon, Sung-Hyun
    Yu, Ha-Jin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2016, 35 (06): : 501 - 509
  • [39] SPEAKER ADAPTATION OF CONTEXT DEPENDENT DEEP NEURAL NETWORKS
    Liao, Hank
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7947 - 7951
  • [40] Speech Separation of A Target Speaker Based on Deep Neural Networks
    Du Jun
    Tu Yanhui
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 473 - 477