ROBUST PITCH TRACKING IN NOISY SPEECH USING SPEAKER-DEPENDENT DEEP NEURAL NETWORKS

被引:0
|
作者
Liu, Yuzhou [1 ]
Wane, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Pitch estimation; deep neural network; hidden Markov model; speaker-dependent modeling; ADAPTATION; DATABASE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A reliable estimate of pitch in noisy speech is crucial for many speech applications. In this paper, we propose to use speaker-dependent (SD) deep neural networks (DNNs) to model the harmonic patterns of each speaker. Specifically, SD-DNNs take spectral features as input and estimate probabilistic pitch states at each time frame. We investigate two methods for SD-DNN training. The first one is direct training when speaker-dependent data is sufficient. The second one is speaker adaptation of a speaker-independent (SI) DNN with limited data. The Viterbi algorithm is then used to track pitch through time. Experiments show that both training methods of SD-DNNs outperform an SI-DNN based system as well as a state-of-the-art pitch tracking algorithm in all SNR conditions.
引用
收藏
页码:5255 / 5259
页数:5
相关论文
共 50 条
  • [41] A Novel Weighted Dynamic Time Warping for Light Weight Speaker-Dependent Speech Recognition in Noisy and Bad Recording Conditions
    Zhang, Xianglilan
    Sun, Jiping
    Huang, Xuhui
    Luo, Zhigang
    MECHANICAL DESIGN AND POWER ENGINEERING, PTS 1 AND 2, 2014, 490-491 : 1347 - +
  • [42] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
    Du, Jun
    Wang, Qing
    Gao, Tian
    Xu, Yong
    Dai, Lirong
    Lee, Chin-Hui
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620
  • [43] Applying a speaker-dependent speech compression technique to concatenative TTS synthesizers
    Lee, Chang-Heon
    Jung, Sung-Kyo
    Kang, Hong-Goo
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (02): : 632 - 640
  • [44] Active Speech Obscuration with Speaker-dependent Human Speech-like Noise for Speech Privacy
    Ohshio, Yoshitaka
    Adachi, Haruka
    Iwai, Kenta
    Nishiura, Takanobu
    Yamashita, Yoichi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1252 - 1255
  • [45] A multi-pitch tracking algorithm for noisy speech
    Wu, MY
    Wang, DL
    Brown, GJ
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 369 - 372
  • [46] SPEECH SEPARATION BASED ON SIGNAL-NOISE-DEPENDENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 61 - 65
  • [47] Efficient classification of noisy speech using neural networks
    Shao, C
    Bouchard, M
    SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 1, PROCEEDINGS, 2003, : 357 - 360
  • [48] A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rung
    Lee, Chin-Hui
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [49] Speaker Identification Using Robust Speech Detection and Neural Network
    Ouzounov, Atanas
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2007, 7 (03) : 48 - 54
  • [50] Robust pitch estimation in noisy speech using ZTW and group delay function
    Prasad, RaviShankar
    Yegnanarayana, B.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3289 - 3292