ROBUST PITCH TRACKING IN NOISY SPEECH USING SPEAKER-DEPENDENT DEEP NEURAL NETWORKS

被引:0
|
作者
Liu, Yuzhou [1 ]
Wane, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Pitch estimation; deep neural network; hidden Markov model; speaker-dependent modeling; ADAPTATION; DATABASE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A reliable estimate of pitch in noisy speech is crucial for many speech applications. In this paper, we propose to use speaker-dependent (SD) deep neural networks (DNNs) to model the harmonic patterns of each speaker. Specifically, SD-DNNs take spectral features as input and estimate probabilistic pitch states at each time frame. We investigate two methods for SD-DNN training. The first one is direct training when speaker-dependent data is sufficient. The second one is speaker adaptation of a speaker-independent (SI) DNN with limited data. The Viterbi algorithm is then used to track pitch through time. Experiments show that both training methods of SD-DNNs outperform an SI-DNN based system as well as a state-of-the-art pitch tracking algorithm in all SNR conditions.
引用
收藏
页码:5255 / 5259
页数:5
相关论文
共 50 条
  • [21] Speaker-Dependent Speech Recognition Algorithm for Laparoscopic Supporter Control
    Ren Kailong
    Wang Yi
    Chen Xiaodong
    Cai Huaiyu
    LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (18)
  • [22] Speaker-Dependent Bottleneck Features for Egyptian Arabic Speech Recognition
    Romanenko, Aleksei
    Mendelev, Valentin
    SPEECH AND COMPUTER, 2016, 9811 : 620 - 626
  • [23] SPEAKER ADAPTIVE TRAINING IN DEEP NEURAL NETWORKS USING SPEAKER DEPENDENT BOTTLENECK FEATURES
    Doddipatla, Rama
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5290 - 5294
  • [24] Noisy training for deep neural networks in speech recognition
    Shi Yin
    Chao Liu
    Zhiyong Zhang
    Yiye Lin
    Dong Wang
    Javier Tejedor
    Thomas Fang Zheng
    Yinguo Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2015
  • [25] Noisy training for deep neural networks in speech recognition
    Yin, Shi
    Liu, Chao
    Zhang, Zhiyong
    Lin, Yiye
    Wang, Dong
    Tejedor, Javier
    Zheng, Thomas Fang
    Li, Yinguo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
  • [26] Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification
    Thomsen, Nicolai Baek
    Thomsen, Dennis Alexander Lehmann
    Tan, Zheng-Hua
    Lindberg, Borge
    Jensen, Soren Holdt
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1839 - 1843
  • [27] Speaker adaptation using codebook integrated deep neural networks for speech enhancement
    Chidambar, B.
    Naidu, D. Hanumanth Rao
    JASA EXPRESS LETTERS, 2024, 4 (11):
  • [28] Improved Speaker Recognition System for Stressed Speech using Deep Neural Networks
    Dumpala, Sri Harsha
    Kopparapu, Sunil Kumar
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1257 - 1264
  • [29] The research on Uighur speaker-dependent isolated word speech recocrnition
    Silamu, Wushour
    Nuominghua, Caiqin
    PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 360 - 365
  • [30] Speaker-dependent model interpolation for statistical emotional speech synthesis
    Chih-Yu Hsu
    Chia-Ping Chen
    EURASIP Journal on Audio, Speech, and Music Processing, 2012