F0 Estimation and Voicing Detection With Cascade Architecture in Noisy Speech

被引:2
|
作者
Zhang, Yixuan [1 ]
Wang, Heming [1 ]
Wang, Deliang [2 ,3 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[3] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
Estimation; Noise measurement; Multitasking; Speech enhancement; Convolution; Training; Speech processing; Complex domain processing; densely-connected convolutional recurrent neural network; multi-task learning; neural cascade architecture; pitch tracking; voicing detection; MULTIPITCH TRACKING; PITCH; ALGORITHM; MASKING; ROBUST;
D O I
10.1109/TASLP.2023.3313427
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
As a fundamental problem in speech processing, pitch tracking has been studied for decades. While strong performance has been achieved on clean speech, pitch tracking in noisy speech is still challenging. Severe non-stationary noises not only corrupt the harmonic structure in voiced intervals but also make it difficult to determine the existence of voiced speech. Given the importance of voicing detection for pitch tracking, this study proposes a neural cascade architecture that jointly performs pitch estimation and voicing detection. The cascade architecture optimizes a speech enhancement module and a pitch tracking module, and is trained in a speaker-independent and noise-independent way. It is observed that incorporating the enhancement module improves both pitch estimation and voicing detection accuracy, especially in low signal-to-noise ratio (SNR) conditions. In addition, compared with frameworks that combine corresponding single-task models, the proposed multi-task framework achieves better performance and is more efficient. Experimental results show that the proposed method is robust to different noise conditions and substantially outperforms other competitive pitch tracking methods.
引用
收藏
页码:3760 / 3770
页数:11
相关论文
共 50 条
  • [41] Estimation of the Underlying F0 Range of a Speaker from the Spectral Features of a Brief Speech Input
    Zhang, Wei
    Xie, Yanlu
    Lin, Binghuai
    Wang, Liyuan
    Zhang, Jinsong
    APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [42] Evaluation of F0 estimation using ZFR based on time-varying speech analysis
    Funaki, Keiichi
    Higa, Takehito
    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012), 2012,
  • [43] Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency
    Arifianto, D
    Tanaka, T
    Masuko, T
    Kobayashi, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (12) : 2812 - 2820
  • [44] Obstruent voicing effects on F0, but without voicing: Phonetic correlates of Swiss German lenis, fortis, and aspirated stops
    Ladd, D. Robert
    Schmid, Stephan
    JOURNAL OF PHONETICS, 2018, 71 : 229 - 248
  • [45] Maximising objective speech intelligibility by local f0 modulation
    Villegas, Julian
    Cooke, Martin
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1702 - 1705
  • [46] F0 slope and mean: cues to speech segmentation in French
    Cordero, Maria del Mar
    Meunier, Fanny
    Grimault, Nicolas
    Pota, Stephane
    Spinelli, Elsa
    INTERSPEECH 2020, 2020, : 1610 - 1614
  • [47] A Novel Model of F0 Contours Prediction for Continuous Speech
    胡文英
    祖漪清
    王志中
    JournalofShanghaiJiaotongUniversity, 2005, (03) : 231 - 235
  • [48] Additive modeling of English F0 contour for speech synthesis
    Sakai, S
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 277 - 280
  • [49] F0 contour of prosodic word in happy speech of mandarin
    Wang, HB
    Li, AJ
    Fang, Q
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 433 - 440
  • [50] F0 declination in English and Mandarin Broadcast News Speech
    Yuan, Jiahong
    Liberman, Mark
    SPEECH COMMUNICATION, 2014, 65 : 67 - 74