Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

被引:0
|
作者
Han-Gyu Kim
Gil-Jin Jang
Yung-Hwan Oh
Ho-Jin Choi
机构
[1] Naver Corp.,Clova Speech
[2] KAIST,School of Computing
[3] Kyungpook National University,School of Electronics Engineering
来源
关键词
Speech segregation; Speech pitch estimation; Pitch classification; Recurrent neural network; Long short-term memory; Bidirectional long short-term memory;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose speech/music pitch classification based on recurrent neural network (RNN) for monaural speech segregation from music interferences. The speech segregation methods in this paper exploit sub-band masking to construct segregation masks modulated by the estimated speech pitch. However, for speech signals mixed with music, speech pitch estimation becomes unreliable, as speech and music have similar harmonic structures. In order to remove the music interference effectively, we propose an RNN-based speech/music pitch classification. Our proposed method models the temporal trajectories of speech and music pitch values and determines an unknown continuous pitch sequence as belonging to either speech or music. Among various types of RNNs, we chose simple recurrent network, long short-term memory (LSTM), and bidirectional LSTM for pitch classification. The experimental results show that our proposed method significantly outperforms the baseline methods for speech–music mixtures without loss of segregation performance for speech-noise mixtures.
引用
收藏
页码:8193 / 8213
页数:20
相关论文
共 50 条
  • [1] Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation
    Kim, Han-Gyu
    Jang, Gil-Jin
    Oh, Yung-Hwan
    Choi, Ho-Jin
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (10): : 8193 - 8213
  • [2] Speech Segregation based on Pitch Track Correction and Music-Speech Classification
    Kim, Han-Gyu
    Jang, Gil-Jin
    Park, Jeong-Sik
    Kim, Ji-Hwan
    Oh, Yung-Hwan
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2012, 12 (02) : 15 - 20
  • [3] Pitch-based monaural segregation of reverberant speech
    Roman, Nicoleta
    Wang, DeLiang
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (01): : 458 - 469
  • [4] Pitch-based monaural segregation of reverberant speech
    Roman, Nicoleta
    Wang, DeLiang
    Journal of the Acoustical Society of America, 2006, 120 (01): : 458 - 469
  • [5] DISCRIMINATIVE DEEP RECURRENT NEURAL NETWORKS FOR MONAURAL SPEECH SEPARATION
    Wang, Guan-Xiang
    Hsu, Chung-Chien
    Chien, Jen-Tzung
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2544 - 2548
  • [6] Monaural speech segregation using synthetic speech signals
    Brungart, DS
    Iyer, N
    Simpson, BD
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (04): : 2327 - 2333
  • [7] Monaural speech segregation based on pitch tracking and amplitude modulation
    Hu, GN
    Wang, DL
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2004, 15 (05): : 1135 - 1150
  • [8] Monaural speech segregation based on pitch tracking and amplitude modulation
    Hu, GN
    Wang, DL
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 553 - 556
  • [9] Monaural Voiced Speech Segregation Based on Pitch and Comb Filter
    Zhang, Xueliang
    Liu, Wenju
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1752 - +
  • [10] Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks
    Jiang, Yi
    Wang, DeLiang
    Liu, RunSheng
    Feng, ZhenMing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 2112 - 2121