Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

被引:0
|
作者
Han-Gyu Kim
Gil-Jin Jang
Yung-Hwan Oh
Ho-Jin Choi
机构
[1] Naver Corp.,Clova Speech
[2] KAIST,School of Computing
[3] Kyungpook National University,School of Electronics Engineering
来源
关键词
Speech segregation; Speech pitch estimation; Pitch classification; Recurrent neural network; Long short-term memory; Bidirectional long short-term memory;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose speech/music pitch classification based on recurrent neural network (RNN) for monaural speech segregation from music interferences. The speech segregation methods in this paper exploit sub-band masking to construct segregation masks modulated by the estimated speech pitch. However, for speech signals mixed with music, speech pitch estimation becomes unreliable, as speech and music have similar harmonic structures. In order to remove the music interference effectively, we propose an RNN-based speech/music pitch classification. Our proposed method models the temporal trajectories of speech and music pitch values and determines an unknown continuous pitch sequence as belonging to either speech or music. Among various types of RNNs, we chose simple recurrent network, long short-term memory (LSTM), and bidirectional LSTM for pitch classification. The experimental results show that our proposed method significantly outperforms the baseline methods for speech–music mixtures without loss of segregation performance for speech-noise mixtures.
引用
收藏
页码:8193 / 8213
页数:20
相关论文
共 50 条
  • [31] Using Recurrent Neural Networks for Part-of-Speech Tagging and Subject and Predicate Classification in a Sentence
    David Muñoz-Valero
    Luis Rodriguez-Benitez
    Luis Jimenez-Linares
    Juan Moreno-Garcia
    International Journal of Computational Intelligence Systems, 2020, 13 : 706 - 716
  • [32] Vietnamese Speech Command Recognition using Recurrent Neural Networks
    Phan Duy Hung
    Truong Minh Giang
    Le Hoang Nam
    Phan Minh Duong
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (07) : 194 - 201
  • [33] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks
    Lim, Wootaek
    Jang, Daeyoung
    Lee, Taejin
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [34] Deep Recurrent Neural Network based Monaural Speech Separation using Recurrent Temporal Restricted Boltzmann Machines
    Samui, Suman
    Chakrabarti, Indrajit
    Ghosh, Soumya K.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3622 - 3626
  • [35] PITCH PROCESSING IN MUSIC AND SPEECH
    Tillmann, Barbara
    ACOUSTICS AUSTRALIA, 2014, 42 (02) : 124 - 130
  • [36] Pitch processing in music and speech
    Tillmann, Barbara, 1600, Australian Acoustical Society, Singapore (42):
  • [37] Speech/music classification using speech-specific features
    Khonglah, Banriskhem K.
    Prasanna, S. R. Mahadeva
    DIGITAL SIGNAL PROCESSING, 2016, 48 : 71 - 83
  • [38] Combining Monaural and Binaural Evidence for Reverberant Speech Segregation
    Woodruff, John
    Prabhavalkar, Rohit
    Fosler-Lussier, Eric
    Wang, DeLiang
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 406 - 409
  • [39] Research on monaural speech segregation based on feature selection
    Xiaoping Xie
    Yongzhen Chen
    Rufeng Shen
    Dan Tian
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [40] Research on monaural speech segregation based on feature selection
    Xie, Xiaoping
    Chen, Yongzhen
    Shen, Rufeng
    Tian, Dan
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)