Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

被引:0
|
作者
Han-Gyu Kim
Gil-Jin Jang
Yung-Hwan Oh
Ho-Jin Choi
机构
[1] Naver Corp.,Clova Speech
[2] KAIST,School of Computing
[3] Kyungpook National University,School of Electronics Engineering
来源
关键词
Speech segregation; Speech pitch estimation; Pitch classification; Recurrent neural network; Long short-term memory; Bidirectional long short-term memory;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose speech/music pitch classification based on recurrent neural network (RNN) for monaural speech segregation from music interferences. The speech segregation methods in this paper exploit sub-band masking to construct segregation masks modulated by the estimated speech pitch. However, for speech signals mixed with music, speech pitch estimation becomes unreliable, as speech and music have similar harmonic structures. In order to remove the music interference effectively, we propose an RNN-based speech/music pitch classification. Our proposed method models the temporal trajectories of speech and music pitch values and determines an unknown continuous pitch sequence as belonging to either speech or music. Among various types of RNNs, we chose simple recurrent network, long short-term memory (LSTM), and bidirectional LSTM for pitch classification. The experimental results show that our proposed method significantly outperforms the baseline methods for speech–music mixtures without loss of segregation performance for speech-noise mixtures.
引用
收藏
页码:8193 / 8213
页数:20
相关论文
共 50 条
  • [41] A Supervised Learning Approach to Monaural Segregation of Reverberant Speech
    Jin, Zhaozhang
    Wang, DeLiang
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04): : 625 - 638
  • [42] A supervised learning approach to monaural segregation of reverberant speech
    Jin, Zhaozhang
    Wang, DeLiang
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 921 - +
  • [43] A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
    Chao Sun
    Min Zhang
    Ruijuan Wu
    Junhong Lu
    Guo Xian
    Qin Yu
    Xiaofeng Gong
    Ruisen Luo
    Scientific Reports, 11
  • [44] A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
    Sun, Chao
    Zhang, Min
    Wu, Ruijuan
    Lu, Junhong
    Xian, Guo
    Yu, Qin
    Gong, Xiaofeng
    Luo, Ruisen
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [45] Low-Power Convolutional Recurrent Neural Network For Monaural Speech Enhancement
    Gao, Fei
    Guan, Haixin
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 559 - 563
  • [46] Emotional speech classification with prosodic prameters by using neural networks
    Sato, H
    Mitsukura, Y
    Fukumi, M
    Akamatsu, N
    ANZIIS 2001: PROCEEDINGS OF THE SEVENTH AUSTRALIAN AND NEW ZEALAND INTELLIGENT INFORMATION SYSTEMS CONFERENCE, 2001, : 395 - 398
  • [47] EEG Classification of Covert Speech Using Regularized Neural Networks
    Sereshkeh, Alborz Rezazadeh
    Trott, Robert
    Bricout, Aurelien
    Chau, Tom
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2292 - 2300
  • [48] Visual speech recognition by recurrent neural networks
    Rabi, G
    Lu, SW
    JOURNAL OF ELECTRONIC IMAGING, 1998, 7 (01) : 61 - 69
  • [49] Visual speech recognition by recurrent neural networks
    Rabi, G
    Lu, SW
    1997 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS I AND II: ENGINEERING INNOVATION: VOYAGE OF DISCOVERY, 1997, : 55 - 58
  • [50] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
    Graves, Alex
    Mohamed, Abdel-rahman
    Hinton, Geoffrey
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649