Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

被引:0
|
作者
Han-Gyu Kim
Gil-Jin Jang
Yung-Hwan Oh
Ho-Jin Choi
机构
[1] Naver Corp.,Clova Speech
[2] KAIST,School of Computing
[3] Kyungpook National University,School of Electronics Engineering
来源
关键词
Speech segregation; Speech pitch estimation; Pitch classification; Recurrent neural network; Long short-term memory; Bidirectional long short-term memory;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose speech/music pitch classification based on recurrent neural network (RNN) for monaural speech segregation from music interferences. The speech segregation methods in this paper exploit sub-band masking to construct segregation masks modulated by the estimated speech pitch. However, for speech signals mixed with music, speech pitch estimation becomes unreliable, as speech and music have similar harmonic structures. In order to remove the music interference effectively, we propose an RNN-based speech/music pitch classification. Our proposed method models the temporal trajectories of speech and music pitch values and determines an unknown continuous pitch sequence as belonging to either speech or music. Among various types of RNNs, we chose simple recurrent network, long short-term memory (LSTM), and bidirectional LSTM for pitch classification. The experimental results show that our proposed method significantly outperforms the baseline methods for speech–music mixtures without loss of segregation performance for speech-noise mixtures.
引用
收藏
页码:8193 / 8213
页数:20
相关论文
共 50 条
  • [21] Monaural Segregation of Voiced Speech using Discriminative Random Fields
    Prabhavalkar, Rohit
    Jin, Zhaozhang
    Fosler-Lussier, Eric
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 864 - 867
  • [22] Arabic speech recognition using recurrent neural networks
    El Choubassi, MM
    El Khoury, HE
    Alagha, CEJ
    Skaf, JA
    Al-Alaoui, MA
    PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 543 - 547
  • [23] Separation and deconvolution of speech using recurrent neural networks
    Li, Y
    Powers, D
    Wen, P
    IC-AI'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS I-III, 2001, : 1303 - 1309
  • [24] PERCEPTUAL IMPROVEMENT OF DEEP NEURAL NETWORKS FOR MONAURAL SPEECH ENHANCEMENT
    Han, Wei
    Zhang, Xiongwei
    Sun, Meng
    Shi, Wenhua
    Chen, Xushan
    Hu, Yonggang
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [25] A SUPERVISED LEARNING APPROACH FOR MONAURAL SPEECH SEGREGATION
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 1323 - 1326
  • [26] RECURRENT NEURAL NETWORKS FOR SPEECH RECOGNITION
    VERDEJO, JED
    HERREROS, AP
    LUNA, JCS
    ORTUZAR, MCB
    AYUSO, AR
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 540 : 361 - 369
  • [27] Efficient classification of noisy speech using neural networks
    Shao, C
    Bouchard, M
    SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 1, PROCEEDINGS, 2003, : 357 - 360
  • [28] Improvement of joint optimization of masks and deep recurrent neural networks for monaural speech separation using optimized activation functions
    MASOOD Asim
    YE Zhongfu
    Chinese Journal of Acoustics, 2020, 39 (03) : 420 - 432
  • [29] Monaural Speech Dereverberation Using Deformable Convolutional Networks
    Kothapally, Vinay
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1712 - 1723
  • [30] Using Recurrent Neural Networks for Part-of-Speech Tagging and Subject and Predicate Classification in a Sentence
    Munoz-Valero, David
    Rodriguez-Benitez, Luis
    Jimenez-Linares, Luis
    Moreno-Garcia, Juan
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 706 - 716