Speech and music pitch trajectory classification using recurrent neural networks for monaural speech segregation

被引：0

作者：

Han-Gyu Kim

Gil-Jin Jang

Yung-Hwan Oh

Ho-Jin Choi

机构：

[1] Naver Corp.,Clova Speech

[2] KAIST,School of Computing

[3] Kyungpook National University,School of Electronics Engineering

来源：

The Journal of Supercomputing | 2020年 / 76卷

关键词：

Speech segregation; Speech pitch estimation; Pitch classification; Recurrent neural network; Long short-term memory; Bidirectional long short-term memory;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper, we propose speech/music pitch classification based on recurrent neural network (RNN) for monaural speech segregation from music interferences. The speech segregation methods in this paper exploit sub-band masking to construct segregation masks modulated by the estimated speech pitch. However, for speech signals mixed with music, speech pitch estimation becomes unreliable, as speech and music have similar harmonic structures. In order to remove the music interference effectively, we propose an RNN-based speech/music pitch classification. Our proposed method models the temporal trajectories of speech and music pitch values and determines an unknown continuous pitch sequence as belonging to either speech or music. Among various types of RNNs, we chose simple recurrent network, long short-term memory (LSTM), and bidirectional LSTM for pitch classification. The experimental results show that our proposed method significantly outperforms the baseline methods for speech–music mixtures without loss of segregation performance for speech-noise mixtures.

引用

页码：8193 / 8213

页数：20

共 50 条

[41] A Supervised Learning Approach to Monaural Segregation of Reverberant Speech
Jin, Zhaozhang
Wang, DeLiang
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04): : 625 - 638
[42] A supervised learning approach to monaural segregation of reverberant speech
Jin, Zhaozhang
Wang, DeLiang
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 921 - +
[43] A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
Chao Sun
Min Zhang
Ruijuan Wu
Junhong Lu
Guo Xian
Qin Yu
Xiaofeng Gong
Ruisen Luo
Scientific Reports, 11
[44] A convolutional recurrent neural network with attention framework for speech separation in monaural recordings
Sun, Chao
Zhang, Min
Wu, Ruijuan
Lu, Junhong
Xian, Guo
Yu, Qin
Gong, Xiaofeng
Luo, Ruisen
SCIENTIFIC REPORTS, 2021, 11 (01)
[45] Low-Power Convolutional Recurrent Neural Network For Monaural Speech Enhancement
Gao, Fei
Guan, Haixin
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 559 - 563
[46] Emotional speech classification with prosodic prameters by using neural networks
Sato, H
Mitsukura, Y
Fukumi, M
Akamatsu, N
ANZIIS 2001: PROCEEDINGS OF THE SEVENTH AUSTRALIAN AND NEW ZEALAND INTELLIGENT INFORMATION SYSTEMS CONFERENCE, 2001, : 395 - 398
[47] EEG Classification of Covert Speech Using Regularized Neural Networks
Sereshkeh, Alborz Rezazadeh
Trott, Robert
Bricout, Aurelien
Chau, Tom
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2292 - 2300
[48] Visual speech recognition by recurrent neural networks
Rabi, G
Lu, SW
JOURNAL OF ELECTRONIC IMAGING, 1998, 7 (01) : 61 - 69
[49] Visual speech recognition by recurrent neural networks
Rabi, G
Lu, SW
1997 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS I AND II: ENGINEERING INNOVATION: VOYAGE OF DISCOVERY, 1997, : 55 - 58
[50] SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS
Graves, Alex
Mohamed, Abdel-rahman
Hinton, Geoffrey
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6645 - 6649

← 1 2 3 4 5 →