Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

被引：26

作者：

Benetos, Emmanouil ^{[1
]}

Dixon, Simon ^{[1
]}

机构：

[1] Queen Mary Univ London, Sch Elect Engn & Comp Sci, Ctr Digital Mus, London E1 4NS, England

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2011年 / 5卷 / 06期

关键词：

Automatic music transcription; harmonic envelope estimation; conditional random fields (CRFs); resonator time-frequency image; SEPARATION;

D O I：

10.1109/JSTSP.2011.2162394

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated.

引用

页码：1111 / 1123

页数：13

共 50 条

[31] RNN-BLSTM Based Multi-Pitch Estimation
Zhang, Jianshu
Tang, Jian
Dai, Li-Rang
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1785 - 1789
[32] POLYPHONIC MUSIC TRANSCRIPTION USING NOTE ONSET AND OFFSET DETECTION
Benetos, Emmanouil
Dixon, Simon
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 37 - 40
[33] Multi-Layer Combined Frequency and Periodicity Representations for Multi-Pitch Estimation of Multi-Instrument Music
Matsunaga, Tomoki
Saito, Hiroaki
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3171 - 3184
[34] The multi-pitch estimation problem: Some new solutions
Christensen, Mads Graeesboll
Stoica, Petre
Jakobsson, Andreas
Jensen, Soren Holdt
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PTS 1-3, PROCEEDINGS, 2007, : 1221 - +
[35] MULTI-PITCH ESTIMATION OF AUDIO RECORDINGS USING A CODEBOOK-BASED APPROACH
Hansen, Martin Weiss
Jensen, Jesper Rindom
Christensen, Mads Graesboll
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 983 - 987
[36] Multi-Pitch Estimation using NHF with Multi-Dictionary Distinguishing Attack and Reverberation of Sounds
Fujisawa, Takanori
Harada, Sora
Ikehara, Masaaki
CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 1836 - 1841
[37] RMVPE: A Robust Model for Vocal Pitch Estimation in Polyphonic Music
Wei, Haojie
Cao, Xueke
Dan, Tangpeng
Chen, Yueguo
INTERSPEECH 2023, 2023, : 5421 - 5425
[38] MULTI-PITCH ESTIMATION VIA FAST GROUP SPARSE LEARNING
Kronvall, Ted
Elvander, Filip
Adalbjornsson, Stefan Ingi
Jakobsson, Andreas
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1093 - 1097
[39] Automatic Transcription of Polyphonic Piano Music Using Genetic Algorithms, Adaptive Spectral Envelope Modeling, and Dynamic Noise Level Estimation
Reis, Gustavo
Fernandez de Vega, Francisco
Ferreira, Anibal
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (08): : 2313 - 2328
[40] Deep Neural Network for Multi-Pitch Estimation Using Weighted Cross Entropy Loss
Stone, Samuel
Spector, Evan
2021 IEEE WESTERN NEW YORK IMAGE AND SIGNAL PROCESSING WORKSHOP (WNYISPW), 2021,

← 1 2 3 4 5 →