FUSING TRANSCRIPTION RESULTS FROM POLYPHONIC AND MONOPHONIC AUDIO FOR SINGING MELODY TRANSCRIPTION IN POLYPHONIC MUSIC

被引：0

作者：

Zhu, Bilei ^{[1
]}

Wu, Fuzhang ^{[1
]}

Li, Ke ^{[1
]}

Wu, Yongjian ^{[1
]}

Huang, Feiyue ^{[1
]}

Wu, Yunsheng ^{[1
]}

机构：

[1] Tencent Youtu AI Lab, Shenzhen, Peoples R China

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

Singing melody transcription; polyphonic audio; monophonic singing recordings; deep neural network (DNN); pitch sequence selection; SPEECH;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a new system for singing melody transcription from polyphonic songs. Instead of operating solely on polyphonic audio of each song to be processed (as most existing systems do), our system takes as inputs additionally multiple monophonic recordings of people singing the song. To transcribe the singing melody in a song, our system first tracks the singing pitch from polyphonic audio of the song by using a deep neural network (DNN)-based method, and then uses the estimated pitch series as reference to select the pitch sequences extracted from the multiple monophonic singing recordings. The selected monophonic pitch sequences, as well as the DNN pitch series from the polyphonic audio, are then transcribed separately, and their transcriptions results are fused to form the final note sequence. Experimental results show that, by introducing monophonic singings into transcription, the performance of singing melody transcription from polyphonic songs can be significantly improved.

引用

页码：296 / 300

页数：5

共 50 条

[31] Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures
Francisco José Rodríguez-Serrano
Julio José Carabias-Orti
Pedro Vera-Candeas
Francisco Jesús Canadas-Quesada
Nicolás Ruiz-Reyes
Multimedia Tools and Applications, 2014, 72 : 925 - 949
[32] Detecting pitch of singing voice in polyphonic audio
Li, YP
Wang, DL
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 17 - 20
[33] Efficient Vocal Melody Extraction from Polyphonic Music Signals
Yao, G.
Zheng, Y.
Xiao, L.
Ruan, L.
Li, Y.
ELEKTRONIKA IR ELEKTROTECHNIKA, 2013, 19 (06) : 103 - 108
[34] Real-Time monophonic and polyphonic audio classification from power spectra
Baelde, Maxime
Biernacki, Christophe
Greff, Raphael
PATTERN RECOGNITION, 2019, 92 : 82 - 92
[35] Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures
Jose Rodriguez-Serrano, Francisco
Jose Carabias-Orti, Julio
Vera-Candeas, Pedro
Jesus Canadas-Quesada, Francisco
Ruiz-Reyes, Nicolas
MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 72 (01) : 925 - 949
[36] Graph modeling for vocal melody extraction from polyphonic music
Zhang, Weiwei
Yan, Lingyu
Zhang, Qiaoling
Gao, Jinyi
APPLIED ACOUSTICS, 2023, 211
[37] Computationally inexpensive and effective scheme for automatic transcription of polyphonic music
Lao, WL
Tan, ET
Kam, AH
2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1775 - 1778
[38] Non-negative matrix factorization for polyphonic music transcription
Smaragdis, P
Brown, JC
2003 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS PROCEEDINGS, 2003, : 177 - 180
[39] POLYPHONIC MUSIC TRANSCRIPTION USING NOTE ONSET AND OFFSET DETECTION
Benetos, Emmanouil
Dixon, Simon
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 37 - 40
[40] Automatic transcription of polyphonic music using the multiresolution Fourier Transform
Keren, R
Zeevi, YY
Chazan, D
MELECON '98 - 9TH MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, VOLS 1 AND 2, 1998, : 654 - 657

← 1 2 3 4 5 →