FUSING TRANSCRIPTION RESULTS FROM POLYPHONIC AND MONOPHONIC AUDIO FOR SINGING MELODY TRANSCRIPTION IN POLYPHONIC MUSIC

被引:0
|
作者
Zhu, Bilei [1 ]
Wu, Fuzhang [1 ]
Li, Ke [1 ]
Wu, Yongjian [1 ]
Huang, Feiyue [1 ]
Wu, Yunsheng [1 ]
机构
[1] Tencent Youtu AI Lab, Shenzhen, Peoples R China
关键词
Singing melody transcription; polyphonic audio; monophonic singing recordings; deep neural network (DNN); pitch sequence selection; SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new system for singing melody transcription from polyphonic songs. Instead of operating solely on polyphonic audio of each song to be processed (as most existing systems do), our system takes as inputs additionally multiple monophonic recordings of people singing the song. To transcribe the singing melody in a song, our system first tracks the singing pitch from polyphonic audio of the song by using a deep neural network (DNN)-based method, and then uses the estimated pitch series as reference to select the pitch sequences extracted from the multiple monophonic singing recordings. The selected monophonic pitch sequences, as well as the DNN pitch series from the polyphonic audio, are then transcribed separately, and their transcriptions results are fused to form the final note sequence. Experimental results show that, by introducing monophonic singings into transcription, the performance of singing melody transcription from polyphonic songs can be significantly improved.
引用
收藏
页码:296 / 300
页数:5
相关论文
共 50 条
  • [31] Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures
    Francisco José Rodríguez-Serrano
    Julio José Carabias-Orti
    Pedro Vera-Candeas
    Francisco Jesús Canadas-Quesada
    Nicolás Ruiz-Reyes
    Multimedia Tools and Applications, 2014, 72 : 925 - 949
  • [32] Detecting pitch of singing voice in polyphonic audio
    Li, YP
    Wang, DL
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 17 - 20
  • [33] Efficient Vocal Melody Extraction from Polyphonic Music Signals
    Yao, G.
    Zheng, Y.
    Xiao, L.
    Ruan, L.
    Li, Y.
    ELEKTRONIKA IR ELEKTROTECHNIKA, 2013, 19 (06) : 103 - 108
  • [34] Real-Time monophonic and polyphonic audio classification from power spectra
    Baelde, Maxime
    Biernacki, Christophe
    Greff, Raphael
    PATTERN RECOGNITION, 2019, 92 : 82 - 92
  • [35] Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures
    Jose Rodriguez-Serrano, Francisco
    Jose Carabias-Orti, Julio
    Vera-Candeas, Pedro
    Jesus Canadas-Quesada, Francisco
    Ruiz-Reyes, Nicolas
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 72 (01) : 925 - 949
  • [36] Graph modeling for vocal melody extraction from polyphonic music
    Zhang, Weiwei
    Yan, Lingyu
    Zhang, Qiaoling
    Gao, Jinyi
    APPLIED ACOUSTICS, 2023, 211
  • [37] Computationally inexpensive and effective scheme for automatic transcription of polyphonic music
    Lao, WL
    Tan, ET
    Kam, AH
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1775 - 1778
  • [38] Non-negative matrix factorization for polyphonic music transcription
    Smaragdis, P
    Brown, JC
    2003 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS PROCEEDINGS, 2003, : 177 - 180
  • [39] POLYPHONIC MUSIC TRANSCRIPTION USING NOTE ONSET AND OFFSET DETECTION
    Benetos, Emmanouil
    Dixon, Simon
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 37 - 40
  • [40] Automatic transcription of polyphonic music using the multiresolution Fourier Transform
    Keren, R
    Zeevi, YY
    Chazan, D
    MELECON '98 - 9TH MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, VOLS 1 AND 2, 1998, : 654 - 657