Balancing bias and performance in polyphonic piano transcription systems

被引:0
|
作者
Martak, Lukas Samuel [1 ,2 ]
Kelz, Rainer [1 ]
Widmer, Gerhard [1 ,2 ]
机构
[1] Johannes Kepler Univ Linz, Inst Computat Percept, Linz, Austria
[2] Johannes Kepler Univ Linz, Linz Inst Technol, Artificial Intelligence Lab, Linz, Austria
来源
基金
欧洲研究理事会;
关键词
differentiable dictionary search; non-negative matrix factorization; deep learning; normalizing flows; density models; piano music; source separation; automatic music transcription;
D O I
10.3389/frsip.2022.975932
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Current state-of-the-art methods for polyphonic piano transcription tend to use high capacity neural networks. Most models are trained "end-to-end", and learn a mapping from audio input to pitch labels. They require large training corpora consisting of many audio recordings of different piano models and temporally aligned pitch labels. It has been shown in previous work that neural network-based systems struggle to generalize to unseen note combinations, as they tend to learn note combinations by heart. Semi-supervised linear matrix decomposition is a frequently used alternative approach to piano transcription-one that does not have this particular drawback. The disadvantages of linear methods start to show when they encounter recordings of pieces played on unseen pianos, a scenario where neural networks seem relatively untroubled. A recently proposed approach called "Differentiable Dictionary Search" (DDS) combines the modeling capacity of deep density models with the linear mixing model of matrix decomposition in order to balance the mutual advantages and disadvantages of the standalone approaches, making it better suited to model unseen sources, while generalization to unseen note combinations should be unaffected, because the mixing model is not learned, and thus cannot acquire a corpus bias. In its initially proposed form, however, DDS is too inefficient in utilizing computational resources to be applied to piano music transcription. To reduce computational demands and memory requirements, we propose a number of modifications. These adjustments finally enable a fair comparison of our modified DDS variant with a semi-supervised matrix decomposition baseline, as well as a state-of-the-art, deep neural network based system that is trained end-to-end. In systematic experiments with both musical and "unmusical" piano recordings (real musical pieces and unusual chords), we provide quantitative and qualitative analyses at the frame level, characterizing the behavior of the modified approach, along with a comparison to several related methods. The results will generally show the fundamental promise of the model, and in particular demonstrate improvement in situations where a corpus bias incurred by learning from musical material of a specific genre would be problematic.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Towards automatic music transcription: Extraction of midi-data out of polyphonic piano music
    Wellhausen, J
    Krause, I
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS: IMAGE, ACOUSTIC, SIGNAL PROCESSING AND OPTICAL SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 114 - 118
  • [32] Spectral identification of polyphonic piano signals
    Rossi, L
    Girolami, G
    Leca, L
    ACUSTICA, 1996, 82 : S187 - S187
  • [33] Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices
    Nakamura, Eita
    Yoshii, Kazuyoshi
    Sagayama, Shigeki
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 794 - 806
  • [34] SIC RECEIVER FOR POLYPHONIC PIANO MUSIC
    Barbancho, Ana M.
    Barbancho, Isabel
    Soto, Beatriz
    Tardon, Lorenzo J.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 377 - 380
  • [35] Summary of Dissertation: Transcription of polyphonic music for piano based in resolution of groups of notes and finite states
    Gómez-Meire S.
    Inteligencia Artificial, 2010, 14 (45) : 44 - 47
  • [36] Mobile-AMT: Real-time Polyphonic Piano Transcription for In-the-Wild Recordings
    Kusaka, Yuta
    Maezawa, Akira
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 36 - 40
  • [37] Polyphonic Piano Music Transcription System Exploiting Mutual Correlations of Different Musical Note States
    Kim, Taehyeon
    Lee, Donghyeon
    Kim, Man-Je
    Ahn, Chang Wook
    IEEE ACCESS, 2024, 12 : 93689 - 93700
  • [38] A DISCRIMINATIVE APPROACH TO POLYPHONIC PIANO NOTE TRANSCRIPTION USING SUPERVISED NON-NEGATIVE MATRIX FACTORIZATION
    Weninger, Felix
    Kirst, Christian
    Schuller, Bjoern
    Bungartz, Hans-Joachim
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6 - 10
  • [39] End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding
    Zeng, Wei
    He, Xian
    Wang, Ye
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 7788 - 7795
  • [40] FUSING TRANSCRIPTION RESULTS FROM POLYPHONIC AND MONOPHONIC AUDIO FOR SINGING MELODY TRANSCRIPTION IN POLYPHONIC MUSIC
    Zhu, Bilei
    Wu, Fuzhang
    Li, Ke
    Wu, Yongjian
    Huang, Feiyue
    Wu, Yunsheng
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 296 - 300