Auxiliary self-supervision to metric learning for music similarity-based retrieval and auto-tagging

被引:0
|
作者
Akama, Taketo [1 ]
Kitano, Hiroaki [1 ]
Takematsu, Katsuhiro [2 ]
Miyajima, Yasushi [2 ]
Polouliakh, Natalia [1 ]
机构
[1] Sony Comp Sci Labs Inc, Tokyo, Japan
[2] Koozyt Inc, Tokyo, Japan
来源
PLOS ONE | 2023年 / 18卷 / 11期
关键词
D O I
10.1371/journal.pone.0294643
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In the realm of music information retrieval, similarity-based retrieval and auto-tagging serve as essential components. Similarity-based retrieval involves automatically analyzing a music track and fetching analogous tracks from a database. Auto-tagging, on the other hand, assesses a music track to deduce associated tags, such as genre and mood. Given the limitations and non-scalability of human supervision signals, it becomes crucial for models to learn from alternative sources to enhance their performance. Contrastive learning-based self-supervised learning, which exclusively relies on learning signals derived from music audio data, has demonstrated its efficacy in the context of auto-tagging. In this work, we propose a model that builds on the self-supervised learning approach to address the similarity-based retrieval challenge by introducing our method of metric learning with a self-supervised auxiliary loss. Furthermore, diverging from conventional self-supervised learning methodologies, we discovered the advantages of concurrently training the model with both self-supervision and supervision signals, without freezing pre-trained models. We also found that refraining from employing augmentation during the fine-tuning phase yields better results. Our experimental results confirm that the proposed methodology enhances retrieval and tagging performance metrics in two distinct scenarios: one where human-annotated tags are consistently available for all music tracks, and another where such tags are accessible only for a subset of music tracks.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Music Auto-tagging Algorithm Based on Deep Analysis on Labels
    Wang Z.
    Zhang R.
    Gao Y.
    Xiao Y.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2019, 47 (08): : 71 - 76
  • [2] AUDIO-BASED AUTO-TAGGING WITH CONTEXTUAL TAGS FOR MUSIC
    Ibrahim, Karim M.
    Royo-Letelier, Jimena
    Epure, Elena, V
    Peeters, Geoffroy
    Richard, Gael
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 16 - 20
  • [3] Code Tagging and Similarity-based Retrieval with myCBR
    Roth-Berghofer, Thomas R.
    Bahls, Daniel
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXV, 2009, : 19 - +
  • [4] Music auto-tagging based on the unified latent semantic modeling
    Xi Shao
    Zhiyong Cheng
    Mohan S. Kankanhalli
    Multimedia Tools and Applications, 2019, 78 : 161 - 176
  • [5] Tag Propagation and Cost-Sensitive Learning for Music Auto-Tagging
    Lin, Yi-Hsun
    Chen, Homer H.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1605 - 1616
  • [6] Music auto-tagging based on the unified latent semantic modeling
    Shao, Xi
    Cheng, Zhiyong
    Kankanhalli, Mohan S.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (01) : 161 - 176
  • [7] Similarity-based Distant Supervision for Definition Retrieval
    Jiang, Jiepu
    Allan, James
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 527 - 536
  • [8] Tag Propagation and Cost-Sensitive Learning for Music Auto-Tagging
    Lin, Yi-Hsun
    Chen, Homer H.
    IEEE Transactions on Multimedia, 2021, 23 : 1605 - 1616
  • [9] Playlist-Based Tag Propagation for Improving Music Auto-Tagging
    Lin, Yi-Hsun
    Chung, Chia-Hao
    Chen, Homer H.
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2270 - 2274
  • [10] IMPROVING MUSIC AUTO-TAGGING WITH TRIGGER-BASED CONTEXT MODEL
    Yan, Qin
    Ding, Cong
    Yin, Jingjing
    Lv, Yong
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 434 - 438