CoverHunter: Cover Song Identification with Refined Attention and Alignments

被引:0
|
作者
Liu, Feng [1 ]
Tuo, Deyi [1 ]
Xu, Yinan [1 ]
Han, Xintong [1 ]
机构
[1] Huya Inc, Intelligent Media Technol Dept, Guangzhou, Peoples R China
关键词
Cover Song Identification; Contrastive Learning; Chunk Alignment; Conformer; Coarse-to-Fine Training;
D O I
10.1109/ICME55011.2023.00189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cover Song Identification (CSI) focuses on finding the same music with different versions in reference anchors given a query track. In this paper, we propose a novel system named CoverHunter that overcomes the shortcomings of existing detection schemes by exploring richer features with refined attention and alignments. CoverHunter contains three key modules: 1) A convolution-augmented transformer (e.g. Conformer) structure that captures both local and global feature interactions in contrast to previous methods mainly relying on convolutional neural networks; 2) An attention-based time pooling module that further exploits the attention in the time dimension; 3) A novel coarse-to-fine training scheme that first trains a network to roughly align the song chunks and then refines the network by training on the aligned chunks. At the same time, we also summarize some important training tricks used in our system to achieve better results. Experiments on several standard CSI datasets show that our method significantly improves over state-of-the-art methods with an embedding size of 128 (2.3% on SHS100K-TEST and 17.7% on DaTacos).
引用
收藏
页码:1080 / 1085
页数:6
相关论文
共 50 条
  • [1] CQTXNet: A Modified Xception Network with Attention Modules for Cover Song Identification
    Seo, Jinsoo
    Kim, Junghyun
    Kim, Hyemi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (01) : 49 - 52
  • [2] Similarity fusion scheme for cover song identification
    Chen, Ning
    Xiao, Hai-dong
    ELECTRONICS LETTERS, 2016, 52 (13) : 1173 - 1174
  • [3] Training audio transformers for cover song identification
    Zeng, Te
    Lau, Francis C. M.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2023, 2023 (01)
  • [4] Deep feature learning for cover song identification
    Fang, Jiunn-Tsair
    Day, Chi-Ting
    Chang, Pao-Chi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (22) : 23225 - 23238
  • [5] A HEURISTIC FOR DISTANCE FUSION IN COVER SONG IDENTIFICATION
    Degani, Alessio
    Dalai, Marco
    Leonardi, Riccardo
    Migliorati, Pierangelo
    2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS), 2013,
  • [6] Fusing similarity functions for cover song identification
    Ning Chen
    Wei Li
    Haidong Xiao
    Multimedia Tools and Applications, 2018, 77 : 2629 - 2652
  • [7] Training audio transformers for cover song identification
    Te Zeng
    Francis C. M. Lau
    EURASIP Journal on Audio, Speech, and Music Processing, 2023
  • [8] Cross recurrence quantification for cover song identification
    Serra, Joan
    Serra, Xavier
    Andrzejak, Ralph G.
    NEW JOURNAL OF PHYSICS, 2009, 11
  • [9] Fusing similarity functions for cover song identification
    Chen, Ning
    Li, Wei
    Xiao, Haidong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (02) : 2629 - 2652
  • [10] Deep feature learning for cover song identification
    Jiunn-Tsair Fang
    Chi-Ting Day
    Pao-Chi Chang
    Multimedia Tools and Applications, 2017, 76 : 23225 - 23238