CoverHunter: Cover Song Identification with Refined Attention and Alignments

被引:0
|
作者
Liu, Feng [1 ]
Tuo, Deyi [1 ]
Xu, Yinan [1 ]
Han, Xintong [1 ]
机构
[1] Huya Inc, Intelligent Media Technol Dept, Guangzhou, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
关键词
Cover Song Identification; Contrastive Learning; Chunk Alignment; Conformer; Coarse-to-Fine Training;
D O I
10.1109/ICME55011.2023.00189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cover Song Identification (CSI) focuses on finding the same music with different versions in reference anchors given a query track. In this paper, we propose a novel system named CoverHunter that overcomes the shortcomings of existing detection schemes by exploring richer features with refined attention and alignments. CoverHunter contains three key modules: 1) A convolution-augmented transformer (e.g. Conformer) structure that captures both local and global feature interactions in contrast to previous methods mainly relying on convolutional neural networks; 2) An attention-based time pooling module that further exploits the attention in the time dimension; 3) A novel coarse-to-fine training scheme that first trains a network to roughly align the song chunks and then refines the network by training on the aligned chunks. At the same time, we also summarize some important training tricks used in our system to achieve better results. Experiments on several standard CSI datasets show that our method significantly improves over state-of-the-art methods with an embedding size of 128 (2.3% on SHS100K-TEST and 17.7% on DaTacos).
引用
收藏
页码:1080 / 1085
页数:6
相关论文
共 50 条
  • [31] Karalk: a karaoke dataset for cover song identification and singing voice analysis
    Bayle, Yann
    Marsik, Ladislav
    Rusek, Martin
    Robine, Matthias
    Hanna, Pierre
    Slaninova, Katerina
    Martinovic, Jan
    Pokorny, Jaroslav
    2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 177 - 184
  • [32] WideResNet with Joint Representation Learning and Data Augmentation for Cover Song Identification
    Hu, Shichao
    Zhang, Bin
    Lu, Jinhong
    Jiang, Yiliang
    Wang, Wucheng
    Kong, Lingcheng
    Zhao, Weifeng
    Jiang, Tao
    INTERSPEECH 2022, 2022, : 4187 - 4191
  • [33] LEARNING A REPRESENTATION FOR COVER SONG IDENTIFICATION USING CONVOLUTIONAL NEURAL NETWORK
    Yu, Zhesong
    Xu, Xiaoshuo
    Chen, Xiaoou
    Yang, Deshun
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 541 - 545
  • [34] Deep learning of chroma representation for cover song identification in compression domain
    Jiunn-Tsair Fang
    Yu-Ruey Chang
    Pao-Chi Chang
    Multidimensional Systems and Signal Processing, 2018, 29 : 887 - 902
  • [35] BYTECOVER: COVER SONG IDENTIFICATION VIA MULTI-LOSS TRAINING
    Du, Xingjian
    Yu, Zhesong
    Zhu, Bilei
    Chen, Xiaoou
    Ma, Zejun
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 551 - 555
  • [36] Salient Chromagram Extraction Based on Trend Removal for Cover Song Identification
    Seo, Jin S.
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (01): : 51 - 54
  • [37] Efficient Two-Layer Model Towards Cover Song Identification
    Xu, Xiaoshuo
    Cheng, Yao
    Chen, Xiaoou
    Yang, Deshun
    MULTIMEDIA MODELING, MMM 2018, PT II, 2018, 10705 : 118 - 128
  • [38] Chroma binary similarity and local alignment applied to cover song identification
    Serra, Joan
    Gomez, Emilia
    Herrera, Perfecto
    Serra, Xavier
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (06): : 1138 - 1151
  • [39] Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification
    Yu, Zhesong
    Xu, Xiaoshuo
    Chen, Xiaoou
    Yang, Deshun
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4846 - 4852
  • [40] COVER SONG IDENTIFICATION WITH 2D FOURIER TRANSFORM SEQUENCES
    Seetharaman, Prem
    Rafii, Zajar
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 616 - 620