CoverHunter: Cover Song Identification with Refined Attention and Alignments

被引：0

作者：

Liu, Feng ^{[1
]}

Tuo, Deyi ^{[1
]}

Xu, Yinan ^{[1
]}

Han, Xintong ^{[1
]}

机构：

[1] Huya Inc, Intelligent Media Technol Dept, Guangzhou, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年

关键词：

Cover Song Identification; Contrastive Learning; Chunk Alignment; Conformer; Coarse-to-Fine Training;

D O I：

10.1109/ICME55011.2023.00189

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cover Song Identification (CSI) focuses on finding the same music with different versions in reference anchors given a query track. In this paper, we propose a novel system named CoverHunter that overcomes the shortcomings of existing detection schemes by exploring richer features with refined attention and alignments. CoverHunter contains three key modules: 1) A convolution-augmented transformer (e.g. Conformer) structure that captures both local and global feature interactions in contrast to previous methods mainly relying on convolutional neural networks; 2) An attention-based time pooling module that further exploits the attention in the time dimension; 3) A novel coarse-to-fine training scheme that first trains a network to roughly align the song chunks and then refines the network by training on the aligned chunks. At the same time, we also summarize some important training tricks used in our system to achieve better results. Experiments on several standard CSI datasets show that our method significantly improves over state-of-the-art methods with an embedding size of 128 (2.3% on SHS100K-TEST and 17.7% on DaTacos).

引用

页码：1080 / 1085

页数：6

共 50 条

[31] Karalk: a karaoke dataset for cover song identification and singing voice analysis
Bayle, Yann
Marsik, Ladislav
Rusek, Martin
Robine, Matthias
Hanna, Pierre
Slaninova, Katerina
Martinovic, Jan
Pokorny, Jaroslav
2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 177 - 184
[32] WideResNet with Joint Representation Learning and Data Augmentation for Cover Song Identification
Hu, Shichao
Zhang, Bin
Lu, Jinhong
Jiang, Yiliang
Wang, Wucheng
Kong, Lingcheng
Zhao, Weifeng
Jiang, Tao
INTERSPEECH 2022, 2022, : 4187 - 4191
[33] LEARNING A REPRESENTATION FOR COVER SONG IDENTIFICATION USING CONVOLUTIONAL NEURAL NETWORK
Yu, Zhesong
Xu, Xiaoshuo
Chen, Xiaoou
Yang, Deshun
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 541 - 545
[34] Deep learning of chroma representation for cover song identification in compression domain
Jiunn-Tsair Fang
Yu-Ruey Chang
Pao-Chi Chang
Multidimensional Systems and Signal Processing, 2018, 29 : 887 - 902
[35] BYTECOVER: COVER SONG IDENTIFICATION VIA MULTI-LOSS TRAINING
Du, Xingjian
Yu, Zhesong
Zhu, Bilei
Chen, Xiaoou
Ma, Zejun
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 551 - 555
[36] Salient Chromagram Extraction Based on Trend Removal for Cover Song Identification
Seo, Jin S.
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (01): : 51 - 54
[37] Efficient Two-Layer Model Towards Cover Song Identification
Xu, Xiaoshuo
Cheng, Yao
Chen, Xiaoou
Yang, Deshun
MULTIMEDIA MODELING, MMM 2018, PT II, 2018, 10705 : 118 - 128
[38] Chroma binary similarity and local alignment applied to cover song identification
Serra, Joan
Gomez, Emilia
Herrera, Perfecto
Serra, Xavier
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (06): : 1138 - 1151
[39] Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification
Yu, Zhesong
Xu, Xiaoshuo
Chen, Xiaoou
Yang, Deshun
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4846 - 4852
[40] COVER SONG IDENTIFICATION WITH 2D FOURIER TRANSFORM SEQUENCES
Seetharaman, Prem
Rafii, Zajar
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 616 - 620

← 1 2 3 4 5 →