CoverHunter: Cover Song Identification with Refined Attention and Alignments

被引:0
|
作者
Liu, Feng [1 ]
Tuo, Deyi [1 ]
Xu, Yinan [1 ]
Han, Xintong [1 ]
机构
[1] Huya Inc, Intelligent Media Technol Dept, Guangzhou, Peoples R China
关键词
Cover Song Identification; Contrastive Learning; Chunk Alignment; Conformer; Coarse-to-Fine Training;
D O I
10.1109/ICME55011.2023.00189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cover Song Identification (CSI) focuses on finding the same music with different versions in reference anchors given a query track. In this paper, we propose a novel system named CoverHunter that overcomes the shortcomings of existing detection schemes by exploring richer features with refined attention and alignments. CoverHunter contains three key modules: 1) A convolution-augmented transformer (e.g. Conformer) structure that captures both local and global feature interactions in contrast to previous methods mainly relying on convolutional neural networks; 2) An attention-based time pooling module that further exploits the attention in the time dimension; 3) A novel coarse-to-fine training scheme that first trains a network to roughly align the song chunks and then refines the network by training on the aligned chunks. At the same time, we also summarize some important training tricks used in our system to achieve better results. Experiments on several standard CSI datasets show that our method significantly improves over state-of-the-art methods with an embedding size of 128 (2.3% on SHS100K-TEST and 17.7% on DaTacos).
引用
收藏
页码:1080 / 1085
页数:6
相关论文
共 50 条
  • [21] Audio cover song identification based on tonal sequence alignment
    Serra, Joan
    Gomez, Emilia
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 61 - 64
  • [22] DisCover: Disentangled Music Representation Learning for Cover Song Identification
    Xun, Jiahao
    Zhang, Shengyu
    Yang, Yanting
    Zhu, Jieming
    Deng, Liqun
    Zhao, Zhou
    Dong, Zhenhua
    Li, Ruiqi
    Zhang, Lichao
    Wu, Fei
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 453 - 463
  • [23] FALCON: FAst Lucene-based Cover sOng identificatioN
    Department of Information Engineering, University of Padova, Padova, Italy
    MM - Proc. ACM Multimedia Int. Conf., (1477-1480):
  • [24] MUSIC FINGERPRINT EXTRACTION FOR CLASSICAL MUSIC COVER SONG IDENTIFICATION
    Kim, Samuel
    Unal, Erdem
    Narayanan, Shrikanth
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1261 - 1264
  • [25] On Accuracy and Time Processing Evaluation of Cover Song Identification Systems
    Ferreira, Martha Dais
    Correa, Debora Cristina
    Grivet, Marcos Antonio
    dos Santos, Geovan Tavares
    de Mello, Rodrigo Fernandes
    Nonato, Luis Gustavo
    JOURNAL OF NEW MUSIC RESEARCH, 2016, 45 (04) : 333 - 342
  • [26] Dynamic chroma feature vectors with applications to cover song identification
    Kim, Samuel
    Narayanan, Shrikanth
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 988 - 991
  • [27] A code-based chromagram similarity for cover song identification
    Seo, Jin Soo
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (03): : 314 - 319
  • [28] Two-layer similarity fusion model for cover song identification
    Chen, Ning
    Li, Mingyu
    Xiao, Haidong
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
  • [29] Content-Based Cover Song Identification in Music Digital Libraries
    Miotto, Riccardo
    Montecchio, Nicola
    Orio, Nicola
    DIGITAL LIBRARIES, 2010, 91 : 195 - 204
  • [30] Two-layer similarity fusion model for cover song identification
    Ning Chen
    Mingyu Li
    Haidong Xiao
    EURASIP Journal on Audio, Speech, and Music Processing, 2017