Music genre classification based on fusing audio and lyric information

被引:0
|
作者
You Li
Zhihai Zhang
Han Ding
Liang Chang
机构
[1] Guilin University of Electronic Technology,Guangxi Key Laboratory of Trusted Software
[2] Guilin University of Electronic Technology,School of Electronic Engineering and Automation
来源
关键词
Music genre classification; Audio information; Lyric information; Information fusion;
D O I
暂无
中图分类号
学科分类号
摘要
Music genre classification (MGC) has a wide range of application scenarios. Traditional MGC methods only consider either audio information or lyric information, resulting in an unsatisfactory recognition effect. In this paper, we propose a multimodal music genre classification framework that integrates both audio information and lyric information. By using the complementarity of multimodal information, music genres can be represented more comprehensively. First, the framework extracts the mel-spectrogram of audio, and a convolutional neural network is used to extract audio features. Simultaneously, BERT is used to obtain the distributed representation of the lyrics. Then, the two modal pieces of information are fused through different strategies, such as at the feature level and decision level. To solve the serious inconsistency between the convergence speed of the audio channel and the lyric channel, we adopt the strategy of asynchronous start training of two channels and different learning rates. A series of experiments are carried out to verify the effectiveness of the proposed model. The F1 score of the proposed model is 0.87 for music genre classification, which is approximately 4% higher than that of the best baseline in the experiment.
引用
收藏
页码:20157 / 20176
页数:19
相关论文
共 50 条
  • [21] Content-based information fusion for semi-supervised music genre classification
    Song, Yangqiu
    Zhang, Changshui
    IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (01) : 145 - 152
  • [22] VIOLENCE DETECTION IN VIDEOS BASED ON FUSING VISUAL AND AUDIO INFORMATION
    Pang, Wen-Feng
    He, Qian-Hua
    Hu, Yong-jian
    Li, Yan-Xiong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2260 - 2264
  • [23] Audio Songs Classification Based on Music Patterns
    Sharma, Rahul
    Murthy, Y. V. Srinivasa
    Koolagudi, Shashidhar G.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 157 - 166
  • [24] Automatic audio genre classification based on support vector machine
    Zhu, Yingying
    Ming, Zhong
    Huang, Qiang
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 1, PROCEEDINGS, 2007, : 517 - +
  • [25] Factor Analysis for Audio-based Video Genre Classification
    Rouvier, Mickael
    Matrouf, Driss
    Linares, Georges
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1131 - 1134
  • [26] Automatic TV program genre classification based on audio patterns
    Jasinschi, RS
    Louie, J
    PROCEEDINGS OF THE 27TH EUROMICRO CONFERENCE - 2001: A NET ODYSSEY, 2001, : 370 - 375
  • [27] Music Genre Classification of audio signals Using Particle Swarm Optimization and Stacking Ensemble
    Leartpantulak, Krittika
    Kitjaidure, Yuttana
    2019 7TH INTERNATIONAL ELECTRICAL ENGINEERING CONGRESS (IEECON 2019), 2019,
  • [28] Music genre classification using audio features, different classifiers and feature selection methods
    Yaslan, Yusuf
    Cataltepe, Zehra
    2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 535 - +
  • [29] Genre Classification of Compressed Audio Data
    Rizzi, Antonello
    Buccino, Nicola Maurizio
    Panella, Massimo
    Uncini, Aurelio
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 654 - 659
  • [30] Musical genre classification of audio signals
    Tzanetakis, G
    Cook, P
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (05): : 293 - 302