Audio-Based Music Classification with DenseNet and Data Augmentation

被引:14
|
作者
Bian, Wenhao [1 ,2 ]
Wang, Jie [2 ]
Zhuang, Bojin [2 ]
Yang, Jiankui [1 ]
Wang, Shaojun [2 ]
Xiao, Jing [2 ]
机构
[1] Beijing Univ Posts & Telecommn, Beijing, Peoples R China
[2] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
关键词
Music classification; Spectrogram; CNN; ResNet; DenseNet; Deep learning;
D O I
10.1007/978-3-030-29894-4_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep learning technique has received intense attention owing to its great success in image recognition. A tendency of adaption of deep learning in various information processing fields has formed, including music information retrieval (MIR). In this paper, we conduct a comprehensive study on music audio classification with improved convolutional neural networks (CNNs). To the best of our knowledge, this the first work to apply Densely Connected Convolutional Networks (DenseNet) to music audio tagging, which has been demonstrated to perform better than Residual neural network (ResNet). Additionally, two specific data augmentation approaches of time overlapping and pitch shifting have been proposed to address the deficiency of labelled data in the MIR. Moreover, an ensemble learning of stacking is employed based on SVM. We believe that the proposed combination of strong representation of DenseNet and data augmentation can be adapted to other audio processing tasks.
引用
收藏
页码:56 / 65
页数:10
相关论文
共 50 条
  • [1] A Survey of Audio-Based Music Classification and Annotation
    Fu, Zhouyu
    Lu, Guojun
    Ting, Kai Ming
    Zhang, Dengsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2011, 13 (02) : 303 - 319
  • [2] AUDIO-BASED CLASSIFICATION OF SPEAKER CHARACTERISTICS
    Dutta, Promiti
    Haubold, Alexander
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 422 - 425
  • [3] Visualization in audio-based music information retrieval
    Cooper, Matthew
    Foote, Jonathan
    Pampalk, Elias
    Tzanetakis, George
    COMPUTER MUSIC JOURNAL, 2006, 30 (02) : 42 - 62
  • [4] Audio-based Deep Music Emotion Recognition
    Liu, Tong
    Han, Li
    Ma, Liangkai
    Guo, Dongwei
    6TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION (CDMMS 2018), 2018, 1967
  • [5] AUDIO-BASED DETECTION OF EXPLICIT CONTENT IN MUSIC
    Vaglio, Andrea
    Hennequin, Romain
    Moussallam, Manuel
    Richard, Gael
    d'Alche-Buc, Florence
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 526 - 530
  • [6] Robust Audio-based Classification of Video Genre
    Rouvier, Mickael
    Linares, Georges
    Matrouf, Driss
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1155 - 1158
  • [7] Music Popularity: Metrics, Characteristics, and Audio-Based Prediction
    Lee, Junghyuk
    Lee, Jong-Seok
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (11) : 3173 - 3182
  • [8] Visual music transcription of clarinet video recordings trained with audio-based labelled data
    Zinemanas, Pablo
    Arias, Pablo
    Haro, Gloria
    Gomez, Emilia
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 463 - 470
  • [9] Automated Data Augmentation for Audio Classification
    Sun, Yanjie
    Xu, Kele
    Liu, Chaorun
    Dou, Yong
    Wang, Huaimin
    Ding, Bo
    Pan, Qinghua
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2716 - 2728
  • [10] Audio-Based Semantic Concept Classification for Consumer Video
    Lee, Keansub
    Ellis, Daniel P. W.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1406 - 1416