Audio-Based Music Classification with DenseNet and Data Augmentation

被引:14
|
作者
Bian, Wenhao [1 ,2 ]
Wang, Jie [2 ]
Zhuang, Bojin [2 ]
Yang, Jiankui [1 ]
Wang, Shaojun [2 ]
Xiao, Jing [2 ]
机构
[1] Beijing Univ Posts & Telecommn, Beijing, Peoples R China
[2] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
关键词
Music classification; Spectrogram; CNN; ResNet; DenseNet; Deep learning;
D O I
10.1007/978-3-030-29894-4_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep learning technique has received intense attention owing to its great success in image recognition. A tendency of adaption of deep learning in various information processing fields has formed, including music information retrieval (MIR). In this paper, we conduct a comprehensive study on music audio classification with improved convolutional neural networks (CNNs). To the best of our knowledge, this the first work to apply Densely Connected Convolutional Networks (DenseNet) to music audio tagging, which has been demonstrated to perform better than Residual neural network (ResNet). Additionally, two specific data augmentation approaches of time overlapping and pitch shifting have been proposed to address the deficiency of labelled data in the MIR. Moreover, an ensemble learning of stacking is employed based on SVM. We believe that the proposed combination of strong representation of DenseNet and data augmentation can be adapted to other audio processing tasks.
引用
收藏
页码:56 / 65
页数:10
相关论文
共 50 条
  • [41] Multi-view Neural Networks for Raw Audio-based Music Emotion Recognition
    He, Na
    Ferguson, Sam
    2020 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2020), 2020, : 168 - 172
  • [42] Audio-Based Video Genre Identification
    Rouvier, Mickael
    Oger, Stanislas
    Linares, Georges
    Matrouf, Driss
    Merialdo, Bernard
    Li, Yingbo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (06) : 1031 - 1041
  • [43] AUDIO-BASED NONLINEAR VIDEO DIFFUSION
    Casanovas, Anna Llagostera
    Vandergheynst, Pierre
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2486 - 2489
  • [44] Classification of music genre using data augmentation in neural network based on Sports universities data
    Sejong
    REVISTA DE PSICOLOGIA DEL DEPORTE, 2022, 31 (01): : 107 - 116
  • [45] AUDIO-BASED IDENTIFICATION OF BEEHIVE STATES
    Nolasco, Ines
    Terenzi, Alessandro
    Cecchi, Stefania
    Orcioni, Simone
    Bear, Helen L.
    Benetos, Emmanouil
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8256 - 8260
  • [46] An audio-based personal memory aid
    Vemuri, S
    Schmandt, C
    Bender, W
    Tellex, S
    Lassey, B
    UBICOMP 2004: UBIQUITOUS COMPUTING, PROCEEDINGS, 2004, 3205 : 400 - 417
  • [47] Audio-based description and structuring of videos
    Harb H.
    Chen L.
    International Journal on Digital Libraries, 2006, 6 (1) : 70 - 81
  • [48] Combining audio-based and video-based shot classification systems for news videos segmentation
    De Santo, M
    Percannella, G
    Sansone, C
    Vento, M
    MULTIPLE CLASSIFIER SYSTEMS, 2005, 3541 : 397 - 406
  • [49] Audio-Based Hate Speech Classification from Online Short-Form Videos
    Ibanez, Michael
    Sapinit, Ranz
    Reyes, Lloyd Antonie
    Hussien, Mohammed
    Imperial, Joseph Marvin
    Rodriguez, Ramon
    2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 72 - 77
  • [50] Audio-based shot classification for audiovisual indexing using PCA, MGD and fuzzy algorithm
    Nitanda, Naoki
    Haseyama, Miki
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2007, E90A (08) : 1542 - 1548