Audio-Based Music Classification with DenseNet and Data Augmentation

被引:14
|
作者
Bian, Wenhao [1 ,2 ]
Wang, Jie [2 ]
Zhuang, Bojin [2 ]
Yang, Jiankui [1 ]
Wang, Shaojun [2 ]
Xiao, Jing [2 ]
机构
[1] Beijing Univ Posts & Telecommn, Beijing, Peoples R China
[2] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
关键词
Music classification; Spectrogram; CNN; ResNet; DenseNet; Deep learning;
D O I
10.1007/978-3-030-29894-4_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep learning technique has received intense attention owing to its great success in image recognition. A tendency of adaption of deep learning in various information processing fields has formed, including music information retrieval (MIR). In this paper, we conduct a comprehensive study on music audio classification with improved convolutional neural networks (CNNs). To the best of our knowledge, this the first work to apply Densely Connected Convolutional Networks (DenseNet) to music audio tagging, which has been demonstrated to perform better than Residual neural network (ResNet). Additionally, two specific data augmentation approaches of time overlapping and pitch shifting have been proposed to address the deficiency of labelled data in the MIR. Moreover, an ensemble learning of stacking is employed based on SVM. We believe that the proposed combination of strong representation of DenseNet and data augmentation can be adapted to other audio processing tasks.
引用
收藏
页码:56 / 65
页数:10
相关论文
共 50 条
  • [31] A 15-Category Audio Dataset for Drones and an Audio-Based UAV Classification Using Machine Learning
    Wang, Mia Yaqin
    Chu, Zhiwei
    Ku, Ilmun
    Smith, E. Cho
    Matson, Eric T.
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2024, 18 (02) : 257 - 272
  • [32] Adaptive Audio-Based Context Recognition
    Dargie, Waltenegus
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (04): : 715 - 725
  • [33] Music genre classification of MPEG AAC audio data
    Kobayakawa, Michihiro
    Hoshi, Mamoru
    Yuzawa, Koichiro
    2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2014, : 347 - 352
  • [34] Audio Surveillance: Detection of Audio-Based Emergency Situations
    Dosbayev, Zhandos
    Abdrakhmanov, Rustam
    Akhmetova, Oxana
    Nurtas, Marat
    Iztayev, Zhalgasbek
    Zhaidakbaeva, Lyazzat
    Shaimerdenova, Lazzat
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 1463 : 413 - 424
  • [35] Audio Songs Classification Based on Music Patterns
    Sharma, Rahul
    Murthy, Y. V. Srinivasa
    Koolagudi, Shashidhar G.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 157 - 166
  • [36] MELON PLAYLIST DATASET: A PUBLIC DATASET FOR AUDIO-BASED PLAYLIST GENERATION AND MUSIC TAGGING
    Ferraro, Andres
    Kim, Yuntae
    Lee, Soohyeon
    Kim, Biho
    Jo, Namjun
    Lim, Semi
    Lim, Suyon
    Jang, Jungtaek
    Kim, Sehwan
    Serra, Xavier
    Bogdanov, Dmitry
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 536 - 540
  • [37] Audio-Based Epileptic Seizure Detection
    Ahsan, M. N. Istiaq
    Kertesz, Csaba
    Mesaros, Annamaria
    Heittola, Toni
    Knight, Andrew
    Virtanen, Tuomas
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [38] Sound Event Classification with Feature Vector Combination for Automatic Audio-based Surveillance
    Lee, Seunghyung
    Park, Jinuk
    Park, Sangjun
    Hahn, Minsoo
    2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
  • [39] EXPLORING META INFORMATION FOR AUDIO-BASED ZERO-SHOT BIRD CLASSIFICATION
    Gebhard, Alexander
    Triantafyllopoulos, Andreas
    Bez, Teresa
    Christ, Lukas
    Kathan, Alexander
    Schuller, Bjoern W.
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1211 - 1215
  • [40] Audio-based Gender and Age Identification
    Bozkurt, O. Ozgur
    Taysi, Z. Cihan
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 1371 - 1374