AtResNet: Residual Atrous CNN with Multi-scale Feature Representation for Low Complexity Acoustic Scene Classification

被引:0
|
作者
Madhu, Aswathy [1 ,3 ]
Suresh, K. [2 ,3 ]
机构
[1] Coll Engn, Dept Elect & Commun, Thiruvananthapuram 695016, Kerala, India
[2] Govt Engn Coll, Dept Elect & Commun, Wayanad 670644, Kerala, India
[3] APJ Abdul Kalam Technol Univ, Thiruvananthapuram, Kerala, India
关键词
Low complexity ASC; Wavelet transform; Atrous CNN; Residual CNN; DCASE; CONVOLUTIONAL NEURAL-NETWORKS; DATA AUGMENTATION;
D O I
10.1007/s00034-022-02107-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Acoustic Scene Classification (ASC) aims to categorize real-world audio into one of the predetermined classes that identifies the recording environment of the audio. State-of-the-art ASC algorithms have excellent performance in terms of accuracy due to the emergence of deep learning algorithms. In particular, Convolutional Neural Networks (CNN) have set a new benchmark in ASC due to their promising performance. Despite the emergence of new frameworks, the interest in ASC is growing progressively with a shift of focus from enhancing accuracy to reducing model complexity. In this work, we introduce the AtResNet, a residual atrous CNN for low complexity acoustic scene classification. The AtResNet utilizes dilated convolutions and residual connections to reduce the number of model parameters. To further enhance the performance of AtResNet, we introduce a multi-scale feature representation method called multi-scale mel spectrogram (ms2). To compute the ms2, we evaluate the mel spectrogram on the wavelet subbands of the signal. We assessed AtResNet with ms2 on three benchmark datasets in ASC. The results suggest that our method significantly outperformed the CNN-based techniques in addition to a baseline system based on log mel spectrum for signal representation. AtResNet offers a 28.73% reduction in the model parameters against a baseline CNN. Furthermore, the AtResNet has a model size of 81 KB with post-training quantization of network weights. It makes AtResNet suitable for deployment in context-aware devices.
引用
收藏
页码:7035 / 7056
页数:22
相关论文
共 50 条
  • [41] EEG classification model for virtual reality motion sickness based on multi-scale CNN feature correlation
    Hua, Chengcheng
    Tao, Jianlong
    Zhou, Zhanfeng
    Chai, Lining
    Yan, Ying
    Liu, Jia
    Fu, Rongrong
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 251
  • [42] Automatic CAC Voxel Classification with Multi-scale CNN Architecture
    Kim, Won Shik
    Jung, Ho-Youl
    Choi, Jae Hun
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1351 - 1353
  • [43] DIFFERENTIAL CONVOLUTION FEATURE GUIDED DEEP MULTI-SCALE MULTIPLE INSTANCE LEARNING FOR AERIAL SCENE CLASSIFICATION
    Zhou, Beichen
    Yi, Jingjun
    Bi, Qi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4595 - 4599
  • [44] A Multi-scale CNN and Curriculum Learning Strategy for Mammogram Classification
    Lotter, William
    Sorensen, Greg
    Cox, David
    DEEP LEARNING IN MEDICAL IMAGE ANALYSIS AND MULTIMODAL LEARNING FOR CLINICAL DECISION SUPPORT, 2017, 10553 : 169 - 177
  • [45] Remote Sensing Scene Classification Method Based on Multi-Scale Graph Convolution Context Feature Aggregation
    Chen, Baolan
    Li, Huawang
    Wang, Yinxiao
    LASER & OPTOELECTRONICS PROGRESS, 2025, 62 (04)
  • [46] Multi-scale counting and difference representation for texture classification
    Dong, Yongsheng
    Feng, Jinwang
    Yang, Chunlei
    Wang, Xiaohong
    Zheng, Lintao
    Pu, Jiexin
    VISUAL COMPUTER, 2018, 34 (10): : 1315 - 1324
  • [47] Multi-scale counting and difference representation for texture classification
    Yongsheng Dong
    Jinwang Feng
    Chunlei Yang
    Xiaohong Wang
    Lintao Zheng
    Jiexin Pu
    The Visual Computer, 2018, 34 : 1315 - 1324
  • [48] Multi-Scale Feature Based Medical Image Classification
    Li, Bo
    Li, Wei
    Zhao, Dazhe
    2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 1182 - 1186
  • [49] Feature reduction of multi-scale LBP for texture classification
    Hu, Ran
    Qi, Wenfa
    Guo, Zongming
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP), 2015, : 397 - 400
  • [50] Hyperspectral Image Classification with Multi-Scale Feature Extraction
    Tu, Bing
    Li, Nanying
    Fang, Leyuan
    He, Danbing
    Ghamisi, Pedram
    REMOTE SENSING, 2019, 11 (05)