AtResNet: Residual Atrous CNN with Multi-scale Feature Representation for Low Complexity Acoustic Scene Classification

被引:0
|
作者
Madhu, Aswathy [1 ,3 ]
Suresh, K. [2 ,3 ]
机构
[1] Coll Engn, Dept Elect & Commun, Thiruvananthapuram 695016, Kerala, India
[2] Govt Engn Coll, Dept Elect & Commun, Wayanad 670644, Kerala, India
[3] APJ Abdul Kalam Technol Univ, Thiruvananthapuram, Kerala, India
关键词
Low complexity ASC; Wavelet transform; Atrous CNN; Residual CNN; DCASE; CONVOLUTIONAL NEURAL-NETWORKS; DATA AUGMENTATION;
D O I
10.1007/s00034-022-02107-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Acoustic Scene Classification (ASC) aims to categorize real-world audio into one of the predetermined classes that identifies the recording environment of the audio. State-of-the-art ASC algorithms have excellent performance in terms of accuracy due to the emergence of deep learning algorithms. In particular, Convolutional Neural Networks (CNN) have set a new benchmark in ASC due to their promising performance. Despite the emergence of new frameworks, the interest in ASC is growing progressively with a shift of focus from enhancing accuracy to reducing model complexity. In this work, we introduce the AtResNet, a residual atrous CNN for low complexity acoustic scene classification. The AtResNet utilizes dilated convolutions and residual connections to reduce the number of model parameters. To further enhance the performance of AtResNet, we introduce a multi-scale feature representation method called multi-scale mel spectrogram (ms2). To compute the ms2, we evaluate the mel spectrogram on the wavelet subbands of the signal. We assessed AtResNet with ms2 on three benchmark datasets in ASC. The results suggest that our method significantly outperformed the CNN-based techniques in addition to a baseline system based on log mel spectrum for signal representation. AtResNet offers a 28.73% reduction in the model parameters against a baseline CNN. Furthermore, the AtResNet has a model size of 81 KB with post-training quantization of network weights. It makes AtResNet suitable for deployment in context-aware devices.
引用
收藏
页码:7035 / 7056
页数:22
相关论文
共 50 条
  • [21] MCANet: multi-scale contextual feature fusion network based on Atrous convolution
    Li, Ke
    Liu, ZhanDong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34679 - 34702
  • [22] Feature ensemble network for medical image segmentation with multi-scale atrous transformer
    Gai, Di
    Geng, Yuhan
    Huang, Xia
    Huang, Zheng
    Xiong, Xin
    Zhou, Ruihua
    Wang, Qi
    IET IMAGE PROCESSING, 2024, 18 (11) : 3082 - 3092
  • [23] Short-time acoustic scene recognition method using multi-scale feature fusion
    Wang, Meng
    Zhang, Pengyuan
    Shengxue Xuebao/Acta Acustica, 2022, 47 (06): : 717 - 726
  • [24] MULTI-SCALE RESIDUAL NETWORK FOR IMAGE CLASSIFICATION
    Zhong, Xian
    Gong, Oubo
    Huang, Wenxin
    Yuan, Jingling
    Ma, Bo
    Li, Ryan Wen
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2023 - 2027
  • [25] Integrating Multi-Scale Feature Boundary Module and Feature Fusion With CNN for Accurate Skin Cancer Segmentation and Classification
    Malaiarasan, S.
    Ravi, R.
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (05)
  • [26] Facial Expression Image Classification Based on Multi-scale Feature Fusion Residual Network
    Zhao, Yuxi
    Wang, Chunzhi
    Zhou, Xianjing
    Liu, Hu
    Communications in Computer and Information Science, 2023, 1811 CCIS : 105 - 118
  • [27] Spectral Segmentation Multi-Scale Feature Extraction Residual Networks for Hyperspectral Image Classification
    Wang, Jiamei
    Ren, Jiansi
    Peng, Yinbin
    Shi, Meilin
    REMOTE SENSING, 2023, 15 (17)
  • [28] Multi-scale pulmonary nodule classification with deep feature fusion via residual network
    Zhang G.
    Zhu D.
    Liu X.
    Chen M.
    Itti L.
    Luo Y.
    Lu J.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (11) : 14829 - 14840
  • [29] Acoustic scene classification using deep CNN with fine-resolution feature
    Zhang, Tao
    Liang, Jinhua
    Ding, Biyun
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 143
  • [30] A CNN-Based Feature Pyramid Segmentation Strategy for Acoustic Scene Classification
    Xi, Ji
    Xie, Yue
    Jiang, Pengxu
    Jiang, Wei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1093 - 1096