AtResNet: Residual Atrous CNN with Multi-scale Feature Representation for Low Complexity Acoustic Scene Classification

被引:0
|
作者
Madhu, Aswathy [1 ,3 ]
Suresh, K. [2 ,3 ]
机构
[1] Coll Engn, Dept Elect & Commun, Thiruvananthapuram 695016, Kerala, India
[2] Govt Engn Coll, Dept Elect & Commun, Wayanad 670644, Kerala, India
[3] APJ Abdul Kalam Technol Univ, Thiruvananthapuram, Kerala, India
关键词
Low complexity ASC; Wavelet transform; Atrous CNN; Residual CNN; DCASE; CONVOLUTIONAL NEURAL-NETWORKS; DATA AUGMENTATION;
D O I
10.1007/s00034-022-02107-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Acoustic Scene Classification (ASC) aims to categorize real-world audio into one of the predetermined classes that identifies the recording environment of the audio. State-of-the-art ASC algorithms have excellent performance in terms of accuracy due to the emergence of deep learning algorithms. In particular, Convolutional Neural Networks (CNN) have set a new benchmark in ASC due to their promising performance. Despite the emergence of new frameworks, the interest in ASC is growing progressively with a shift of focus from enhancing accuracy to reducing model complexity. In this work, we introduce the AtResNet, a residual atrous CNN for low complexity acoustic scene classification. The AtResNet utilizes dilated convolutions and residual connections to reduce the number of model parameters. To further enhance the performance of AtResNet, we introduce a multi-scale feature representation method called multi-scale mel spectrogram (ms2). To compute the ms2, we evaluate the mel spectrogram on the wavelet subbands of the signal. We assessed AtResNet with ms2 on three benchmark datasets in ASC. The results suggest that our method significantly outperformed the CNN-based techniques in addition to a baseline system based on log mel spectrum for signal representation. AtResNet offers a 28.73% reduction in the model parameters against a baseline CNN. Furthermore, the AtResNet has a model size of 81 KB with post-training quantization of network weights. It makes AtResNet suitable for deployment in context-aware devices.
引用
收藏
页码:7035 / 7056
页数:22
相关论文
共 50 条
  • [31] A MULTI-SCALE DEEP FEATURE LEARNING AND SEMANTIC ENHANCEMENT APPROACH FOR REMOTE SENSING SCENE CLASSIFICATION
    Huang, Hengyi
    Wang, Wenzhen
    Liao, Wenzhi
    Xiao, Liang
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5419 - 5422
  • [32] Historical Document Text Binarization using Atrous Convolution and Multi-scale Feature Decoder
    Rasyidi, Hanif
    Khan, Salman
    2019 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2019, : 537 - 544
  • [33] Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition
    Yao, Cong
    Bai, Xiang
    Shi, Baoguang
    Liu, Wenyu
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 4042 - 4049
  • [34] CNN-Based Post-Processing Filter for Video Compression with Multi-Scale Feature Representation
    Qi, Zhanyuan
    Jung, Cheolkon
    Liu, Yang
    Li, Ming
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [35] Multi-scale Representation of Building Feature in Urban GIS
    AI Tinghua WANG Hong LIU Yaolin
    Geo-Spatial Information Science, 2002, (02) : 37 - 44
  • [36] A new texture representation with multi-scale wavelet feature
    Yi, Sheng
    Cao, Hanqiang
    Li, Xutao
    Liu, Miao
    VISUAL INFORMATION PROCESSING XV, 2006, 6246
  • [37] Scene Classification of High-Resolution Remote Sensing Image by Multi-scale and Multi-feature Fusion
    Huang H.
    Xu K.-J.
    Shi G.-Y.
    Huang, Hong (hhuang@cqu.edu.cn), 1824, Chinese Institute of Electronics (48): : 1824 - 1833
  • [38] Multi-Instance Multi-Scale CNN for Medical Image Classification
    Li, Shaohua
    Liu, Yong
    Sui, Xiuchao
    Chen, Cheng
    Tjio, Gabriel
    Ting, Daniel Shu Wei
    Goh, Rick Siow Mong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 531 - 539
  • [39] Multi-Scale Multi-Level Generative Model in Scene Classification
    Xie, Wenjie
    Xu, De
    Tang, Yingjun
    Cui, Geng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (01): : 167 - 170
  • [40] A Spatial Layout and Scale Invariant Feature Representation for Indoor Scene Classification
    Hayat, Munawar
    Khan, Salman H.
    Bennamoun, Mohammed
    An, Senjian
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (10) : 4829 - 4841