AtResNet: Residual Atrous CNN with Multi-scale Feature Representation for Low Complexity Acoustic Scene Classification

被引：0

作者：

Madhu, Aswathy ^{[1
,3
]}

Suresh, K. ^{[2
,3
]}

机构：

[1] Coll Engn, Dept Elect & Commun, Thiruvananthapuram 695016, Kerala, India

[2] Govt Engn Coll, Dept Elect & Commun, Wayanad 670644, Kerala, India

[3] APJ Abdul Kalam Technol Univ, Thiruvananthapuram, Kerala, India

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2022年 / 41卷 / 12期

关键词：

Low complexity ASC; Wavelet transform; Atrous CNN; Residual CNN; DCASE; CONVOLUTIONAL NEURAL-NETWORKS; DATA AUGMENTATION;

D O I：

10.1007/s00034-022-02107-2

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Acoustic Scene Classification (ASC) aims to categorize real-world audio into one of the predetermined classes that identifies the recording environment of the audio. State-of-the-art ASC algorithms have excellent performance in terms of accuracy due to the emergence of deep learning algorithms. In particular, Convolutional Neural Networks (CNN) have set a new benchmark in ASC due to their promising performance. Despite the emergence of new frameworks, the interest in ASC is growing progressively with a shift of focus from enhancing accuracy to reducing model complexity. In this work, we introduce the AtResNet, a residual atrous CNN for low complexity acoustic scene classification. The AtResNet utilizes dilated convolutions and residual connections to reduce the number of model parameters. To further enhance the performance of AtResNet, we introduce a multi-scale feature representation method called multi-scale mel spectrogram (ms2). To compute the ms2, we evaluate the mel spectrogram on the wavelet subbands of the signal. We assessed AtResNet with ms2 on three benchmark datasets in ASC. The results suggest that our method significantly outperformed the CNN-based techniques in addition to a baseline system based on log mel spectrum for signal representation. AtResNet offers a 28.73% reduction in the model parameters against a baseline CNN. Furthermore, the AtResNet has a model size of 81 KB with post-training quantization of network weights. It makes AtResNet suitable for deployment in context-aware devices.

引用

页码：7035 / 7056

页数：22

共 50 条

[21] MCANet: multi-scale contextual feature fusion network based on Atrous convolution
Li, Ke
Liu, ZhanDong
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34679 - 34702
[22] Feature ensemble network for medical image segmentation with multi-scale atrous transformer
Gai, Di
Geng, Yuhan
Huang, Xia
Huang, Zheng
Xiong, Xin
Zhou, Ruihua
Wang, Qi
IET IMAGE PROCESSING, 2024, 18 (11) : 3082 - 3092
[23] Short-time acoustic scene recognition method using multi-scale feature fusion
Wang, Meng
Zhang, Pengyuan
Shengxue Xuebao/Acta Acustica, 2022, 47 (06): : 717 - 726
[24] MULTI-SCALE RESIDUAL NETWORK FOR IMAGE CLASSIFICATION
Zhong, Xian
Gong, Oubo
Huang, Wenxin
Yuan, Jingling
Ma, Bo
Li, Ryan Wen
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2023 - 2027
[25] Integrating Multi-Scale Feature Boundary Module and Feature Fusion With CNN for Accurate Skin Cancer Segmentation and Classification
Malaiarasan, S.
Ravi, R.
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (05)
[26] Facial Expression Image Classification Based on Multi-scale Feature Fusion Residual Network
Zhao, Yuxi
Wang, Chunzhi
Zhou, Xianjing
Liu, Hu
Communications in Computer and Information Science, 2023, 1811 CCIS : 105 - 118
[27] Spectral Segmentation Multi-Scale Feature Extraction Residual Networks for Hyperspectral Image Classification
Wang, Jiamei
Ren, Jiansi
Peng, Yinbin
Shi, Meilin
REMOTE SENSING, 2023, 15 (17)
[28] Multi-scale pulmonary nodule classification with deep feature fusion via residual network
Zhang G.
Zhu D.
Liu X.
Chen M.
Itti L.
Luo Y.
Lu J.
Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (11) : 14829 - 14840
[29] Acoustic scene classification using deep CNN with fine-resolution feature
Zhang, Tao
Liang, Jinhua
Ding, Biyun
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 143
[30] A CNN-Based Feature Pyramid Segmentation Strategy for Acoustic Scene Classification
Xi, Ji
Xie, Yue
Jiang, Pengxu
Jiang, Wei
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (08) : 1093 - 1096

← 1 2 3 4 5 →