AtResNet: Residual Atrous CNN with Multi-scale Feature Representation for Low Complexity Acoustic Scene Classification

被引：0

作者：

Madhu, Aswathy ^{[1
,3
]}

Suresh, K. ^{[2
,3
]}

机构：

[1] Coll Engn, Dept Elect & Commun, Thiruvananthapuram 695016, Kerala, India

[2] Govt Engn Coll, Dept Elect & Commun, Wayanad 670644, Kerala, India

[3] APJ Abdul Kalam Technol Univ, Thiruvananthapuram, Kerala, India

来源：

CIRCUITS SYSTEMS AND SIGNAL PROCESSING | 2022年 / 41卷 / 12期

关键词：

Low complexity ASC; Wavelet transform; Atrous CNN; Residual CNN; DCASE; CONVOLUTIONAL NEURAL-NETWORKS; DATA AUGMENTATION;

D O I：

10.1007/s00034-022-02107-2

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Acoustic Scene Classification (ASC) aims to categorize real-world audio into one of the predetermined classes that identifies the recording environment of the audio. State-of-the-art ASC algorithms have excellent performance in terms of accuracy due to the emergence of deep learning algorithms. In particular, Convolutional Neural Networks (CNN) have set a new benchmark in ASC due to their promising performance. Despite the emergence of new frameworks, the interest in ASC is growing progressively with a shift of focus from enhancing accuracy to reducing model complexity. In this work, we introduce the AtResNet, a residual atrous CNN for low complexity acoustic scene classification. The AtResNet utilizes dilated convolutions and residual connections to reduce the number of model parameters. To further enhance the performance of AtResNet, we introduce a multi-scale feature representation method called multi-scale mel spectrogram (ms2). To compute the ms2, we evaluate the mel spectrogram on the wavelet subbands of the signal. We assessed AtResNet with ms2 on three benchmark datasets in ASC. The results suggest that our method significantly outperformed the CNN-based techniques in addition to a baseline system based on log mel spectrum for signal representation. AtResNet offers a 28.73% reduction in the model parameters against a baseline CNN. Furthermore, the AtResNet has a model size of 81 KB with post-training quantization of network weights. It makes AtResNet suitable for deployment in context-aware devices.

引用

页码：7035 / 7056

页数：22

共 50 条

[41] EEG classification model for virtual reality motion sickness based on multi-scale CNN feature correlation
Hua, Chengcheng
Tao, Jianlong
Zhou, Zhanfeng
Chai, Lining
Yan, Ying
Liu, Jia
Fu, Rongrong
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 251
[42] Automatic CAC Voxel Classification with Multi-scale CNN Architecture
Kim, Won Shik
Jung, Ho-Youl
Choi, Jae Hun
2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1351 - 1353
[43] DIFFERENTIAL CONVOLUTION FEATURE GUIDED DEEP MULTI-SCALE MULTIPLE INSTANCE LEARNING FOR AERIAL SCENE CLASSIFICATION
Zhou, Beichen
Yi, Jingjun
Bi, Qi
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4595 - 4599
[44] A Multi-scale CNN and Curriculum Learning Strategy for Mammogram Classification
Lotter, William
Sorensen, Greg
Cox, David
DEEP LEARNING IN MEDICAL IMAGE ANALYSIS AND MULTIMODAL LEARNING FOR CLINICAL DECISION SUPPORT, 2017, 10553 : 169 - 177
[45] Remote Sensing Scene Classification Method Based on Multi-Scale Graph Convolution Context Feature Aggregation
Chen, Baolan
Li, Huawang
Wang, Yinxiao
LASER & OPTOELECTRONICS PROGRESS, 2025, 62 (04)
[46] Multi-scale counting and difference representation for texture classification
Dong, Yongsheng
Feng, Jinwang
Yang, Chunlei
Wang, Xiaohong
Zheng, Lintao
Pu, Jiexin
VISUAL COMPUTER, 2018, 34 (10): : 1315 - 1324
[47] Multi-scale counting and difference representation for texture classification
Yongsheng Dong
Jinwang Feng
Chunlei Yang
Xiaohong Wang
Lintao Zheng
Jiexin Pu
The Visual Computer, 2018, 34 : 1315 - 1324
[48] Multi-Scale Feature Based Medical Image Classification
Li, Bo
Li, Wei
Zhao, Dazhe
2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, : 1182 - 1186
[49] Feature reduction of multi-scale LBP for texture classification
Hu, Ran
Qi, Wenfa
Guo, Zongming
2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP), 2015, : 397 - 400
[50] Hyperspectral Image Classification with Multi-Scale Feature Extraction
Tu, Bing
Li, Nanying
Fang, Leyuan
He, Danbing
Ghamisi, Pedram
REMOTE SENSING, 2019, 11 (05)

← 1 2 3 4 5 →