TFECN: Time-Frequency Enhanced ConvNet for Audio Classification

被引：1

作者：

Wang, Mengwei ^{[1
,2
]}

Yang, Zhe ^{[1
,2
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China

[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou, Peoples R China

来源：

INTERSPEECH 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

audio classification; large kernel ConvNet; transfer learning;

D O I：

10.21437/Interspeech.2023-734

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Recently, transformer-based models have shown leading performance in audio classification, gradually replacing the dominant ConvNet in the past. However, some research has shown that certain characteristics and designs in transformers can be applied to other architectures and make them achieve similar performance as transformers. In this paper, we introduce TFECN, a pure ConvNet that combines the design in transformers and has time-frequency enhanced convolution with large kernels. It can provide a global receptive field on the frequency dimension as well as avoid the influence of the convolution's shift-equivariance on the recognition of not shiftinvariant patterns along the frequency axis. Furthermore, to use ImageNet-pretrained weights, we propose a method for transferring weights between kernels of different sizes. On the commonly used datasets AudioSet, FSD50K, and ESC50, our TFECN outperforms the models trained in the same

引用

页码：281 / 285

页数：5

共 50 条

[21] Audio watermarking using time-frequency characteristics
Esmaili, S
Krishnan, S
Raahemifar, K
CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2003, 28 (02): : 57 - 61
[22] Audio denoising by time-frequency block thresholding
Yu, Guoshen
Mallat, Stephane
Bacry, Emmanuel
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2008, 56 (05) : 1830 - 1839
[23] EVALUATION OF AUDIO COMPANDORS IN THE TIME-FREQUENCY DOMAIN
SKRITEK, P
HLAWATSCH, F
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1986, 34 (05): : 386 - 386
[24] Time-frequency algorithm of audio signal compression
Rabinovich, E. V.
Shekhirev, A. V.
APEIE-2006 8TH INTERNATIONAL CONFERENCE ON ACTUAL PROBLEMS OF ELECTRONIC INSTRUMENT ENGINEERING PROCEEDINGS, VOL 1, 2006, : 147 - +
[25] Persistent Time-Frequency Shrinkage for Audio Denoising
Siedenburg, Kai
Doerfler, Monika
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2013, 61 (1-2): : 29 - 38
[26] Time-frequency domain fast audio transcoding
Ju, Fu-Shing
Fang, Ce-Min
ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 750 - 753
[27] Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking
K Umapathy
B Ghoraani
S Krishnan
EURASIP Journal on Advances in Signal Processing, 2010
[28] Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking
Umapathy, K.
Ghoraani, B.
Krishnan, S.
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2010,
[29] Dual Stage Learning Based Dynamic Time-Frequency Mask Generation For Audio Event Classification
Kim, Donghyeon
Park, Jaihyun
Han, David K.
Ko, Hanseok
INTERSPEECH 2020, 2020, : 836 - 840
[30] Time-frequency filters for target classification
Chevret, P
Gache, N
Zimpfer, V
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (04): : 1829 - 1837

← 1 2 3 4 5 →