TFECN: Time-Frequency Enhanced ConvNet for Audio Classification

被引:1
|
作者
Wang, Mengwei [1 ,2 ]
Yang, Zhe [1 ,2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
audio classification; large kernel ConvNet; transfer learning;
D O I
10.21437/Interspeech.2023-734
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, transformer-based models have shown leading performance in audio classification, gradually replacing the dominant ConvNet in the past. However, some research has shown that certain characteristics and designs in transformers can be applied to other architectures and make them achieve similar performance as transformers. In this paper, we introduce TFECN, a pure ConvNet that combines the design in transformers and has time-frequency enhanced convolution with large kernels. It can provide a global receptive field on the frequency dimension as well as avoid the influence of the convolution's shift-equivariance on the recognition of not shiftinvariant patterns along the frequency axis. Furthermore, to use ImageNet-pretrained weights, we propose a method for transferring weights between kernels of different sizes. On the commonly used datasets AudioSet, FSD50K, and ESC50, our TFECN outperforms the models trained in the same
引用
收藏
页码:281 / 285
页数:5
相关论文
共 50 条
  • [41] Audio coding using dynamic time-frequency decompositions
    Purat, M
    FREQUENZ, 1996, 50 (9-10) : 205 - 210
  • [42] Audio watermarking using time-frequency compression expansion
    Wei, FS
    Mun, HS
    Mei, NL
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3, PROCEEDINGS, 2004, : 201 - 204
  • [43] SPARSE DENOISING OF AUDIO BY GREEDY TIME-FREQUENCY SHRINKAGE
    Bhattacharya, Gautam
    Depalle, Philippe
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [44] Environmental Sound Recognition With Time-Frequency Audio Features
    Chu, Selina
    Narayanan, Shrikanth
    Kuo, C. -C. Jay
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1142 - 1158
  • [45] TIME-FREQUENCY NETWORKS FOR AUDIO SUPER-RESOLUTION
    Lim, Teck Yian
    Yeh, Raymond A.
    Xu, Yijia
    Do, Minh N.
    Hasegawa-Johnson, Mark
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 646 - 650
  • [46] Multi-Gabor dictionaries for audio time-frequency analysis
    Wolfe, PJ
    Godsill, SJ
    Dörfler, M
    PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, : 43 - 46
  • [47] Missing Data Imputation for Time-Frequency Representations of Audio Signals
    Smaragdis, Paris
    Raj, Bhiksha
    Shashanka, Madhusudana
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2011, 65 (03): : 361 - 370
  • [48] Time-frequency analysis for audio event detection in real scenarios
    Saggese, Alessia
    Strisciuglio, Nicola
    Vento, Mario
    Petkov, Nicolai
    2016 13TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2016, : 438 - 443
  • [49] NMF With Time-Frequency Activations to Model Nonstationary Audio Events
    Hennequin, Romain
    Badeau, Roland
    David, Bertrand
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 744 - 753
  • [50] Audio fingerprinting based on analyzing time-frequency localization of signals
    Lu, CS
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2002, : 174 - 177