TFECN: Time-Frequency Enhanced ConvNet for Audio Classification

被引:1
|
作者
Wang, Mengwei [1 ,2 ]
Yang, Zhe [1 ,2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
audio classification; large kernel ConvNet; transfer learning;
D O I
10.21437/Interspeech.2023-734
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, transformer-based models have shown leading performance in audio classification, gradually replacing the dominant ConvNet in the past. However, some research has shown that certain characteristics and designs in transformers can be applied to other architectures and make them achieve similar performance as transformers. In this paper, we introduce TFECN, a pure ConvNet that combines the design in transformers and has time-frequency enhanced convolution with large kernels. It can provide a global receptive field on the frequency dimension as well as avoid the influence of the convolution's shift-equivariance on the recognition of not shiftinvariant patterns along the frequency axis. Furthermore, to use ImageNet-pretrained weights, we propose a method for transferring weights between kernels of different sizes. On the commonly used datasets AudioSet, FSD50K, and ESC50, our TFECN outperforms the models trained in the same
引用
收藏
页码:281 / 285
页数:5
相关论文
共 50 条
  • [21] Audio watermarking using time-frequency characteristics
    Esmaili, S
    Krishnan, S
    Raahemifar, K
    CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2003, 28 (02): : 57 - 61
  • [22] Audio denoising by time-frequency block thresholding
    Yu, Guoshen
    Mallat, Stephane
    Bacry, Emmanuel
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2008, 56 (05) : 1830 - 1839
  • [23] EVALUATION OF AUDIO COMPANDORS IN THE TIME-FREQUENCY DOMAIN
    SKRITEK, P
    HLAWATSCH, F
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1986, 34 (05): : 386 - 386
  • [24] Time-frequency algorithm of audio signal compression
    Rabinovich, E. V.
    Shekhirev, A. V.
    APEIE-2006 8TH INTERNATIONAL CONFERENCE ON ACTUAL PROBLEMS OF ELECTRONIC INSTRUMENT ENGINEERING PROCEEDINGS, VOL 1, 2006, : 147 - +
  • [25] Persistent Time-Frequency Shrinkage for Audio Denoising
    Siedenburg, Kai
    Doerfler, Monika
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2013, 61 (1-2): : 29 - 38
  • [26] Time-frequency domain fast audio transcoding
    Ju, Fu-Shing
    Fang, Ce-Min
    ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 750 - 753
  • [27] Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking
    K Umapathy
    B Ghoraani
    S Krishnan
    EURASIP Journal on Advances in Signal Processing, 2010
  • [28] Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking
    Umapathy, K.
    Ghoraani, B.
    Krishnan, S.
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2010,
  • [29] Dual Stage Learning Based Dynamic Time-Frequency Mask Generation For Audio Event Classification
    Kim, Donghyeon
    Park, Jaihyun
    Han, David K.
    Ko, Hanseok
    INTERSPEECH 2020, 2020, : 836 - 840
  • [30] Time-frequency filters for target classification
    Chevret, P
    Gache, N
    Zimpfer, V
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (04): : 1829 - 1837