Environmental Sound Classification via Time-Frequency Attention and Framewise Self-Attention-Based Deep Neural Networks

被引:19
|
作者
Wu, Bo [1 ]
Zhang, Xiao-Ping [1 ]
机构
[1] Ryerson Univ, Dept Elect Comp & Biomed Engn, Toronto, ON M5B 2K3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Time-frequency analysis; Spectrogram; Clutter; Internet of Things; Deep learning; Feature extraction; Acoustics; Deep neural networks (DNNs); discriminative feature fusion; environmental sound; framewise self-attention; time-frequency attention;
D O I
10.1109/JIOT.2021.3098464
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Environmental sound classification (ESC) is crucial to understanding the surroundings in Internet of Things (IoT) applications. The state-of-the-art deep learning approaches do not have good ESC performance when there exists various clutter interference, which is common in IoT scenarios. In this article, we present a novel deep neural network framework based on time-frequency attention and framewise self-attention (TFFS-DNN). It consists of two major novel architectures: 1) gradient and 2) latent feature-based DNN to generate our time-frequency attention, which can locate the relevant time-frequency (i.e., spectral) features accurately, and self-attention normalization DNN to generate our framewise self-attentions which properly indicate the relevance of frames. By conjoining these two sorts of distinct and complementary attentions with spectrograms, we are able to identify the importance or relevance in terms of time, frequency, and frame of the sounds using TFFS-DNN, which helps in distinguishing clutter such as background as well as model interpretation to some extent. Thus, the proposed TFFS-DNN can classify environmental sounds with clutter. The evaluation using four real-world environmental sound data sets demonstrates the superior performance of the proposed framework over several state-of-the-art schemes. Notably, we achieve 79.23% classification accuracy on the UrbanSound data set, a raw environmental sound data set that is full of clutter. The ablation study demonstrates a relative 3%-9% improvement of classification accuracy by the proposed framework over the baseline deep model.
引用
收藏
页码:3416 / 3428
页数:13
相关论文
共 50 条
  • [21] hERG-Att: Self-attention-based deep neural network for predicting hERG blockers
    Kim, Hyunho
    Nam, Hojung
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2020, 87
  • [22] Self-Attention-Based Deep Learning Network for Regional Influenza Forecasting
    Jung, Seungwon
    Moon, Jaeuk
    Park, Sungwoo
    Hwang, Eenjun
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (02) : 922 - 933
  • [23] A Self-attention-based Ensemble Convolution Neural Network Approach for Sleep Stage Classification with Merged Spectrogram
    Kuo, Chih-En
    Liao, Po-Yu
    Lin, Yu-Syuan
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1262 - 1268
  • [24] Self-Attention-Based Conditional Variational Auto-Encoder Generative Adversarial Networks for Hyperspectral Classification
    Chen, Zhitao
    Tong, Lei
    Qian, Bin
    Yu, Jing
    Xiao, Chuangbai
    REMOTE SENSING, 2021, 13 (16)
  • [25] Robust Sound Event Classification with Local Time-Frequency Information and Convolutional Neural Networks
    Yao, Yanli
    Yu, Qiang
    Wang, Longbiao
    Dang, Jianwu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: TEXT AND TIME SERIES, PT IV, 2019, 11730 : 351 - 361
  • [26] Text Simplification with Self-Attention-Based Pointer-Generator Networks
    Li, Tianyu
    Li, Yun
    Qiang, Jipeng
    Yuan, Yun-Hao
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 537 - 545
  • [27] Spatiotemporal Self-Attention-Based LSTNet for Multivariate Time Series Prediction
    Wang, Dezheng
    Chen, Congyan
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2023, 2023
  • [28] Prediction of Short-Term Photovoltaic Power Via Self-Attention-Based Deep Learning Approach
    Li, Jie
    Niu, Huimeng
    Meng, Fanxi
    Li, Runran
    JOURNAL OF ENERGY RESOURCES TECHNOLOGY-TRANSACTIONS OF THE ASME, 2022, 144 (10):
  • [29] A Self-Attention-Based Deep Reinforcement Learning Approach for AGV Dispatching Systems
    Wei, Qinglai
    Yan, Yutian
    Zhang, Jie
    Xiao, Jun
    Wang, Cong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 7911 - 7922
  • [30] Hierarchical multimodal self-attention-based graph neural network for DTI prediction
    Bian, Jilong
    Lu, Hao
    Dong, Guanghui
    Wang, Guohua
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)