Hierarchical Temporal Attention and Competent Teacher Network for Sound Event Detection

被引:0
|
作者
Zhang, Yihang [1 ]
Liang, Yun [1 ,2 ]
Weng, Shitong [1 ,3 ]
Lin, Hai [1 ]
Chen, Liping [1 ]
Zheng, Shenlong [1 ,4 ]
机构
[1] South China Agr Univ, Guangzhou, Peoples R China
[2] Guangzhou Key Lab Intelligent Agr, Guangzhou, Peoples R China
[3] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[4] Jinan Univ, Guangzhou, Guangdong, Peoples R China
关键词
sound event detection; hierarchical temporal attention; competent-teacher framework;
D O I
10.1109/ICME57554.2024.10688275
中图分类号
学科分类号
摘要
Sound event detection identifies specific auditory signal occurrences to recognize the sound event class and its temporal localization. While the Convolutional Recurrent Neural Network with a mean-teacher framework shows impressive SED performance, its effectiveness is hindered by its small receptive field, leading to inadequate consideration of global temporal information and imprecise event boundary localization. Moreover, existing detectors overlook the intricate interplay between temporal and frequency information, compromising detection accuracy. Simultaneously, there is an oversight in interactions between student and teacher models, leading to the teacher conveying inaccurate knowledge to the student. To solve these challenges, this paper proposes a novel robust detector named HTA-CTD, incorporating the Hierarchical Temporal Attention (HTA) and Competent Teacher Network (CTN). HTA introduces an adaptive temporal-frequency feature extraction method, while CTN minimizes reliance on strong labels. Experiments on challenging benchmarks show that our HTA-CTD outperforms the state-of-the-art detector and achieves leading performance.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Event Temporal Relation Extraction with Attention Mechanism and Graph Neural Network
    Xiaoliang Xu
    Tong Gao
    Yuxiang Wang
    Xinle Xuan
    Tsinghua Science and Technology, 2022, 27 (01) : 79 - 90
  • [42] Temporal Coding with Magnitude-Phase Regularization for Sound Event Detection
    Park, Sangwook
    Kothinti, Sandeep
    Elhilali, Mounya
    INTERSPEECH 2022, 2022, : 1536 - 1540
  • [43] Decoupling Temporal Convolutional Networks Model in Sound Event Detection and Localization
    Song, Shen
    Zhang, Cong
    You, Xinyuan
    JOURNAL OF INTERNET TECHNOLOGY, 2023, 24 (01): : 89 - 99
  • [44] Event Temporal Relation Extraction with Attention Mechanism and Graph Neural Network
    Xu, Xiaoliang
    Gao, Tong
    Wang, Yuxiang
    Xuan, Xinle
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (01) : 79 - 90
  • [45] On Local Temporal Embedding for Semi-Supervised Sound Event Detection
    Gao, Lijian
    Mao, Qirong
    Dong, Ming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1687 - 1698
  • [46] Neural Network Distillation on IoT Platforms for Sound Event Detection
    Cerutti, Gianmarco
    Prasad, Rahul
    Brutti, Alessio
    Farella, Elisabetta
    INTERSPEECH 2019, 2019, : 3609 - 3613
  • [47] Minimally Supervised Sound Event Detection Using a Neural Network
    Agarwal, Aditya
    Quadri, Syed Munawwar
    Murthy, Savitha
    Sitaram, Dinkar
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2495 - 2500
  • [48] Sound Event Detection with Perturbed Residual Recurrent Neural Network
    Yuan, Shuang
    Yang, Lidong
    Guo, Yong
    ELECTRONICS, 2023, 12 (18)
  • [49] A SEQUENCE MATCHING NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION
    Thi Ngoc Tho Nguyen
    Jones, Douglas L.
    Gan, Woon-Seng
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 71 - 75
  • [50] Anomalous Sound Detection Using Self-Supervised Classification Deep Hierarchical Reconstruction Network with Symmetric Fusion Attention
    Wang, Hui
    Shen, Kuan
    Wang, Fuquan
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2025,