Hierarchical Temporal Attention and Competent Teacher Network for Sound Event Detection

被引:0
|
作者
Zhang, Yihang [1 ]
Liang, Yun [1 ,2 ]
Weng, Shitong [1 ,3 ]
Lin, Hai [1 ]
Chen, Liping [1 ]
Zheng, Shenlong [1 ,4 ]
机构
[1] South China Agr Univ, Guangzhou, Peoples R China
[2] Guangzhou Key Lab Intelligent Agr, Guangzhou, Peoples R China
[3] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[4] Jinan Univ, Guangzhou, Guangdong, Peoples R China
关键词
sound event detection; hierarchical temporal attention; competent-teacher framework;
D O I
10.1109/ICME57554.2024.10688275
中图分类号
学科分类号
摘要
Sound event detection identifies specific auditory signal occurrences to recognize the sound event class and its temporal localization. While the Convolutional Recurrent Neural Network with a mean-teacher framework shows impressive SED performance, its effectiveness is hindered by its small receptive field, leading to inadequate consideration of global temporal information and imprecise event boundary localization. Moreover, existing detectors overlook the intricate interplay between temporal and frequency information, compromising detection accuracy. Simultaneously, there is an oversight in interactions between student and teacher models, leading to the teacher conveying inaccurate knowledge to the student. To solve these challenges, this paper proposes a novel robust detector named HTA-CTD, incorporating the Hierarchical Temporal Attention (HTA) and Competent Teacher Network (CTN). HTA introduces an adaptive temporal-frequency feature extraction method, while CTN minimizes reliance on strong labels. Experiments on challenging benchmarks show that our HTA-CTD outperforms the state-of-the-art detector and achieves leading performance.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Document Embedding Enhanced Event Detection with Hierarchical and Supervised Attention
    Zhao, Yue
    Jin, Xiaolong
    Wang, Yuanzhuo
    Cheng, Xueqi
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 414 - 419
  • [22] Event Detection using Hierarchical Multi-Aspect Attention
    Mehta, Sneha
    Islam, Mohammad Raihanul
    Rangwala, Huzefa
    Ramakrishnan, Naren
    WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 3079 - 3085
  • [23] Document-Improved Hierarchical Modular Attention for Event Detection
    Ni, Yiwei
    Du, Qingfeng
    Xu, Jincheng
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2020), PT II, 2020, 12275 : 325 - 335
  • [24] FA3-Net: feature aggregation and augmentation with attention network for sound event localization and detection
    Wang, Chuan
    Huang, Qinghua
    APPLIED INTELLIGENCE, 2025, 55 (06)
  • [25] A hybrid attention hierarchical network-based extreme event detection method for structural health monitoring
    Pan, Qiuyue
    Bao, Yuequan
    STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2025,
  • [26] Hierarchical graph attention network for temporal knowledge graph reasoning
    Shao, Pengpeng
    He, Jiayi
    Li, Guanjun
    Zhang, Dawei
    Tao, Jianhua
    NEUROCOMPUTING, 2023, 550
  • [27] ConvTransformer Attention Network for temporal action detection
    Cui, Di
    Xin, Chang
    Wu, Lifang
    Wang, Xiangdong
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [28] CONNECTIONIST TEMPORAL LOCALIZATION FOR SOUND EVENT DETECTION WITH SEQUENTIAL LABELING
    Wang, Yun
    Metze, Florian
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 745 - 749
  • [29] Enhance Temporal Relations in Audio Captioning with Sound Event Detection
    Xie, Zeyu
    Xu, Xuenan
    Wu, Mengyue
    Yu, Kai
    INTERSPEECH 2023, 2023, : 4179 - 4183
  • [30] ABNORMAL SOUND EVENT DETECTION USING TEMPORAL TRAJECTORIES MIXTURES
    Chakrabarty, Debmalya
    Elhilali, Mounya
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 216 - 220