Hierarchical Temporal Attention and Competent Teacher Network for Sound Event Detection

被引:0
|
作者
Zhang, Yihang [1 ]
Liang, Yun [1 ,2 ]
Weng, Shitong [1 ,3 ]
Lin, Hai [1 ]
Chen, Liping [1 ]
Zheng, Shenlong [1 ,4 ]
机构
[1] South China Agr Univ, Guangzhou, Peoples R China
[2] Guangzhou Key Lab Intelligent Agr, Guangzhou, Peoples R China
[3] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[4] Jinan Univ, Guangzhou, Guangdong, Peoples R China
关键词
sound event detection; hierarchical temporal attention; competent-teacher framework;
D O I
10.1109/ICME57554.2024.10688275
中图分类号
学科分类号
摘要
Sound event detection identifies specific auditory signal occurrences to recognize the sound event class and its temporal localization. While the Convolutional Recurrent Neural Network with a mean-teacher framework shows impressive SED performance, its effectiveness is hindered by its small receptive field, leading to inadequate consideration of global temporal information and imprecise event boundary localization. Moreover, existing detectors overlook the intricate interplay between temporal and frequency information, compromising detection accuracy. Simultaneously, there is an oversight in interactions between student and teacher models, leading to the teacher conveying inaccurate knowledge to the student. To solve these challenges, this paper proposes a novel robust detector named HTA-CTD, incorporating the Hierarchical Temporal Attention (HTA) and Competent Teacher Network (CTN). HTA introduces an adaptive temporal-frequency feature extraction method, while CTN minimizes reliance on strong labels. Experiments on challenging benchmarks show that our HTA-CTD outperforms the state-of-the-art detector and achieves leading performance.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Hierarchical Attention Neural Network for Event Types to Improve Event Detection
    Jin, Yanliang
    Ye, Jinjin
    Shen, Liquan
    Xiong, Yong
    Fan, Lele
    Zang, Qingfu
    SENSORS, 2022, 22 (11)
  • [2] Abnormal event detection by a weakly supervised temporal attention network
    Zheng, Xiangtao
    Zhang, Yichao
    Zheng, Yunpeng
    Luo, Fulin
    Lu, Xiaoqiang
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2022, 7 (03) : 419 - 431
  • [3] Self-Consistency Training with Hierarchical Temporal Aggregation for Sound Event Detection
    Li, Yunlong
    Zhu, Xiujuan
    Wang, Mingyu
    Hu, Ying
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 27 - 32
  • [4] Polyphonic Sound Event Detection Using Temporal-Frequency Attention and Feature Space Attention
    Jin, Ye
    Wang, Mei
    Luo, Liyan
    Zhao, Dinghao
    Liu, Zhanqi
    SENSORS, 2022, 22 (18)
  • [5] Event Specific Attention for Polyphonic Sound Event Detection
    Sundar, Harshavardhan
    Sun, Ming
    Wang, Chao
    INTERSPEECH 2021, 2021, : 566 - 570
  • [6] CNN-TRANSFORMER WITH SELF-ATTENTION NETWORK FOR SOUND EVENT DETECTION
    Wakayama, Keigo
    Saito, Shoichiro
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 806 - 810
  • [7] A capsule network with pixel-based attention and BGRU for sound event detection
    Meng, Jiaxiang
    Wang, Xingmei
    Wang, Jinli
    Teng, Xuyang
    Xu, Yuezhu
    DIGITAL SIGNAL PROCESSING, 2022, 123
  • [8] Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection
    Shen, Yu-Han
    He, Ke-Xin
    Zhang, Wei-Qiang
    INTERSPEECH 2019, 2019, : 2563 - 2567
  • [9] Robust and Interpretable Temporal Convolution Network for Event Detection in Lung Sound Recordings
    Fernando, Tharindu
    Sridharan, Sridha
    Denman, Simon
    Ghaemmaghami, Houman
    Fookes, Clinton
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (07) : 2898 - 2908
  • [10] Attention mechanism combined with residual recurrent neural network for sound event detection and localization
    Lan, Chaofeng
    Zhang, Lei
    Zhang, Yuanyuan
    Fu, Lirong
    Sun, Chao
    Han, Yulan
    Zhang, Meng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)