Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection

被引:2
|
作者
Gungor, Cagri [1 ]
Kovashka, Adriana [1 ,2 ]
机构
[1] Univ Pittsburgh, Intelligent Syst Program, Pittsburgh, PA 15260 USA
[2] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/WACV56688.2023.00222
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We tackle the problem of learning object detectors in a noisy environment, which is one of the significant challenges for weakly-supervised learning. We use multimodal learning to help localize objects of interest, but unlike other methods, we treat audio as an auxiliary modality that assists to tackle noise in detection from visual regions. First, we use the audio-visual model to generate new "ground-truth" labels for the training set to remove noise between the visual features and noisy supervision. Second, we propose an "indirect path" between audio and class predictions, which combines the link between visual and audio regions, and the link between visual features and predictions. Third, we propose a sound-based "attention path" which uses the benefit of complementary audio cues to identify important visual regions. We use contrastive learning to perform region-based audio-visual instance discrimination, which serves as an intermediate task and benefits from the complementary cues from audio to boost object classification and detection performance. We show that our methods, which update noisy ground truth and provide indirect and attention paths, greatly boosting performance on the AudioSet and VGGSound datasets compared to single-modality predictions, even ones that use contrastive learning. Our method outperforms previous weakly-supervised detectors for the task of object detection by reaching the state-of-art on AudioSet, and our sound localization module performs better than several state-of-art methods on AudioSet and MUSIC.
引用
收藏
页码:2184 / 2193
页数:10
相关论文
共 50 条
  • [41] Weakly Supervised Object Detection Using Complementary Learning and Instance Clustering
    Awan, Mehwish
    Shin, Jitae
    IEEE ACCESS, 2020, 8 : 103419 - 103432
  • [42] Weakly-Supervised Temporal Action Localization by Progressive Complementary Learning
    Du, Jia-Run
    Feng, Jia-Chang
    Lin, Kun-Yu
    Hong, Fa-Ting
    Qi, Zhongang
    Shan, Ying
    Hu, Jian-Fang
    Zheng, Wei-Shi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 938 - 952
  • [43] Complementary adversarial mechanisms for weakly-supervised temporal action localization
    Wang, Chuanxu
    Wang, Jing
    Liu, Peng
    PATTERN RECOGNITION, 2023, 139
  • [44] Weakly-supervised Joint Anomaly Detection and Classification
    Majhi, Snehashis
    Das, Srijan
    Bremond, Francois
    Dash, Ratnakar
    Sa, Pankaj Kumar
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [45] Weakly-Supervised Detection of Bone Lesions in CT
    Sheng, Tao
    Mathai, Tejas Sudharshan
    Shieh, Alexander
    Summers, Ronald M.
    COMPUTER-AIDED DIAGNOSIS, MEDICAL IMAGING 2024, 2024, 12927
  • [46] Mutual Information Regularization for Weakly-Supervised RGB-D Salient Object Detection
    Li, Aixuan
    Mao, Yuxin
    Zhang, Jing
    Dai, Yuchao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 397 - 410
  • [47] Single-shot weakly-supervised object detection guided by empirical saliency model
    Zhao, Danpei
    Yuan, Zhichao
    Shi, Zhenwei
    Xie, Fengying
    NEUROCOMPUTING, 2021, 455 : 431 - 440
  • [48] Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation
    Inoue, Naoto
    Furuta, Ryosuke
    Yamasaki, Toshihiko
    Aizawa, Kiyoharu
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5001 - 5009
  • [49] A Weakly-Supervised Cross-Domain Query Framework for Video Camouflage Object Detection
    Lu, Zelin
    Xie, Liang
    Zhao, Xing
    Xu, Binwei
    Liang, Haoran
    Liang, Ronghua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1506 - 1518
  • [50] TCC-Det: Temporarily Consistent Cues for Weakly-Supervised 3D Detection
    Skvrna, Jan
    Neumann, Lukas
    COMPUTER VISION - ECCV 2024, PT XXVI, 2025, 15084 : 129 - 145