Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection

被引:2
|
作者
Gungor, Cagri [1 ]
Kovashka, Adriana [1 ,2 ]
机构
[1] Univ Pittsburgh, Intelligent Syst Program, Pittsburgh, PA 15260 USA
[2] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/WACV56688.2023.00222
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We tackle the problem of learning object detectors in a noisy environment, which is one of the significant challenges for weakly-supervised learning. We use multimodal learning to help localize objects of interest, but unlike other methods, we treat audio as an auxiliary modality that assists to tackle noise in detection from visual regions. First, we use the audio-visual model to generate new "ground-truth" labels for the training set to remove noise between the visual features and noisy supervision. Second, we propose an "indirect path" between audio and class predictions, which combines the link between visual and audio regions, and the link between visual features and predictions. Third, we propose a sound-based "attention path" which uses the benefit of complementary audio cues to identify important visual regions. We use contrastive learning to perform region-based audio-visual instance discrimination, which serves as an intermediate task and benefits from the complementary cues from audio to boost object classification and detection performance. We show that our methods, which update noisy ground truth and provide indirect and attention paths, greatly boosting performance on the AudioSet and VGGSound datasets compared to single-modality predictions, even ones that use contrastive learning. Our method outperforms previous weakly-supervised detectors for the task of object detection by reaching the state-of-art on AudioSet, and our sound localization module performs better than several state-of-art methods on AudioSet and MUSIC.
引用
收藏
页码:2184 / 2193
页数:10
相关论文
共 50 条
  • [1] Weakly-Supervised Action Detection Guided by Audio Narration
    Ye, Keren
    Kovashka, Adriana
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 1527 - 1537
  • [2] Efficient Weakly-Supervised Object Detection with Pseudo Annotations
    Yuan, Qingsheng
    Sun, Gang
    Liang, Jianming
    Leng, Biao
    IEEE Access, 2021, 9 : 104356 - 104366
  • [3] ALWOD: Active Learning for Weakly-Supervised Object Detection
    Wang, Yuting
    Ilic, Velibor
    Li, Jiatong
    Kisacanin, Branislav
    Pavlovic, Vladimir
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6436 - 6446
  • [4] Efficient Weakly-Supervised Object Detection With Pseudo Annotations
    Yuan, Qingsheng
    Sun, Gang
    Liang, Jianming
    Leng, Biao
    IEEE ACCESS, 2021, 9 : 104356 - 104366
  • [5] Weakly-Supervised Salient Object Detection on Light Fields
    Liang, Zijian
    Wang, Pengjie
    Xu, Ke
    Zhang, Pingping
    Lau, Rynson W. H.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6295 - 6305
  • [6] Active Learning Strategies for Weakly-Supervised Object Detection
    Vo, Huy V.
    Simeoni, Oriane
    Gidaris, Spyros
    Bursuc, Andrei
    Perez, Patrick
    Ponce, Jean
    COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 211 - 230
  • [7] Weakly-supervised Human-object Interaction Detection
    Sugimoto, Masaki
    Furuta, Ryosuke
    Taniguchi, Yukinobu
    VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 293 - 300
  • [8] Weakly-Supervised Camouflaged Object Detection with Scribble Annotations
    He, Ruozhen
    Dong, Qihua
    Lin, Jiaying
    Lau, Rynson W. H.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 781 - 789
  • [9] Weakly-Supervised Learning With Complementary Heatmap for Retinal Disease Detection
    Meng, Qier
    Liao, Liang
    Satoh, Shin'ichi
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (08) : 2067 - 2078
  • [10] Weakly-Supervised Saliency Detection via Salient Object Subitizing
    Zheng, Xiaoyang
    Tan, Xin
    Zhou, Jie
    Ma, Lizhuang
    Lau, Rynson W. H.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) : 4370 - 4380