Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection

被引:2
|
作者
Gungor, Cagri [1 ]
Kovashka, Adriana [1 ,2 ]
机构
[1] Univ Pittsburgh, Intelligent Syst Program, Pittsburgh, PA 15260 USA
[2] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/WACV56688.2023.00222
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We tackle the problem of learning object detectors in a noisy environment, which is one of the significant challenges for weakly-supervised learning. We use multimodal learning to help localize objects of interest, but unlike other methods, we treat audio as an auxiliary modality that assists to tackle noise in detection from visual regions. First, we use the audio-visual model to generate new "ground-truth" labels for the training set to remove noise between the visual features and noisy supervision. Second, we propose an "indirect path" between audio and class predictions, which combines the link between visual and audio regions, and the link between visual features and predictions. Third, we propose a sound-based "attention path" which uses the benefit of complementary audio cues to identify important visual regions. We use contrastive learning to perform region-based audio-visual instance discrimination, which serves as an intermediate task and benefits from the complementary cues from audio to boost object classification and detection performance. We show that our methods, which update noisy ground truth and provide indirect and attention paths, greatly boosting performance on the AudioSet and VGGSound datasets compared to single-modality predictions, even ones that use contrastive learning. Our method outperforms previous weakly-supervised detectors for the task of object detection by reaching the state-of-art on AudioSet, and our sound localization module performs better than several state-of-art methods on AudioSet and MUSIC.
引用
收藏
页码:2184 / 2193
页数:10
相关论文
共 50 条
  • [31] Few-shot Weakly-Supervised Object Detection via Directional Statistics
    Shaban, Amirreza
    Rahimi, Amir
    Ajanthan, Thalaiyasingam
    Boots, Byron
    Hartley, Richard
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1040 - 1049
  • [32] Weakly-Supervised Semantic Feature Refinement Network for MMW Concealed Object Detection
    Gou, Shuiping
    Wang, Xinlin
    Mao, Shasha
    Jiao, Licheng
    Liu, Zhen
    Zhao, Yinghai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1363 - 1373
  • [33] Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training
    Lin, Jian
    Wang, Weiqiang
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 930 - 935
  • [34] Weakly-Supervised Contrastive Learning for Unsupervised Object Discovery
    Lv, Yunqiu
    Zhang, Jing
    Barnes, Nick
    Dai, Yuchao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2689 - 2702
  • [35] Discovering an inference recipe for weakly-supervised object localization
    Lee, Sanghuk
    Mun, Cheolhyun
    Uh, Youngjung
    Choe, Junsuk
    Byun, Hyeran
    PATTERN RECOGNITION, 2024, 156
  • [36] Weakly-supervised object localization in unlabeled image collection
    Yanyun Qu
    Han Liu
    Xiaoqing Yang
    Suwen Fang
    Hanzi Wang
    Multimedia Systems, 2013, 19 : 51 - 63
  • [37] Weakly-supervised object localization in unlabeled image collection
    Qu, Yanyun
    Liu, Han
    Yang, Xiaoqing
    Fang, Suwen
    Wang, Hanzi
    MULTIMEDIA SYSTEMS, 2013, 19 (01) : 51 - 63
  • [38] Representative Discovery of Structure Cues for Weakly-Supervised Image Segmentation
    Zhang, Luming
    Gao, Yue
    Xia, Yingjie
    Lu, Ke
    Shen, Jialie
    Ji, Rongrong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (02) : 470 - 479
  • [39] A WEAKLY-SUPERVISED DISCRIMINATIVE MODEL FOR AUDIO-TO-SCORE ALIGNMENT
    Lajugie, Remi
    Bojanowski, Piotr
    Cuvillier, Philippe
    Arlot, Sylvain
    Bach, Francis
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2484 - 2488
  • [40] Complementary characteristics fusion network for weakly supervised salient object detection
    Liu, Yan
    Zhang, Yunzhou
    Wang, Zhenyu
    Yang, Fei
    Qin, Cao
    Qiu, Feng
    Coleman, Sonya
    Kerr, Dermot
    IMAGE AND VISION COMPUTING, 2022, 126