Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection

被引：2

作者：

Gungor, Cagri ^{[1
]}

Kovashka, Adriana ^{[1
,2
]}

机构：

[1] Univ Pittsburgh, Intelligent Syst Program, Pittsburgh, PA 15260 USA

[2] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA USA

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/WACV56688.2023.00222

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We tackle the problem of learning object detectors in a noisy environment, which is one of the significant challenges for weakly-supervised learning. We use multimodal learning to help localize objects of interest, but unlike other methods, we treat audio as an auxiliary modality that assists to tackle noise in detection from visual regions. First, we use the audio-visual model to generate new "ground-truth" labels for the training set to remove noise between the visual features and noisy supervision. Second, we propose an "indirect path" between audio and class predictions, which combines the link between visual and audio regions, and the link between visual features and predictions. Third, we propose a sound-based "attention path" which uses the benefit of complementary audio cues to identify important visual regions. We use contrastive learning to perform region-based audio-visual instance discrimination, which serves as an intermediate task and benefits from the complementary cues from audio to boost object classification and detection performance. We show that our methods, which update noisy ground truth and provide indirect and attention paths, greatly boosting performance on the AudioSet and VGGSound datasets compared to single-modality predictions, even ones that use contrastive learning. Our method outperforms previous weakly-supervised detectors for the task of object detection by reaching the state-of-art on AudioSet, and our sound localization module performs better than several state-of-art methods on AudioSet and MUSIC.

引用

页码：2184 / 2193

页数：10

共 50 条

[31] Few-shot Weakly-Supervised Object Detection via Directional Statistics
Shaban, Amirreza
Rahimi, Amir
Ajanthan, Thalaiyasingam
Boots, Byron
Hartley, Richard
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1040 - 1049
[32] Weakly-Supervised Semantic Feature Refinement Network for MMW Concealed Object Detection
Gou, Shuiping
Wang, Xinlin
Mao, Shasha
Jiao, Licheng
Liu, Zhen
Zhao, Yinghai
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1363 - 1373
[33] Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training
Lin, Jian
Wang, Weiqiang
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 930 - 935
[34] Weakly-Supervised Contrastive Learning for Unsupervised Object Discovery
Lv, Yunqiu
Zhang, Jing
Barnes, Nick
Dai, Yuchao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2689 - 2702
[35] Discovering an inference recipe for weakly-supervised object localization
Lee, Sanghuk
Mun, Cheolhyun
Uh, Youngjung
Choe, Junsuk
Byun, Hyeran
PATTERN RECOGNITION, 2024, 156
[36] Weakly-supervised object localization in unlabeled image collection
Yanyun Qu
Han Liu
Xiaoqing Yang
Suwen Fang
Hanzi Wang
Multimedia Systems, 2013, 19 : 51 - 63
[37] Weakly-supervised object localization in unlabeled image collection
Qu, Yanyun
Liu, Han
Yang, Xiaoqing
Fang, Suwen
Wang, Hanzi
MULTIMEDIA SYSTEMS, 2013, 19 (01) : 51 - 63
[38] Representative Discovery of Structure Cues for Weakly-Supervised Image Segmentation
Zhang, Luming
Gao, Yue
Xia, Yingjie
Lu, Ke
Shen, Jialie
Ji, Rongrong
IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (02) : 470 - 479
[39] A WEAKLY-SUPERVISED DISCRIMINATIVE MODEL FOR AUDIO-TO-SCORE ALIGNMENT
Lajugie, Remi
Bojanowski, Piotr
Cuvillier, Philippe
Arlot, Sylvain
Bach, Francis
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2484 - 2488
[40] Complementary characteristics fusion network for weakly supervised salient object detection
Liu, Yan
Zhang, Yunzhou
Wang, Zhenyu
Yang, Fei
Qin, Cao
Qiu, Feng
Coleman, Sonya
Kerr, Dermot
IMAGE AND VISION COMPUTING, 2022, 126

← 1 2 3 4 5 →