Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection

被引：2

作者：

Gungor, Cagri ^{[1
]}

Kovashka, Adriana ^{[1
,2
]}

机构：

[1] Univ Pittsburgh, Intelligent Syst Program, Pittsburgh, PA 15260 USA

[2] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA USA

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/WACV56688.2023.00222

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We tackle the problem of learning object detectors in a noisy environment, which is one of the significant challenges for weakly-supervised learning. We use multimodal learning to help localize objects of interest, but unlike other methods, we treat audio as an auxiliary modality that assists to tackle noise in detection from visual regions. First, we use the audio-visual model to generate new "ground-truth" labels for the training set to remove noise between the visual features and noisy supervision. Second, we propose an "indirect path" between audio and class predictions, which combines the link between visual and audio regions, and the link between visual features and predictions. Third, we propose a sound-based "attention path" which uses the benefit of complementary audio cues to identify important visual regions. We use contrastive learning to perform region-based audio-visual instance discrimination, which serves as an intermediate task and benefits from the complementary cues from audio to boost object classification and detection performance. We show that our methods, which update noisy ground truth and provide indirect and attention paths, greatly boosting performance on the AudioSet and VGGSound datasets compared to single-modality predictions, even ones that use contrastive learning. Our method outperforms previous weakly-supervised detectors for the task of object detection by reaching the state-of-art on AudioSet, and our sound localization module performs better than several state-of-art methods on AudioSet and MUSIC.

引用

页码：2184 / 2193

页数：10

共 50 条

[1] Weakly-Supervised Action Detection Guided by Audio Narration
Ye, Keren
Kovashka, Adriana
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 1527 - 1537
[2] Efficient Weakly-Supervised Object Detection with Pseudo Annotations
Yuan, Qingsheng
Sun, Gang
Liang, Jianming
Leng, Biao
IEEE Access, 2021, 9 : 104356 - 104366
[3] ALWOD: Active Learning for Weakly-Supervised Object Detection
Wang, Yuting
Ilic, Velibor
Li, Jiatong
Kisacanin, Branislav
Pavlovic, Vladimir
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6436 - 6446
[4] Efficient Weakly-Supervised Object Detection With Pseudo Annotations
Yuan, Qingsheng
Sun, Gang
Liang, Jianming
Leng, Biao
IEEE ACCESS, 2021, 9 : 104356 - 104366
[5] Weakly-Supervised Salient Object Detection on Light Fields
Liang, Zijian
Wang, Pengjie
Xu, Ke
Zhang, Pingping
Lau, Rynson W. H.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6295 - 6305
[6] Active Learning Strategies for Weakly-Supervised Object Detection
Vo, Huy V.
Simeoni, Oriane
Gidaris, Spyros
Bursuc, Andrei
Perez, Patrick
Ponce, Jean
COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 211 - 230
[7] Weakly-supervised Human-object Interaction Detection
Sugimoto, Masaki
Furuta, Ryosuke
Taniguchi, Yukinobu
VISAPP: PROCEEDINGS OF THE 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL. 5: VISAPP, 2021, : 293 - 300
[8] Weakly-Supervised Camouflaged Object Detection with Scribble Annotations
He, Ruozhen
Dong, Qihua
Lin, Jiaying
Lau, Rynson W. H.
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 781 - 789
[9] Weakly-Supervised Learning With Complementary Heatmap for Retinal Disease Detection
Meng, Qier
Liao, Liang
Satoh, Shin'ichi
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (08) : 2067 - 2078
[10] Weakly-Supervised Saliency Detection via Salient Object Subitizing
Zheng, Xiaoyang
Tan, Xin
Zhou, Jie
Ma, Lizhuang
Lau, Rynson W. H.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (11) : 4370 - 4380

← 1 2 3 4 5 →