Sound and Visual Representation Learning with Multiple Pretraining Tasks

被引：3

作者：

Vasudevan, Arun Balajee ^{[1
]}

Dai, Dengxin ^{[2
]}

Van Gool, Luc ^{[1
,3
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] MPI Informat, Saarbrucken, Germany

[3] Katholieke Univ Leuven, Leuven, Belgium

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.01421

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks. For this study, we investigate binaural sounds and image data. For binaural sounds, we propose three SSL tasks namely, spatial alignment, temporal synchronization of foreground objects and binaural sounds and temporal gap prediction. We investigate several approaches of Multi-SSL and give insights into the downstream task performance on video retrieval, spatial sound super resolution, and semantic prediction using OmniAudio dataset. Our experiments on binaural sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models and fully supervised models in the downstream task performance. As a check of applicability on other modalities, we also formulate our Multi-SSL models for image representation learning and we use the recently proposed SSL tasks, MoCov2 and DenseCL. Here, Multi-SSL surpasses recent methods such as MoCov2, DenseCL and DetCo by 2.06%, 3.27% and 1.19% on VOC07 classification and +2.83, +1.56 and +1.61 AP on COCO detection.

引用

页码：14596 / 14606

页数：11

共 50 条

[1] DISCRIMINATION PRETRAINING AND SOUND LEARNING
WINITZ, H
PREISLER, L
PERCEPTUAL AND MOTOR SKILLS, 1965, 20 (03) : 905 - 916
[2] Boost Supervised Pretraining for Visual Transfer Learning: Implications of Self-Supervised Contrastive Representation Learning
Sun, Jinghan
Wei, Dong
Ma, Kai
Wang, Liansheng
Zheng, Yefeng
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2307 - 2315
[3] EFFECTS OF PRETRAINING ON SOUND DISCRIMINATION-LEARNING
WINITZ, H
BELLEROSE, B
JOURNAL OF SPEECH AND HEARING RESEARCH, 1963, 6 (02): : 171 - 180
[4] Multimodal pretraining for unsupervised protein representation learning
Nguyen, Viet Thanh Duy
Hy, Truong Son
BIOLOGY METHODS & PROTOCOLS, 2024, 9 (01):
[5] Learning Multiple Visual Tasks while Discovering their Structure
Ciliberto, Carlo
Rosasco, Lorenzo
Villa, Silvia
2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 131 - 139
[6] How Useful is Self-Supervised Pretraining for Visual Tasks?
Newell, Alejandro
Deng, Jia
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 7343 - 7352
[7] Multistain Pretraining for Slide Representation Learning in Pathology
Jaume, Guillaume
Vaidya, Anurag
Zhang, Andrew
Song, Andrew H.
Chen, Richard J.
Sahai, Sharifa
Mo, Dandan
Madrigal, Emilio
Le, Long Phi
Mahmood, Faisal
COMPUTER VISION - ECCV 2024, PT XXXIII, 2025, 15091 : 19 - 37
[8] Multistructure Contrastive Learning for Pretraining Event Representation
Zheng, Jianming
Cai, Fei
Liu, Jun
Ling, Yanxiang
Chen, Honghui
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 842 - 854
[9] Pretraining Methods for Dialog Context Representation Learning
Mehri, Shikib
Razumovskaia, Evgeniia
Zhao, Tiancheng
Eskenazi, Maxine
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3836 - 3845
[10] Same Representation, Different Attentions: Shareable Sentence Representation Learning from Multiple Tasks
Zheng, Renjie
Chen, Junkun
Qiu, Xipeng
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4616 - 4622

← 1 2 3 4 5 →