Token Boosting for Robust Self-Supervised Visual Transformer Pre-training

被引：2

作者：

Li, Tianjiao ^{[1
]}

Foo, Lin Geng ^{[1
]}

Hu, Ping ^{[2
]}

Shang, Xindi ^{[3
]}

Rahmani, Hossein ^{[4
]}

Yuan, Zehuan ^{[3
]}

Liu, Jun ^{[1
]}

机构：

[1] Singapore Univ Technol & Design, Singapore, Singapore

[2] Boston Univ, Boston, MA 02215 USA

[3] ByteDance, Beijing, Peoples R China

[4] Univ Lancaster, Lancaster, England

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

新加坡国家研究基金会; 欧盟地平线“2020”;

关键词：

OBJECT;

D O I：

10.1109/CVPR52729.2023.02301

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning with large-scale unlabeled data has become a powerful tool for pre-training Visual Transformers (VTs). However, prior works tend to overlook that, in real-world scenarios, the input data may be corrupted and unreliable. Pre-training VTs on such corrupted data can be challenging, especially when we pre-train via the masked autoencoding approach, where both the inputs and masked "ground truth" targets can potentially be unreliable in this case. To address this limitation, we introduce the Token Boosting Module (TBM) as a plug-and-play component for VTs that effectively allows the VT to learn to extract clean and robust features during masked autoencoding pre-training. We provide theoretical analysis to show how TBM improves model pre-training with more robust and generalizable representations, thus benefiting downstream tasks. We conduct extensive experiments to analyze TBM's effectiveness, and results on four corrupted datasets demonstrate that TBM consistently improves performance on downstream tasks.

引用

页码：24027 / 24038

页数：12

共 50 条

[41] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
Khare, Aparna
Wu, Minhua
Bhati, Saurabhchand
Droppo, Jasha
Maas, Roland
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181
[42] Class incremental learning with self-supervised pre-training and prototype learning
Liu, Wenzhuo
Wu, Xin-Jian
Zhu, Fei
Yu, Ming-Ming
Wang, Chuang
Liu, Cheng-Lin
PATTERN RECOGNITION, 2025, 157
[43] Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds
Hess, Georg
Jaxing, Johan
Svensson, Elias
Hagerman, David
Petersson, Christoffer
Svensson, Lennart
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 350 - 359
[44] Feature-Suppressed Contrast for Self-Supervised Food Pre-training
Liu, Xinda
Zhu, Yaohui
Liu, Linhu
Tian, Jiang
Wang, Lili
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4359 - 4367
[45] MULTI-TASK SELF-SUPERVISED PRE-TRAINING FOR MUSIC CLASSIFICATION
Wu, Ho-Hsiang
Kao, Chieh-Chi
Tang, Qingming
Sun, Ming
McFee, Brian
Bello, Juan Pablo
Wang, Chao
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 556 - 560
[46] PreTraM: Self-supervised Pre-training via Connecting Trajectory and Map
Xu, Chenfeng
Li, Tian
Tang, Chen
Sun, Lingfeng
Keutzer, Kurt
Tomizuka, Masayoshi
Fathi, Alireza
Zhan, Wei
COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 34 - 50
[47] Self-supervised Pre-training with Acoustic Configurations for Replay Spoofing Detection
Shim, Hye-jin
Heo, Hee-Soo
Jung, Jee-weon
Yu, Ha-Jin
INTERSPEECH 2020, 2020, : 1091 - 1095
[48] Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization
Zhuang, Yingying
Song, Jiecheng
Sadagopan, Narayanan
Beniwal, Anurag
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1069 - 1076
[49] Self-Supervised Underwater Image Generation for Underwater Domain Pre-Training
Wu, Zhiheng
Wu, Zhengxing
Chen, Xingyu
Lu, Yue
Yu, Junzhi
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 14
[50] COMPARISON OF SELF-SUPERVISED SPEECH PRE-TRAINING METHODS ON FLEMISH DUTCH
Poncelet, Jakob
Hamme, Hugo Van
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 169 - 176

← 1 2 3 4 5 →