Scaling Backwards: Minimal Synthetic Pre-Training?

被引：0

作者：

Nakamura, Ryo ^{[1
]}

Tadokoro, Ryu ^{[2
]}

Yamada, Ryosuke ^{[1
]}

Asano, Yuki M. ^{[3
]}

Laina, Iro ^{[4
]}

Rupprecht, Christian ^{[4
]}

Inoue, Nakamasa ^{[5
]}

Yokota, Rio ^{[5
]}

Kataoka, Hirokatsu ^{[1
]}

机构：

[1] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan

[2] Tohoku Univ, Sendai, Miyagi, Japan

[3] Univ Amsterdam, Amsterdam, Netherlands

[4] Univ Oxford, Oxford, England

[5] Tokyo Inst Technol, Meguro, Japan

来源：

COMPUTER VISION - ECCV 2024, PT XV | 2025年 / 15073卷

关键词：

Synthetic pre-training; Limited data; Vision transformers;

D O I：

10.1007/978-3-031-72633-0_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-training and transfer learning are an important building block of current computer vision systems. While pre-training is usually performed on large real-world image datasets, in this paper we ask whether this is truly necessary. To this end, we search for a minimal, purely synthetic pre-training dataset that allows us to achieve performance similar to the 1 million images of ImageNet-1k. We construct such a dataset from a single fractal with perturbations. With this, we contribute three main findings. (i) We show that pre-training is effective even with minimal synthetic images, with performance on par with large-scale pre-training datasets like ImageNet-1k for full fine-tuning. (ii) We investigate the single parameter with which we construct artificial categories for our dataset. We find that while the shape differences can be indistinguishable to humans, they are crucial for obtaining strong performances. (iii) Finally, we investigate the minimal requirements for successful pre-training. Surprisingly, we find that a substantial reduction of synthetic images from 1k to 1 can even lead to an increase in pre-training performance, a motivation to further investigate "scaling backwards". Finally, we extend our method from synthetic images to real images to see if a single real image can show similar pre-training effect through shape augmentation. We find that the use of grayscale images and affine transformations allows even real images to "scale backwards". The code is available at https://github.com/SUPERTADORY/1p- frac.

引用

页码：153 / 171

页数：19

共 50 条

[1] Evaluating synthetic pre-Training for handwriting processing tasks
Pippi, Vittorio
Cascianelli, Silvia
Baraldi, Lorenzo
Cucchiara, Rita
PATTERN RECOGNITION LETTERS, 2023, 172 : 44 - 50
[2] Insights into Pre-training via Simpler Synthetic Tasks
Wu, Yuhuai
Li, Felix
Liang, Percy
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Synthetic Pre-Training Tasks for Neural Machine Translation
He, Zexue
Blackwood, Graeme
Panda, Rameswar
McAuley, Julian
Feris, Rogerio
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8080 - 8098
[4] Scaling Language-Image Pre-training via Masking
Li, Yanghao
Fan, Haoqi
Hu, Ronghang
Feichtenhofert, Christoph
He, Kaiming
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23390 - 23400
[5] Synthetic pre-training for neural-network interatomic potentials
Gardner, John L. A.
Baker, Kathryn T.
Deringer, Volker L.
MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (01):
[6] Synthetic Augmentation with Large-Scale Unconditional Pre-training
Ye, Jiarong
Ni, Haomiao
Jin, Peng
Huang, Sharon X.
Xue, Yuan
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 754 - 764
[7] Scaling Up Vision-Language Pre-training for Image Captioning
Hu, Xiaowei
Gan, Zhe
Wang, Jianfeng
Yang, Zhengyuan
Liu, Zicheng
Lu, Yumao
Wang, Lijuan
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17959 - 17968
[8] Bridging Synthetic and Real Worlds for Pre-Training Scene Text Detectors
Guan, Tongkun
Shen, Wei
Yang, Xue
Wang, Xuehui
Yang, Xiaokang
COMPUTER VISION-ECCV 2024, PT XLIV, 2025, 15102 : 428 - 446
[9] Evaluating the Use of Synthetic Queries for Pre-training a Semantic Query Tagger
Bassani, Elias
Pasi, Gabriella
ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 39 - 46
[10] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
Yang, Kaicheng
Deng, Jiankang
An, Xiang
Li, Jiawei
Feng, Ziyong
Guo, Jia
Yang, Jing
Liu, Tongliang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2910 - 2919

← 1 2 3 4 5 →