Scaling Backwards: Minimal Synthetic Pre-Training?

被引:0
|
作者
Nakamura, Ryo [1 ]
Tadokoro, Ryu [2 ]
Yamada, Ryosuke [1 ]
Asano, Yuki M. [3 ]
Laina, Iro [4 ]
Rupprecht, Christian [4 ]
Inoue, Nakamasa [5 ]
Yokota, Rio [5 ]
Kataoka, Hirokatsu [1 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Tokyo, Japan
[2] Tohoku Univ, Sendai, Miyagi, Japan
[3] Univ Amsterdam, Amsterdam, Netherlands
[4] Univ Oxford, Oxford, England
[5] Tokyo Inst Technol, Meguro, Japan
来源
关键词
Synthetic pre-training; Limited data; Vision transformers;
D O I
10.1007/978-3-031-72633-0_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training and transfer learning are an important building block of current computer vision systems. While pre-training is usually performed on large real-world image datasets, in this paper we ask whether this is truly necessary. To this end, we search for a minimal, purely synthetic pre-training dataset that allows us to achieve performance similar to the 1 million images of ImageNet-1k. We construct such a dataset from a single fractal with perturbations. With this, we contribute three main findings. (i) We show that pre-training is effective even with minimal synthetic images, with performance on par with large-scale pre-training datasets like ImageNet-1k for full fine-tuning. (ii) We investigate the single parameter with which we construct artificial categories for our dataset. We find that while the shape differences can be indistinguishable to humans, they are crucial for obtaining strong performances. (iii) Finally, we investigate the minimal requirements for successful pre-training. Surprisingly, we find that a substantial reduction of synthetic images from 1k to 1 can even lead to an increase in pre-training performance, a motivation to further investigate "scaling backwards". Finally, we extend our method from synthetic images to real images to see if a single real image can show similar pre-training effect through shape augmentation. We find that the use of grayscale images and affine transformations allows even real images to "scale backwards". The code is available at https://github.com/SUPERTADORY/1p- frac.
引用
收藏
页码:153 / 171
页数:19
相关论文
共 50 条
  • [1] Evaluating synthetic pre-Training for handwriting processing tasks
    Pippi, Vittorio
    Cascianelli, Silvia
    Baraldi, Lorenzo
    Cucchiara, Rita
    PATTERN RECOGNITION LETTERS, 2023, 172 : 44 - 50
  • [2] Insights into Pre-training via Simpler Synthetic Tasks
    Wu, Yuhuai
    Li, Felix
    Liang, Percy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Synthetic Pre-Training Tasks for Neural Machine Translation
    He, Zexue
    Blackwood, Graeme
    Panda, Rameswar
    McAuley, Julian
    Feris, Rogerio
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8080 - 8098
  • [4] Scaling Language-Image Pre-training via Masking
    Li, Yanghao
    Fan, Haoqi
    Hu, Ronghang
    Feichtenhofert, Christoph
    He, Kaiming
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23390 - 23400
  • [5] Synthetic pre-training for neural-network interatomic potentials
    Gardner, John L. A.
    Baker, Kathryn T.
    Deringer, Volker L.
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (01):
  • [6] Synthetic Augmentation with Large-Scale Unconditional Pre-training
    Ye, Jiarong
    Ni, Haomiao
    Jin, Peng
    Huang, Sharon X.
    Xue, Yuan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 754 - 764
  • [7] Scaling Up Vision-Language Pre-training for Image Captioning
    Hu, Xiaowei
    Gan, Zhe
    Wang, Jianfeng
    Yang, Zhengyuan
    Liu, Zicheng
    Lu, Yumao
    Wang, Lijuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17959 - 17968
  • [8] Bridging Synthetic and Real Worlds for Pre-Training Scene Text Detectors
    Guan, Tongkun
    Shen, Wei
    Yang, Xue
    Wang, Xuehui
    Yang, Xiaokang
    COMPUTER VISION-ECCV 2024, PT XLIV, 2025, 15102 : 428 - 446
  • [9] Evaluating the Use of Synthetic Queries for Pre-training a Semantic Query Tagger
    Bassani, Elias
    Pasi, Gabriella
    ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 39 - 46
  • [10] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
    Yang, Kaicheng
    Deng, Jiankang
    An, Xiang
    Li, Jiawei
    Feng, Ziyong
    Guo, Jia
    Yang, Jing
    Liu, Tongliang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2910 - 2919