On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition

被引:16
|
作者
Bai, Ching-Yuan [1 ]
Lin, Hsuan-Tien [1 ]
Raffel, Colin [2 ]
Kan, Wendy Chih-wen [2 ]
机构
[1] Natl Taiwan Univ, Comp Sci & Informat Engn, Taipei, Taiwan
[2] Google, Mountain View, CA 94043 USA
关键词
benchmark; competition; neural networks; generative models; memorization; datasets; computer vision;
D O I
10.1145/3447548.3467198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many recent developments on generative models for natural images have relied on heuristically-motivated metrics that can be easily gamed by memorizing a small sample from the true distribution or training a model directly to improve the metric. In this work, we critically evaluate the gameability of these metrics by designing and deploying a generative modeling competition. Our competition received over 11000 submitted models. The competitiveness between participants allowed us to investigate both intentional and unintentional memorization in generative modeling. To detect intentional memorization, we propose the "Memorization-Informed Frechet Inception Distance" (MiFID) as a new memorization-aware metric and design benchmark procedures to ensure that winning submissions made genuine improvements in perceptual quality. Furthermore, we manually inspect the code for the 1000 top-performing models to understand and label different forms of memorization. Our analysis reveals that unintentional memorization is a serious and common issue in popular generative models. The generated images and our memorization labels of those models as well as code to compute MiFID are released to facilitate future studies on benchmarking generative models.
引用
收藏
页码:2534 / 2542
页数:9
相关论文
共 50 条
  • [1] A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking
    Duan, Keyu
    Liu, Zirui
    Wang, Peihao
    Zheng, Wenqing
    Zhou, Kaixiong
    Chen, Tianlong
    Hu, Xia
    Wang, Zhangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Modeling cannabinoids from a large-scale sample of Cannabis sativa chemotypes
    Vergara, Daniela
    Gaudino, Reggie
    Blank, Thomas
    Keegan, Brian
    PLOS ONE, 2020, 15 (09):
  • [3] Lessons Learned from Large-Scale Refactoring
    Wright, Hyrum K.
    2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2019), 2019, : 366 - 366
  • [4] Benchmarking for, large-scale placement and beyond
    Adya, AN
    Yildiz, MC
    Markov, IL
    Villarrubia, PG
    Parakh, PN
    Madden, PH
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2004, 23 (04) : 472 - 487
  • [5] GANInSAR: Deep Generative Modeling for Large-Scale InSAR Signal Simulation
    Zhou, Zhongrun
    Sun, Xinyao
    Yang, Fei
    Wang, Zheng
    Goldsbury, Ryan
    Cheng, Irene
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 5303 - 5316
  • [6] May I Draw Your Attention? Initial Lessons From the Large-Scale Generative Mark Maker
    Phillips, Aidan
    Vinoo, Ashwin
    Fitter, Naomi T.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) : 691 - 698
  • [7] Tutorial: Large-scale biological sample storage - Lessons learned from automated compound management
    Wales, R
    Gedrych, M
    GENETIC ENGINEERING NEWS, 2002, 22 (04): : 34 - +
  • [8] DIAGRAMS AS LARGE-SCALE GENERATIVE SYSTEMS
    Paredes Maldonado, Miguel
    EGA-REVISTA DE EXPRESION GRAFICA ARQUITECTONICA, 2015, (25): : 168 - 179
  • [9] DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation
    Zhang, Yizhe
    Sun, Siqi
    Galley, Michel
    Chen, Yen-Chun
    Brockett, Chris
    Gao, Xiang
    Gao, Jianfeng
    Liu, Jingjing
    Dolan, Bill
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): SYSTEM DEMONSTRATIONS, 2020, : 270 - 278
  • [10] Large-Scale Distributed Training Applied to Generative Adversarial Networks for Calorimeter Simulation
    Vlimant, Jean-Roch
    Pantaleo, Felice
    Pierini, Maurizio
    Loncar, Vladimir
    Vallecorsa, Sofia
    Anderson, Dustin
    Thong Nguyen
    Zlokapa, Alexander
    23RD INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2018), 2019, 214