Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

被引:0
|
作者
Haghifam, Mahdi [1 ,2 ]
Rodriguez-Galvez, Borja [3 ]
Thobaben, Ragnar [3 ]
Skoglund, Mikael [3 ]
Roy, Daniel M. [1 ,2 ]
Dziugaite, Gintare Karolina [4 ,5 ,6 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] KTH Royal Inst Technol, Stockholm, Sweden
[4] Google Res, Toronto, ON, Canada
[5] Mila, Montreal, PQ, Canada
[6] McGill, Montreal, PQ, Canada
基金
瑞典研究理事会; 加拿大自然科学与工程研究理事会;
关键词
STABILITY;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.
引用
收藏
页码:663 / 706
页数:44
相关论文
共 50 条
  • [21] Statistical and Information-Theoretic Optimization and Performance Bounds of Video Steganography
    Sharifzadeh, Mehdi
    Schonfeld, Dan
    2015 53RD ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2015, : 1454 - 1457
  • [22] Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates
    Negrea, Jeffrey
    Haghifam, Mahdi
    Dziugaite, Gintare Karolina
    Khisti, Ashish
    Roy, Daniel M.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [23] Sample-Conditioned Hypothesis Stability Sharpens Information-Theoretic Generalization Bounds
    Wang, Ziqiao
    Mao, Yongyi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [24] Information-Theoretic Bounds for Integral Estimation
    Adams, Donald Q.
    Batik, Adarsh
    Honorio, Jean
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 742 - 747
  • [25] Gradient based nonuniform subsampling for information-theoretic alignment methods
    Sabuncu, MR
    Ramadge, PJ
    PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 1683 - 1686
  • [26] INFORMATION-THEORETIC LIMITATIONS OF FORMAL SYSTEMS
    CHAITIN, GJ
    NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY, 1972, 19 (01): : A19 - &
  • [27] Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
    Chen, Ruijuan
    Tang, Xiaoquan
    Li, Xiuting
    FRACTAL AND FRACTIONAL, 2022, 6 (12)
  • [28] INFORMATION-THEORETIC LIMITATIONS OF FORMAL SYSTEMS
    CHAITIN, GJ
    JOURNAL OF THE ACM, 1974, 21 (03) : 403 - 424
  • [29] Stochastic Gradient Descent on a Tree: an Adaptive and Robust Approach to Stochastic Convex Optimization
    Vakili, Sattar
    Salgia, Sudeep
    Zhao, Qing
    2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 432 - 438
  • [30] On the Generalization of Stochastic Gradient Descent with Momentum
    Ramezani-Kebrya, Ali
    Antonakopoulos, Kimon
    Cevher, Volkan
    Khisti, Ashish
    Liang, Ben
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56