Sequence Length Independent Norm-Based Generalization Bounds for Transformers

被引:0
|
作者
Trauger, Jacob [1 ]
Tewari, Ambuj [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper provides norm-based generalization bounds for the Transformer architecture that do not depend on the input sequence length. We employ a covering number based approach to prove our bounds. We use three novel covering number bounds for the function class of bounded linear mappings to upper bound the Rademacher complexity of the Transformer. Furthermore, we show this generalization bound applies to the common Transformer training technique of masking and then predicting the masked word. We also run a simulated study on a sparse majority data set that empirically validates our theoretical findings.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] A NORM-BASED REMEDIAL MODEL FOR UNDERINCLUSIVE STATUTES
    CAMINKER, EH
    YALE LAW JOURNAL, 1986, 95 (06): : 1185 - 1209
  • [32] Norm-based Enterprise Agent Intelligence Design
    Gao, Caihua
    Zhao, Jun
    ISBDAI '18: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON BIG DATA AND ARTIFICIAL INTELLIGENCE, 2018, : 1 - 4
  • [33] Adaptive norm-based coding of facial identity
    Rhodes, Gillian
    Jeffery, Linda
    VISION RESEARCH, 2006, 46 (18) : 2977 - 2987
  • [34] Length Generalization of Causal Transformers without Position Encoding
    Wang, Jie
    Ji, Tao
    Wu, Yuanbin
    Yan, Hang
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    Wang, Xiaoling
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14024 - 14040
  • [35] Randomized Positional Encodings Boost Length Generalization of Transformers
    Ruoss, Anian
    Deletang, Gregoire
    Genewein, Tim
    Grau-Moya, Jordi
    Csordas, Robert
    Bennani, Mehdi
    Legg, Shane
    Veness, Joel
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1889 - 1903
  • [36] The Role of Familiarity for Representations in Norm-Based Face Space
    Faerber, Stella J.
    Kaufmann, Jurgen M.
    Leder, Helmut
    Martin, Eva Maria
    Schweinberger, Stefan R.
    PLOS ONE, 2016, 11 (05):
  • [37] Firm Competition and Cooperation with Norm-Based Preferences for Sustainability
    Inderst, Roman
    Sartzetakis, Eftichios S.
    Xepapadeas, Anastasios
    JOURNAL OF INDUSTRIAL ECONOMICS, 2023, 71 (04): : 1038 - 1071
  • [38] Evidence for norm-based coding of human movement speed
    Mather, George
    Sharman, Rebecca
    Parsons, Todd
    PERCEPTION, 2016, 45 : 369 - 369
  • [39] Characterization of Norm-Based Robust Solutions in Vector Optimization
    Morteza Rahimi
    Majid Soleimani-damaneh
    Journal of Optimization Theory and Applications, 2020, 185 : 554 - 573
  • [40] Confounding of norm-based and adaptation effects in brain responses
    Kahn, David Alexander
    Aguirre, Geoffrey Karl
    NEUROIMAGE, 2012, 60 (04) : 2294 - 2299