Sequence Length Independent Norm-Based Generalization Bounds for Transformers

被引:0
|
作者
Trauger, Jacob [1 ]
Tewari, Ambuj [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper provides norm-based generalization bounds for the Transformer architecture that do not depend on the input sequence length. We employ a covering number based approach to prove our bounds. We use three novel covering number bounds for the function class of bounded linear mappings to upper bound the Rademacher complexity of the Transformer. Furthermore, we show this generalization bound applies to the common Transformer training technique of masking and then predicting the masked word. We also run a simulated study on a sparse majority data set that empirically validates our theoretical findings.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] On triangular norm-based propositional fuzzy logics
    1600, Elsevier Science B.V., Amsterdam, Netherlands (69):
  • [22] A Norm-based approach for profiling business knowledge
    Ali, Nazmona Mat
    Liu, Kecheng
    World Academy of Science, Engineering and Technology, 2010, 66 : 1060 - 1064
  • [23] Triangular norm-based mathematical fuzzy logics
    Gottwald, S
    Hájek, P
    LOGICAL, ALGEBRAIC, ANALYTIC, AND PROBABILISTIC ASPECTS OF TRIANGULAR NORMS, 2005, : 275 - 299
  • [24] A Norm-Based Approach for Personalising Smart Environments
    Ribino, Patrizia
    Lodato, Carmelo
    Cavaleri, Antonella
    Cossentino, Massimo
    INTELLIGENT INTERACTIVE MULTIMEDIA SYSTEMS AND SERVICES 2016, 2016, 55 : 659 - 670
  • [25] A Norm-Based Approach towards Requirements Patterns
    Ketabchi, Shokoofeh
    Sani, Navid Karimi
    Liu, Kecheng
    2011 35TH IEEE ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2011, : 590 - 595
  • [26] A norm-based approach to the quantification of model uncertainty
    Zio, E
    Apostolakis, GE
    HIGH LEVEL RADIOACTIVE WASTE MANAGEMENT, 1996 ., 1996, : 252 - 254
  • [27] Triangular norm-based iterative compensatory operators
    Kolesárová, A
    Komorníková, M
    FUZZY SETS AND SYSTEMS, 1999, 104 (01) : 109 - 120
  • [28] The triangular norm-based addition of fuzzy intervals
    Hwang, SY
    Hwang, JJ
    An, JH
    APPLIED MATHEMATICS LETTERS, 1998, 11 (04) : 9 - 13
  • [29] Sequential pattern sampling with norm-based utility
    Diop, Lamine
    Diop, Cheikh Talibouya
    Giacometti, Arnaud
    Li, Dominique
    Soulet, Arnaud
    KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (05) : 2029 - 2065
  • [30] The cordon sanitaire: a social norm-based model
    Axelsen, Jorgen Eikvar
    JOURNAL OF ELECTIONS PUBLIC OPINION AND PARTIES, 2024, 34 (02): : 277 - 297