Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

被引:0
|
作者
Wu, Liwei [1 ]
Li, Shuqing [2 ]
Hsieh, Cho-Jui [3 ]
Sharpnack, James [1 ]
机构
[1] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
关键词
NEURAL-NETWORKS; REGRESSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastic shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Stochastic Data-Driven Predictive Control: Regularization, Estimation, and Constraint Tightening
    Yin, Mingzhou
    Iannelli, Andrea
    Smith, Roy S.
    IFAC PAPERSONLINE, 2024, 58 (15): : 79 - 84
  • [2] Data-driven priors for hyperparameters in regularization
    Keren, D
    Werman, M
    MAXIMUM ENTROPY AND BAYESIAN METHODS, 1996, 79 : 77 - 84
  • [3] On Regularization Schemes for Data-Driven Optimization
    Ni, Wei
    Jiang, Zhong-Ping
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3016 - 3023
  • [4] Data-driven Privacy With Domain Regularization
    Wang, Chong Xiao
    Tay, Wee Peng
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [5] Manifold embedding data-driven mechanics
    Bahmani, Bahador
    Sun, WaiChing
    Journal of the Mechanics and Physics of Solids, 2022, 166
  • [6] Manifold embedding data-driven mechanics
    Bahmani, Bahador
    Sun, WaiChing
    JOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDS, 2022, 166
  • [7] Data-Driven Stochastic Averaging
    Li, Junyin
    Huang, Zhanchao
    Wang, Yong
    Huang, Zhilong
    Zhu, Weiqiu
    JOURNAL OF APPLIED MECHANICS-TRANSACTIONS OF THE ASME, 2024, 91 (01):
  • [8] SReachTools Kernel Module: Data-Driven Stochastic Reachability Using Hilbert Space Embeddings of Distributions
    Thorpe, Adam J.
    Ortiz, Kendric R.
    Oishi, Meeko M. K.
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 5073 - 5079
  • [9] On the impact of regularization in data-driven predictive control
    Breschi, Valentina
    Chiuso, Alessandro
    Fabris, Marco
    Formentin, Simone
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3061 - 3066
  • [10] Data-Driven Morozov Regularization of Inverse Problems
    Haltmeier, Markus
    Kowar, Richard
    Tiefenthaler, Markus
    NUMERICAL FUNCTIONAL ANALYSIS AND OPTIMIZATION, 2024, 45 (15) : 759 - 777