Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

被引:0
|
作者
Wu, Liwei [1 ]
Li, Shuqing [2 ]
Hsieh, Cho-Jui [3 ]
Sharpnack, James [1 ]
机构
[1] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
关键词
NEURAL-NETWORKS; REGRESSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastic shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.
引用
收藏
页数:11
相关论文
共 50 条
  • [11] State-based confidence bounds for data-driven stochastic reachability using Hilbert space embeddings
    Thorpe, Adam J.
    Ortiz, Kendric R.
    Oishi, Meeko M. K.
    AUTOMATICA, 2022, 138
  • [12] On the Role of Regularization in Direct Data-Driven LQR Control
    Dörfler, Florian
    Tesi, Pietro
    De Persis, Claudio
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 1091 - 1098
  • [13] Multichannel absorption compensation with a data-driven structural regularization
    Ma, Xiong
    Li, Guofa
    Li, Hao
    Yang, Wuyang
    GEOPHYSICS, 2020, 85 (01) : V71 - V80
  • [14] Improving the Robustness of Data-Driven Fuzzy Systems with Regularization
    Lughofer, Edwin
    Kindermann, Stefan
    2008 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2008, : 703 - +
  • [15] A data-driven regularization strategy for statistical CT reconstruction
    Clark, D. P.
    Badea, C. T.
    MEDICAL IMAGING 2017: PHYSICS OF MEDICAL IMAGING, 2017, 10132
  • [16] Data-Driven Regularization Parameter Selection in Dynamic MRI
    Hanhela, Matti
    Grohn, Olli
    Kettunen, Mikko
    Niinimaki, Kati
    Vauhkonen, Marko
    Kolehmainen, Ville
    JOURNAL OF IMAGING, 2021, 7 (02)
  • [17] Regularization of Sobolev Embedding Operators and Applications Part II: Data Driven Regularization and Applications
    Ronny Ramlau
    Gerd Teschke
    Sampling Theory in Signal and Image Processing, 2004, 3 (3): : 205 - 226
  • [18] Data-driven nonlinear and stochastic dynamics with control
    Xu, Yong
    Lenci, Stefano
    Li, Yongge
    Kurths, Juergen
    NONLINEAR DYNAMICS, 2025, 113 (05) : 3959 - 3964
  • [19] Data-driven chance constrained stochastic program
    Jiang, Ruiwei
    Guan, Yongpei
    MATHEMATICAL PROGRAMMING, 2016, 158 (1-2) : 291 - 327
  • [20] A DATA-DRIVEN APPROACH TO STOCHASTIC NETWORK OPTIMIZATION
    Chen, Tianyi
    Mokhtari, Aryan
    Wang, Xin
    Ribeiro, Alejandro
    Giannakis, Georgios B.
    2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 510 - 514