Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers

被引：0

作者：

Wu, Liwei ^{[1
]}

Li, Shuqing ^{[2
]}

Hsieh, Cho-Jui ^{[3
]}

Sharpnack, James ^{[1
]}

机构：

[1] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA

[2] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA

[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

NEURAL-NETWORKS; REGRESSION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters. Tikhonov regularization, graph-based regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastic shared embeddings (SSE), a data-driven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSE-Graph using knowledge graphs of embeddings; SSE-SE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widely-used regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.

引用

页数：11

共 50 条

[1] Stochastic Data-Driven Predictive Control: Regularization, Estimation, and Constraint Tightening
Yin, Mingzhou
Iannelli, Andrea
Smith, Roy S.
IFAC PAPERSONLINE, 2024, 58 (15): : 79 - 84
[2] Data-driven priors for hyperparameters in regularization
Keren, D
Werman, M
MAXIMUM ENTROPY AND BAYESIAN METHODS, 1996, 79 : 77 - 84
[3] On Regularization Schemes for Data-Driven Optimization
Ni, Wei
Jiang, Zhong-Ping
PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3016 - 3023
[4] Data-driven Privacy With Domain Regularization
Wang, Chong Xiao
Tay, Wee Peng
2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
[5] Manifold embedding data-driven mechanics
Bahmani, Bahador
Sun, WaiChing
Journal of the Mechanics and Physics of Solids, 2022, 166
[6] Manifold embedding data-driven mechanics
Bahmani, Bahador
Sun, WaiChing
JOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDS, 2022, 166
[7] Data-Driven Stochastic Averaging
Li, Junyin
Huang, Zhanchao
Wang, Yong
Huang, Zhilong
Zhu, Weiqiu
JOURNAL OF APPLIED MECHANICS-TRANSACTIONS OF THE ASME, 2024, 91 (01):
[8] SReachTools Kernel Module: Data-Driven Stochastic Reachability Using Hilbert Space Embeddings of Distributions
Thorpe, Adam J.
Ortiz, Kendric R.
Oishi, Meeko M. K.
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 5073 - 5079
[9] On the impact of regularization in data-driven predictive control
Breschi, Valentina
Chiuso, Alessandro
Fabris, Marco
Formentin, Simone
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 3061 - 3066
[10] Data-Driven Morozov Regularization of Inverse Problems
Haltmeier, Markus
Kowar, Richard
Tiefenthaler, Markus
NUMERICAL FUNCTIONAL ANALYSIS AND OPTIMIZATION, 2024, 45 (15) : 759 - 777

← 1 2 3 4 5 →