R-Drop: Regularized Dropout for Neural Networks

被引：0

作者：

Liang, Xiaobo ^{[1
]}

Wu, Lijun ^{[2
]}

Li, Juntao ^{[1
]}

Wang, Yue ^{[1
]}

Meng, Qi ^{[2
]}

Qin, Tao ^{[2
]}

Chen, Wei ^{[2
]}

Zhang, Min ^{[1
]}

Liu, Tie-Yan ^{[2
]}

机构：

[1] Soochow Univ, Suzhou, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. Though effective and performing well, the randomness introduced by dropout causes unnegligible inconsistency between training and inference. In this paper, we introduce a simple consistency training strategy to regularize dropout, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the above inconsistency. Experiments on 5 widely used deep learning tasks (18 datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English -> German translation (30:91 BLEU) and WMT14 English -> French translation (43:95 BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub(2).

引用

页数：16

共 50 条

[41] LDMNet: Low Dimensional Manifold Regularized Neural Networks
Zhu, Wei
Qiu, Qiang
Huang, Jiaji
Calderbank, Robert
Sapiro, Guillermo
Daubechies, Ingrid
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2743 - 2751
[42] Periodic Signal Recovery with Regularized Sine Neural Networks
Robin, David A. R.
Scaman, Kevin
Lelarge, Marc
NEURIPS WORKSHOP ON SYMMETRY AND GEOMETRY IN NEURAL REPRESENTATIONS, VOL 197, 2022, 197 : 98 - 110
[43] REGULARIZED NEURAL NETWORKS - SOME CONVERGENCE RATE RESULTS
CORRADI, V
WHITE, H
NEURAL COMPUTATION, 1995, 7 (06) : 1225 - 1244
[44] Linear Regularized Compression of Deep Convolutional Neural Networks
Ceruti, Claudio
Campadelli, Paola
Casiraghi, Elena
IMAGE ANALYSIS AND PROCESSING,(ICIAP 2017), PT I, 2017, 10484 : 244 - 253
[45] Asymptotic Convergence Rate of Dropout on Shallow Linear Neural Networks
Senen-Cerda, Albert
Sanders, Jaron
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (02)
[46] Depth Dropout: Efficient Training of Residual Convolutional Neural Networks
Guo, Jian
Gould, Stephen
2016 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2016, : 343 - 349
[47] Contextual Soft Dropout Method in Training of Artificial Neural Networks
Tu Nga Ly
Kern, Rafal
Pathak, Khanindra
Wolk, Krzysztof
Burnell, Erik Dawid
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2021, 2021, 12672 : 692 - 703
[48] Dropout: A simple way to prevent neural networks from overfitting
1929, Microtome Publishing (15):
[49] Quality of randomness and node dropout regularization for fitting neural networks
Koivu, Aki
Kakko, Joona-Pekko
Maentyniemi, Santeri
Sairanen, Mikko
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
[50] Correlation-based structural dropout for convolutional neural networks
Zeng, Yuyuan
Dai, Tao
Chen, Bin
Xia, Shu-Tao
Lu, Jian
PATTERN RECOGNITION, 2021, 120

← 1 2 3 4 5 →