R-Drop: Regularized Dropout for Neural Networks

被引：0

作者：

Liang, Xiaobo ^{[1
]}

Wu, Lijun ^{[2
]}

Li, Juntao ^{[1
]}

Wang, Yue ^{[1
]}

Meng, Qi ^{[2
]}

Qin, Tao ^{[2
]}

Chen, Wei ^{[2
]}

Zhang, Min ^{[1
]}

Liu, Tie-Yan ^{[2
]}

机构：

[1] Soochow Univ, Suzhou, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. Though effective and performing well, the randomness introduced by dropout causes unnegligible inconsistency between training and inference. In this paper, we introduce a simple consistency training strategy to regularize dropout, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the above inconsistency. Experiments on 5 widely used deep learning tasks (18 datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English -> German translation (30:91 BLEU) and WMT14 English -> French translation (43:95 BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub(2).

引用

页数：16

共 50 条

[1] 引入R-Drop的恶意软件检测模型
林茂源
微型电脑应用, 2025, 41 (01) : 74 - 77
[2] 基于深度学习和R-Drop正则的入侵检测模型
李为
程相鑫
计算机与数字工程, 2024, 52 (04) : 1142 - 1148
[3] A Chinese-Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization
Liu, Canglan
Silamu, Wushouer
Li, Yanbing
APPLIED SCIENCES-BASEL, 2023, 13 (19):
[4] 基于MacBERT和R-Drop的地质命名实体识别
刘昕
徐洪珍
刘爱华
邓德军
郑州大学学报(工学版), 2024, 45 (03) : 89 - 95
[5] Universal Approximation in Dropout Neural Networks
Manita, Oxana A.
Peletier, Mark A.
Portegies, Jacobus W.
Sanders, Jaron
Senen-Cerda, Albert
JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
[6] Selective Dropout for Deep Neural Networks
Barrow, Erik
Eastwood, Mark
Jayne, Chrisina
NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 519 - 528
[7] Adversarial Dropout for Recurrent Neural Networks
Park, Sungrae
Song, Kyungwoo
Ji, Mingi
Lee, Wonsung
Moon, Il-Chul
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4699 - 4706
[8] Understanding Dropout for Graph Neural Networks
Shu, Juan
Xi, Bowei
Li, Yu
Wu, Fan
Kamhoua, Charles
Ma, Jianzhu
COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 1128 - 1138
[9] Dropout Algorithms for Recurrent Neural Networks
Watt, Nathan
du Plessis, Mathys C.
PROCEEDINGS OF THE ANNUAL CONFERENCE OF THE SOUTH AFRICAN INSTITUTE OF COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS (SAICSIT 2018), 2018, : 72 - 78
[10] Statistical guarantees for regularized neural networks
Taheri, Mahsa
Xie, Fang
Lederer, Johannes
NEURAL NETWORKS, 2021, 142 : 148 - 161

← 1 2 3 4 5 →