R-Drop: Regularized Dropout for Neural Networks

被引:0
|
作者
Liang, Xiaobo [1 ]
Wu, Lijun [2 ]
Li, Juntao [1 ]
Wang, Yue [1 ]
Meng, Qi [2 ]
Qin, Tao [2 ]
Chen, Wei [2 ]
Zhang, Min [1 ]
Liu, Tie-Yan [2 ]
机构
[1] Soochow Univ, Suzhou, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dropout is a powerful and widely used technique to regularize the training of deep neural networks. Though effective and performing well, the randomness introduced by dropout causes unnegligible inconsistency between training and inference. In this paper, we introduce a simple consistency training strategy to regularize dropout, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the above inconsistency. Experiments on 5 widely used deep learning tasks (18 datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English -> German translation (30:91 BLEU) and WMT14 English -> French translation (43:95 BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub(2).
引用
收藏
页数:16
相关论文
共 50 条
  • [1] 引入R-Drop的恶意软件检测模型
    林茂源
    微型电脑应用, 2025, 41 (01) : 74 - 77
  • [2] 基于深度学习和R-Drop正则的入侵检测模型
    李为
    程相鑫
    计算机与数字工程, 2024, 52 (04) : 1142 - 1148
  • [3] A Chinese-Kazakh Translation Method That Combines Data Augmentation and R-Drop Regularization
    Liu, Canglan
    Silamu, Wushouer
    Li, Yanbing
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [4] 基于MacBERT和R-Drop的地质命名实体识别
    刘昕
    徐洪珍
    刘爱华
    邓德军
    郑州大学学报(工学版), 2024, 45 (03) : 89 - 95
  • [5] Universal Approximation in Dropout Neural Networks
    Manita, Oxana A.
    Peletier, Mark A.
    Portegies, Jacobus W.
    Sanders, Jaron
    Senen-Cerda, Albert
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [6] Selective Dropout for Deep Neural Networks
    Barrow, Erik
    Eastwood, Mark
    Jayne, Chrisina
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 519 - 528
  • [7] Adversarial Dropout for Recurrent Neural Networks
    Park, Sungrae
    Song, Kyungwoo
    Ji, Mingi
    Lee, Wonsung
    Moon, Il-Chul
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4699 - 4706
  • [8] Understanding Dropout for Graph Neural Networks
    Shu, Juan
    Xi, Bowei
    Li, Yu
    Wu, Fan
    Kamhoua, Charles
    Ma, Jianzhu
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 1128 - 1138
  • [9] Dropout Algorithms for Recurrent Neural Networks
    Watt, Nathan
    du Plessis, Mathys C.
    PROCEEDINGS OF THE ANNUAL CONFERENCE OF THE SOUTH AFRICAN INSTITUTE OF COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS (SAICSIT 2018), 2018, : 72 - 78
  • [10] Statistical guarantees for regularized neural networks
    Taheri, Mahsa
    Xie, Fang
    Lederer, Johannes
    NEURAL NETWORKS, 2021, 142 : 148 - 161