Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation

被引:0
|
作者
Feng, Lingyun [1 ]
Qiu, Minghui [2 ]
Li, Yaliang [2 ]
Zheng, Hai-Tao [1 ]
Shen, Ying [3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
[3] Sun Yat Sen Univ, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of natural language processing tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to adopt knowledge distillation to compress these large pre-trained models (teacher models) to small student models. However, for a target domain with scarce training data, the teacher can hardly pass useful knowledge to the student, which yields performance degradation for the student models. To tackle this problem, we propose a method to learn to augment for data-scarce domain BERT knowledge distillation, by learning a cross-domain manipulation scheme that automatically augments the target with the help of resource-rich source domains. Specifically, the proposed method generates samples acquired from a stationary distribution near the target data and adopts a reinforced selector to automatically refine the augmentation strategy according to the performance of the student. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art baselines on four different tasks, and for the data-scarce domains, the compressed student models even perform better than the original large teacher model, with much fewer parameters (only -13.3%) when only a few labeled examples available.
引用
收藏
页码:7422 / 7430
页数:9
相关论文
共 50 条
  • [21] Integrating Physical and Machine Learning Models for Enhanced Landslide Prediction in Data-Scarce Environments
    Al-Najjar, Husam A. H.
    Pradhan, Biswajeet
    He, Xuzhen
    Sheng, Daichao
    Alamri, Abdullah
    Gite, Shilpa
    Park, Hyuck-Jin
    EARTH SYSTEMS AND ENVIRONMENT, 2024,
  • [22] Stream salinity prediction in data-scarce regions: Application of transfer learning and uncertainty quantification
    Khodkar, Kasra
    Mirchi, Ali
    Nourani, Vahid
    Kaghazchi, Afsaneh
    Sadler, Jeffrey M.
    Mansaray, Abubakarr
    Wagner, Kevin
    Alderman, Phillip D.
    Taghvaeian, Saleh
    Bailey, Ryan T.
    JOURNAL OF CONTAMINANT HYDROLOGY, 2024, 266
  • [23] Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
    Loh, Charlotte
    Christensen, Thomas
    Dangovski, Rumen
    Kim, Samuel
    Soljacic, Marin
    NATURE COMMUNICATIONS, 2022, 13 (01)
  • [24] Suitability Assessment of Fish Habitat in a Data-Scarce River
    Akter, Aysha
    Toukir, Md Redwoan
    Dayem, Ahammed
    HYDROLOGY, 2022, 9 (10)
  • [25] Data Mining Techniques for Endometriosis Detection in a Data-Scarce Medical Dataset
    Caballero, Pablo
    Gonzalez-Abril, Luis
    Ortega, Juan A.
    Simon-Soro, Aurea
    ALGORITHMS, 2024, 17 (03)
  • [26] On the derivation of flow rating curves in data-scarce environments
    Manfreda, Salvatore
    JOURNAL OF HYDROLOGY, 2018, 562 : 151 - 154
  • [27] Bidirectional domain transfer knowledge distillation for catastrophic forgetting in federated learning with heterogeneous data
    Min, Qi
    Luo, Fei
    Dong, Wenbo
    Gu, Chunhua
    Ding, Weichao
    KNOWLEDGE-BASED SYSTEMS, 2025, 311
  • [28] Self-Learning Random Forests Model for Mapping Groundwater Yield in Data-Scarce Areas
    Maher Ibrahim Sameen
    Biswajeet Pradhan
    Saro Lee
    Natural Resources Research, 2019, 28 : 757 - 775
  • [29] Adaptive Contrastive Knowledge Distillation for BERT Compression
    Guo, Jinyang
    Liu, Jiaheng
    Wang, Zining
    Ma, Yuqing
    Gong, Ruihao
    Xu, Ke
    Liu, Xianglong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8941 - 8953
  • [30] Patient Knowledge Distillation for BERT Model Compression
    Sun, Siqi
    Cheng, Yu
    Gan, Zhe
    Liu, Jingjing
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 4323 - 4332