Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation

被引:0
|
作者
Feng, Lingyun [1 ]
Qiu, Minghui [2 ]
Li, Yaliang [2 ]
Zheng, Hai-Tao [1 ]
Shen, Ying [3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
[3] Sun Yat Sen Univ, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of natural language processing tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to adopt knowledge distillation to compress these large pre-trained models (teacher models) to small student models. However, for a target domain with scarce training data, the teacher can hardly pass useful knowledge to the student, which yields performance degradation for the student models. To tackle this problem, we propose a method to learn to augment for data-scarce domain BERT knowledge distillation, by learning a cross-domain manipulation scheme that automatically augments the target with the help of resource-rich source domains. Specifically, the proposed method generates samples acquired from a stationary distribution near the target data and adopts a reinforced selector to automatically refine the augmentation strategy according to the performance of the student. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art baselines on four different tasks, and for the data-scarce domains, the compressed student models even perform better than the original large teacher model, with much fewer parameters (only -13.3%) when only a few labeled examples available.
引用
收藏
页码:7422 / 7430
页数:9
相关论文
共 50 条
  • [1] Knowledge distillation for BERT unsupervised domain adaptation
    Ryu, Minho
    Lee, Geonseok
    Lee, Kichun
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (11) : 3113 - 3128
  • [2] Knowledge distillation for BERT unsupervised domain adaptation
    Minho Ryu
    Geonseok Lee
    Kichun Lee
    Knowledge and Information Systems, 2022, 64 : 3113 - 3128
  • [3] BERT Learns to Teach: Knowledge Distillation with Meta Learning
    Zhou, Wangchunshu
    Xu, Canwen
    McAuley, Julian
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7037 - 7049
  • [4] Harnessing large language models for data-scarce learning of polymer properties
    Liu, Ning
    Jafarzadeh, Siavash
    Lattimer, Brian Y.
    Ni, Shuna
    Lua, Jim
    Yu, Yue
    NATURE COMPUTATIONAL SCIENCE, 2025, : 245 - 254
  • [5] A drought monitoring framework for data-scarce regions
    Real-Rangel, Roberto A.
    Pedrozo-Acuna, Adrian
    Agustin Brena-Naranjo, J.
    Alcocer-Yamanaka, Victor H.
    JOURNAL OF HYDROINFORMATICS, 2020, 22 (01) : 170 - 185
  • [6] Towards catchment classification in data-scarce regions
    Auerbach, Daniel A.
    Buchanan, Brian P.
    Alexiades, Alexander V.
    Anderson, Elizabeth P.
    Encalada, Andrea C.
    Larson, Erin I.
    McManamay, Ryan A.
    Poe, Gregory L.
    Walter, M. Todd
    Flecker, Alexander S.
    ECOHYDROLOGY, 2016, 9 (07) : 1235 - 1247
  • [7] Optimizing deep reinforcement learning in data-scarce domains: a cross-domain evaluation of double DQN and dueling DQN
    Din, Nusrat Mohi Ud
    Assad, Assif
    Ul Sabha, Saqib
    Rasool, Muzafar
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024,
  • [8] Potential of rainfall data hybridization in a data-scarce region
    Wambura, Frank Joseph
    SCIENTIFIC AFRICAN, 2020, 8
  • [9] DEM Generation Incorporating River Channels in Data-Scarce Contexts: The "Fluvial Domain Method"
    Villanueva, Jairo R. Escobar
    Perez-Montiel, Jhonny I.
    Nardini, Andrea Gianni Cristoforo
    HYDROLOGY, 2025, 12 (02)
  • [10] An Improved Anticipated Learning Machine for Daily Runoff Prediction in Data-Scarce Regions
    Hu, Wei
    Qian, Longxia
    Hong, Mei
    Zhao, Yong
    Fan, Linlin
    MATHEMATICAL GEOSCIENCES, 2025, 57 (01) : 49 - 88