Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation

被引:0
|
作者
Feng, Lingyun [1 ]
Qiu, Minghui [2 ]
Li, Yaliang [2 ]
Zheng, Hai-Tao [1 ]
Shen, Ying [3 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
[3] Sun Yat Sen Univ, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite pre-trained language models such as BERT have achieved appealing performance in a wide range of natural language processing tasks, they are computationally expensive to be deployed in real-time applications. A typical method is to adopt knowledge distillation to compress these large pre-trained models (teacher models) to small student models. However, for a target domain with scarce training data, the teacher can hardly pass useful knowledge to the student, which yields performance degradation for the student models. To tackle this problem, we propose a method to learn to augment for data-scarce domain BERT knowledge distillation, by learning a cross-domain manipulation scheme that automatically augments the target with the help of resource-rich source domains. Specifically, the proposed method generates samples acquired from a stationary distribution near the target data and adopts a reinforced selector to automatically refine the augmentation strategy according to the performance of the student. Extensive experiments demonstrate that the proposed method significantly outperforms state-of-the-art baselines on four different tasks, and for the data-scarce domains, the compressed student models even perform better than the original large teacher model, with much fewer parameters (only -13.3%) when only a few labeled examples available.
引用
收藏
页码:7422 / 7430
页数:9
相关论文
共 50 条
  • [41] Identification of groundwater potential zones in data-scarce mountainous region using explainable machine learning
    Dahal, Kshitij
    Sharma, Sandesh
    Shakya, Amin
    Talchabhadel, Rocky
    Adhikari, Sanot
    Pokharel, Anju
    Sheng, Zhuping
    Pradhan, Ananta Man Singh
    Kumar, Saurav
    JOURNAL OF HYDROLOGY, 2023, 627
  • [42] Self-Learning Random Forests Model for Mapping Groundwater Yield in Data-Scarce Areas
    Sameen, Maher Ibrahim
    Pradhan, Biswajeet
    Lee, Saro
    NATURAL RESOURCES RESEARCH, 2019, 28 (03) : 757 - 775
  • [43] Estimation of spatially distributed groundwater recharge in data-scarce regions
    Belay, Ashebir Sewale
    Yenehun, Alemu
    Nigate, Fenta
    Tilahun, Seifu A.
    Dessie, Mekete
    Moges, Michael M.
    Chen, Margaret
    Fentie, Derbew
    Adgo, Enyew
    Nyssen, Jan
    Walraevens, Kristine
    JOURNAL OF HYDROLOGY-REGIONAL STUDIES, 2024, 56
  • [44] Survey of Methods for Data-Scarce Processing Based on Mechanism Model
    Wang, Guobo
    Ma, Minglu
    Xu, Liansheng
    5TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI2020), 2020, 1575
  • [45] Accelerating Multi-Exit BERT Inference via Curriculum Learning and Knowledge Distillation
    Gu, Shengwei
    Luo, Xiangfeng
    Wang, Xinzhi
    Guo, Yike
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (03) : 395 - 413
  • [46] Prediction of Runoff in Watersheds Located within Data-Scarce Regions
    Ghanim, Abdulnoor A. J.
    Beddu, Salmia
    Abd Manan, Teh Sabariah Binti
    Al Yami, Saleh H.
    Irfan, Muhammad
    Mursal, Salim Nasar Faraj
    Kamal, Nur Liyana Mohd
    Mohamad, Daud
    Machmudah, Affiani
    Yavari, Saba
    Mohtar, Wan Hanna Melini Wan
    Ahmad, Amirrudin
    Rasdi, Nadiah Wan
    Khan, Taimur
    SUSTAINABILITY, 2022, 14 (13)
  • [47] Optimising the allocation of groundwater carrying capacity in a data-scarce region
    Li, Xun-Gui
    Wei, Xia
    Lu, Yu-Dong
    WATER SA, 2010, 36 (04) : 451 - 460
  • [48] Forecasting fierce floods with transferable AI in data-scarce regions
    Wang, Hui-Min
    Peng, Xiao
    He, Xiaogang
    INNOVATION, 2024, 5 (04):
  • [49] Landslide susceptibility analysis in data-scarce regions: the case of Kyrgyzstan
    Annamaria Saponaro
    Marco Pilz
    Marc Wieland
    Dino Bindi
    Bolot Moldobekov
    Stefano Parolai
    Bulletin of Engineering Geology and the Environment, 2015, 74 : 1117 - 1136
  • [50] Tracer hydrology of the data-scarce and heterogeneous Central American Isthmus
    Sanchez-Murillo, Ricardo
    Esquivel-Hernandez, Germain
    Corrales-Salazar, Jose L.
    Castro-Chacon, Laura
    Duran-Quesada, Ana M.
    Guerrero-Hernandez, Manuel
    Delgado, Valeria
    Barberena, Javier
    Montenegro-Rayo, Katia
    Calderon, Heyddy
    Chevez, Carlos
    Pena-Paz, Tania
    Garcia-Santos, Saul
    Ortiz-Roque, Pedro
    Alvarado-Callejas, Yaneth
    Benegas, Laura
    Hernandez-Antonio, Antonio
    Matamoros-Ortega, Marcela
    Ortega, Lucia
    Terzer-Wassmuth, Stefan
    HYDROLOGICAL PROCESSES, 2020, 34 (11) : 2660 - 2675