Cost-effective Distillation of Large Language Models

被引:0
|
作者
Dasgupta, Sayantan [1 ]
Cohn, Trevor [1 ,2 ]
Baldwin, Timothy [1 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic, Australia
[2] Google DeepMind, Seattle, WA USA
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Knowledge distillation (KD) involves training a small "student" model to replicate the strong performance of a high-capacity "teacher" model, enabling efficient deployment in resource-constrained settings. Topperforming methods tend to be task- or architecture-specific and lack generalizability. Several existing approaches require pretraining of the teacher on task-specific datasets, which can be costly for large and unstable for small datasets. Here we propose an approach for improving KD through a novel distillation loss agnostic to the task and model architecture. We successfully apply our method to the distillation of the BERT-base and achieve highly competitive results from the distilled student across a range of GLUE tasks, especially for tasks with smaller datasets.(1)
引用
收藏
页码:7346 / 7354
页数:9
相关论文
共 50 条
  • [1] Speed Up! Cost-Effective Large Language Model for ADAS Via Knowledge Distillation
    Taveekitworachai, Pittawat
    Suntichaikul, Pratch
    Nukoolkit, Chakarida
    Thawonmas, Ruck
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1933 - 1938
  • [2] Cost-effective distillation: Getting the best
    Gaikar, V.G.
    Chemical Engineering World, 2001, 36 (06): : 43 - 47
  • [3] CPM-2: Large-scale cost-effective pre-trained language models
    Zhang, Zhengyan
    Gu, Yuxian
    Han, Xu
    Chen, Shengqi
    Xiao, Chaojun
    Sun, Zhenbo
    Yao, Yuan
    Qi, Fanchao
    Guan, Jian
    Ke, Pei
    Cai, Yanzheng
    Zeng, Guoyang
    Tan, Zhixing
    Liu, Zhiyuan
    Huang, Minlie
    Han, Wentao
    Liu, Yang
    Zhu, Xiaoyan
    Sun, Maosong
    AI OPEN, 2021, 2 : 216 - 224
  • [4] Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference
    Wang, Chi
    Liu, Susan Xueqing
    Awadallah, Ahmed H.
    INTERNATIONAL CONFERENCE ON AUTOMATED MACHINE LEARNING, VOL 224, 2023, 224
  • [5] On-the-Fly Adapting Code Summarization on Trainable Cost-Effective Language Models
    Cai, Yufan
    Lin, Yun
    Liu, Chenyan
    Wu, Jinglian
    Zhang, Yifan
    Liu, Yiming
    Gong, Yeyun
    Dong, Jin Song
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Thermodynamic Efficiency and Cost-Effective Optimization of Heterogeneous Batch Distillation
    Rodriguez-Donis, Ivonne
    Hernandez-Gonzalez, Noslen
    Gerbaud, Vincent
    Joulia, Xavier
    22 EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, 2012, 30 : 362 - 366
  • [7] A strategy for cost-effective large language model use at health system-scale
    Klang, Eyal
    Apakama, Donald
    Abbott, Ethan E.
    Vaid, Akhil
    Lampert, Joshua
    Sakhuja, Ankit
    Freeman, Robert
    Charney, Alexander W.
    Reich, David
    Kraft, Monica
    Nadkarni, Girish N.
    Glicksberg, Benjamin S.
    NPJ DIGITAL MEDICINE, 2024, 7 (01):
  • [8] Cost-Effective Learning for Cost-Effective Care?
    Walsh, Kieran
    ACADEMIC MEDICINE, 2011, 86 (12) : 1485 - 1486
  • [9] LARGE-SCALE COST-EFFECTIVE PACKAGING
    FARRELL, JJ
    IEEE MICRO, 1985, 5 (03) : 5 - 10
  • [10] Innovative and cost-effective management of large omphalocele
    Gupta, Parthapratim
    JOURNAL OF PEDIATRIC SURGERY, 2007, 42 (06) : 1130 - 1132