Cost-effective Distillation of Large Language Models

被引：0

作者：

Dasgupta, Sayantan ^{[1
]}

Cohn, Trevor ^{[1
,2
]}

Baldwin, Timothy ^{[1
]}

机构：

[1] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic, Australia

[2] Google DeepMind, Seattle, WA USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Knowledge distillation (KD) involves training a small "student" model to replicate the strong performance of a high-capacity "teacher" model, enabling efficient deployment in resource-constrained settings. Topperforming methods tend to be task- or architecture-specific and lack generalizability. Several existing approaches require pretraining of the teacher on task-specific datasets, which can be costly for large and unstable for small datasets. Here we propose an approach for improving KD through a novel distillation loss agnostic to the task and model architecture. We successfully apply our method to the distillation of the BERT-base and achieve highly competitive results from the distilled student across a range of GLUE tasks, especially for tasks with smaller datasets.(1)

引用

页码：7346 / 7354

页数：9

共 50 条

[1] Speed Up! Cost-Effective Large Language Model for ADAS Via Knowledge Distillation
Taveekitworachai, Pittawat
Suntichaikul, Pratch
Nukoolkit, Chakarida
Thawonmas, Ruck
2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1933 - 1938
[2] Cost-effective distillation: Getting the best
Gaikar, V.G.
Chemical Engineering World, 2001, 36 (06): : 43 - 47
[3] CPM-2: Large-scale cost-effective pre-trained language models
Zhang, Zhengyan
Gu, Yuxian
Han, Xu
Chen, Shengqi
Xiao, Chaojun
Sun, Zhenbo
Yao, Yuan
Qi, Fanchao
Guan, Jian
Ke, Pei
Cai, Yanzheng
Zeng, Guoyang
Tan, Zhixing
Liu, Zhiyuan
Huang, Minlie
Han, Wentao
Liu, Yang
Zhu, Xiaoyan
Sun, Maosong
AI OPEN, 2021, 2 : 216 - 224
[4] Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference
Wang, Chi
Liu, Susan Xueqing
Awadallah, Ahmed H.
INTERNATIONAL CONFERENCE ON AUTOMATED MACHINE LEARNING, VOL 224, 2023, 224
[5] On-the-Fly Adapting Code Summarization on Trainable Cost-Effective Language Models
Cai, Yufan
Lin, Yun
Liu, Chenyan
Wu, Jinglian
Zhang, Yifan
Liu, Yiming
Gong, Yeyun
Dong, Jin Song
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Thermodynamic Efficiency and Cost-Effective Optimization of Heterogeneous Batch Distillation
Rodriguez-Donis, Ivonne
Hernandez-Gonzalez, Noslen
Gerbaud, Vincent
Joulia, Xavier
22 EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, 2012, 30 : 362 - 366
[7] A strategy for cost-effective large language model use at health system-scale
Klang, Eyal
Apakama, Donald
Abbott, Ethan E.
Vaid, Akhil
Lampert, Joshua
Sakhuja, Ankit
Freeman, Robert
Charney, Alexander W.
Reich, David
Kraft, Monica
Nadkarni, Girish N.
Glicksberg, Benjamin S.
NPJ DIGITAL MEDICINE, 2024, 7 (01):
[8] Cost-Effective Learning for Cost-Effective Care?
Walsh, Kieran
ACADEMIC MEDICINE, 2011, 86 (12) : 1485 - 1486
[9] LARGE-SCALE COST-EFFECTIVE PACKAGING
FARRELL, JJ
IEEE MICRO, 1985, 5 (03) : 5 - 10
[10] Innovative and cost-effective management of large omphalocele
Gupta, Parthapratim
JOURNAL OF PEDIATRIC SURGERY, 2007, 42 (06) : 1130 - 1132

← 1 2 3 4 5 →