Cost-effective Distillation of Large Language Models

被引:0
|
作者
Dasgupta, Sayantan [1 ]
Cohn, Trevor [1 ,2 ]
Baldwin, Timothy [1 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic, Australia
[2] Google DeepMind, Seattle, WA USA
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Knowledge distillation (KD) involves training a small "student" model to replicate the strong performance of a high-capacity "teacher" model, enabling efficient deployment in resource-constrained settings. Topperforming methods tend to be task- or architecture-specific and lack generalizability. Several existing approaches require pretraining of the teacher on task-specific datasets, which can be costly for large and unstable for small datasets. Here we propose an approach for improving KD through a novel distillation loss agnostic to the task and model architecture. We successfully apply our method to the distillation of the BERT-base and achieve highly competitive results from the distilled student across a range of GLUE tasks, especially for tasks with smaller datasets.(1)
引用
收藏
页码:7346 / 7354
页数:9
相关论文
共 50 条
  • [21] PROMOTING COST-EFFECTIVE PRESCRIBING - COST-EFFECTIVENESS STUDIES MAY NOT BE COST-EFFECTIVE
    TOWSE, A
    WELLS, N
    BRITISH MEDICAL JOURNAL, 1995, 311 (6997): : 126 - 126
  • [22] Cost-effective plasma spraying for large-scale applications
    Medricky, Jan
    Musalek, Radek
    Janata, Marek
    Chraska, Tomas
    Lukac, Frantisek
    INTERNATIONAL THERMAL SPRAY CONFERENCE AND EXPOSITION (ITSC 2018), 2018, : 683 - 689
  • [23] A COST-EFFECTIVE EIGENSOLUTION METHOD FOR LARGE SYSTEMS WITH ROCKWELL NASTRAN
    GUPTA, VK
    COLE, JG
    MOCK, WD
    NUCLEAR ENGINEERING AND DESIGN, 1984, 78 (03) : 329 - 333
  • [24] COST-EFFECTIVE SAMPLING OF FISH POPULATIONS IN LARGE WATER BODIES
    HICKLEY, P
    STARKIE, A
    JOURNAL OF FISH BIOLOGY, 1985, 27 : 151 - 161
  • [25] Robust, cost-effective and scalable localization in large indoor areas
    Guan, Tong
    Fang, Le
    Dong, Wen
    Koutsonikolas, Dimitrios
    Challen, Geoffrey
    Qiao, Chunming
    COMPUTER NETWORKS, 2017, 120 : 43 - 55
  • [26] Robust, Cost-Effective and Scalable Localization in Large Indoor Areas
    Guan, Tong
    Dong, Wen
    Koutsonikolas, Dimitrios
    Challen, Geoffrey
    Qiao, Chunming
    2015 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2015,
  • [27] Mathematical models and cost-effective screening strategies for colorectal cancer
    Smith, Robert A.
    CANADIAN MEDICAL ASSOCIATION JOURNAL, 2010, 182 (12) : 1283 - 1284
  • [28] Cost-effective HIPing
    Zimmerman, Franz
    Bergman, Carl
    Westerlund, Jan
    International Journal of Powder Metallurgy (Princeton, New Jersey), 1999, 35 (03): : 31 - 35
  • [29] IS INTERVENTION COST-EFFECTIVE
    RECKLESS, JPD
    POSTGRADUATE MEDICAL JOURNAL, 1992, 68 (805) : 882 - 883
  • [30] Apixaban is cost-effective
    Gregory B. Lim
    Nature Reviews Cardiology, 2014, 11 (4) : 187 - 187