Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model

被引:0
|
作者
Wang, Xiao [1 ]
Zhou, Weikang [1 ]
Zhang, Qi [1 ]
Zhou, Jie [1 ]
Gao, Songyang [1 ]
Wang, Junzhe [1 ]
Zhang, Menghan [2 ]
Gao, Xiang [3 ]
Chen, Yunwen [3 ]
Gui, Tao [2 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[2] Fudan Univ, Inst Modern Languages & Linguist, Shanghai, Peoples R China
[3] DataGrand Informat Technol Shanghai Co Ltd, Shanghai, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pretrained language models have achieved remarkable success in various natural language processing tasks. However, pretraining has recently shifted toward larger models and larger data, and this has resulted in significant computational and energy costs. In this paper, we propose Influence Subset Selection (ISS) for language model, which explicitly utilizes end-task knowledge to select a tiny subset of the pretraining corpus. Specifically, the ISS selects the samples that will provide the most positive influence on the performance of the end-task. Furthermore, we design a gradient matching based influence estimation method, which can drastically reduce the computation time of influence. With only 0.45% of the data and a three-orders-of-magnitude lower computational cost, ISS outperformed pretrained models (e.g., RoBERTa) on eight datasets covering four domains.
引用
收藏
页码:555 / 568
页数:14
相关论文
共 50 条
  • [41] Identifying Influential Nodes in Large-Scale Directed Networks: The Role of Clustering
    Chen, Duan-Bing
    Gao, Hui
    Lu, Linyuan
    Zhou, Tao
    PLOS ONE, 2013, 8 (10):
  • [42] Identifying Influential Factors of CDN Performance with Large-scale Data Analysis
    Wang, Dawei
    Zhang, Shuzhuang
    Xue, Yibo
    Dong, Yingfei
    2018 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2018, : 873 - 877
  • [43] On a large-scale model of the Universe
    Grigoryan, SS
    DOKLADY PHYSICS, 2002, 47 (10) : 731 - 734
  • [44] On large-scale model of universe
    Grigorian, S.S.
    Doklady Akademii Nauk, 2002, 386 (04) : 471 - 475
  • [45] On a large-scale model of the universe
    S. S. Grigoryan
    Doklady Physics, 2002, 47 : 731 - 734
  • [46] Identifying influential spreaders in large-scale networks based on evidence theory
    Liu, Dong
    Nie, Hao
    Zhao, Jing
    Wang, Qingchen
    NEUROCOMPUTING, 2019, 359 : 466 - 475
  • [47] Evaluating large-language-model chatbots to engage communities in large-scale design projects
    Dortheimer, Jonathan
    Martelaro, Nik
    Sprecher, Aaron
    Schubert, Gerhard
    AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2024, 38
  • [48] Large-scale photonic natural language processing
    Valensise, Carlo M.
    Grecco, Ivana
    Perangeli, Davide
    Conti, Laudio
    PHOTONICS RESEARCH, 2022, 10 (12) : 2846 - 2853
  • [49] Large-Scale Network Involvement in Language Processing
    Wylie, Korey P.
    Regner, Michael F.
    JOURNAL OF NEUROSCIENCE, 2014, 34 (47): : 15505 - 15507
  • [50] Language requirements for large-scale generic libraries
    Siek, J
    Lumsdaine, A
    GENERATIVE PROGRAMMING AND COMPONENT ENGINEERING, PROCEEDINGS, 2005, 3676 : 405 - 421