DynImpt: A Dynamic Data Selection Method for Improving Model Training Efficiency

被引:0
|
作者
Huang, Wei [1 ,2 ]
Zhang, Yunxiao [3 ]
Guo, Shangmin [4 ]
Shang, Yu-Ming [2 ,5 ]
Fu, Xiangling [1 ,2 ]
机构
[1] Beijing Univ Posts & Telecommunicat, Sch Comp Sci, Nat Pilot Software Engn Sch, Beijing 100876, Peoples R China
[2] Minist Educ, Key Lab Trustworthy Distributed Comp & Serv BUPT, Beijing 100876, Peoples R China
[3] Acad Mil Sci, Beijing 100850, Peoples R China
[4] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Scotland
[5] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100876, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金; 国家重点研发计划;
关键词
Data selection; scoring criteria; model scalability; low computational cost; deep learning;
D O I
10.1109/TKDE.2024.3482466
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Selecting key data subsets for model training is an effective way to improve training efficiency. Existing methods generally utilize a well-trained model to evaluate samples and select crucial subsets, ignoring the fact that the sample importance changes dynamically during model training, resulting in the selected subset only being critical in a specific training epoch rather than a changing training phase. To address this issue, we attempt to evaluate the significant changes in sample importance during dynamic training and propose a novel data selection method to improve model training efficiency. Specifically, the temporal changes in sample importance are considered from three perspectives: (i) loss, the difference between the predicted labels and the true labels of samples in the current training epoch; (ii) instability, the dispersion of sample importance in the recent training phase; and (iii) inconsistency, the comparison of the changing trend in the importance of an individual sample relative to the average importance of all samples in the recent training phase. Extensive experiments demonstrate that dynamic data selection can reduce computational costs and improve model training efficiency. Additionally, we find that the difficulty level of the training task influences the data selection strategy.
引用
收藏
页码:239 / 252
页数:14
相关论文
共 50 条
  • [1] Pre-Training Model and Client Selection Optimization for Improving Federated Learning Efficiency
    Ge, Bingchen
    Zhou, Ying
    Xie, Liping
    Kou, Lirong
    2024 9TH INTERNATIONAL CONFERENCE ON ELECTRONIC TECHNOLOGY AND INFORMATION SCIENCE, ICETIS 2024, 2024, : 650 - 660
  • [2] Training data selection for improving discriminative training of acoustic models
    Liu, Shih-Hung
    Chu, Fang-Hui
    Lin, Shih-Hsiang
    Lee, Hung-Shin
    Chen, Berlin
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 284 - 289
  • [3] Training data selection for improving discriminative training of acoustic models
    Chen, Berlin
    Liu, Shih-Hung
    Chu, Fang-Hui
    PATTERN RECOGNITION LETTERS, 2009, 30 (13) : 1228 - 1235
  • [4] DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
    Li, Conglong
    Yao, Zhewei
    Wu, Xiaoxia
    Zhang, Minjia
    Holmes, Connor
    Li, Cheng
    He, Yuxiong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18490 - 18498
  • [5] IMPROVING THE EFFICIENCY OF NEURAL NETWORKS WITH VIRTUAL TRAINING DATA
    Hollosi, Janos
    Krecht, Rudolf
    Marko, Norbert
    Ballagi, Aron
    HUNGARIAN JOURNAL OF INDUSTRY AND CHEMISTRY, 2020, 48 (01): : 3 - 10
  • [6] Improving Model Selection by Employing the Test Data
    Westphal, Max
    Brannath, Werner
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [7] Training data selection method for adaptive beamforming
    Dai, Baoquan
    Wang, Tong
    Bai, Tao
    Wu, Jianxin
    Bao, Zheng
    ELECTRONICS LETTERS, 2014, 50 (17) : 1242 - 1243
  • [8] Improving Hyperspectral Pixel Classification With Unsupervised Training Data Selection
    Rajadell, Olga
    Garcia-Sevilla, Pedro
    Viet Cuong Dinh
    Duin, Robert P. W.
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2014, 11 (03) : 656 - 660
  • [9] Improving robust model selection tests for dynamic models
    Choi, Hwan-Sik
    Kiefer, Nicholas M.
    ECONOMETRICS JOURNAL, 2010, 13 (02): : 177 - 204
  • [10] Effective Selection of Translation Model Training Data
    Liu, Le
    Hong, Yu
    Liu, Hao
    Wang, Xing
    Yao, Jianmin
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 569 - 573