DynImpt: A Dynamic Data Selection Method for Improving Model Training Efficiency

被引：0

作者：

Huang, Wei ^{[1
,2
]}

Zhang, Yunxiao ^{[3
]}

Guo, Shangmin ^{[4
]}

Shang, Yu-Ming ^{[2
,5
]}

Fu, Xiangling ^{[1
,2
]}

机构：

[1] Beijing Univ Posts & Telecommunicat, Sch Comp Sci, Nat Pilot Software Engn Sch, Beijing 100876, Peoples R China

[2] Minist Educ, Key Lab Trustworthy Distributed Comp & Serv BUPT, Beijing 100876, Peoples R China

[3] Acad Mil Sci, Beijing 100850, Peoples R China

[4] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Scotland

[5] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100876, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2025年 / 37卷 / 01期

基金：

中国国家自然科学基金; 北京市自然科学基金; 国家重点研发计划;

关键词：

Data selection; scoring criteria; model scalability; low computational cost; deep learning;

D O I：

10.1109/TKDE.2024.3482466

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Selecting key data subsets for model training is an effective way to improve training efficiency. Existing methods generally utilize a well-trained model to evaluate samples and select crucial subsets, ignoring the fact that the sample importance changes dynamically during model training, resulting in the selected subset only being critical in a specific training epoch rather than a changing training phase. To address this issue, we attempt to evaluate the significant changes in sample importance during dynamic training and propose a novel data selection method to improve model training efficiency. Specifically, the temporal changes in sample importance are considered from three perspectives: (i) loss, the difference between the predicted labels and the true labels of samples in the current training epoch; (ii) instability, the dispersion of sample importance in the recent training phase; and (iii) inconsistency, the comparison of the changing trend in the importance of an individual sample relative to the average importance of all samples in the recent training phase. Extensive experiments demonstrate that dynamic data selection can reduce computational costs and improve model training efficiency. Additionally, we find that the difficulty level of the training task influences the data selection strategy.

引用

页码：239 / 252

页数：14

共 50 条

[1] Pre-Training Model and Client Selection Optimization for Improving Federated Learning Efficiency
Ge, Bingchen
Zhou, Ying
Xie, Liping
Kou, Lirong
2024 9TH INTERNATIONAL CONFERENCE ON ELECTRONIC TECHNOLOGY AND INFORMATION SCIENCE, ICETIS 2024, 2024, : 650 - 660
[2] Training data selection for improving discriminative training of acoustic models
Liu, Shih-Hung
Chu, Fang-Hui
Lin, Shih-Hsiang
Lee, Hung-Shin
Chen, Berlin
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 284 - 289
[3] Training data selection for improving discriminative training of acoustic models
Chen, Berlin
Liu, Shih-Hung
Chu, Fang-Hui
PATTERN RECOGNITION LETTERS, 2009, 30 (13) : 1228 - 1235
[4] DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Li, Conglong
Yao, Zhewei
Wu, Xiaoxia
Zhang, Minjia
Holmes, Connor
Li, Cheng
He, Yuxiong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18490 - 18498
[5] IMPROVING THE EFFICIENCY OF NEURAL NETWORKS WITH VIRTUAL TRAINING DATA
Hollosi, Janos
Krecht, Rudolf
Marko, Norbert
Ballagi, Aron
HUNGARIAN JOURNAL OF INDUSTRY AND CHEMISTRY, 2020, 48 (01): : 3 - 10
[6] Improving Model Selection by Employing the Test Data
Westphal, Max
Brannath, Werner
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[7] Training data selection method for adaptive beamforming
Dai, Baoquan
Wang, Tong
Bai, Tao
Wu, Jianxin
Bao, Zheng
ELECTRONICS LETTERS, 2014, 50 (17) : 1242 - 1243
[8] Improving Hyperspectral Pixel Classification With Unsupervised Training Data Selection
Rajadell, Olga
Garcia-Sevilla, Pedro
Viet Cuong Dinh
Duin, Robert P. W.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2014, 11 (03) : 656 - 660
[9] Improving robust model selection tests for dynamic models
Choi, Hwan-Sik
Kiefer, Nicholas M.
ECONOMETRICS JOURNAL, 2010, 13 (02): : 177 - 204
[10] Effective Selection of Translation Model Training Data
Liu, Le
Hong, Yu
Liu, Hao
Wang, Xing
Yao, Jianmin
PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 569 - 573

← 1 2 3 4 5 →