FastTuning: Enabling Fast and Efficient Hyper-Parameter Tuning With Partitioning and Parallelism of Search Space

被引:0
作者
Li, Xiaqing [1 ,2 ]
Guo, Qi [1 ,2 ]
Zhang, Guangyan [1 ,2 ]
Ye, Siwei [1 ,2 ]
He, Guanhua [1 ,2 ]
Yao, Yiheng [1 ,2 ]
Zhang, Rui [1 ,2 ]
Hao, Yifan [1 ,2 ]
Du, Zidong [1 ,2 ]
Zheng, Weimin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing 100045, Peoples R China
[2] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
关键词
Deep learning; distributed hyper-parameter tuning (HPT) system; parallel computing;
D O I
10.1109/TPDS.2024.3386939
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hyper-parameter tuning (HPT) for deep learning (DL) models is prohibitively expensive. Sequential model-based optimization (SMBO) emerges as the state-of-the-art (SOTA) approach to automatically optimize HPT performance due to its heuristic advantages. Unfortunately, focusing on algorithm optimization rather than a large-scale parallel HPT system, existing SMBO-based approaches still cannot effectively remove their strong sequential nature, posing two performance problems: (1) extremely low tuning speed and (2) sub-optimal model quality. In this paper, we propose FastTuning, a fast, scalable, and generic system aiming at parallelly accelerating SMBO-based HPT for large DL/ML models. The key is to partition the highly complex search space into multiple smaller sub-spaces, each of which is assigned to and optimized by a different tuning worker in parallel. However, determining the right level of resource allocation to strike a balance between quality and cost remains a challenge. To address this, we further propose NIMBLE, a dynamic scheduling strategy that is specially designed for FastTuning, including (1) Dynamic Elimination Algorithm, (2) Sub-space Re-division, and (3) Posterior Information Sharing. Finally, we incorporate 6 SOTAs (i.e., 3 tuning algorithms and 3 parallel tuning tools) into FastTuning. Experimental results, on ResNet18, VGG19, ResNet50, and ResNet152, show that FastTuning can consistently offer much faster tuning speed (up to 80x) with better accuracy (up to 4.7% improvement), thereby enabling the application of automatic HPT to real-life DL models.
引用
收藏
页码:1174 / 1188
页数:15
相关论文
共 58 条
  • [1] Optuna: A Next-generation Hyperparameter Optimization Framework
    Akiba, Takuya
    Sano, Shotaro
    Yanase, Toshihiko
    Ohta, Takeru
    Koyama, Masanori
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2623 - 2631
  • [2] Amodei D, 2015, Arxiv, DOI [arXiv:1512.02595, DOI 10.48550/ARXIV.1512.02595]
  • [3] Asuncion A., 2007, UCI MACHINE LEARNING
  • [4] Baker B, 2017, Arxiv, DOI [arXiv:1705.10823, 10.48550/arXiv.1705.10823]
  • [5] Bergstra J., 2013, P 30 INT C MACHINE L, V28, P115
  • [6] Cai H, 2018, AAAI CONF ARTIF INTE, P2787
  • [7] Contal Emile, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8188, P225, DOI 10.1007/978-3-642-40988-2_15
  • [8] Desautels T, 2014, J MACH LEARN RES, V15, P3873
  • [9] Domhan T, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P3460
  • [10] Dong XY, 2020, Arxiv, DOI arXiv:2001.00326