Runtime Data Layout Scheduling for Machine Learning Dataset

被引:5
|
作者
You, Yang [1 ]
Demmel, James [1 ]
机构
[1] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
关键词
parallel auto-tuning; machine learning;
D O I
10.1109/ICPP.2017.54
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning (ML) approaches are widely-used classification/regression methods for data mining applications. However, the time-consuming training process greatly limits the efficiency of ML approaches. We use the example of SVM (traditional ML algorithm) and DNN (state-of-the-art ML algorithm) to illustrate the idea in this paper. For SVM, a major performance bottleneck of current tools is that they use a unified data storage format because the data formats can have a significant influence on the complexity of storage and computation, memory bandwidth, and the efficiency of parallel processing. To address the problem above, we study the factors influencing the algorithm's performance and conduct auto-tuning to speed up SVM training. DNN training is even slower than SVM. For example, using a 8-core CPUs to train AlexNet model by CIFAR-10 dataset costs 8.2 hours. CIFAR-10 is only 170 MB, which is not efficient for distributed processing. Moreover, due to the algorithm limitation, only a small batch of data can be processed at each iteration. We focus on finding the right algorithmic parameters and using auto-tuning techniques to make the algorithm run faster. For SVM training, our implementation achieves 1.7-16.3x speedup (6.8x on average) against the non-adaptive case (using the worst data format) for various datasets. For DNN training on CIFAR-10 dataset, we reduce the time from 8.2 hours to only roughly 1 minute. We use the benchmark of dollars per speedup to help the users to select the right deep learning hardware.
引用
收藏
页码:452 / 461
页数:10
相关论文
共 50 条
  • [21] A benchmark dataset for machine learning in ecotoxicology
    Schuer, Christoph
    Gasser, Lilian
    Perez-Cruz, Fernando
    Schirmer, Kristin
    Baity-Jesi, Marco
    SCIENTIFIC DATA, 2023, 10 (01)
  • [22] CuneiML: A Cuneiform Dataset for Machine Learning
    Chen, Danlu
    Agarwal, Aditi
    Berg-Kirkpatrick, Taylor
    Myerston, Jacobo
    JOURNAL OF OPEN HUMANITIES DATA, 2023, 9
  • [23] Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models
    Nasim Ahmed
    Andre L. C. Barczak
    Mohammad A. Rashid
    Teo Susnjak
    Journal of Big Data, 9
  • [24] Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models
    Ahmed, Nasim
    Barczak, Andre L. C.
    Rashid, Mohammad A.
    Susnjak, Teo
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [25] Production Planning and Scheduling Using Machine Learning and Data Science Processes
    De Modesti, Paulo Henrique
    Carvalhar Fernandes, Ederson
    Borsato, Milton
    SPS2020, 2020, 13 : 155 - 166
  • [26] Genetic algorithms for integrating cell formation with machine layout and scheduling
    Wu, Xiaodan
    Chu, Chao-Hsien
    Wang, Yunfeng
    Yue, Dianmin
    COMPUTERS & INDUSTRIAL ENGINEERING, 2007, 53 (02) : 277 - 289
  • [27] Planning and layout of tourism and leisure facilities based on POI big data and machine learning
    Wu, Shifeng
    Wang, Jiangyun
    Jia, Yinuo
    Yang, Jintian
    Li, Jixiu
    PLOS ONE, 2025, 20 (03):
  • [28] Kairos: Preemptive Data Center Scheduling Without Runtime Estimates
    Delgado, Pamela
    Didona, Diego
    Dinu, Florin
    Zwaenepoel, Willy
    PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18), 2018, : 135 - 148
  • [29] A REVIEW OF MACHINE LEARNING IN SCHEDULING
    AYTUG, H
    BHATTACHARYYA, S
    KOEHLER, GJ
    SNOWDON, JL
    IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 1994, 41 (02) : 165 - 171
  • [30] A Machine Learning Approach for Layout Inference in Spreadsheets
    Koci, Elvis
    Thiele, Maik
    Romero, Oscar
    Lehner, Wolfgang
    KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 77 - 88