Runtime Data Layout Scheduling for Machine Learning Dataset

被引:5
|
作者
You, Yang [1 ]
Demmel, James [1 ]
机构
[1] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
关键词
parallel auto-tuning; machine learning;
D O I
10.1109/ICPP.2017.54
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning (ML) approaches are widely-used classification/regression methods for data mining applications. However, the time-consuming training process greatly limits the efficiency of ML approaches. We use the example of SVM (traditional ML algorithm) and DNN (state-of-the-art ML algorithm) to illustrate the idea in this paper. For SVM, a major performance bottleneck of current tools is that they use a unified data storage format because the data formats can have a significant influence on the complexity of storage and computation, memory bandwidth, and the efficiency of parallel processing. To address the problem above, we study the factors influencing the algorithm's performance and conduct auto-tuning to speed up SVM training. DNN training is even slower than SVM. For example, using a 8-core CPUs to train AlexNet model by CIFAR-10 dataset costs 8.2 hours. CIFAR-10 is only 170 MB, which is not efficient for distributed processing. Moreover, due to the algorithm limitation, only a small batch of data can be processed at each iteration. We focus on finding the right algorithmic parameters and using auto-tuning techniques to make the algorithm run faster. For SVM training, our implementation achieves 1.7-16.3x speedup (6.8x on average) against the non-adaptive case (using the worst data format) for various datasets. For DNN training on CIFAR-10 dataset, we reduce the time from 8.2 hours to only roughly 1 minute. We use the benchmark of dollars per speedup to help the users to select the right deep learning hardware.
引用
收藏
页码:452 / 461
页数:10
相关论文
共 50 条
  • [31] Correcting the document layout: A machine learning approach
    Malerba, D
    Esposito, F
    Altamura, O
    Ceci, M
    Berardi, M
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 97 - 102
  • [32] Room Layout EstimationUsing a Machine Learning Technique
    Techasarntikul, Nattaon
    Tsuchida, Kazumi
    Mashita, Tomohiro
    INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 505 - 510
  • [33] The ripple effect of dataset reuse: Contextualising the data lifecycle for machine learning data sets and social impact
    Park, Jaihyun
    Cordell, Ryan
    JOURNAL OF INFORMATION SCIENCE, 2023,
  • [34] Efficient data layout, scheduling and playout control in MARS
    Milind M. Buddhikot
    Gurudatta M. Parulkar
    Multimedia Systems, 1997, 5 : 199 - 212
  • [35] Data Layout and Scheduling Tasks in a Meteorological Cloud Environment
    Wang, Kunfu
    Hao, Yongsheng
    Cao, Jie
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (01): : 1033 - 1052
  • [36] Efficient data layout, scheduling and playout control in MARS
    Buddhikot, MM
    Parulkar, GM
    MULTIMEDIA SYSTEMS, 1997, 5 (03) : 199 - 212
  • [37] Data and its (dis)contents: A survey of dataset development and use in machine learning research
    Paullada, Amandalynne
    Raji, Inioluwa Deborah
    Bender, Emily M.
    Denton, Emily
    Hanna, Alex
    PATTERNS, 2021, 2 (11):
  • [38] MANAGING DATA REQUIREMENTS FOR THE MACHINE LAYOUT PROBLEM
    HASSAN, MMD
    ALBIN, M
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 1994, 7 (1-2) : 28 - 37
  • [39] Performance Analysis of Machine Learning Algorithms on Diabetes Dataset using Big Data Analytics
    Kumar, P. Suresh
    Pranavi, S.
    2017 INTERNATIONAL CONFERENCE ON INFOCOM TECHNOLOGIES AND UNMANNED SYSTEMS (TRENDS AND FUTURE DIRECTIONS) (ICTUS), 2017, : 508 - 513
  • [40] Application of Big Data Analytics and Machine Learning to Large-Scale Synchrophasor Datasets: Evaluation of Dataset 'Machine Learning-Readiness'
    Hart, Philip
    He, Lijun
    Wang, Tianyi
    Kumar, Vijay S.
    Aggour, Kareem
    Subramanian, Arun
    Yan, Weizhong
    IEEE OPEN ACCESS JOURNAL OF POWER AND ENERGY, 2022, 9 : 386 - 397