Optimizing Machine Learning on Apache Spark in HPC Environments

被引:0
|
作者
Li, Zhenyu [1 ]
Davis, James [1 ]
Jarvis, Stephen A. [1 ]
机构
[1] Univ Warwick, Dept Comp Sci, Coventry, W Midlands, England
基金
英国工程与自然科学研究理事会;
关键词
Machine Learning; High Performance Computing; Apache Spark; All-Reduce; Asynchronous Stochastic Gradient Descent; MAPREDUCE;
D O I
10.1109/MLHPC.2018.00006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning has established itself as a powerful tool for the construction of decision making models and algorithms through the use of statistical techniques on training data. However, a significant impediment to its progress is the time spent training and improving the accuracy of these models this is a data and compute intensive process, which can often take days, weeks or even months to complete. A common approach to accelerate this process is to employ the use of multiple machines simultaneously, a trait shared with the field of High Performance Computing (HPC) and its clusters. However, existing distributed frameworks for data analytics and machine learning are designed for commodity servers, which do not realize the full potential of a HPC cluster, and thus denies the effective use of a readily available and potentially useful resource. In this work we adapt the application of Apache Spark, a distributed data-flow framework, to support the use of machine learning in HPC environments for the purposes of machine learning. There are inherent challenges to using Spark in this context; memory management, communication costs and synchronization overheads all pose challenges to its efficiency. To this end we introduce: (i) the application of MapRDD, a fine grained distributed data representation; (ii) a task-based all-reduce implementation; and (iii) a new asynchronous Stochastic Gradient Descent (SGD) algorithm using non-blocking all-reduce. We demonstrate up to a 2.6x overall speedup (or a 11.2x theoretical speedup with a Nvidia K80 graphics card), a 82-91% compute ratio, and a 80% reduction in the memory usage, when training the GoogLeNet model to classify 10% of the ImageNet dataset on a 32-node cluster. We also demonstrate a comparable convergence rate using the new asynchronous SGD with respect to the synchronous method. With increasing use of accelerator cards, larger cluster computers and deeper neural network models, we predict a 2x further speedup (i.e. 22.4x accumulated speedup) is obtainable with the new asynchronous SGD algorithm on heterogeneous clusters.
引用
收藏
页码:95 / 105
页数:11
相关论文
共 50 条
  • [31] Road Traffic Event Detection Using Twitter Data, Machine Learning, and Apache Spark
    Alomari, Ebtesam
    Mehmood, Rashid
    Katib, Iyad
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1888 - 1895
  • [32] Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark
    Abdel-Fattah, Manal A.
    Othman, Nermin Abdelhakim
    Goher, Nagwa
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [33] Hybrid Machine Learning-Based Approach for Anomaly Detection using Apache Spark
    Chliah, Hanane
    Battou, Amal
    Hadj, Maryem Ait el
    Laoufi, Adil
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (04) : 870 - 878
  • [34] Deep Learning Frameworks on Apache Spark: A Review
    Venkatesan, Nikitha Johnsirani
    Nam, ChoonSung
    Shin, Dong Ryeol
    IETE TECHNICAL REVIEW, 2019, 36 (02) : 164 - 177
  • [35] An insight into tree based machine learning techniques for big data Analytics using Apache Spark
    Sheshasaayee, Ananthi
    Lakshmi, J. V. N.
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, INSTRUMENTATION AND CONTROL TECHNOLOGIES (ICICICT), 2017, : 1740 - 1743
  • [36] Machine-Learning Based Memory Prediction Model for Data Parallel Workloads in Apache Spark
    Myung, Rohyoung
    Choi, Sukyong
    SYMMETRY-BASEL, 2021, 13 (04):
  • [37] Performance evaluation of DNN with other machine learning techniques in a cluster using Apache Spark and MLlib
    JayaLakshmi, A. N. M.
    Kishore, K. V. Krishna
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (01) : 1311 - 1319
  • [38] Large-Scale Music Genre Analysis and Classification Using Machine Learning with Apache Spark
    Chaudhury, Mousumi
    Karami, Amin
    Ghazanfar, Mustansar Ali
    ELECTRONICS, 2022, 11 (16)
  • [39] Machine Learning Framework for Detecting Offensive Swahili Messages in Social Networks with Apache Spark Implementation
    Jonathan, Francis
    Yang, Dong
    Gowing, Glyn
    Wei, Songjie
    PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2021, : 293 - 297
  • [40] An Analysis of HPC Benchmarks in Virtual Machine Environments
    Tikotekar, Anand
    Vallee, Geoffroy
    Naughton, Thomas
    Ong, Hong
    Engelmann, Christian
    Scott, Stephen L.
    EURO-PAR 2008 WORKSHOPS - PARALLEL PROCESSING, 2009, 5415 : 63 - 71