Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

被引:0
|
作者
Khan, Ayaz H. [1 ]
Qamar, Ali Mustafa [2 ]
Yusuf, Aneeq [1 ]
Khan, Rehanullah [2 ]
机构
[1] Karachi Inst Econ & Technol, Coll Comp & Informat Sci, Karachi, Pakistan
[2] Qassim Univ, Coll Comp, Mulaidah, Saudi Arabia
关键词
Big data; deep learning; deep auto-encoders; Restricted Boltzmann Machines (RBM);
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The goal of big data analytics is to analyze datasets with a higher amount of volume, velocity, and variety for large-scale business intelligence problems. These workloads are normally processed with the distribution on massively parallel analytical systems. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning plays a significant role in the information analysis by adding value to the massive amount of unsupervised data. A core domain of research is related to the development of deep learning algorithms for auto-extraction of complex data formats at a higher level of abstraction using the massive volumes of data. In this paper, we present the latest research trends in the development of parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on various parallel architectures. The basic building blocks for deep learning such as Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) are identified and analyzed for parallelization of deep learning models. We proposed a parallel software API based on PyTorch, Hadoop Distributed File System (HDFS), Apache Hadoop MapReduce and MapReduce Job (MRJob) for developing large-scale deep learning models. We obtained about 5-30% reduction in the execution time of the deep auto-encoder model even on a single node Hadoop cluster. Furthermore, the complexity of code development is significantly reduced to create multi-layer deep learning models.
引用
收藏
页码:557 / 566
页数:10
相关论文
共 50 条
  • [31] HiPerData: An Autonomous Large-Scale Model Building and Management Platform for Big Data Analytics
    Duan, Rubing
    Goh, Rick Siow Mong
    Yang, Feng
    Di Shang, Richard
    Liu, Yong
    Li, Zengxiang
    Wang, Long
    Lu, Sifei
    Yang, Xulei
    Qin, Zheng
    PROCEEDINGS OF THE 2015 10TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, 2015, : 449 - 454
  • [32] Big Data Analytics on Large-Scale Scientific Datasets in the INDIGO-DataCloud Project
    Fiore, Sandro
    Palazzo, Cosimo
    D'Anca, Alessandro
    Elia, Donatello
    Londero, Elisa
    Knapic, Cristina
    Monna, Stephen
    Marcucci, Nicola M.
    Aguilar, Fernando
    Plociennik, Marcin
    De Lucas, Jesus E. Marco
    Aloisio, Giovanni
    ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 343 - 348
  • [33] Towards Big Data Analytics in Large-Scale Federations of Semantically Heterogeneous IoT Platforms
    Kalamaras, Ilias
    Kaklanis, Nikolaos
    Votis, Kostantinos
    Tzovaras, Dimitrios
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 520 : 13 - 23
  • [34] Large-Scale Graph Analytics in Aster 6: Bringing Context to Big Data Discovery
    Simmen, David
    Schnaitter, Karl
    Davis, Jeff
    He, Yingjie
    Lohariwala, Sangeet
    Mysore, Ajay
    Shenoi, Vinayak
    Tan, Mingfeng
    Xiao, Yu
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13): : 1405 - 1416
  • [35] Real-Time Large-Scale Big Data Networks Analytics and Visualization Architecture
    Chopade, Pravin
    Zhan, Justin
    Roy, Kaushik
    Flurchick, Kenneth
    2015 12TH INTERNATIONAL CONFERENCE & EXPO ON EMERGING TECHNOLOGIES FOR A SMARTER WORLD (CEWIT), 2015,
  • [36] Visual software analytics for the build optimization of large-scale software systems
    Telea, Alexandru
    Voinea, Lucian
    COMPUTATIONAL STATISTICS, 2011, 26 (04) : 635 - 654
  • [37] Visual software analytics for the build optimization of large-scale software systems
    Alexandru Telea
    Lucian Voinea
    Computational Statistics, 2011, 26 : 635 - 654
  • [38] A Hybrid Data Model for Large-Scale Analytics
    Feo, John
    2018 ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2018, : 269 - 269
  • [39] Explore adaptive dropout deep computing and reinforcement learning to large-scale tasks processing for big data
    Zhao, Jia
    Hu, Ming
    Ding, Yan
    Xu, Gaochao
    Wu, Chunyi
    2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2019,
  • [40] Utilizing Web Analytics in the Context of Learning Analytics for Large-Scale Online Learning
    Robloff, Tobias
    Oldag, Soren
    Renz, Jan
    Meinel, Christoph
    PROCEEDINGS OF 2019 IEEE GLOBAL ENGINEERING EDUCATION CONFERENCE (EDUCON), 2019, : 296 - 305