Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

被引:0
|
作者
Khan, Ayaz H. [1 ]
Qamar, Ali Mustafa [2 ]
Yusuf, Aneeq [1 ]
Khan, Rehanullah [2 ]
机构
[1] Karachi Inst Econ & Technol, Coll Comp & Informat Sci, Karachi, Pakistan
[2] Qassim Univ, Coll Comp, Mulaidah, Saudi Arabia
关键词
Big data; deep learning; deep auto-encoders; Restricted Boltzmann Machines (RBM);
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The goal of big data analytics is to analyze datasets with a higher amount of volume, velocity, and variety for large-scale business intelligence problems. These workloads are normally processed with the distribution on massively parallel analytical systems. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning plays a significant role in the information analysis by adding value to the massive amount of unsupervised data. A core domain of research is related to the development of deep learning algorithms for auto-extraction of complex data formats at a higher level of abstraction using the massive volumes of data. In this paper, we present the latest research trends in the development of parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on various parallel architectures. The basic building blocks for deep learning such as Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) are identified and analyzed for parallelization of deep learning models. We proposed a parallel software API based on PyTorch, Hadoop Distributed File System (HDFS), Apache Hadoop MapReduce and MapReduce Job (MRJob) for developing large-scale deep learning models. We obtained about 5-30% reduction in the execution time of the deep auto-encoder model even on a single node Hadoop cluster. Furthermore, the complexity of code development is significantly reduced to create multi-layer deep learning models.
引用
收藏
页码:557 / 566
页数:10
相关论文
共 50 条
  • [21] Big Data Analytics - an Influence of Deep Learning
    Chandralekha, C.
    Divya, S.
    Aiswarya, N.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (06): : 220 - 223
  • [22] Editorial: Deep Learning for Big Data Analytics
    Wu, Yulei
    Hao, Fei
    Bakshi, Sambit
    Huang, Haojun
    MOBILE NETWORKS & APPLICATIONS, 2021, 26 (06): : 2315 - 2317
  • [23] Significance of deep learning on big data analytics
    Mao, Jilei
    Mao, Zijun
    CIVIL, ARCHITECTURE AND ENVIRONMENTAL ENGINEERING, VOLS 1 AND 2, 2017, : 1597 - 1600
  • [24] Explore Deep Neural Network and Reinforcement Learning to Large-scale Tasks Processing in Big Data
    Wu, Chunyi
    Xu, Gaochao
    Ding, Yan
    Zhao, Jia
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (13)
  • [25] Deep learning for the large-scale cancer data analysis
    Tsuji, Shingo
    Aburatani, Hiroyuki
    CANCER RESEARCH, 2015, 75 (22)
  • [26] Effective ensemble learning approach for large-scale medical data analytics
    Namamula, Lakshmana Rao
    Chaytor, Daniel
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (01) : 13 - 20
  • [27] Effective ensemble learning approach for large-scale medical data analytics
    Lakshmana Rao Namamula
    Daniel Chaytor
    International Journal of System Assurance Engineering and Management, 2024, 15 : 13 - 20
  • [28] Application of Big Data Analytics and Machine Learning to Large-Scale Synchrophasor Datasets: Evaluation of Dataset 'Machine Learning-Readiness'
    Hart, Philip
    He, Lijun
    Wang, Tianyi
    Kumar, Vijay S.
    Aggour, Kareem
    Subramanian, Arun
    Yan, Weizhong
    IEEE OPEN ACCESS JOURNAL OF POWER AND ENERGY, 2022, 9 : 386 - 397
  • [29] Edge Enhanced Deep Learning System for Large-scale Video Stream Analytics
    Ali, M.
    Anjum, A.
    Yaseen, M. U.
    Zamani, A. R.
    Balouek-Thomert, D.
    Rana, O.
    Parashar, M.
    2018 IEEE 2ND INTERNATIONAL CONFERENCE ON FOG AND EDGE COMPUTING (ICFEC), 2018,
  • [30] A machine learning software for large-scale molecular and clinical data
    Pan, L.
    Mikolajczyk, K.
    Dimitrakopoulou-Strauss, A.
    Burger, C.
    Strauss, L.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2007, 34 : S343 - S343