Large-scale Neural Modeling in MapReduce and Giraph

被引:0
|
作者
Yang, Shuo [1 ]
Spielman, Nicholas D. [2 ]
Jackson, Jadin C. [3 ]
Rubin, Brad S. [1 ]
机构
[1] St Thomas Univ, Grad Programs Software, St Paul, MN 55455 USA
[2] Neurosci Program Univ St Thomas, Minneapolis, MN USA
[3] Univ St Thomas, Dept Biol, Minneapolis, MN USA
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
One of the most crucial challenges in scientific computing is scalability. Hadoop, an open-source implementation of the MapReduce parallel programming model developed by Google, has emerged as a powerful platform for performing large-scale scientific computing at very low costs. In this paper, we explore the use of Hadoop to model large-scale neural networks. A neural network is most naturally modeled by a graph structure with iterative processing. In this paper, we first present an improved graph algorithm design pattern in MapReduce called Mapper-side Schimmy. Experiments show that the application of our design pattern, combined with the current best practices, can reduce the running time of the neural network simulation on a neural network with 100,000 neurons and 2.3 billion edges by 64%. MapReduce, however, is inherently not efficient for iterative graph processing. To address the limitation of the MapReduce model, we then explore the use of Giraph, an open source large-scale graph processing framework that sits on top of Hadoop to implement graph algorithms with a vertex-centric approach. We show that our Giraph implementation boosted performance by 91% compared to a basic MapReduce implementation and by 60% compared to our improved Mapper-side Schimmy algorithm.
引用
收藏
页码:556 / 561
页数:6
相关论文
共 50 条
  • [1] Giraph-Based Distributed Algorithms for Coloring Large-Scale Graphs
    Brighen, Assia
    Chouikh, Asma
    Ikhlef, Hamida
    Slimani, Hachem
    Rezgui, Abdelmounaam
    Kheddouci, Hamamache
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2025, 53 (01)
  • [2] Large-scale incremental processing with MapReduce
    Lee, Daewoo
    Kim, Jin-Soo
    Maeng, Seungryoul
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 36 : 66 - 79
  • [3] MapReduce for Large-scale Monitor Data Analyses
    Ding, Jianwei
    Liu, Yingbo
    Zhang, Li
    Wang, Jianmin
    2014 IEEE 13TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM), 2014, : 747 - 754
  • [4] Large-Scale Deep Belief Nets With MapReduce
    Zhang, Kunlei
    Chen, Xue-Wen
    IEEE ACCESS, 2014, 2 : 395 - 403
  • [5] MapReduce in MPI for Large-scale graph algorithms
    Plimpton, Steven J.
    Devine, Karen D.
    PARALLEL COMPUTING, 2011, 37 (09) : 610 - 632
  • [6] Large-Scale Frequent Subgraph Mining in MapReduce
    Lin, Wenqing
    Xiao, Xiaokui
    Ghinita, Gabriel
    2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 844 - 855
  • [7] Large-scale data modeling in Hive and distributed query processing using Mapreduce and Tez
    Adamov, Abzetdin
    DIVAI 2018: 12TH INTERNATIONAL SCIENTIFIC CONFERENCE ON DISTANCE LEARNING IN APPLIED INFORMATICS, 2018, : 389 - 404
  • [8] Mining large-scale repetitive sequences in a MapReduce setting
    Cao, Hongfei
    Phinney, Michael
    Petersohn, Devin
    Merideth, Benjamin
    Shyu, Chi-Ren
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 14 (03) : 210 - 228
  • [9] Review of large-scale RDF data processing in mapreduce
    Hou, Ke
    Zhang, Ming
    Fang, Xing
    Journal of Software Engineering, 2015, 9 (01): : 195 - 202
  • [10] Efficient Large-scale Trace Checking Using MapReduce
    Bersani, Marcello M.
    Bianculli, Domenico
    Ghezzi, Carlo
    Krstic, Srdan
    San Pietro, Pierluigi
    2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, : 888 - 898