On the role of message broker middleware for many-task computing on a big-data platform

被引:4
|
作者
Cao Ngoc Nguyen
Jaehwan Lee
Soonwook Hwang
Jik-Soo Kim
机构
[1] University of Science & Technology,Korea Institute of Science and Technology Information
[2] Korea Aerospace University,School of Electronics and Information Engineering
[3] Myongji University,Department of Computer Engineering
来源
Cluster Computing | 2019年 / 22卷
关键词
Many-task computing; Message broker middleware; Hadoop; YARN; ActiveMQ; Kafka; MOHA; Load balancing;
D O I
暂无
中图分类号
学科分类号
摘要
We have designed and implemented a new data processing framework called “Many-task computing On HAdoop” (MOHA) which aims to effectively support fine-grained many-task applications that can show another type of data-intensive workloads in the YARN-based Hadoop 2.0 platform. MOHA is developed as one of Hadoop YARN applications so that it can transparently co-host existing many-task computing (MTC) applications with other data processing workflows such as MapReduce in a single Hadoop cluster. In this paper, we investigate main characteristics of two well-known open-source message broker middleware systems (Apache ActiveMQ and Kafka) and their implications on a many-task management scheme in our MOHA framework. Through our extensive experiments with a real MTC application, we demonstrate and discuss trade-offs between parallelism and load balancing of data access patterns in message broker middleware systems for Many-Task Computing on Hadoop.
引用
收藏
页码:2527 / 2540
页数:13
相关论文
共 50 条
  • [31] Towards Asynchronous Many-Task In Situ Data Analysis Using Legion
    Pebay, Philippe
    Bennett, Janine C.
    Hollman, David
    Treichler, Sean
    McCormick, Patrick S.
    Sweeney, Christine M.
    Kolla, Hemanth
    Aiken, Alex
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1033 - 1037
  • [32] Swift/T: Scalable Data Flow Programming for Many-Task Applications
    Wozniak, Justin M.
    Armstrong, Timothy G.
    Wilde, Michael
    Katz, Daniel S.
    Lusk, Ewing
    Foster, Ian T.
    ACM SIGPLAN NOTICES, 2013, 48 (08) : 309 - 310
  • [33] Big-data platform based on open source ecosystem
    Lei J.
    Ye H.
    Wu Z.
    Zhang P.
    Xie L.
    He Y.
    1600, Science Press (54): : 80 - 93
  • [34] Performance optimization of computing task scheduling based on the Hadoop big data platform
    Li, Yang
    Hei, Xinhong
    NEURAL COMPUTING & APPLICATIONS, 2022,
  • [35] Dynamic DAG scheduling for many-task computing of distributed eco-hydrological model
    Yue, Shasha
    Ma, Yan
    Chen, Lajiao
    Wang, Yuzhu
    Song, Weijing
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (02): : 510 - 532
  • [36] Dynamic DAG scheduling for many-task computing of distributed eco-hydrological model
    Shasha Yue
    Yan Ma
    Lajiao Chen
    Yuzhu Wang
    Weijing Song
    The Journal of Supercomputing, 2019, 75 : 510 - 532
  • [37] Disk Cache-Aware Task Scheduling For Data-Intensive and Many-Task Workflow
    Tanaka, Masahiro
    Tatebe, Osamu
    2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2014, : 167 - 175
  • [38] Applying could computing to analysis to the big-data stock system
    Chen, Chiu-Chin
    Liao, Chia-Chun
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 36 - 36
  • [39] Libra and the Art of Task Sizing in Big-Data Analytic Systems
    Li, Rui
    Guo, Peizhen
    Hu, Bo
    Hu, Wenjun
    PROCEEDINGS OF THE 2019 TENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '19), 2019, : 364 - 376
  • [40] Mechanism of a big-data platform for residential heat energy consumption
    Ku, Tai-Yeon
    Park, Wan-Ki
    Choi, Hoon
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1450 - 1452