On the role of message broker middleware for many-task computing on a big-data platform

被引：4

作者：

Cao Ngoc Nguyen

Jaehwan Lee

Soonwook Hwang

Jik-Soo Kim

机构：

[1] University of Science & Technology,Korea Institute of Science and Technology Information

[2] Korea Aerospace University,School of Electronics and Information Engineering

[3] Myongji University,Department of Computer Engineering

来源：

Cluster Computing | 2019年 / 22卷

关键词：

Many-task computing; Message broker middleware; Hadoop; YARN; ActiveMQ; Kafka; MOHA; Load balancing;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We have designed and implemented a new data processing framework called “Many-task computing On HAdoop” (MOHA) which aims to effectively support fine-grained many-task applications that can show another type of data-intensive workloads in the YARN-based Hadoop 2.0 platform. MOHA is developed as one of Hadoop YARN applications so that it can transparently co-host existing many-task computing (MTC) applications with other data processing workflows such as MapReduce in a single Hadoop cluster. In this paper, we investigate main characteristics of two well-known open-source message broker middleware systems (Apache ActiveMQ and Kafka) and their implications on a many-task management scheme in our MOHA framework. Through our extensive experiments with a real MTC application, we demonstrate and discuss trade-offs between parallelism and load balancing of data access patterns in message broker middleware systems for Many-Task Computing on Hadoop.

引用

页码：2527 / 2540

页数：13

共 50 条

[31] Towards Asynchronous Many-Task In Situ Data Analysis Using Legion
Pebay, Philippe
Bennett, Janine C.
Hollman, David
Treichler, Sean
McCormick, Patrick S.
Sweeney, Christine M.
Kolla, Hemanth
Aiken, Alex
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1033 - 1037
[32] Swift/T: Scalable Data Flow Programming for Many-Task Applications
Wozniak, Justin M.
Armstrong, Timothy G.
Wilde, Michael
Katz, Daniel S.
Lusk, Ewing
Foster, Ian T.
ACM SIGPLAN NOTICES, 2013, 48 (08) : 309 - 310
[33] Big-data platform based on open source ecosystem
Lei J.
Ye H.
Wu Z.
Zhang P.
Xie L.
He Y.
1600, Science Press (54): : 80 - 93
[34] Performance optimization of computing task scheduling based on the Hadoop big data platform
Li, Yang
Hei, Xinhong
NEURAL COMPUTING & APPLICATIONS, 2022,
[35] Dynamic DAG scheduling for many-task computing of distributed eco-hydrological model
Yue, Shasha
Ma, Yan
Chen, Lajiao
Wang, Yuzhu
Song, Weijing
JOURNAL OF SUPERCOMPUTING, 2019, 75 (02): : 510 - 532
[36] Dynamic DAG scheduling for many-task computing of distributed eco-hydrological model
Shasha Yue
Yan Ma
Lajiao Chen
Yuzhu Wang
Weijing Song
The Journal of Supercomputing, 2019, 75 : 510 - 532
[37] Disk Cache-Aware Task Scheduling For Data-Intensive and Many-Task Workflow
Tanaka, Masahiro
Tatebe, Osamu
2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2014, : 167 - 175
[38] Applying could computing to analysis to the big-data stock system
Chen, Chiu-Chin
Liao, Chia-Chun
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 36 - 36
[39] Libra and the Art of Task Sizing in Big-Data Analytic Systems
Li, Rui
Guo, Peizhen
Hu, Bo
Hu, Wenjun
PROCEEDINGS OF THE 2019 TENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '19), 2019, : 364 - 376
[40] Mechanism of a big-data platform for residential heat energy consumption
Ku, Tai-Yeon
Park, Wan-Ki
Choi, Hoon
12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 1450 - 1452

← 1 2 3 4 5 →