Multivariate modeling and two-level scheduling of analytic queries

被引:2
|
作者
Liu, Zhuo [1 ]
Nath, Amit Kumar [2 ]
Ding, Xiaoning [3 ]
Fu, Huansong [2 ]
Khan, Md Muhib [2 ]
Yu, Weikuan [2 ]
机构
[1] Auburn Univ, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA
[2] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
[3] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
基金
美国国家科学基金会;
关键词
MapReduce; Multivariate modeling; Query scheduling; MANAGEMENT;
D O I
10.1016/j.parco.2019.01.006
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Analytic queries are typically compiled into execution plans in the form of directed acyclic graphs (DAGs) of MapReduce jobs. Jobs in the DAGs are dispatched to the MapReduce processing engine as soon as their dependencies are satisfied. MapReduce adopts a job-level scheduling policy to strive for a balanced distribution of tasks and effective utilization of resources. However, such simplistic policy is unable to reconcile the dynamics of different jobs in complex analytic queries, resulting in unfair treatment of different queries, low utilization of system resources, prolonged execution time, and low query throughput. Therefore, we introduce a scheduling framework to address these problems systematically. Our framework includes two techniques: multivariate DAG modeling and two-level query scheduling. Cross-layer semantics percolation allows the flow of query semantics and job dependencies in the DAG to the MapReduce scheduler. With richer semantics information, we build a multivariate model that can accurately predict the execution time of individual MapReduce jobs and gauge the changing size of analytics datasets through selectivity approximation. Furthermore, we introduce two-level query scheduling that can maximize the intra-query job-level concurrency, and at the same time speed up the query-level completion time based on the accurate prediction and queuing of queries. At the job level, we focus on detecting query semantics, predicting the query completion time through an online multivariate linear regression model, thereby increasing job-level parallelism and maximizing data sharing across jobs. At the task level, we focus on balanced data distribution, maximal slot utilization, and optimal data locality of task scheduling. Our experimental results on a set of complex query benchmarks demonstrate that our scheduling framework can significantly improve both fairness and throughput of Hive queries. It can improve query response time by up to 43.9% and 72.8% on average, compared to the Hadoop Fair Scheduling and the Hadoop Capacity Scheduling, respectively. In addition, our two-level scheduler can achieve a query fairness that is, on average, 59.8% better than that of the Hadoop Fair Scheduler. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:66 / 78
页数:13
相关论文
共 50 条
  • [21] A Two-level Modeling Methodology for Memristive Devices
    Jimenez-Leon, Jesus
    Sarmiento-Reyes, Arturo
    Rosales-Quintero, Pedro
    2021 18TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, COMPUTING SCIENCE AND AUTOMATIC CONTROL (CCE 2021), 2021,
  • [22] A two-level multivariate response model for data with latent structures
    Zhang, Yingjuan
    Einbeck, Jochen
    Drikvandi, Reza
    STATISTICAL MODELLING, 2025,
  • [23] Efficient and fair scheduling for two-level information broadcasting systems
    Lee, Byoung-Hoon
    Lim, Sung-Hwa
    Kim, Jai-Hoon
    Cho, We-Duke
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2008, 20 (18): : 2179 - 2200
  • [24] Two-level Approach for Scheduling Multiproduct Oil Distribution Systems
    Mostafaei, Hossein
    Castro, Pedro M.
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON OPERATIONS RESEARCH AND ENTERPRISE SYSTEMS (ICORES), 2017, : 150 - 159
  • [25] Two-Level Task Scheduling for Irregular Applications on GPU Platform
    Li, Jing
    Liu, Lei
    Wu, Yuan
    Feng, Xiaobing
    Wu, Chengyong
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (01) : 79 - 93
  • [26] A Two-level Hierarchical Scheduling Method for Independent Tasks in Grids
    Weng, Tien-Hsiung
    Chang, Chia-Fu
    Liu, Chun-Chieh
    Hsu, Ching-Hsien
    Wen, Chia-Hsien
    Chou, Wen-Kuang
    Li, Kuan-Ching
    de Mello, Rodrigo Fernandes
    2008 FIRST IEEE INTERNATIONAL CONFERENCE ON UBI-MEDIA COMPUTING AND WORKSHOPS, PROCEEDINGS, 2008, : 478 - +
  • [27] Hardness of approximate two-level logic minimization and PAC learning with membership queries
    Feldman, Vitaly
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2009, 75 (01) : 13 - 26
  • [28] Two-Level Distributed Opportunistic Scheduling in DF Relay Networks
    Dong, Lei
    Wang, Yongchao
    Jiang, Hai
    Zhang, Zhou
    Zhou, Shuai
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2015, 4 (05) : 477 - 480
  • [29] Performance evaluation of two-level scheduling algorithms for NUMA multiprocessors
    Nara Inst of Science and Technology, Ikoma, Japan
    Syst Comput Jpn, 2 (36-46):
  • [30] Competitive Two-Level Adaptive Scheduling Using Resource Augmentation
    Sun, Hongyang
    Cao, Yangjie
    Hsu, Wen-Jing
    JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING, 2009, 5798 : 207 - +