Multivariate modeling and two-level scheduling of analytic queries

被引:2
|
作者
Liu, Zhuo [1 ]
Nath, Amit Kumar [2 ]
Ding, Xiaoning [3 ]
Fu, Huansong [2 ]
Khan, Md Muhib [2 ]
Yu, Weikuan [2 ]
机构
[1] Auburn Univ, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA
[2] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
[3] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
基金
美国国家科学基金会;
关键词
MapReduce; Multivariate modeling; Query scheduling; MANAGEMENT;
D O I
10.1016/j.parco.2019.01.006
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Analytic queries are typically compiled into execution plans in the form of directed acyclic graphs (DAGs) of MapReduce jobs. Jobs in the DAGs are dispatched to the MapReduce processing engine as soon as their dependencies are satisfied. MapReduce adopts a job-level scheduling policy to strive for a balanced distribution of tasks and effective utilization of resources. However, such simplistic policy is unable to reconcile the dynamics of different jobs in complex analytic queries, resulting in unfair treatment of different queries, low utilization of system resources, prolonged execution time, and low query throughput. Therefore, we introduce a scheduling framework to address these problems systematically. Our framework includes two techniques: multivariate DAG modeling and two-level query scheduling. Cross-layer semantics percolation allows the flow of query semantics and job dependencies in the DAG to the MapReduce scheduler. With richer semantics information, we build a multivariate model that can accurately predict the execution time of individual MapReduce jobs and gauge the changing size of analytics datasets through selectivity approximation. Furthermore, we introduce two-level query scheduling that can maximize the intra-query job-level concurrency, and at the same time speed up the query-level completion time based on the accurate prediction and queuing of queries. At the job level, we focus on detecting query semantics, predicting the query completion time through an online multivariate linear regression model, thereby increasing job-level parallelism and maximizing data sharing across jobs. At the task level, we focus on balanced data distribution, maximal slot utilization, and optimal data locality of task scheduling. Our experimental results on a set of complex query benchmarks demonstrate that our scheduling framework can significantly improve both fairness and throughput of Hive queries. It can improve query response time by up to 43.9% and 72.8% on average, compared to the Hadoop Fair Scheduling and the Hadoop Capacity Scheduling, respectively. In addition, our two-level scheduler can achieve a query fairness that is, on average, 59.8% better than that of the Hadoop Fair Scheduler. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:66 / 78
页数:13
相关论文
共 50 条
  • [41] A Two-Level Distributed Approach to Power Network Modeling
    Sun, Hongbin
    Chen, Runze
    Guo, Qinglai
    Wang, Jing
    Zhang, Yang
    Wu, Wenchuan
    Zhang, Boming
    IEEE TRANSACTIONS ON POWER DELIVERY, 2015, 30 (03) : 1496 - 1504
  • [42] A Two-Level Approach for Modeling and Verification of Telecommunication Systems
    Beloglazov, Dmitry
    Nepomniaschy, Valery
    PERSPECTIVES OF SYSTEMS INFORMATICS, 2010, 5947 : 70 - 85
  • [43] Two-level independent component regression model for multivariate spectroscopic calibration
    Zheng, Junhua
    Song, Zhihuan
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2016, 155 : 160 - 169
  • [44] Profit coordination Modeling of Two-level Supply Chain
    Dong, Fang
    Yang, Hualong
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 6497 - 6500
  • [46] Two-level iterative queuing Modeling of software contention
    Menascé, DA
    MASCOTS 2002: 10TH IEEE INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATIONS SYSTEMS, PROCEEDINGS, 2002, : 267 - 276
  • [47] Two-level consensus modeling with utility and cost constraints
    Diao Weixue
    Liu Yong
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2022, 33 (03) : 716 - 726
  • [48] Extending Two-level Information Modeling to the Internet of Things
    Stacey, Paul
    Berry, Damon
    2019 IEEE 5TH WORLD FORUM ON INTERNET OF THINGS (WF-IOT), 2019, : 696 - 701
  • [49] A two-level approach for modeling and recognition of hand gesture
    Nam, YG
    Wohn, KY
    Hyung, LK
    3RD ASIA PACIFIC COMPUTER HUMAN INTERACTION, PROCEEDINGS, 1998, : 304 - 309
  • [50] Two-level modeling of lithium-ion batteries
    Bai, Yang
    Zhao, Ying
    Liu, Wei
    Xu, Bai-Xiang
    JOURNAL OF POWER SOURCES, 2019, 422 : 92 - 103