Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data Exploration

被引：0

作者：

Zhang, Hanbing ^{[1
]}

Jing, Yinan ^{[1
]}

He, Zhenying ^{[1
]}

Zhang, Kai ^{[1
]}

Wang, X. Sean ^{[1
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai 200437, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Measurement; Adaptation models; Costs; Tuners; Accuracy; Q-learning; Query processing; Optimization; Synthetic data; Approximate query processing; interactive data exploration; data analysis;

D O I：

10.1109/TKDE.2023.3341451

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For interactive data exploration, approximate query processing (AQP) is a useful approach that usually uses samples to provide a timely response for queries by trading query accuracy. Existing AQP systems often materialize samples in the memory for reuse to speed up query processing. How to tune the samples according to the workload is one of the key problems in AQP. However, since the data exploration workload is so complex that it cannot be accurately predicted, existing sample tuning approaches cannot adapt to the changing workload very well. To address this problem, this paper proposes a deep reinforcement learning-based sample tuner, RL-STuner. When tuning samples, RL-STuner considers the workload changes from a global perspective and uses a Deep Q-learning Network (DQN) model to select an optimal sample set that has the maximum utility for the current workload. In addition, this paper proposes a set of optimization mechanisms to reduce the sample tuning cost. Experimental results on both real-world and synthetic datasets show that RL-STuner outperforms the existing sample tuning approaches and achieves 1.6x-5.2x improvements on query accuracy with a low tuning cost.

引用

页码：6532 / 6546

页数：15

共 50 条

[31] A learning-based framework for spatial join processing: estimation, optimization and tuning
Vu, Tin
Belussi, Alberto
Migliorini, Sara
Eldawy, Ahmed
VLDB JOURNAL, 2024, 33 (04): : 1155 - 1177
[32] An analysis of query-agnostic sampling for interactive data exploration
Liu, Wenzhao
Diao, Yanlei
Liu, Anna
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2018, 47 (16) : 3820 - 3837
[33] Efficiently processing deterministic approximate aggregation query on massive data
Xixian Han
Bailing Wang
Jianzhong Li
Hong Gao
Knowledge and Information Systems, 2018, 57 : 437 - 473
[34] Efficiently processing deterministic approximate aggregation query on massive data
Han, Xixian
Wang, Bailing
Li, Jianzhong
Gao, Hong
KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 57 (02) : 437 - 473
[35] A Session-Based Approach to Fast-But-Approximate Interactive Data Cube Exploration
Kamat, Niranjan
Nandi, Arnab
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (01)
[36] Deep Learning-Based Enhancement of Small Sample Liquefaction Data
Chen, Mingyue
Kang, Xin
Ma, Xiongying
INTERNATIONAL JOURNAL OF GEOMECHANICS, 2023, 23 (09)
[37] Deep learning-based real-time query processing for wireless sensor network
Lee, Ki-Seong
Lee, Sun-Ro
Kim, Youngmin
Lee, Chan-Gun
INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (05):
[38] Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks
Bao, Liang
Liu, Xin
Chen, Weizhao
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 181 - 190
[39] Flex Query: An Online Query System for Interactive Remote Visual Data Exploration at Large Scale
Zou, Hongbo
Schwan, Karsten
Slawinska, Magdalena
Wolf, Matt
Eisenhauer, Greg
Zheng, Fang
Dayal, Jai
Logan, Jeremy
Liu, Qing
Klasky, Scott
Bode, Tanja
Clark, Michael
Kinsey, Matt
2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
[40] Learning-Based SPARQL Query Performance Prediction
Zhang, Wei Emma
Sheng, Quan Z.
Taylor, Kerry
Qin, Yongrui
Yao, Lina
WEB INFORMATION SYSTEMS ENGINEERING - WISE 2016, PT I, 2016, 10041 : 313 - 327

← 1 2 3 4 5 →