Answering ad hoc aggregate queries from data streams using prefix aggregate trees

被引:0
|
作者
Moonjung Cho
Jian Pei
Ke Wang
机构
[1] State University of New York at Buffalo,Department of Computer Science and Engineering
[2] Simon Fraser University,School of Computing Science
[3] 8888 University Drive,undefined
来源
关键词
Data warehousing; Data cube; Data stream; Online analytic processing (OLAP); Aggregate query;
D O I
暂无
中图分类号
学科分类号
摘要
In some business applications such as trading management in financial institutions, it is required to accurately answer ad hoc aggregate queries over data streams. Materializing and incrementally maintaining a full data cube or even its compression or approximation over a data stream is often computationally prohibitive. On the other hand, although previous studies proposed approximate methods for continuous aggregate queries, they cannot provide accurate answers. In this paper, we develop a novel prefix aggregate tree (PAT) structure for online warehousing data streams and answering ad hoc aggregate queries. Often, a data stream can be partitioned into the historical segment, which is stored in a traditional data warehouse, and the transient segment, which can be stored in a PAT to answer ad hoc aggregate queries. The size of a PAT is linear in the size of the transient segment, and only one scan of the data stream is needed to create and incrementally maintain a PAT. Although the query answering using PAT costs more than the case of a fully materialized data cube, the query answering time is still kept linear in the size of the transient segment. Our extensive experimental results on both synthetic and real data sets illustrate the efficiency and the scalability of our design.
引用
收藏
页码:301 / 329
页数:28
相关论文
共 50 条
  • [31] Aggregate Query Answering on Possibilistic Data with Cardinality Constraints
    Cormode, Graham
    Srivastava, Divesh
    Shen, Entong
    Yu, Ting
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 258 - 269
  • [32] Efficient aggregate computation over data streams
    Nagaraj, Kanthi
    Naidu, K. V. M.
    Rastogi, Rajeev
    Satkin, Scott
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1382 - +
  • [33] CubiST++:: Evaluating ad-hoc CUBE queries using statistics trees
    Hammer, J
    Fu, LX
    DISTRIBUTED AND PARALLEL DATABASES, 2003, 14 (03) : 221 - 254
  • [34] Certificateless aggregate deniable authentication protocol for ad hoc networks
    Jin, Chunhua
    Zhao, Jianyang
    INTERNATIONAL JOURNAL OF ELECTRONIC SECURITY AND DIGITAL FORENSICS, 2018, 10 (02) : 168 - 187
  • [35] Data structures for range-aggregate extent queries
    Gupta, Prosenjit
    Janardan, Ravi
    Kumar, Yokesh
    Smid, Michiel
    COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2014, 47 (02): : 329 - 347
  • [36] Improvising Range Aggregate Queries in Big Data Environment
    Arbad, Ganesh R.
    Kulkarni, P. V.
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 1896 - 1901
  • [37] RFID-Data Compression for Supporting Aggregate Queries
    Fazzinga, Bettina
    Flesca, Sergio
    Furfaro, Filippo
    Masciari, Elio
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2013, 38 (02): : 1 - 45
  • [38] Queries with aggregate functions over fuzzy RDF data
    Zongmin Ma
    Xiaowen Zhang
    Yuhan Zhao
    The Journal of Supercomputing, 2023, 79 : 14780 - 14807
  • [39] Performing Range Aggregate Queries in Stream Data Warehouse
    Gorawski, Marcin
    Malczok, Rafal
    MAN-MACHINE INTERACTIONS, 2009, 59 : 615 - 622
  • [40] Improving estimation accuracy of aggregate queries on data cubes
    Pourabbas, E.
    Shoshani, A.
    DATA & KNOWLEDGE ENGINEERING, 2010, 69 (01) : 50 - 72