Optimization of sub-query processing in distributed data integration systems

被引:10
|
作者
Chen, Gang [1 ]
Wu, Yongwei [1 ]
Liu, Jia [1 ]
Yang, Guangwen [1 ]
Zheng, Weimin [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
关键词
Cloud computing; Grid computing; Data integration; Query; Data flow;
D O I
10.1016/j.jnca.2010.06.007
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data integration system (DIS) is becoming paramount when Cloud/Grid applications need to integrate and analyze data from geographically distributed data sources. DIS gathers data from multiple remote sources, integrates and analyzes the data to obtain a query result. As Clouds/Grids are distributed over wide-area networks, communication cost usually dominates overall query response time. Therefore we can expect that query performance can be improved by minimizing communication cost. In our method, DIS uses a data flow style query execution model. Each query plan is mapped to a group of mu Engines, each of which is a program corresponding to a particular operator. Thus, multiple sub-queries from concurrent queries are able to share mu Engines. We reconstruct these sub-queries to exploit overlapping data among them. As a result, all the sub-queries can obtain their results, and overall communication overhead can be reduced. Experimental results show that, when DIS runs a group of parameterized queries, our reconstructing algorithm can reduce the average query completion time by 32-48%; when DIS runs a group of non-parameterized queries, the average query completion time of queries can be reduced by 25-35%. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1035 / 1042
页数:8
相关论文
共 50 条
  • [21] MERCI: Efficient Embedding Reduction on Commodity Hardware via Sub-query Memoization
    Lee, Yejin
    Seo, Seong Hoon
    Choi, Hyunji
    Sul, Hyoung Uk
    Kim, Soosung
    Lee, Jae W.
    Ham, Tae Jun
    ASPLOS XXVI: TWENTY-SIXTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2021, : 302 - 313
  • [22] Active Integration of Databases in Grids for Scalable Distributed Query Processing
    Woehrer, Alexander
    Brezany, Peter
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2008, PART I, 2008, 5331 : 762 - 774
  • [23] QUERY-PROCESSING AND DATA ALLOCATION IN DISTRIBUTED DATABASE-SYSTEMS - APERS,PMG
    JONES, S
    COMPUTER JOURNAL, 1984, 27 (01): : 93 - 93
  • [24] Efficient OLAP query processing in distributed data warehouses
    Akinde, M
    Böhlen, M
    Johnson, T
    Lakshmanan, LVS
    Srivastava, D
    18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 262 - 262
  • [25] Efficient OLAP query processing in distributed data warehouses
    Akinde, MO
    Böhlen, MH
    Johnson, T
    Lakshmanan, LVS
    Srivastava, D
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2002, 2002, 2287 : 336 - 353
  • [26] QUERY-PROCESSING IN DISTRIBUTED DATABASES WITH NONDISJOINT DATA
    GOYAL, P
    NARAYANAN, TS
    SADRI, F
    INFORMATION SYSTEMS, 1993, 18 (07) : 419 - 427
  • [27] Efficient OLAP query processing in distributed data warehouses
    Akinde, MO
    Böhlen, MH
    Johnson, T
    Lakshmanan, LVS
    Srivastava, D
    INFORMATION SYSTEMS, 2003, 28 (1-2) : 111 - 135
  • [28] Distributed Join Query Processing for Big RDF Data
    Elzein, Nahla Mohammed
    Majid, Mazlina Abdul
    Fakherldin, Mohammed
    Hashem, Ibrahim Abaker Targio
    ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7758 - 7761
  • [29] Distributed data summaries for approximate query processing in PDMS
    Hose, Katja
    Klan, Daniel
    Sattler, Kai-Uwe
    10TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2006, : 37 - 44
  • [30] An Efficient Nested Query Processing for Distributed Database Systems
    Kang, Yu-Jin
    Choi, Chi-Hawn
    Yang, Kyung-En
    Kim, Hun-Gi
    Choi, Wan-Sup
    CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, 2011, 206 : 669 - +