An Effective Framework for Enhancing Query Answering in a Heterogeneous Data Lake

被引:3
|
作者
Yuan, Qin [1 ]
Yuan, Ye [1 ]
Wen, Zhenyu [2 ]
Wang, He [1 ]
Tang, Shiyuan [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Zhejiang Univ Technol, Hangzhou, Peoples R China
基金
国家重点研发计划;
关键词
heterogeneous data lake; relational schema; query answering; SIMILARITY SEARCH;
D O I
10.1145/3539618.3591637
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There has been a growing interest in cross-source searching to gain rich knowledge in recent years. A data lake collects massive raw and heterogeneous data with different data schemas and query interfaces. Many real-life applications require query answering over the heterogeneous data lake, such as e-commerce, bioinformatics and healthcare. In this paper, we propose LakeAns that semantically integrates heterogeneous data schemas of the lake to enhance the semantics of query answers. To this end, we propose a novel framework to efficiently and effectively perform the cross-source searching. The framework exploits a reinforcement learning method to semantically integrate the data schemas and further create a global relational schema for the heterogeneous data. It then performs a query answering algorithm based on the global schema to find answers across multiple data sources. We conduct extensive experimental evaluations using real-life data to verify that our approach outperforms existing solutions in terms of effectiveness and efficiency.
引用
收藏
页码:770 / 780
页数:11
相关论文
共 50 条
  • [31] Query Answering with Transitive and Linear-Ordered Data
    Amarilli, Antoine
    Benedikt, Michael
    Bourhis, Pierre
    Boom, Michael Vanden
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 63 : 191 - 264
  • [32] Approximate Query Answering and Result Refinement on XML Data
    Seidler, Katja
    Peukert, Eric
    Hackenbroich, Gregor
    Lehner, Wolfgang
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2010, 6187 : 78 - +
  • [33] Best Effort Query Answering in Dataspaces on Unstructured Data
    Sheokand, Vishal
    Singh, Vikram
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 155 - 159
  • [34] Aggregate Query Answering on Possibilistic Data with Cardinality Constraints
    Cormode, Graham
    Srivastava, Divesh
    Shen, Entong
    Yu, Ting
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 258 - 269
  • [35] Linguistic Query Answering on Data Cubes with Time Dimension
    Castillo-Ortega, Rita
    Marin, Nicolas
    Sanchez, Daniel
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2011, 26 (10) : 1002 - 1021
  • [36] Approximate query answering using data warehouse striping
    Bernardino, JR
    Furtado, PS
    Madeira, HC
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2002, 19 (02) : 145 - 167
  • [37] Data Mining for XML Query-Answering Support
    Mazuran, Mirjana
    Quintarelli, Elisa
    Tanca, Letizia
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (08) : 1393 - 1407
  • [38] A Semi-automatic Data Generator for Query Answering
    Angiulli, Fabrizio
    Del Prete, Alessandra
    Fassetti, Fabio
    Nistico, Simona
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2022), 2022, 13515 : 106 - 114
  • [39] An efficient consistent query answering method for data integration
    Zhang, Xiao-Gang
    Yang, Lu-Ming
    Pan, Jiu-Hui
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2014, 42 (08): : 1474 - 1479
  • [40] Approximate Query Answering Using Data Warehouse Striping
    Jorge R. Bernardino
    Pedro S. Furtado
    Henrique C. Madeira
    Journal of Intelligent Information Systems, 2002, 19 : 145 - 167