An Effective Framework for Enhancing Query Answering in a Heterogeneous Data Lake

被引:3
|
作者
Yuan, Qin [1 ]
Yuan, Ye [1 ]
Wen, Zhenyu [2 ]
Wang, He [1 ]
Tang, Shiyuan [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Zhejiang Univ Technol, Hangzhou, Peoples R China
基金
国家重点研发计划;
关键词
heterogeneous data lake; relational schema; query answering; SIMILARITY SEARCH;
D O I
10.1145/3539618.3591637
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There has been a growing interest in cross-source searching to gain rich knowledge in recent years. A data lake collects massive raw and heterogeneous data with different data schemas and query interfaces. Many real-life applications require query answering over the heterogeneous data lake, such as e-commerce, bioinformatics and healthcare. In this paper, we propose LakeAns that semantically integrates heterogeneous data schemas of the lake to enhance the semantics of query answers. To this end, we propose a novel framework to efficiently and effectively perform the cross-source searching. The framework exploits a reinforcement learning method to semantically integrate the data schemas and further create a global relational schema for the heterogeneous data. It then performs a query answering algorithm based on the global schema to find answers across multiple data sources. We conduct extensive experimental evaluations using real-life data to verify that our approach outperforms existing solutions in terms of effectiveness and efficiency.
引用
收藏
页码:770 / 780
页数:11
相关论文
共 50 条
  • [41] Query answering with transitive and linear-ordered data
    1600, AI Access Foundation (63):
  • [42] Distributed RDF Query Answering with Dynamic Data Exchange
    Potter, Anthony
    Motik, Boris
    Nenov, Yavor
    Horrocks, Ian
    SEMANTIC WEB - ISWC 2016, PT I, 2016, 9981 : 480 - 497
  • [43] Cost Effective Framework for Complex and Heterogeneous Data Integration in Warehouse
    Amuthabala, P.
    Mohanapriya, M.
    SOFTWARE ENGINEERING PERSPECTIVES AND APPLICATION IN INTELLIGENT SYSTEMS, VOL 2, 2016, 465 : 93 - 104
  • [44] A Comprehensive Framework for Controlled Query Evaluation, Consistent Query Answering and KB Updates in Description Logics
    Lembo, Domenico
    Rosati, Riccardo
    Savo, Domenico Fabio
    SIXTEENTH INTERNATIONAL CONFERENCE ON PRINCIPLES OF KNOWLEDGE REPRESENTATION AND REASONING, 2018, : 653 - 654
  • [45] Medical data lake query assistance
    Abdelhedi, Fatma
    Jemmali, Rym
    Zurfluh, Gilles
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [46] Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
    Begoli, Edmon
    Camacho-Rodriguez, Jesus
    Hyde, Julian
    Mior, Michael J.
    Lemire, Daniel
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 221 - 230
  • [47] CQFaRAD: Collaborative Query-Answering Framework for a Research Article Dataspace
    Singh M.
    Pandey S.
    Saxena R.
    Chaudhary M.
    Lal N.
    International Journal of Information Technology, 2024, 16 (3) : 1873 - 1886
  • [48] Distributed SPARQL query answering over RDF data streams
    Leida, Marcello
    Chu, Andrej
    2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 369 - 378
  • [49] Ontology-Based Query Answering for Probabilistic Temporal Data
    Koopmann, Patrick
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 2903 - 2910
  • [50] A study on answering a data mining query using a materialized view
    Zakrzewicz, M
    Morzy, M
    Wojciechowski, M
    COMPUTER AND INFORMATION SCIENCES - ISCIS 2004, PROCEEDINGS, 2004, 3280 : 493 - 502