Answering Approximate String Queries on Large Data Sets Using External Memory

被引:0
|
作者
Behm, Alexander [1 ]
Li, Chen [1 ]
Carey, Michael J. [1 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92717 USA
来源
IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011) | 2011年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An approximate string query is to find from a collection of strings those that are similar to a given query string. Answering such queries is important in many applications such as data cleaning and record linkage, where errors could occur in queries as well as the data. Many existing algorithms have focused on in-memory indexes. In this paper we investigate how to efficiently answer such queries in a disk-based setting, by systematically studying the effects of storing data and indexes on disk. We devise a novel physical layout for an inverted index to answer queries and we study how to construct it with limited buffer space. To answer queries, we develop a cost-based, adaptive algorithm that balances the I/O costs of retrieving candidate matches and accessing inverted lists. Experiments on large, real datasets verify that simply adapting existing algorithms to a disk-based setting does not work well and that our new techniques answer queries efficiently. Further, our solutions significantly outperform a recent tree-based index, BED-tree.
引用
收藏
页码:888 / 899
页数:12
相关论文
共 50 条
  • [1] Answering Approximate Queries Over XML Data
    Liu, Jian
    Yan, D. L.
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2016, 24 (02) : 288 - 305
  • [2] ANSWERING GERONTOLOGICAL RESEARCH QUESTIONS USING LARGE DATA SETS
    O'Connor, M.
    Bowles, K. H.
    GERONTOLOGIST, 2011, 51 : 481 - 481
  • [3] Answering queries using limited external query processors
    Levy, AY
    Rajaraman, A
    Ullman, JD
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1999, 58 (01) : 69 - 82
  • [4] Approximate queries and representations for large data sequences
    Shatkay, H
    Zdonik, SB
    PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 536 - 545
  • [5] Hippocampus: Answering Memory Queries using Transactive Search
    Catasta, Michele
    Tonon, Alberto
    Difallah, Djellel Eddine
    Demartini, Gianluca
    Aberer, Karl
    Cudre-Mauroux, Philippe
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 535 - 540
  • [6] Answering approximate range aggregate queries on OLAP data cubes with probabilistic guarantees
    Cuzzocrea, A
    Wang, W
    Matrangolo, U
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2004, 3181 : 97 - 107
  • [7] Answering Web Queries Using Structured Data Sources
    Paparizos, Stelios
    Ntoulas, Alexandros
    Shafer, John
    Agrawal, Rakesh
    ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 1127 - 1129
  • [8] Graduated errors in approximate queries using hierarchies and ordered sets
    Guzman-Arenas, A
    Levachkine, S
    MICAI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2004, 2972 : 119 - 128
  • [9] An external memory data structure for shortest path queries
    Hutchinson, D
    Maheshwari, A
    Zeh, N
    DISCRETE APPLIED MATHEMATICS, 2003, 126 (01) : 55 - 82
  • [10] Approximate Continuous Query Answering over Streams and Dynamic Linked Data Sets
    Dehghanzadeh, Soheila
    Dell'Aglio, Daniele
    Gao, Shen
    Della Valle, Emanuele
    Mileo, Alessandra
    Bernstein, Abraham
    ENGINEERING THE WEB IN THE BIG DATA ERA, 2015, 9114 : 307 - 325