FishStore: Fast Ingestion and Indexing of Raw Data

被引:2
|
作者
Chandramouli, Badrish [1 ]
Xie, Dong [2 ]
Li, Yinan [1 ]
Kossmann, Donald [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Univ Utah, Salt Lake City, UT 84112 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 12期
关键词
D O I
10.14778/3352063.3352100
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The last decade has witnessed a huge increase in data being ingested into the cloud from a variety of data sources. The ingested data takes various forms such as JSON, CSV, and binary formats. Traditionally, data is either ingested into storage in raw form, indexed ad-hoc using range indices, or cooked into analytics-friendly columnar formats. None of these solutions is able to handle modern requirements on storage: making the data available immediately for ad-hoc and streaming queries while ingesting at extremely high throughputs. We demonstrate FishStore, our open-source concurrent latch-free storage layer for data with flexible schema. FishStore builds on recent advances in parsing and indexing techniques, and is based on multi-chain hash indexing of dynamically registered predicated subsets of data. We find predicated subset hashing to be a powerful primitive that supports a broad range of queries on ingested data and admits a higher performance (by up to an order of magnitude) implementation than current alternatives.
引用
收藏
页码:1922 / 1925
页数:4
相关论文
共 50 条
  • [1] FISHSTORE: Faster Ingestion with Subset Hashing
    Xie, Dong
    Chandramouli, Badrish
    Li, Yinan
    Kossmann, Donald
    SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 1711 - 1728
  • [2] A Fast Data Ingestion and Indexing Scheme for Real-Time Log Analytics
    Bian, Haoqiong
    Chen, Yueguo
    Qin, Xiongpai
    Du, Xiaoyong
    WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 841 - 852
  • [3] Fast data series indexing for in-memory data
    Botao Peng
    Panagiota Fatourou
    Themis Palpanas
    The VLDB Journal, 2021, 30 : 1041 - 1067
  • [4] Fast data series indexing for in-memory data
    Peng, Botao
    Fatourou, Panagiota
    Palpanas, Themis
    VLDB JOURNAL, 2021, 30 (06): : 1041 - 1067
  • [5] Fast indexing and retrieval of color image data
    Gupte, AV
    Berkovich, SY
    CISST '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON IMAGING SCIENCE, SYSTEMS, AND TECHNOLOGY, 2004, : 549 - 554
  • [6] SmallClient for big data: an indexing framework towards fast data retrieval
    Siddiqa, Aisha
    Karim, Ahmad
    Chang, Victor
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (02): : 1193 - 1208
  • [7] SmallClient for big data: an indexing framework towards fast data retrieval
    Aisha Siddiqa
    Ahmad Karim
    Victor Chang
    Cluster Computing, 2017, 20 : 1193 - 1208
  • [8] Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing
    Olma, Matthaios
    Karpathiotakis, Manos
    Alagiannis, Ioannis
    Athanassoulis, Manos
    Ailamaki, Anastasia
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (10): : 1106 - 1117
  • [9] INGESTION OF RAW BEEF
    SWAIM, J
    JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1965, 193 (01): : 85 - &
  • [10] Embedded data indexing for fast stream interception by Internet appliances
    Khan, JI
    He, YH
    IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 579 - 583