FishStore: Fast Ingestion and Indexing of Raw Data

被引:2
|
作者
Chandramouli, Badrish [1 ]
Xie, Dong [2 ]
Li, Yinan [1 ]
Kossmann, Donald [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Univ Utah, Salt Lake City, UT 84112 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 12期
关键词
D O I
10.14778/3352063.3352100
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The last decade has witnessed a huge increase in data being ingested into the cloud from a variety of data sources. The ingested data takes various forms such as JSON, CSV, and binary formats. Traditionally, data is either ingested into storage in raw form, indexed ad-hoc using range indices, or cooked into analytics-friendly columnar formats. None of these solutions is able to handle modern requirements on storage: making the data available immediately for ad-hoc and streaming queries while ingesting at extremely high throughputs. We demonstrate FishStore, our open-source concurrent latch-free storage layer for data with flexible schema. FishStore builds on recent advances in parsing and indexing techniques, and is based on multi-chain hash indexing of dynamically registered predicated subsets of data. We find predicated subset hashing to be a powerful primitive that supports a broad range of queries on ingested data and admits a higher performance (by up to an order of magnitude) implementation than current alternatives.
引用
收藏
页码:1922 / 1925
页数:4
相关论文
共 50 条
  • [21] Fast indexing and visualization of metric data sets using slim-trees
    Traina, C
    Traina, A
    Faloutsos, C
    Seeger, B
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (02) : 244 - 260
  • [22] DISEASES FROM INGESTION OF RAW LIVER
    WALDEN, RT
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1958, 166 (08): : 992 - 992
  • [23] FAST SIMILARITY SEARCH ON A LARGE SPEECH DATA SET WITH NEIGHBORHOOD GRAPH INDEXING
    Aoyama, Kazuo
    Watanabe, Shinji
    Sawada, Hiroshi
    Minami, Yasuhiro
    Ueda, Naonori
    Saito, Kazumi
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5358 - 5361
  • [24] FAST: Frequency-Aware Indexing for Spatio-Textual Data Streams
    Mahmood, Ahmed R.
    Aly, Ahmed M.
    Aref, Walid G.
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 305 - 316
  • [25] Anaphylaxis Induced by Ingestion of Raw Garlic
    Ma, Shikun
    Yin, Jia
    FOODBORNE PATHOGENS AND DISEASE, 2012, 9 (08) : 773 - 775
  • [26] A fast multichannel SAR raw data simulator of clutter and moving targets
    Meysam Mohammadi
    Alimorad Mahmoudi
    Multidimensional Systems and Signal Processing, 2017, 28 : 1367 - 1391
  • [27] A fast multichannel SAR raw data simulator of clutter and moving targets
    Mohammadi, Meysam
    Mahmoudi, Alimorad
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2017, 28 (04) : 1367 - 1391
  • [28] Space-Efficient Indexing of Spaced Seeds for Accurate Overlap Computation of Raw Optical Mapping Data
    Walve, Riku
    Puglisi, Simon J.
    Salmela, Leena
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (04) : 2454 - 2462
  • [29] Tree Based Fast Similarity Query Search Indexing on Outsourced Cloud Data Streams
    Balasubramanian, Balamurugan
    Durai, Kamalraj
    Sathyanarayanan, Jegadeeswari
    Muthukumarasamy, Sugumaran
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (05) : 871 - 878
  • [30] Raw egg ingestion and salmonellosis in body builders
    Mackenzie, AR
    Laing, RBS
    Cadwgan, AM
    Reid, TMS
    Smith, CC
    SCOTTISH MEDICAL JOURNAL, 1998, 43 (05) : 146 - 147