[Demo] Low-latency Spark Queries on Updatable Data

被引:4
|
作者
Uta, Alexandru [1 ,2 ]
Ghit, Bogdan [2 ]
Dave, Ankur [3 ]
Boncz, Peter [4 ]
机构
[1] Vrije Univ Amsterdam, Amsterdam, Netherlands
[2] Databricks, Amsterdam, Netherlands
[3] Univ Calif Berkeley, Berkeley, CA USA
[4] CWI Amsterdam, Amsterdam, Netherlands
关键词
D O I
10.1145/3299869.3320227
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbenchmarks and real-world graph processing queries, in datasets that are continuously growing.
引用
收藏
页码:2009 / 2012
页数:4
相关论文
共 50 条
  • [21] TEL: Low-Latency Failover Traffic Engineering in Data Plane
    Mostafaei, Habib
    Shojafar, Mohammad
    Conti, Mauro
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (04): : 4697 - 4710
  • [22] Dubhe: A Reliable and Low-Latency Data Dissemination Mechanism for VANETs
    Zhang, Lifeng
    Jin, Beihong
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2013,
  • [23] Low-latency FPGA Based Financial Data Feed Handler
    Pottathuparambil, Robin
    Coyne, Jack
    Allred, Jeffrey
    Lynch, William
    Natoli, Vincent
    2011 IEEE 19TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2011, : 93 - 96
  • [24] Swift: Reliable and Low-Latency Data Processing at Cloud Scale
    Wang, Bo
    Hou, Zhenyu
    Tao, Yangyu
    Lu, Yifeng
    Li, Chao
    Guan, Tao
    Jiang, Xiaowei
    Jiang, Jinlei
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2387 - 2398
  • [25] Characterizing Scheduling Delay for Low-latency Data Analytics Workloads
    Chen, Wei
    Pi, Aidi
    Wang, Shaoqi
    Zhou, Xiaobo
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 630 - 639
  • [26] Low-Latency Service Data Aggregation using Policy Obligations
    Reiff-Marganiec, Stephan
    Tilly, Marcel
    Janicke, Helge
    2014 IEEE 21ST INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2014), 2014, : 526 - 533
  • [27] FPGA Based Low-Latency Market Data Feed Handler
    Zhou, Liyuan
    Jiang, Jiang
    Liao, Ruochen
    Yang, Tianyi
    Wang, Chang
    COMPUTER ENGINEERING AND TECHNOLOGY, NCCET 2014, 2015, 491 : 69 - 77
  • [28] SLIPstream: Scalable Low-latency Interactive Perception on Streaming Data
    Pillai, Padmanabhan S.
    Mummert, Lily B.
    Schlosser, Steven W.
    Sukthankar, Rahul
    Helfrich, Casey J.
    NOSSDAV 09: 18TH INTERNATIONAL WORKSHOP ON NETWORK AND OPERATING SYSTEMS SUPPORT FOR DIGITAL AUDIO AND VIDEO, 2009, : 43 - 48
  • [29] CLAMShell: Speeding up Crowds for Low-latency Data Labeling
    Haas, Daniel
    Wang, Jiannan
    Wu, Eugene
    Franklin, Michael J.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 9 (04): : 372 - 383
  • [30] Using Data Transformations for Low-latency Time Series Analysis
    Cui, Henggang
    Keeton, Kimberly
    Roy, Indrajit
    Viswanathan, Krishnamurthy
    Ganger, Gregory R.
    ACM SOCC'15: PROCEEDINGS OF THE SIXTH ACM SYMPOSIUM ON CLOUD COMPUTING, 2015, : 395 - 407