[Demo] Low-latency Spark Queries on Updatable Data

被引:4
|
作者
Uta, Alexandru [1 ,2 ]
Ghit, Bogdan [2 ]
Dave, Ankur [3 ]
Boncz, Peter [4 ]
机构
[1] Vrije Univ Amsterdam, Amsterdam, Netherlands
[2] Databricks, Amsterdam, Netherlands
[3] Univ Calif Berkeley, Berkeley, CA USA
[4] CWI Amsterdam, Amsterdam, Netherlands
关键词
D O I
10.1145/3299869.3320227
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As data science gets deployed more and more into operational applications, it becomes important for data science frameworks to be able to perform computations in interactive, sub-second time. Indexing and caching are two key techniques that can make interactive query processing on large datasets possible. In this demo, we show the design, implementation and performance of a new indexing abstraction in Apache Spark, called the Indexed DataFrame. This is a cached DataFrame that incorporates an index to support fast lookup and join operations, and supports updates with multi-version concurrency. We demonstrate the Indexed Dataframe on a social network dataset using microbenchmarks and real-world graph processing queries, in datasets that are continuously growing.
引用
收藏
页码:2009 / 2012
页数:4
相关论文
共 50 条
  • [41] A High-Radix, Low-Latency Optical Switch for Data Centers
    Alistarh, Dan
    Ballani, Hitesh
    Costa, Paolo
    Funnell, Adam
    Benjamin, Joshua
    Watts, Philip
    Thomsen, Benn
    SIGCOMM'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2015, : 367 - 368
  • [42] A Scalable Architecture for Low-Latency Market-Data Processing on FPGA
    Tang, Qiu
    Su, Majing
    Jiang, Lei
    Yang, Jiajia
    Bai, Xu
    2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2016, : 597 - 603
  • [43] HyperPlane: A Scalable Low-Latency Notification Accelerator for Software Data Planes
    Mirhosseini, Amirhossein
    Golestani, Hossein
    Wenisch, Thomas F.
    2020 53RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO 2020), 2020, : 852 - 867
  • [44] Doughnutie: An efficient and low-latency cloud data center network architecture
    Nasirian, Sara
    Faghani, Farhad
    Daneshvar Farzanegan, Mahmoud
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (20):
  • [45] A High-Radix, Low-Latency Optical Switch for Data Centers
    Alistarh, Dan
    Ballani, Hitesh
    Costa, Paolo
    Funnell, Adam
    Benjamin, Joshua
    Watts, Philip
    Thomsen, Benn
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2015, 45 (04) : 367 - 368
  • [46] DRILL: Micro Load Balancing for Low-latency Data Center Networks
    Ghorbani, Soudeh
    Yang, Zibin
    Godfrey, P. Brighten
    Ganjali, Yashar
    Firoozshahian, Amin
    SIGCOMM '17: PROCEEDINGS OF THE 2017 CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2017, : 225 - 238
  • [47] Low-latency MLLM Inference with Spatiotemporal Heterogeneous Distributed Multimodal Data
    Xu, Xiangrui
    Liu, Sicong
    Yu, Zhiwen
    Wang, Lehao
    Gu, Bin
    2024 IEEE COUPLING OF SENSING & COMPUTING IN AIOT SYSTEMS, CSCAIOT 2024, 2024, : 19 - 20
  • [48] A Low-latency Secure Data Outsourcing Scheme for Cloud-WSN
    Li, Jing
    Guan, Zhitao
    Du, Xiaojiang
    Zhang, Zijian
    Zhou, Zhenyu
    2017 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2017,
  • [49] Carousel: Low-Latency Transaction Processing for Globally-Distributed Data
    Yan, Xinan
    Yang, Linguan
    Zhang, Hongbo
    Lin, Xiayue Charles
    Wong, Bernard
    Salem, Kenneth
    Brecht, Tim
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 231 - 243
  • [50] Low-Latency Video Semantic Segmentation
    Li, Yule
    Shi, Jianping
    Lin, Dahua
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5997 - 6005