Fast data series indexing for in-memory data

被引:15
|
作者
Peng, Botao [1 ]
Fatourou, Panagiota [2 ]
Palpanas, Themis [3 ,4 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] FORTH ICS, Iraklion, Greece
[3] Univ Paris, LIPADE, Paris, France
[4] French Univ Inst IUF, Paris, France
来源
VLDB JOURNAL | 2021年 / 30卷 / 06期
关键词
Data series; Indexing; Modern hardware; SIMILARITY SEARCH; TIME;
D O I
10.1007/s00778-021-00677-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and dynamic time warping (DTW) distances. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in similar to 50 ms (30-75 ms across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.
引用
收藏
页码:1041 / 1067
页数:27
相关论文
共 50 条
  • [31] In-Memory Data Rearrangement for Irregular, Data-Intensive Computing
    Lloyd, Scott
    Gokhale, Maya
    COMPUTER, 2015, 48 (08) : 18 - 25
  • [32] CHOPPER: Optimizing Data Partitioning for In-Memory Data Analytics Frameworks
    Paul, Arnab Kumar
    Zhuang, Wenjie
    Xu, Luna
    Li, Min
    Rafique, M. Mustafa
    Butt, Ali R.
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 110 - 119
  • [33] LeanStore: In-Memory Data Management Beyond Main Memory
    Leis, Viktor
    Haubenschild, Michael
    Kemper, Alfons
    Neumann, Thomas
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 185 - 196
  • [34] An In-Memory based Framework for Scientific Data Analytics
    Elia, Donatello
    Fiore, Sandro
    D'Anca, Alessandro
    Palazzo, Cosimo
    Foster, Ian
    Williams, Dean N.
    PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 424 - 429
  • [35] In-Memory Big Data Management and Processing: A Survey
    Zhang, Hao
    Chen, Gang
    Ooi, Beng Chin
    Tan, Kian-Lee
    Zhang, Meihui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (07) : 1920 - 1948
  • [36] In-Memory Indexed Caching for Distributed Data Processing
    Uta, Alexandru
    Ghit, Bogdan
    Dave, Ankur
    Rellermeyer, Jan
    Boncz, Peter
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 104 - 114
  • [37] Online Data Deduplication for In-Memory Big-Data Analytic Systems
    Sun, Yushi
    Zeng, Catherine Y.
    Chung, Jaeyoon
    Huang, Zhe
    2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2017,
  • [38] LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data
    Tang, Mingjie
    Yu, Yongyang
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Aref, Walid G.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (13): : 1565 - 1568
  • [39] Life Cycle of Transactional Data in In-memory Databases
    Pathak, Amit
    Gurajada, Aditya
    Khadilkar, Pushkar
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2018, : 122 - 133
  • [40] Simba: Spatial In-Memory Big Data Analysis
    Xie, Dong
    Li, Feifei
    Yao, Bin
    Li, Gefei
    Chen, Zhongpu
    Zhou, Liang
    Guo, Minyi
    24TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2016), 2016,