Fast data series indexing for in-memory data

被引:15
|
作者
Peng, Botao [1 ]
Fatourou, Panagiota [2 ]
Palpanas, Themis [3 ,4 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] FORTH ICS, Iraklion, Greece
[3] Univ Paris, LIPADE, Paris, France
[4] French Univ Inst IUF, Paris, France
来源
VLDB JOURNAL | 2021年 / 30卷 / 06期
关键词
Data series; Indexing; Modern hardware; SIMILARITY SEARCH; TIME;
D O I
10.1007/s00778-021-00677-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and dynamic time warping (DTW) distances. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in similar to 50 ms (30-75 ms across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.
引用
收藏
页码:1041 / 1067
页数:27
相关论文
共 50 条
  • [41] Distributed in-memory data management for workflow executions
    Souza, Renan
    Silva, Vitor
    Lima, Alexandre A. B.
    de Oliveira, Daniel
    Valduriez, Patrick
    Mattoso, Marta
    PEERJ COMPUTER SCIENCE, 2021,
  • [42] Exploiting In-memory Systems for Genomic Data Analysis
    Shah, Zeeshan Ali
    El-Kalioby, Mohamed
    Faquih, Tariq
    Shokrof, Moustafa
    Subhani, Shazia
    Alnakhli, Yasser
    Aljafar, Hussain
    Anjum, Ashiq
    Abouelhoda, Mohamed
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2018, PT I, 2018, 10813 : 405 - 414
  • [43] Distributed In-Memory Analytics for Big Temporal Data
    Yao, Bin
    Zhang, Wei
    Wang, Zhi-Jie
    Chen, Zhongpu
    Shang, Shuo
    Zheng, Kai
    Guo, Minyi
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 549 - 565
  • [44] Distributed In-memory Data Management for Workflow Executions
    Souza, Renan
    Silva, Vitor
    Lima, Alexandre A. B.
    de Oliveira, Daniel
    Valduriez, Patrick
    Mattoso, Marta
    PeerJ Computer Science, 2021, 7 : 1 - 30
  • [45] Integrating Polystore RDBMS with Common In-Memory Data
    Muniswamaiah, Manoj
    Agerwala, Tilak
    Tappert, Charles C.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5762 - 5764
  • [46] Research on In-Memory Computing Model and Data Analysis
    Wu Jun
    Huang Zhixiong
    PROCEEDINGS OF 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION (ICICTA 2015), 2015, : 726 - 729
  • [47] Spacecraft Payload Data Processing With In-Memory Concurrent Threads and Data Buffers
    Sunil, Aravind
    Rao, Narayana G. S.
    Pulijala, Amulya Sri
    Sastry, Subbaraya C. V. R.
    2024 IEEE SPACE, AEROSPACE AND DEFENCE CONFERENCE, SPACE 2024, 2024, : 144 - 147
  • [48] Near to Far: An Evaluation of Disaggregated Memory for In-Memory Data Processing
    Geyer, Andreas
    Pietrzyk, Johannes
    Krause, Alexander
    Habich, Dirk
    Lehner, Wolfgang
    Faerber, Christian
    Willhalm, Thomas
    PROCEEDINGS OF THE 2023 1ST WORKSHOP ON DISRUPTIVE MEMORY SYSTEMS, DIMES 2023, 2023, : 16 - 22
  • [49] FishStore: Fast Ingestion and Indexing of Raw Data
    Chandramouli, Badrish
    Xie, Dong
    Li, Yinan
    Kossmann, Donald
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 1922 - 1925
  • [50] Fast indexing and retrieval of color image data
    Gupte, AV
    Berkovich, SY
    CISST '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON IMAGING SCIENCE, SYSTEMS, AND TECHNOLOGY, 2004, : 549 - 554