Fast data series indexing for in-memory data

被引:15
|
作者
Peng, Botao [1 ]
Fatourou, Panagiota [2 ]
Palpanas, Themis [3 ,4 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] FORTH ICS, Iraklion, Greece
[3] Univ Paris, LIPADE, Paris, France
[4] French Univ Inst IUF, Paris, France
来源
VLDB JOURNAL | 2021年 / 30卷 / 06期
关键词
Data series; Indexing; Modern hardware; SIMILARITY SEARCH; TIME;
D O I
10.1007/s00778-021-00677-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and dynamic time warping (DTW) distances. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in similar to 50 ms (30-75 ms across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.
引用
收藏
页码:1041 / 1067
页数:27
相关论文
共 50 条
  • [21] Packets as Persistent In-Memory Data Structures
    Honda, Michio
    PROCEEDINGS OF THE THE 20TH ACM WORKSHOP ON HOT TOPICS IN NETWORKS, HOTNETS 2021, 2021, : 31 - 37
  • [22] In-Memory Computing for Scalable Data Analytics
    Li, Jun
    2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2015), 2015, : 93 - 94
  • [23] A Compact In-Memory Dictionary for RDF Data
    Bazoobandi, Hamid R.
    de Rooij, Steven
    Urbani, Jacopo
    ten Teije, Annette
    van Harmelen, Frank
    Bal, Henri
    SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, ESWC 2015, 2015, 9088 : 205 - 220
  • [24] AQUAdexIM: highly efficient in-memory indexing and querying of astronomy time series images
    Zhi Hong
    Ce Yu
    Jie Wang
    Jian Xiao
    Chenzhou Cui
    Jizhou Sun
    Experimental Astronomy, 2016, 42 : 387 - 405
  • [25] AQUAdexIM: highly efficient in-memory indexing and querying of astronomy time series images
    Hong, Zhi
    Yu, Ce
    Wang, Jie
    Xiao, Jian
    Cui, Chenzhou
    Sun, Jizhou
    EXPERIMENTAL ASTRONOMY, 2016, 42 (03) : 387 - 405
  • [26] Data Series Indexing Gone Parallel
    Peng, Botao
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 2059 - 2063
  • [27] Indexing Geolocated Time Series Data
    Chatzigeorgakidis, Georgios
    Skoutas, Dimitrios
    Patroumpas, Kostas
    Athanasiou, Spiros
    Skiadopoulos, Spiros
    25TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2017), 2017,
  • [28] Demystifying Intel Data Streaming Accelerator for In-Memory Data Processing
    Berthold, Andre
    Fuerst, Constantin
    Obersteiner, Antonia
    Schmidt, Lennart
    Habich, Dirk
    Lehner, Wolfgang
    Schirmeier, Horst
    PROCEEDINGS OF THE 2ND WORKSHOP ON DISRUPTIVE MEMORY SYSTEMS, DIMES 2024, 2024, : 9 - 16
  • [29] DAFuzz: data-aware fuzzing of in-memory data stores
    Zeng, Yingpei
    Zhu, Fengming
    Zhang, Siyi
    Yang, Yu
    Yi, Siyu
    Pan, Yufan
    Xie, Guojie
    Wu, Ting
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [30] DAFuzz: data-aware fuzzing of in-memory data stores
    Zeng, Yingpei
    Zhu, Fengming
    Zhang, Siyi
    Yang, Yu
    Yi, Siyu
    Pan, Yufan
    Xie, Guojie
    Wu, Ting
    PEERJ COMPUTER SCIENCE, 2023, 9