VEDAS: an efficient GPU alternative for store and query of large RDF data sets

被引:3
|
作者
Makpaisit, Pisit [1 ]
Chantrapornchai, Chantana [1 ]
机构
[1] Kasetsart Univ, Dept Comp Engn, Bangkok, Thailand
关键词
Query processing; Parallel processing; Graphic Processing Units; Resource Description Framework; SPARQL; SPARQL QUERIES;
D O I
10.1186/s40537-021-00513-y
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Resource Description Framework (RDF) is commonly used as a standard for data interchange on the web. The collection of RDF data sets can form a large graph which consumes time to query. It is known that modern Graphic Processing Units (GPUs) can be employed to execute parallel programs in order to speedup the running time. In this paper, we propose a novel RDF data representation along with the query processing algorithm that is suitable for GPU processing. Since the main challenges of GPU architecture are the limited memory sizes, the memory transfer latency, and the vast number of GPU cores. Our system is designed to strengthen the use of GPU cores and reduce the effect of memory transfer. We propose a representation consists of indices and column-based RDF ID data that can reduce the GPU memory requirement. The indexing and pre-upload filtering techniques are then applied to reduce the data transfer between the host and GPU memory. We add the index swapping process to facilitate the sorting and joining data process based on the given variable and add the pre-upload step to reduce the size of results' storage, and the data transfer time. The experimental results show that our representation is about 35% smaller than the traditional NT format and 40% less compared to that of gStore. The query processing time can be speedup ranging from 1.95 to 397.03 when compared with RDF3X and gStore processing time with WatDiv test suite. It achieves speedup 578.57 and 62.97 for LUBM benchmark when compared to RDF-3X and gStore. The analysis shows the query cases which can gain benefits from our approach.
引用
收藏
页数:34
相关论文
共 50 条
  • [21] Efficient clustering of large data sets
    Ananthanarayana, VS
    Murty, MN
    Subramanian, DK
    PATTERN RECOGNITION, 2001, 34 (12) : 2561 - 2563
  • [22] GPU Acceleration of Range Queries over Large Data Sets
    Nelson, Mitchell
    Sorenson, Zachary
    Myre, Joseph M.
    Sawin, Jason
    Chiu, David
    BDCAT'19: PROCEEDINGS OF THE 6TH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, 2019, : 11 - 20
  • [23] Optimizing the use of GPU Memory in Applications with Large data sets
    Satish, Nadathur
    Sundaram, Narayanan
    Keutzer, Kurt
    16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), PROCEEDINGS, 2009, : 408 - 418
  • [24] RDF packages: a scheme for efficient reasoning and querying over large-scale RDF data
    Ohsawa, Shohei
    Amagasa, Toshiyuki
    Kitagawa, Hiroyuki
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2012, 8 (02) : 212 - +
  • [25] RDF Data Storage Techniques for Efficient SPARQL Query Processing using Distributed Computation Engines
    Hassan, Mahmudul
    Bansal, Srividya K.
    2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 323 - 330
  • [26] GPU-based cell projection for large structured data sets
    Maximo, Andre
    Marroquim, Ricardo
    Farias, Ricardo
    Esperanqa, Claudio
    GRAPP 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL GM/R, 2007, : 312 - 319
  • [27] Efficient record linkage in large data sets
    Jin, L
    Li, C
    Mehrotra, S
    EIGHTH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2003, : 137 - 146
  • [28] Efficient Discovery of Confounders in Large Data Sets
    Zhou, Wenjun
    Xiong, Hui
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 647 - 656
  • [29] No-SQL Databases: An Efficient Way to Store and Query Heterogeneous Astronomical Data in DACE
    Buchschacher, Nicolas
    Alesina, Fabien
    Burnier, Julien
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXVIII, 2019, 523 : 405 - 408
  • [30] Towards SPARQL-Based Induction for Large-Scale RDF Data Sets
    Bin, Simon
    Buehmann, Lorenz
    Lehmann, Jens
    Ngomo, Axel-Cyrille Ngonga
    ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1551 - 1552