VEDAS: an efficient GPU alternative for store and query of large RDF data sets

被引：3

作者：

Makpaisit, Pisit ^{[1
]}

Chantrapornchai, Chantana ^{[1
]}

机构：

[1] Kasetsart Univ, Dept Comp Engn, Bangkok, Thailand

来源：

JOURNAL OF BIG DATA | 2021年 / 8卷 / 01期

关键词：

Query processing; Parallel processing; Graphic Processing Units; Resource Description Framework; SPARQL; SPARQL QUERIES;

D O I：

10.1186/s40537-021-00513-y

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Resource Description Framework (RDF) is commonly used as a standard for data interchange on the web. The collection of RDF data sets can form a large graph which consumes time to query. It is known that modern Graphic Processing Units (GPUs) can be employed to execute parallel programs in order to speedup the running time. In this paper, we propose a novel RDF data representation along with the query processing algorithm that is suitable for GPU processing. Since the main challenges of GPU architecture are the limited memory sizes, the memory transfer latency, and the vast number of GPU cores. Our system is designed to strengthen the use of GPU cores and reduce the effect of memory transfer. We propose a representation consists of indices and column-based RDF ID data that can reduce the GPU memory requirement. The indexing and pre-upload filtering techniques are then applied to reduce the data transfer between the host and GPU memory. We add the index swapping process to facilitate the sorting and joining data process based on the given variable and add the pre-upload step to reduce the size of results' storage, and the data transfer time. The experimental results show that our representation is about 35% smaller than the traditional NT format and 40% less compared to that of gStore. The query processing time can be speedup ranging from 1.95 to 397.03 when compared with RDF3X and gStore processing time with WatDiv test suite. It achieves speedup 578.57 and 62.97 for LUBM benchmark when compared to RDF-3X and gStore. The analysis shows the query cases which can gain benefits from our approach.

引用

页数：34

共 50 条

[21] Efficient clustering of large data sets
Ananthanarayana, VS
Murty, MN
Subramanian, DK
PATTERN RECOGNITION, 2001, 34 (12) : 2561 - 2563
[22] GPU Acceleration of Range Queries over Large Data Sets
Nelson, Mitchell
Sorenson, Zachary
Myre, Joseph M.
Sawin, Jason
Chiu, David
BDCAT'19: PROCEEDINGS OF THE 6TH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, 2019, : 11 - 20
[23] Optimizing the use of GPU Memory in Applications with Large data sets
Satish, Nadathur
Sundaram, Narayanan
Keutzer, Kurt
16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), PROCEEDINGS, 2009, : 408 - 418
[24] RDF packages: a scheme for efficient reasoning and querying over large-scale RDF data
Ohsawa, Shohei
Amagasa, Toshiyuki
Kitagawa, Hiroyuki
INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2012, 8 (02) : 212 - +
[25] RDF Data Storage Techniques for Efficient SPARQL Query Processing using Distributed Computation Engines
Hassan, Mahmudul
Bansal, Srividya K.
2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, : 323 - 330
[26] GPU-based cell projection for large structured data sets
Maximo, Andre
Marroquim, Ricardo
Farias, Ricardo
Esperanqa, Claudio
GRAPP 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL GM/R, 2007, : 312 - 319
[27] Efficient record linkage in large data sets
Jin, L
Li, C
Mehrotra, S
EIGHTH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2003, : 137 - 146
[28] Efficient Discovery of Confounders in Large Data Sets
Zhou, Wenjun
Xiong, Hui
2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 647 - 656
[29] No-SQL Databases: An Efficient Way to Store and Query Heterogeneous Astronomical Data in DACE
Buchschacher, Nicolas
Alesina, Fabien
Burnier, Julien
ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXVIII, 2019, 523 : 405 - 408
[30] Towards SPARQL-Based Induction for Large-Scale RDF Data Sets
Bin, Simon
Buehmann, Lorenz
Lehmann, Jens
Ngomo, Axel-Cyrille Ngonga
ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1551 - 1552

← 1 2 3 4 5 →