SRJA:A Research on Optimizing Top-k Join Queries Based on Spark

被引:0
|
作者
Ren, Hui [1 ,2 ]
Fu, Haidong [1 ,2 ]
Xu, Fangfang [1 ,2 ]
Gu, Jinguang [1 ,2 ]
Zhao, Di [1 ,2 ]
机构
[1] Wuhan Univ Sci & Technol, Coll Comp Sci & Technol, Wuhan, Hubei, Peoples R China
[2] Hubei Prov Key Lab Intelligent Informat Proc & Re, Wuhan, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
Spark; Top-k query; distributed system; RDF data;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the explosive growth of RDF data, the traditional centralized environment is increasingly unable to adapt to this huge data query processing requirements, especially the top-k query. This paper designs and implements the original storage scheme of RDF data based on HBase, and presents two top-k query algorithms: STA (Spark Threshold Algorithm) and SRJA (Spark Rank Join Algorithm) which are under Massive RDF Data Query Processing Framework which Based on HBase-based storage system and Spark-based Parallel Computing System. The STA (Spark Threshold Algorithm) is to reduce the RDF data connection during the operation of the operation. The SRJA (Spark Rank Join Algorithm) is proposed to reduce the sort-related operations of the intermediate data and it is the optimization of. Experimental results show that the SRJA algorithm is superior to the STA algorithm and the traditional top-k algorithm, and has strong applicability.
引用
收藏
页码:1000 / 1005
页数:6
相关论文
共 50 条
  • [1] Distributed Top-K Join Queries Optimizing for RDF Datasets
    Gu, Jinguang
    Dong, Hao
    Liu, Zhao
    Xu, Fangfang
    INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2017, 14 (03) : 67 - 83
  • [2] Processing Top-k Join Queries
    Wu, Minji
    Berti-Equille, Laure
    Marian, Amelie
    Procopiuc, Cecilia M.
    Srivastava, Divesh
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 860 - 870
  • [3] Optimizing Distributed Top-k Queries
    Neumann, Thomas
    Bender, Matthias
    Michel, Sebastian
    Schenkel, Ralf
    Triantafillou, Peter
    Weikum, Gerhard
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2008, PROCEEDINGS, 2008, 5175 : 337 - +
  • [4] Supporting top-k join queries in relational databases
    Ilyas, IF
    Aref, WG
    Elmagarmid, AK
    VLDB JOURNAL, 2004, 13 (03): : 207 - 221
  • [5] A top-k spatial join querying processing algorithm based on spark
    Qiao, Baiyou
    Hu, Bing
    Zhu, Junhai
    Wu, Gang
    Giraud-Carrier, Christophe
    Wang, Guoren
    INFORMATION SYSTEMS, 2020, 87
  • [6] Exploratory product search using top-k join queries
    Gkorgkas, Orestis
    Vlachou, Akrivi
    Doulkeridis, Christos
    Norvag, Kjetil
    INFORMATION SYSTEMS, 2017, 64 : 75 - 92
  • [7] Algorithms for Top-k join queries in wireless sensor networks
    Mo, Shang-Feng
    Chen, Ding-Jie
    Chen, Hong
    Li, Ying-Long
    Li, Cui-Ping
    Jisuanji Xuebao/Chinese Journal of Computers, 2013, 36 (03): : 557 - 570
  • [8] Optimizing Distributed Top-k Queries on Uncertain Data
    Zhao Zhibin
    Yu Yang
    Bao Yubin
    Yu Ge
    2013 25TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2013, : 3209 - 3214
  • [9] Optimizing top-k selection queries over multimedia repositories
    Chaudhuri, S
    Gravano, L
    Marian, A
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (08) : 992 - 1009
  • [10] Top-k Pipe Join
    Martinenghi, Davide
    Tagliasacchi, Marco
    2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 16 - 19