Computing Prominent Skyline on Massive Data

被引:0
|
作者
Wan, Xiaolong [1 ]
Han, Xixian [1 ]
Wang, Jinbao [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, 92 Xidazhi St, Harbin, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
<italic>P</italic>-skyline; Massive data; Selective retrieval; Selective checking; COMPUTATION; ALGORITHMS; QUERIES;
D O I
10.1007/s41019-024-00259-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many practical applications, skyline query is an important operation to return the pareto optimal tuples, which provides a candidate set for the optimum. On massive data, skyline often reports too many results, the users will be overwhelmed and be difficult to find the desired information easily. This paper devises P-skyline to reduce the size of the returned results. Given the approximation factor, P-skyline only generates the prominent skyline results by the definition of p-dominance. To the best of our knowledge, this paper is the first work to study P-skyline problem. This paper first proposes a baseline algorithm, which requires one full table scan to compute the results. It is found that baseline algorithm incurs a relatively high execution cost on massive data. Then, PSTP algorithm is proposed, which consists of two stages: candidate acquisition and refinement. On the presorted table, PSTP utilizes selective retrieval and selective checking to process P-skyline with much lower I/O cost and computation cost. The extensive experimental results, conducted on synthetic and real-life data sets, show that PSTP can compute P-skyline on massive data efficiently.
引用
收藏
页码:117 / 146
页数:30
相关论文
共 50 条
  • [21] Distributed computing platform for solving massive computing and data problems in bioinformatics
    Department of Computer Science and Information Engineering, Asia University, Taichung, 413, Taiwan
    Tamkang J. Sci. Eng., 2006, 2 (177-183):
  • [22] Distributed Computing Platform for Solving Massive Computing and Data Problems in Bioinformatics
    Chen, Shih-Nung
    Huang, Chih-Wei
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2006, 9 (02): : 177 - 183
  • [23] Ranking the big sky: efficient top-k skyline computation on massive data
    Han, Xixian
    Wang, Bailing
    Li, Jianzhong
    Gao, Hong
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (01) : 415 - 446
  • [24] Ranking Skyline Points by Computing Nearest Neighbor of Best Skyline Point
    Ghosh, Partha
    Sen, Soumya
    2015 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2015,
  • [25] Ranking the big sky: efficient top-k skyline computation on massive data
    Xixian Han
    Bailing Wang
    Jianzhong Li
    Hong Gao
    Knowledge and Information Systems, 2019, 60 : 415 - 446
  • [26] Computing Skyline Groups:An Experimental Evaluation
    Haoyang Zhu
    Xiaoyong Li
    Qiang Liu
    Hao Zhu
    Tsinghua Science and Technology, 2019, 24 (02) : 171 - 182
  • [27] AN EFFICIENT CONTRIBUTION TO COMPUTING THE SKYLINE ON GPU
    Belaicha, Hadjer
    Zekri, Lougmiri
    Sakhri, Larbi
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2019, 12 (02): : 49 - 66
  • [28] Computing Skyline Groups: An Experimental Evaluation
    Zhu, Haoyang
    Li, Xiaoyong
    Liu, Qiang
    Zhu, Hao
    TSINGHUA SCIENCE AND TECHNOLOGY, 2019, 24 (02) : 171 - 182
  • [29] Analyzing Massive Machine Maintenance Data in a Computing Cloud
    Bahga, Arshdeep
    Madisetti, Vijay K.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2012, 23 (10) : 1831 - 1843
  • [30] Probabilistic Skyline Query Processing over Uncertain Data Streams in Edge Computing Environments
    Lai, Chuan-Chi
    Chen, Yan-Lin
    Liu, Chuan-Ming
    Wang, Li-Chun
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,