Effective data management strategy and RDD weight cache replacement strategy in Spark

被引:3
|
作者
Jiang, Kun [1 ,3 ]
Du, Shaofeng [2 ]
Zhao, Fu [2 ]
Huang, Yong [4 ]
Li, Chunlin [1 ,3 ]
Luo, Youlong [3 ]
机构
[1] China Inst Water Resources & Hydropower Res, Key Lab Construction & Safety Water Engn, Minist Water Resources, Beijing, Peoples R China
[2] State Key Lab Smart Mfg Special Vehicles & Transmi, Baotou, Peoples R China
[3] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430063, Peoples R China
[4] Chongqing Univ, Key Lab New Technol Construction Cities Mt Area, Minist Educ, Chongqing 400045, Peoples R China
关键词
Data shuffling; Data management; Cache gain; RDD partition weights; Adaptive cache replacement; DATA PLACEMENT; ALGORITHM;
D O I
10.1016/j.comcom.2022.07.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the dramatic increase in internet users and their demand for real-time network performance, Spark has distributed computing environment has emerged. It is widely used due to its high-performance caching mechanism and high scalability. In the face of the unpredictability of data access patterns in the current big data environment, the data shuffling phase is prone to the problems of under-utilization of Spark cluster resources, high computational latency, and high task processing latency. Based on this, this paper proposes an intermediate data management strategy based on the data shuffling phase. Firstly, the size of the data generated in the data shuffling phase of the Spark platform is predicted by random sampling. The strength division strategy obtains the skewed data degree to obtain the part with excessive skew deviation. Finally, the adaptive data management strategy is applied to perform the corresponding computation tasks by the data deviation. In addition, to improve the response time, memory usage, and computation latency of Spark applications, an adaptive cache replacement algorithm based on RDD partition weights is proposed, which takes into account the influence of four weight factors such as computation cost, usage times, partition size and life cycle of RDDs by reasonably calculating the RDD partition weight values. Compared with the current mainstream baseline algorithms, the data management algorithm based on the data mash-up phase proposed in this paper can effectively reduce resource usage and computational response latency. The RDD-based partition weighted adaptive cache replacement algorithm proposed in this paper can fully use memory resources and effectively reduce the problem of resource wastage.
引用
收藏
页码:66 / 85
页数:20
相关论文
共 50 条
  • [41] Dynamic rule set mapping strategy for the design of effective Semantic cache
    Sumalatha, M. R.
    Vaidehi, V.
    Kannan, A.
    Rajasekar, M.
    Karthigaiselvan, M.
    9TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY: TOWARD NETWORK INNOVATION BEYOND EVOLUTION, VOLS 1-3, 2007, : 1952 - +
  • [42] Strategy for pharmacy data management
    Wolfe, Adam
    Hess, Liz
    La, Mary K.
    Pappas, Ashley L.
    Moore, Ryan
    Granko, Robert
    Daniels, Rowell
    AMERICAN JOURNAL OF HEALTH-SYSTEM PHARMACY, 2017, 74 (02) : 79 - 85
  • [43] A Cache Management Strategy for Content Store in Content Centric Network
    Ma, Ge
    Chen, Zhen
    Zhao, Kaichen
    2013 FOURTH INTERNATIONAL CONFERENCE ON NETWORKING AND DISTRIBUTED COMPUTING (ICNDC), 2013, : 94 - 99
  • [44] Robustness of Age Replacement Strategy in Maintenance Management
    Wen, Liang
    Wu, Su
    Jia, Xisheng
    Li, Binfeng
    2011 INTERNATIONAL CONFERENCE ON QUALITY, RELIABILITY, RISK, MAINTENANCE, AND SAFETY ENGINEERING (ICQR2MSE), 2011, : 614 - 619
  • [45] Design of a database and cache management strategy for a global information infrastructure
    Francis, P
    Sato, SY
    THIRD INTERNATIONAL SYMPOSIUM ON AUTONOMOUS DECENTRALIZED SYSTEMS - ISADS 97 - PROCEEDINGS, 1997, : 283 - 290
  • [46] Robustness of Age Replacement Strategy in Maintenance Management
    Wen, Liang
    Wu, Su
    Jia, Xisheng
    Li, Binfeng
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (12B): : 5533 - 5540
  • [47] Collaborative Video Cache Management Strategy in Mobile Edge Computing
    Sang, Zihao
    Guo, Songtao
    Wang, Ying
    2021 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2021,
  • [48] Semantic Cache Replacement Strategy for XML Algebra-Based Query Optimization
    XU Fangfang
    LI Yaoyao
    GU Jinguang
    WuhanUniversityJournalofNaturalSciences, 2015, 20 (02) : 165 - 172
  • [49] An NDN Cache-Optimization Strategy Based on Dynamic Popularity and Replacement Value
    Zha, Yuli
    Cui, Pengshuai
    Hu, Yuxiang
    Xue, Lei
    Lan, Julong
    Wang, Yu
    ELECTRONICS, 2022, 11 (19)
  • [50] A User-Relationship-Based Cache Replacement Strategy for Mobile Social Network
    Xing, Qiyuan
    Li, Yue
    Wang, Jing
    Han, Yanbo
    2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 375 - 380