Effective data management strategy and RDD weight cache replacement strategy in Spark

被引:3
|
作者
Jiang, Kun [1 ,3 ]
Du, Shaofeng [2 ]
Zhao, Fu [2 ]
Huang, Yong [4 ]
Li, Chunlin [1 ,3 ]
Luo, Youlong [3 ]
机构
[1] China Inst Water Resources & Hydropower Res, Key Lab Construction & Safety Water Engn, Minist Water Resources, Beijing, Peoples R China
[2] State Key Lab Smart Mfg Special Vehicles & Transmi, Baotou, Peoples R China
[3] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430063, Peoples R China
[4] Chongqing Univ, Key Lab New Technol Construction Cities Mt Area, Minist Educ, Chongqing 400045, Peoples R China
关键词
Data shuffling; Data management; Cache gain; RDD partition weights; Adaptive cache replacement; DATA PLACEMENT; ALGORITHM;
D O I
10.1016/j.comcom.2022.07.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the dramatic increase in internet users and their demand for real-time network performance, Spark has distributed computing environment has emerged. It is widely used due to its high-performance caching mechanism and high scalability. In the face of the unpredictability of data access patterns in the current big data environment, the data shuffling phase is prone to the problems of under-utilization of Spark cluster resources, high computational latency, and high task processing latency. Based on this, this paper proposes an intermediate data management strategy based on the data shuffling phase. Firstly, the size of the data generated in the data shuffling phase of the Spark platform is predicted by random sampling. The strength division strategy obtains the skewed data degree to obtain the part with excessive skew deviation. Finally, the adaptive data management strategy is applied to perform the corresponding computation tasks by the data deviation. In addition, to improve the response time, memory usage, and computation latency of Spark applications, an adaptive cache replacement algorithm based on RDD partition weights is proposed, which takes into account the influence of four weight factors such as computation cost, usage times, partition size and life cycle of RDDs by reasonably calculating the RDD partition weight values. Compared with the current mainstream baseline algorithms, the data management algorithm based on the data mash-up phase proposed in this paper can effectively reduce resource usage and computational response latency. The RDD-based partition weighted adaptive cache replacement algorithm proposed in this paper can fully use memory resources and effectively reduce the problem of resource wastage.
引用
收藏
页码:66 / 85
页数:20
相关论文
共 50 条
  • [31] LCS: An Efficient Data Eviction Strategy for Spark
    Yuanzhen Geng
    Xuanhua Shi
    Cheng Pei
    Hai Jin
    Wenbin Jiang
    International Journal of Parallel Programming, 2017, 45 : 1285 - 1297
  • [32] Cache Management Strategy for CCN Based on Content Popularity
    Bernardini, Cesar
    Silverston, Thomas
    Festor, Olivier
    EMERGING MANAGEMENT MECHANISMS FOR THE FUTURE INTERNET (AIMS 2013), 2013, 7943 : 92 - 95
  • [33] Flow Correlator: A Flow Table Cache Management Strategy
    McHale, Luke
    Gratz, Paul, V
    Sprintson, Alex
    2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024, 2024,
  • [34] LLRU: Late LRU replacement strategy for power efficient embedded cache
    Raveendran, Biju K.
    Sudarshan, T. S. B.
    Kumar, P. Dilip
    Tangudu, Priyanka
    Gurunarayanan, S.
    ADCOM 2007: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATIONS, 2007, : 339 - 344
  • [35] A Novel Technological Breakthrough in Effective Body Recomposition and Weight Management: A Therapeutic Strategy
    Bagehi, Debasis
    Downs, Bernard W.
    Bagchi, Manashi
    Kushner, Steve W.
    Morrison, Bruce S.
    FASEB JOURNAL, 2022, 36
  • [36] An Advanced LRU cache replacement strategy for content-centric network
    Tang, Bin
    Zhang, Guoyin
    Xing, Zhijing
    Wu, Yanxia
    Wang, Xianghui
    PROGRESS IN MECHATRONICS AND INFORMATION TECHNOLOGY, PTS 1 AND 2, 2014, 462-463 : 884 - 890
  • [37] Cache replacement strategy based on energy model for interactive streaming media
    School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
    Beijing Ligong Daxue Xuebao, 2007, 8 (684-688): : 684 - 688
  • [38] Value-driven Cache Replacement Strategy in Mobile Edge Computing
    Wei, Hua
    Luo, Hong
    Sun, Yan
    Obaidat, Mohammad S.
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [39] An Effective Inventory Management Control Strategy
    Li, Zhu-lin
    Liu, Fen
    Qiao, Kelin
    ADVANCES IN ELECTRICAL ENGINEERING AND AUTOMATION, 2012, 139 : 99 - 105
  • [40] A safe and effective management strategy for BCVI
    Harrigan, Mark R.
    Marques, Marisa B.
    Williams, Lance A., III
    JOURNAL OF TRAUMA AND ACUTE CARE SURGERY, 2017, 82 (01): : 228 - 228