A Study of Resilient Distributed Datasets for Big Data System

被引:0
|
作者
Kim, Da-yeon [1 ]
Shin, Dong-ryeol [1 ]
机构
[1] Sungkyunkwan Univ, Coll Informat & Commun Engn, Suwon, South Korea
关键词
Big data software platform; Hadoop ecosystem; Bigdata service;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present the Resilient Distributed Dataset (RDD) abstraction, on which the rest of the rest of the dissertation builds a general-purpose cluster computing stack. RDDs extend the data flow programming model introduced by MapReduce, which is the most widely used model for large-scale data analysis today. we propose a new abstraction called resilient distributed datasets that gives users direct control of data sharing. RDDs are fault-tolerant, parallel data structures that let users explicitly store data on disk or in memory, control its partitioning, and manipulate it using a rich set of operators. They offer a simple and efficient programming interface that can capture both current specialized models and new applications.
引用
收藏
页码:290 / 293
页数:4
相关论文
共 50 条
  • [21] LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data
    Tang, Mingjie
    Yu, Yongyang
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Aref, Walid G.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (13): : 1565 - 1568
  • [22] Towards the Big Data in Official Statistics: An Analytic Service Framework for Distributed Multiple Sourced Heterogeneous Datasets
    Zhao, Zhuo
    Li, Xingying
    Li, Shanzi
    Wu, Yixuan
    Zhao, Xin
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,
  • [23] What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets
    Kitchin, Rob
    McArdle, Gavin
    BIG DATA & SOCIETY, 2016, 3 (01): : 1 - 10
  • [24] Performance Study of Distributed Big Data Analysis in YARN Cluster
    Ahn, HooYoung
    Kim, Hyunjae
    You, WoongShik
    2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1261 - 1266
  • [25] A STUDY ON THE ERROR OF DISTRIBUTED ALGORITHMS FOR BIG DATA CLASSIFICATION WITH SVM
    Wang, Cheng
    Cao, Feilong
    ANZIAM JOURNAL, 2017, 58 (3-4): : 231 - 237
  • [26] Scalability and Efficiency in Distributed Big Data Architectures: A Comparative Study
    Manikandan, K.
    Pamisetty, Vamsee
    Challa, Srinivas Rao
    Komaragiri, Venkata Bhardwaj
    Challa, Kishore
    Chava, Karthik
    METALLURGICAL & MATERIALS ENGINEERING, 2025, 31 (03) : 40 - 49
  • [27] Resilient Distributed Parameter Estimation With Heterogeneous Data
    Chen, Yuan
    Kar, Soummya
    Moura, Jose M. E.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (19) : 4918 - 4933
  • [28] GDedup: Distributed File System Level Deduplication for Genomic Big Data
    Bartus, Paul
    Arzuaga, Emmanuel
    2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 120 - 127
  • [29] Scalable Distributed Data Anonymization for Large Datasets
    di Vimercati, Sabrina De Capitani
    Facchinetti, Dario
    Foresti, Sara
    Livraga, Giovanni
    Oldani, Gianluca
    Paraboschi, Stefano
    Rossi, Matthew
    Samarati, Pierangela
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (03) : 818 - 831
  • [30] Global Classifier for Confidential Data in Distributed Datasets
    Jasso-Luna, Omar
    Sosa-Sosa, Victor
    Lopez-Arevalo, Ivan
    MICAI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5317 : 315 - 324