A Study of Resilient Distributed Datasets for Big Data System

被引:0
|
作者
Kim, Da-yeon [1 ]
Shin, Dong-ryeol [1 ]
机构
[1] Sungkyunkwan Univ, Coll Informat & Commun Engn, Suwon, South Korea
关键词
Big data software platform; Hadoop ecosystem; Bigdata service;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present the Resilient Distributed Dataset (RDD) abstraction, on which the rest of the rest of the dissertation builds a general-purpose cluster computing stack. RDDs extend the data flow programming model introduced by MapReduce, which is the most widely used model for large-scale data analysis today. we propose a new abstraction called resilient distributed datasets that gives users direct control of data sharing. RDDs are fault-tolerant, parallel data structures that let users explicitly store data on disk or in memory, control its partitioning, and manipulate it using a rich set of operators. They offer a simple and efficient programming interface that can capture both current specialized models and new applications.
引用
收藏
页码:290 / 293
页数:4
相关论文
共 50 条
  • [41] The Next Boom of Big data in Biology: Multicellular datasets
    Merks, Roeland M. H.
    ERCIM NEWS, 2014, (99): : 11 - 12
  • [42] A Study of Recommendation System for Big Data Environment
    Kim, Jinhong
    Hwang, Sung-Tae
    ADVANCED SCIENCE LETTERS, 2016, 22 (11) : 3506 - 3510
  • [43] Big Data Analytics over Encrypted Datasets with Seabed
    Papadimitriou, Antonis
    Bhagwan, Ranjita
    Chandran, Nishanth
    Ramjee, Ramachandran
    Haeberlen, Andreas
    Singh, Harmeet
    Modi, Abhishek
    Badrinarayanan, Saikrishna
    PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2016, : 587 - 602
  • [44] Study on the Big Data Design of Business System
    Yu, Jian
    Yan, Yu
    PROCEEDINGS OF THE 2015 3RD INTERNATIONAL CONFERENCE ON MACHINERY, MATERIALS AND INFORMATION TECHNOLOGY APPLICATIONS, 2015, 35 : 700 - 703
  • [45] An Efficient Distributed Programming Model for Mining Useful Patterns in Big Datasets
    Karim, Md Rezaul
    Ahmed, Chowdhury Farhan
    Jeong, Byeong-Soo
    Choi, Ho-Jin
    IETE TECHNICAL REVIEW, 2013, 30 (01) : 53 - 63
  • [46] Semantic Analysis Techniques using Twitter Datasets on Big Data: Comparative Analysis Study
    Murshed, Belal Abdullah Hezam
    Al-ariki, Hasib Daowd Esmail
    Mallappa, Suresha
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2020, 35 (06): : 495 - 512
  • [47] Big Data Analysis of Massive PMU Datasets: A Data Platform Perspective
    Kumar, Vijay S.
    Wang, Tianyi
    Aggour, Kareem S.
    Wang, Pengyuan
    Hart, Philip J.
    Yan, Weizhong
    2021 IEEE POWER & ENERGY SOCIETY INNOVATIVE SMART GRID TECHNOLOGIES CONFERENCE (ISGT), 2021,
  • [48] Distributed classification for imbalanced big data in distributed environments
    Wang, Huihui
    Xiao, Mingfei
    Wu, Changsheng
    Zhang, Jing
    WIRELESS NETWORKS, 2024, 30 (05) : 3657 - 3668
  • [49] Resilient and Distributed Multi-Robot Visual SLAM: Datasets, Experiments, and Lessons Learned
    Tian, Yulun
    Chang, Yun
    Quang, Long
    Schang, Arthur
    Nieto-Granda, Carlos
    How, Jonathan P.
    Carlone, Luca
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 11027 - 11034
  • [50] An algebra for distributed Big Data analytics
    Fegaras, Leonidas
    JOURNAL OF FUNCTIONAL PROGRAMMING, 2017, 27