The Performance Analysis of Distributed Storage Systems Used in Scalable Web Systems

被引:0
|
作者
Oles, Dominik [1 ]
Nowak, Ziemowit [2 ]
机构
[1] Tieto Czech Sro, 28 Rijna 3346-91, Ostrava 70200, Czech Republic
[2] Wroclaw Univ Sci & Technol, Fac Comp Sci & Management, Wybrzeze Wyspianskiego 27, PL-50370 Wroclaw, Poland
关键词
Big Data; Hadoop; HBase; Kudu;
D O I
10.1007/978-3-319-99981-4_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scalable web systems are directly related to distributed storage systems used to process large amounts of data (big data). An example of such a system is Hadoop with its many extensions supporting data storage such as SQL-on-Hadoop systems and the "Parquet" file format. Another kind of systems for storing and processing big data are NoSQL databases, such as HBase, which are used in applications requiring fast random access. The Kudu system was created to combine the advantages of Hadoop and HBase and enable both effective data set analysis and fast random access. As subject of the research, performance analysis of the mentioned systems was performed. The experiment was conducted in the Amazon Web Services public cloud environment, where the cluster of nine virtual machines was configured. For research purpose, containing about billion rows fragment of "Wikipedia Page Traffic Statistics" public dataset was used. The results of the measurements confirm that the Kudu system is a promising alternative to the commonly used technologies.
引用
收藏
页码:287 / 298
页数:12
相关论文
共 50 条
  • [31] Distributed & Scalable Terahertz Systems in Silicon
    Sherry, Hani
    2017 42ND INTERNATIONAL CONFERENCE ON INFRARED, MILLIMETER, AND TERAHERTZ WAVES (IRMMW-THZ), 2017,
  • [32] Scalable Online Monitoring of Distributed Systems
    Basin, David
    Gras, Matthieu
    Krstic, Srdan
    Schneider, Joshua
    RUNTIME VERIFICATION (RV 2020), 2020, 12399 : 197 - 220
  • [33] Web Services in Distributed Information Systems: Availability, Performance and Composition
    Zhao, Xia
    Wang, Tao
    Liu, Enjie
    Clapworthy, Gordon J.
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2010, 1 (01) : 1 - 16
  • [34] Distributed systems management on the web
    Reed, B
    Peercy, M
    Robinson, E
    INTEGRATED NETWORK MANAGEMENT V: INTEGRATED MANAGEMENT IN A VIRTUAL WORLD, 1997, : 85 - 95
  • [35] Network Aware Reliability Analysis for Distributed Storage Systems
    Epstein, Amir
    Kolodner, Elliot K.
    Sotnikov, Dmitry
    PROCEEDINGS OF 2016 IEEE 35TH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS), 2016, : 249 - 258
  • [36] Scalable performance analysis of parallel systems: Concepts and experiences
    Brunsta, H
    Nagel, WE
    PARALLEL COMPUTING: SOFTWARE TECHNOLOGY, ALGORITHMS, ARCHITECTURES AND APPLICATIONS, 2004, 13 : 737 - 744
  • [37] Performance measurement and analysis tools for extremely scalable systems
    Mohr, B.
    Wylie, B. J. N.
    Wolf, F.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2010, 22 (16): : 2212 - 2229
  • [38] Delay Performance of Direct Reads in Distributed Storage Systems with Coding
    Shuai, Qiqi
    Li, Victor O. K.
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 184 - 189
  • [39] A Study on the Performance of Distributed Storage Systems in Edge Computing Environments
    Makris, Antonios
    Kontopoulos, Ioannis
    Xyalis, Stylianos Nektarios
    Psomakelis, Evangelos
    Theodoropoulos, Theodoros
    Varvarigos, Andreas
    Tserpes, Konstantinos
    2024 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING, JCC, 2024, : 29 - 36
  • [40] Quota enforcement for high-performance distributed storage systems
    Pollack, Kristal T.
    Long, Darrell D. E.
    Golding, Richard A.
    Becker-Szendy, Ralph A.
    Reed, Benjamin
    24TH IEEE CONFERENCE ON MASS STORAGE SYSTEMS AND TECHNOLOGIES, PROCEEDINGS, 2007, : 72 - +