Survey on Cloud-native Databases

被引:0
|
作者
Dong H.-W. [1 ]
Zhang C. [1 ]
Li G.-L. [1 ]
Feng J.-H. [1 ]
机构
[1] Department of Computer Science and Technology, Tsinghua University, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2024年 / 35卷 / 02期
关键词
cloud-native database; compute-storage disaggregation; database-as-a-service (DBaaS);
D O I
10.13328/j.cnki.jos.006952
中图分类号
学科分类号
摘要
The virtualization, high availability, high scheduling elasticity, and other characteristics of cloud infrastructure provide cloud databases with many advantages, such as the out-of-the-box feature, high reliability and availability, and pay-as-you-go model. Cloud databases can be divided into two categories according to the architecture design: cloud-hosted databases and cloud-native databases. Cloud-hosted databases, deploying the database system in the virtual machine environment on the cloud, offer the advantages of low cost, easy operation and maintenance, and high reliability. Besides, cloud-native databases take full advantage of the characteristic elastic scaling of the cloud infrastructure. The disaggregated compute and storage architecture is adopted to achieve the independent scaling of computing and storage resources and further increase the cost-performance ratio of the databases. However, the disaggregated compute and storage architecture poses new challenges to the design of database systems. This survey is an in-depth analysis of the architecture and technology of the cloud-native database system. Specifically, the architectures of cloud-native online transaction processing (OLTP) and online analytical processing (OLAP) databases are classified and analyzed, respectively, according to the difference in the resource disaggregation mode, and the advantages and limitations of each architecture are compared. Then, on the basis of the disaggregated compute and storage architectures, this study explores the key technologies of cloud-native databases in depth by functional modules. The technologies under discussion include those of cloud-native OLTP (data organization, replica consistency, main/standby synchronization, failure recovery, and mixed workload processing) and those of cloud-native OLAP (storage management, query processing, serverless-aware compute, data protection, and machine learning optimization). At last, the study summarizes the technical challenges for existing cloud-native databases and suggests the directions for future research. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:899 / 926
页数:27
相关论文
共 62 条
  • [41] Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T., Dremel: Interactive analysis of Web-scale datasets, Proc. of the VLDB Endowment, 3, 1–2, pp. 330-339, (2010)
  • [42] Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T, Ahmadi H, Delorey D, Min S, Pasumansky M, Shute J., Dremel: A decade of interactive SQL analysis at Web scale, Proc. of the VLDB Endowment, 13, 12, pp. 3461-3472, (2020)
  • [43] Yu XY, Youill M, Woicik M, Ghanem A, Serafini M, Aboulnaga A, Stonebraker M., PushdownDB: Accelerating a DBMS using S3 computation, Proc. of the 36th IEEE Int’l Conf. on Data Engineering (ICDE), pp. 1802-1805, (2020)
  • [44] Yang YF, Youill M, Woicik M, Liu YZ, Yu XY, Serafini M, Aboulnaga A, Stonebraker M., FlexPushdownDB: Hybrid pushdown and caching in a cloud DBMS, Proc. of the VLDB Endowment, 14, 11, pp. 2101-2113, (2021)
  • [45] (2022)
  • [46] Perron M, Fernandez RC, DeWitt D, Madden S., Starling: A scalable query engine on cloud functions, Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data, pp. 131-141, (2020)
  • [47] Poppe O, Guo Q, Lang W, Arora P, Oslake M, Xu SZ, Kalhan A., Moneyball: Proactive auto-scaling in Microsoft Azure SQL database serverless, Proc. of the VLDB Endowment, 15, 6, pp. 1279-1287, (2022)
  • [48] Das P, Ivkin N, Bansal T, Rouesnel L, Gautier P, Karnin Z, Dirac L, Ramakrishnan L, Perunicic A, Shcherbatyi I, Wu W, Zolic A, Shen HB, Ahmed A, Winkelmolen F, Miladinovic M, Archembeau C, Tang A, Dutt B, Grao P, Venkateswar K., Amazon SageMaker Autopilot: A white box AutoML solution at scale, Proc. of the 4th Int’l Workshop on Data Management for End-to-end Machine Learning, (2020)
  • [49] Parchas P, Naamad Y, Van Bouwel P, Faloutsos C, Petropoulos M., Fast and effective distribution-key recommendation for amazon redshift, Proc. of the VLDB Endowment, 13, 12, pp. 2411-2423, (2020)
  • [50] Schleier-Smith J, Sreekanti V, Khandelwal A, Carreira J, Yadwadkar NJ, Popa RA, Gonzalez JE, Stoica I, Patterson DA., What serverless computing is and should become: The next phase of cloud computing, Communications of the ACM, 64, 5, pp. 76-84, (2021)