A Holistic Approach to Data Access for Cloud-Native Analytics and Machine Learning

被引:3
|
作者
Koutsovasilis, Panos [1 ]
Venugopal, Srikumar [1 ]
Gkoufas, Yiannis [1 ]
Pinto, Christian [1 ]
机构
[1] IBM Res Europe Dublin, Dublin, Ireland
关键词
D O I
10.1109/CLOUD53861.2021.00084
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cloud providers offer a variety of storage solutions for hosting data, both in price and in performance. For Analytics and machine learning applications, object storage services are the go-to solution for hosting the datasets that exceed tens of gigabytes in size. However, such a choice results in performance degradation for these applications and requires extra engineering effort in the form of code changes to access the data on remote storage. In this paper, we present a generic end-to-end solution that offers seamless data access for remote object storage services, transparent data caching within the compute infrastructure, and data-aware topologies that boost the performance of applications deployed in Kubernetes. We introduce a custom-implemented cache mechanism that supports all the requirenents of the former and we demonstrate that our holistic solution leads up to 48% improvement for Spark implenentation of the TPC-DS benchmark and up to 191% improvement for the training of deep learning models from the MLPerf benchmark suite.
引用
收藏
页码:654 / 659
页数:6
相关论文
共 50 条
  • [41] Towards a Cloud-Native 5G Service Chaining for IoT and Video Analytics in Smart Campus
    Mohamed, Ramy
    Zemouri, Sofiane
    2022 5TH CONFERENCE ON CLOUD AND INTERNET OF THINGS, CIOT, 2022, : 186 - 188
  • [42] DQN Approach for Adaptive Self-Healing of VNFs in Cloud-Native Network
    Arulappan, Arunkumar
    Mahanti, Aniket
    Passi, Kalpdrum
    Srinivasan, Thiruvenkadam
    Naha, Ranesh
    Raja, Gunasekaran
    IEEE ACCESS, 2024, 12 : 34489 - 34504
  • [43] ITS_LIVE: A Cloud-Native Approach to Monitoring Glaciers From Space
    Lopez, Luis A.
    Gardner, Alex S.
    Greene, Chad A.
    Kennedy, Joseph H.
    Liukis, Maria
    Fahnestock, Mark A.
    Scambos, Ted
    Fahnestock, Jacob R.
    COMPUTING IN SCIENCE & ENGINEERING, 2023, 25 (06) : 49 - 56
  • [44] Machine learning for big data analytics
    Oja, E. (erkki.oja@aalto.fi), 1600, Springer Verlag (384):
  • [45] Toward Cloud-Native VNFs: An ETSI NFV Management and Orchestration Standards Approach
    Aelken J.
    Triay J.
    Chatras B.
    De Nicolas A.M.
    IEEE Communications Standards Magazine, 2024, 8 (02): : 12 - 19
  • [46] Ganos Aero: A Cloud-Native System for Big Raster Data Management and Processing
    Xiao, Fei
    Xie, Jiong
    Chen, Zhida
    Li, Feifei
    Chen, Zhen
    Liu, Jianwei
    Liu, Yinpei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3966 - 3969
  • [47] Data Analytics and Machine Learning in Education
    Gomez-Pulido, Juan A. A.
    Park, Young
    Soto, Ricardo
    Lanza-Gutierrez, Jose M.
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [48] Machine learning and data analytics for the IoT
    Adi, Erwin
    Anwar, Adnan
    Baig, Zubair
    Zeadally, Sherali
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (20): : 16205 - 16233
  • [49] Machine learning and data analytics for the IoT
    Erwin Adi
    Adnan Anwar
    Zubair Baig
    Sherali Zeadally
    Neural Computing and Applications, 2020, 32 : 16205 - 16233
  • [50] Porting Non Cloud-native Applications across Linux Distributions: A Practical Approach
    Kumar, Sanjeet
    Das, Suvrojit
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE (CLOSER), 2022, : 272 - 279