ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation

被引:29
|
作者
Kara, Kaan [1 ]
Eguro, Ken [2 ]
Zhang, Ce [1 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Syst Grp, Zurich, Switzerland
[2] Microsoft Res, Redmond, WA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 12卷 / 04期
关键词
REAL-TIME; SCALE;
D O I
10.14778/3297753.3297756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to perform machine learning (ML) tasks in a database management system (DBMS) provides the data analyst with a powerful tool. Unfortunately, integration of ML into a DBMS is challenging for reasons varying from differences in execution model to data layout requirements. In this paper, we assume a column-store main-memory DBMS, optimized for online analytical processing, as our initial system. On this system, we explore the integration of coordinate-descent based methods working natively on columnar format to train generalized linear models. We use a cache-efficient, partitioned stochastic coordinate descent algorithm providing linear throughput scalability with the number of cores while preserving convergence quality, up to 14 cores in our experiments. Existing column oriented DBMS rely on compression and even encryption to store data in memory. When those features are considered, the performance of a CPU based solution suffers. Thus, in the paper we also show how to exploit hardware acceleration as part of a hybrid CPU+FPGA system to provide on-the-fly data transformation combined with an FPGA-based coordinate-descent engine. The resulting system is a column-store DBMS with its important features preserved (e.g., data compression) that offers high performance machine learning capabilities.
引用
收藏
页码:348 / 361
页数:14
相关论文
共 50 条
  • [41] Evaluating approaches for on-the-fly machine learning interatomic potentials for activated mechanisms sampling with the activation-relaxation technique nouveau
    Sanscartier, Eugene
    Saint-Denis, Felix
    Bolduc, Karl-Etienne
    Mousseau, Normand
    JOURNAL OF CHEMICAL PHYSICS, 2023, 158 (24):
  • [42] Realistic On-the-fly Outcomes of Planetary Collisions. II. Bringing Machine Learning to N-body Simulations
    Emsenhuber, Alexandre
    Cambioni, Saverio
    Asphaug, Erik
    Gabriel, Travis S. J.
    Schwartz, Stephen R.
    Furfaro, Roberto
    ASTROPHYSICAL JOURNAL, 2020, 891 (01):
  • [43] On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets
    Aaron Gilad Kusne
    Tieren Gao
    Apurva Mehta
    Liqin Ke
    Manh Cuong Nguyen
    Kai-Ming Ho
    Vladimir Antropov
    Cai-Zhuang Wang
    Matthew J. Kramer
    Christian Long
    Ichiro Takeuchi
    Scientific Reports, 4
  • [44] Analysis and Data Regression of Water Hammer with Column Separation Based on Machine Learning
    Fu, You
    Zhang, Shanshan
    JOURNAL OF PIPELINE SYSTEMS ENGINEERING AND PRACTICE, 2025, 16 (02)
  • [45] Programming-by-Example for Data Transformation to Improve Machine Learning Performance
    Narita, Minori
    Igarashi, Takeo
    PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES: COMPANION (IUI 2019), 2019, : 113 - 114
  • [46] A Review of Big Data and Machine Learning Operations in Official Statistics: MLOps and Feature Store Adoption
    Ramos Nunes, Carlos Eduardo
    Ashofteh, Afshin
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 711 - 718
  • [47] Into the Noddyverse: a massive data store of 3D geological models for machine learning and inversion applications
    Jessell, Mark
    Guo, Jiateng
    Li, Yunqiang
    Lindsay, Mark
    Scalzo, Richard
    Giraud, Jeremie
    Pirot, Guillaume
    Cripps, Ed
    Ogarko, Vitaliy
    EARTH SYSTEM SCIENCE DATA, 2022, 14 (01) : 381 - 392
  • [48] Impact of Big Data and Machine Learning on Digital Transformation in Marketing: A Literature Review
    Miklosik, Andrej
    Evans, Nina
    IEEE ACCESS, 2020, 8 : 101284 - 101292
  • [49] Modelling In-Store Consumer Behaviour Using Machine Learning and Digital Signage Audience Measurement Data
    Ravnik, Robert
    Solina, Franc
    Zabkar, Vesna
    VIDEO ANALYTICS FOR AUDIENCE MEASUREMENT, 2014, 8811 : 123 - 133
  • [50] Modelling In-Store Consumer Behaviour Using Machine Learning and Digital Signage Audience Measurement Data
    Ravnik, Robert
    Solina, Franc
    Zabkar, Vesna
    Ravnik, Robert (robert.ravnik@fri.uni-lj.si), 1600, Springer Verlag (8811): : 123 - 133