ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation

被引:29
|
作者
Kara, Kaan [1 ]
Eguro, Ken [2 ]
Zhang, Ce [1 ]
Alonso, Gustavo [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Syst Grp, Zurich, Switzerland
[2] Microsoft Res, Redmond, WA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2018年 / 12卷 / 04期
关键词
REAL-TIME; SCALE;
D O I
10.14778/3297753.3297756
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to perform machine learning (ML) tasks in a database management system (DBMS) provides the data analyst with a powerful tool. Unfortunately, integration of ML into a DBMS is challenging for reasons varying from differences in execution model to data layout requirements. In this paper, we assume a column-store main-memory DBMS, optimized for online analytical processing, as our initial system. On this system, we explore the integration of coordinate-descent based methods working natively on columnar format to train generalized linear models. We use a cache-efficient, partitioned stochastic coordinate descent algorithm providing linear throughput scalability with the number of cores while preserving convergence quality, up to 14 cores in our experiments. Existing column oriented DBMS rely on compression and even encryption to store data in memory. When those features are considered, the performance of a CPU based solution suffers. Thus, in the paper we also show how to exploit hardware acceleration as part of a hybrid CPU+FPGA system to provide on-the-fly data transformation combined with an FPGA-based coordinate-descent engine. The resulting system is a column-store DBMS with its important features preserved (e.g., data compression) that offers high performance machine learning capabilities.
引用
收藏
页码:348 / 361
页数:14
相关论文
共 50 条
  • [31] Inclusion of Machine Learning Kernel Ridge Regression Potential Energy Surfaces in On-the-Fly Nonadiabatic Molecular Dynamics Simulation
    Hu, Deping
    Xie, Yu
    Li, Xusong
    Li, Lingyue
    Lan, Zhenggang
    JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2018, 9 (11): : 2725 - 2732
  • [32] An On-the-Fly Approach to Construct Generalized Energy-Based Fragmentation Machine Learning Force Fields of Complex Systems
    Cheng, Zheng
    Zhao, Dongbo
    Ma, Jing
    Li, Wei
    Li, Shuhua
    JOURNAL OF PHYSICAL CHEMISTRY A, 2020, 124 (24): : 5007 - 5014
  • [33] P2C: Understanding Output Data Files via On-the-Fly Transformation from Producer to Consumer Executions
    Kwon, Yonghwi
    Peng, Fei
    Kim, Dohyeong
    Kim, Kyungtae
    Zhang, Xiangyu
    Xu, Dongyan
    Yegneswaran, Vinod
    Qian, John
    22ND ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2015), 2015,
  • [34] Machine learning based on-the-fly kinetic Monte Carlo simulations of sluggish diffusion in Ni-Fe concentrated alloys
    Huang, Wenjiang
    Bai, Xian-Ming
    JOURNAL OF ALLOYS AND COMPOUNDS, 2023, 937
  • [35] Lattice thermal conductivity of ZrSe2 based on the anharmonic phonon approach and on-the-fly machine learning force fields
    Lu, Yong
    Zheng, Fawei
    PHYSICAL REVIEW B, 2024, 109 (01)
  • [36] Prediction of Distillation Column Temperature Using Machine Learning and Data Preprocessing
    Lee, Yechan
    Choi, Yeongryeol
    Cho, Hyungtae
    Kim, Junghwan
    KOREAN CHEMICAL ENGINEERING RESEARCH, 2021, 59 (02): : 191 - 199
  • [37] On-the-fly machine learning force field study of liquid-Al/α-Al2O3 interface
    Zhang, Guicheng
    Liu, Wenting
    Hu, Tao
    Shuai, Sansan
    Chen, Chaoyue
    Xu, Songzhe
    Ren, Wei
    Wang, Jiang
    Ren, Zhongming
    APPLIED SURFACE SCIENCE, 2023, 638
  • [38] Machine-learning-enabled on-the-fly analysis of RHEED patterns during thin film deposition by molecular beam epitaxy
    Kaspar, Tiffany C.
    Akers, Sarah
    Sprueill, Henry W.
    Ter-Petrosyan, Arman H.
    Bilbrey, Jenna A.
    Hopkins, Derek
    Harilal, Ajay
    Christudasjustus, Jijo
    Gemperline, Patrick
    Comes, Ryan B.
    JOURNAL OF VACUUM SCIENCE & TECHNOLOGY A, 2025, 43 (03):
  • [39] Predictive Data Transformation Suggestions in Grafterizer Using Machine Learning
    Sajid, Saliha
    von Zernichow, Bjorn Marius
    Soylu, Ahmet
    Roman, Dumitru
    METADATA AND SEMANTIC RESEARCH, MTSR 2019, 2019, 1057 : 137 - 149
  • [40] On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets
    Kusne, Aaron Gilad
    Gao, Tieren
    Mehta, Apurva
    Ke, Liqin
    Manh Cuong Nguyen
    Ho, Kai-Ming
    Antropov, Vladimir
    Wang, Cai-Zhuang
    Kramer, Matthew J.
    Long, Christian
    Takeuchi, Ichiro
    SCIENTIFIC REPORTS, 2014, 4