Formal semantics and high performance in declarative machine learning using Datalog

被引:4
|
作者
Wang, Jin [1 ]
Wu, Jiacheng [2 ]
Li, Mingda [1 ]
Gu, Jiaqi [1 ]
Das, Ariyam [1 ]
Zaniolo, Carlo [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Tsinghua Univ, Beijing, Peoples R China
来源
VLDB JOURNAL | 2021年 / 30卷 / 05期
关键词
Datalog; Declarative machine learning; Apache spark; Scalability; COMPRESSED LINEAR ALGEBRA; SCALING-UP; ANALYTICS; OPTIMIZATION; AGGREGATION; SOCIALITE; SYSTEMS; POWER;
D O I
10.1007/s00778-021-00665-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely declarative ML framework with a Datalog query interface. We show that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics. This is achieved by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones. We further provide specialized compilation and planning techniques for semi-naive fixpoint computation in the presence of aggregates and optimization strategies that are effective on diverse recursive programs and distributed data platforms. To test and demonstrate these research advances, we have developed a powerful and user-friendly system on top of Apache Spark. Extensive evaluations on large-scale datasets illustrate that this approach will achieve promising performance gains while improving both programming flexibility and ease of development and deployment for ML applications.
引用
收藏
页码:859 / 881
页数:23
相关论文
共 50 条
  • [1] Formal semantics and high performance in declarative machine learning using Datalog
    Jin Wang
    Jiacheng Wu
    Mingda Li
    Jiaqi Gu
    Ariyam Das
    Carlo Zaniolo
    The VLDB Journal, 2021, 30 : 859 - 881
  • [2] Formal Rules for Concept and Semantics Manipulations in Cognitive Linguistics and Machine Learning
    Wang, Yingxu
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 43 - 50
  • [3] A FORMAL SEMANTICS FOR A DATA-FLOW MACHINE - USING VDM
    JONES, KD
    LECTURE NOTES IN COMPUTER SCIENCE, 1987, 252 : 331 - 355
  • [4] Foundations of Declarative Data Analysis Using Limit Datalog Programs
    Kaminski, Mark
    Grau, Bernardo Cuenca
    Kostylev, Egor, V
    Motik, Boris
    Horrocks, Ian
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1123 - 1130
  • [5] Declarative Machine Learning Systems
    Molino, Piero
    Ré, Christopher
    2021, Association for Computing Machinery (19):
  • [6] Declarative Machine Learning Systems
    Molino, Piero
    Re, Christopher
    COMMUNICATIONS OF THE ACM, 2022, 65 (01) : 42 - 49
  • [7] Formal Semantics and Scalability for Datalog with Aggregates: A Cardinality-Based Solution (Extended Abstract)
    Zaniolo, Carlo
    Das, Ariyam
    Li, Youfu
    Li, Mingda
    Wang, Jin
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2020, (325):
  • [8] SystemML: Declarative Machine Learning on MapReduce
    Ghoting, Amol
    Krishnamurthy, Rajasekar
    Pednault, Edwin
    Reinwald, Berthold
    Sindhwani, Vikas
    Tatikonda, Shirish
    Tian, Yuanyuan
    Vaithyanathan, Shivakumar
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 231 - 242
  • [9] SystemML: Declarative Machine Learning on Spark
    Boehm, Matthias
    Dusenberry, Michael W.
    Eriksson, Deron
    Evfimievski, Alexandre V.
    Manshadi, Faraz Makari
    Pansare, Niketan
    Reinwald, Berthold
    Reiss, Frederick R.
    Sen, Prithviraj
    Surve, Arvind C.
    Tatikonda, Shirish
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (13): : 1425 - 1436
  • [10] Modelling Machine Learning Algorithms on Relational Data with Datalog
    Makrynioti, Nantia
    Vasiloglou, Nikolaos
    Pasalic, Emir
    Vassalos, Vasilis
    PROCEEDINGS OF THE SECOND WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, 2018,