Learning Models over Relational Data Using Sparse Tensors and Functional Dependencies

被引:19
|
作者
Khamis, Mahmoud Abo [1 ]
Ngo, Hung Q. [1 ]
Nguyen, Xuanlong [2 ]
Olteanu, Dan [3 ]
Schleich, Maximilian [3 ]
机构
[1] Relat AI Inc, 2120 Univ Ave, Berkeley, CA 94704 USA
[2] Univ Michigan, 461 West Hall,1085 South Univ, Ann Arbor, MI 48109 USA
[3] Univ Oxford, Wolfson Bldg,Parks Rd, Oxford OX1 3QD, England
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2020年 / 45卷 / 02期
关键词
In-database analytics; functional aggregate queries; functional dependencies; model reparameterization; tensors; LINEAR ALGEBRA; LIBRARY;
D O I
10.1145/3375661
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool. These integrated solutions are also a fertile ground of theoretically fundamental and challenging problems at the intersection of relational and statistical data models. This article introduces a unified framework for training and evaluating a class of statistical learning models over relational databases. This class includes ridge linear regression, polynomial regression, factorization machines, and principal component analysis. We show that, by synergizing key tools from database theory such as schema information, query structure, functional dependencies, recent advances in query evaluation algorithms, and from linear algebra such as tensor and matrix operations, one can formulate relational analytics problems and design efficient (query and data) structure-aware algorithms to solve them. This theoretical development informed the design and implementation of the AC/DC system for structure-aware learning. We benchmark the performance of AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting and advertisement planning applications, AC/DC can learn polynomial regression models and factorization machines with at least the same accuracy as its competitors and up to three orders of magnitude faster than its competitors whenever they do not run out of memory, exceed 24-hour timeout, or encounter internal design limitations.
引用
收藏
页数:66
相关论文
共 50 条
  • [1] Learning directed relational models with recursive dependencies
    Oliver Schulte
    Hassan Khosravi
    Tong Man
    Machine Learning, 2012, 89 : 299 - 316
  • [2] Learning directed relational models with recursive dependencies
    Schulte, Oliver
    Khosravi, Hassan
    Man, Tong
    MACHINE LEARNING, 2012, 89 (03) : 299 - 316
  • [3] Data dependencies over rough relational expressions
    Nakata, M
    Murai, T
    10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 1543 - 1546
  • [4] A THEORY OF DATA DEPENDENCIES OVER RELATIONAL EXPRESSIONS
    CASANOVA, MA
    INTERNATIONAL JOURNAL OF COMPUTER & INFORMATION SCIENCES, 1983, 12 (03): : 151 - 191
  • [5] A functional dependencies checking method in relational data
    Zhong P.
    Li Z.-H.
    Chen Q.
    1600, Science Press (40): : 207 - 222
  • [7] Learning Models over Relational Data: A Brief Tutorial
    Schleich, Maximilian
    Olteanu, Dan
    Abo-Khamis, Mahmoud
    Ngo, Hung Q.
    Nguyen, XuanLong
    SCALABLE UNCERTAINTY MANAGEMENT, SUM 2019, 2019, 11940 : 423 - 432
  • [8] A literature overview of functional dependencies in fuzzy relational database models
    Vucetic, Miljan
    Vujosevic, Mirko
    TECHNICS TECHNOLOGIES EDUCATION MANAGEMENT-TTEM, 2012, 7 (04): : 1593 - 1604
  • [9] Efficient relational learning from sparse data
    Popelinsky, L
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2002, 2443 : 11 - 20
  • [10] Functional dependencies and the design of relational databases involving imprecise data
    Bosc, P
    Liétard, L
    INFORMATION, UNCERTAINTY AND FUSION, 2000, 516 : 45 - 56