Neural Factorization Machines for Sparse Predictive Analytics

被引：997

作者：

He, Xiangnan ^{[1
]}

Chua, Tat-Seng ^{[1
]}

机构：

[1] Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore

来源：

SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2017年

基金：

新加坡国家研究基金会;

关键词：

Factorization Machines; Neural Networks; Deep Learning; Sparse Data; Regression; Recommendation;

D O I：

10.1145/3077136.3080777

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many predictive tasks of web applications need to model categorical variables, such as user IDs and demographics like genders and occupations. To apply standard machine learning techniques, these categorical predictors are always converted to a set of binary features via one-hot encoding, making the resultant feature vector highly sparse. To learn from such sparse data effectively, it is crucial to account for the interactions between features. Factorization Machines (FMs) are a popular solution for efficiently using the second-order feature interactions. However, FM models feature interactions in a linear way, which can be insufficient for capturing the non-linear and complex inherent structure of real-world data. While deep neural networks have recently been applied to learn non-linear feature interactions in industry, such as the Wide&Deep by Google and DeepCross by Microsoft, the deep structure meanwhile makes them difficult to train. In this paper, we propose a novel model Neural Factorization Machine (NFM) for prediction under sparse settings. NFM seamlessly combines the linearity of FM in modelling second-order feature interactions and the non-linearity of neural network in modelling higher-order feature interactions. Conceptually, NFM is more expressive than FM since FM can be seen as a special case of NFM without hidden layers. Empirical results on two regression tasks show that with one hidden layer only, NFM significantly outperforms FM with a 7.3% relative improvement. Compared to the recent deep learning methods Wide&Deep and DeepCross, our NFM uses a shallower structure but offers better performance, being much easier to train and tune in practice.

引用

页码：355 / 364

页数：10

共 50 条

[31] Sparse Cholesky factorization on GPU
Zou, Dan
Dou, Yong
Guo, Song
Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (07): : 1445 - 1454
[32] Factorization of bivariate sparse polynomials
Amoroso, Francesco
Sombra, Martin
ACTA ARITHMETICA, 2019, 191 (04) : 361 - 381
[33] Sparse bivariate polynomial factorization
WenYuan Wu
JingWei Chen
Yong Feng
Science China Mathematics, 2014, 57 : 2123 - 2142
[34] On constrained sparse matrix factorization
Zheng, Wei-Shi
Li, Stan Z.
Lai, J. H.
Liao, Shengcai
2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 652 - +
[35] Sparse bivariate polynomial factorization
WU WenYuan
CHEN JingWei
FENG Yong
Science China(Mathematics), 2014, 57 (10) : 2123 - 2142
[36] Factorization of Sparse Bayesian Networks
Stern, Julio Michael
Colla, Ernesto Coutinho
NEW ADVANCES IN INTELLIGENT DECISION TECHNOLOGIES, 2009, 199 : 275 - 285
[37] Modifying a sparse Cholesky factorization
Davis, TA
Hager, WW
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 1999, 20 (03) : 606 - 627
[38] Sparse Exact Factorization Update
Chen, Jinhao
Davis, Timothy A.
Lourenco, Christopher
Moreno-Centeno, Erick
PROCEEDINGS OF IA3 2021: 2021 IEEE/ACM 11TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURES AND ALGORITHMS, 2021, : 35 - 42
[39] Parallel sparse Cholesky factorization
Monien, B
Schulze, J
SOLVING IRREGULARLY STRUCTURED PROBLEMS IN PARALLEL, 1997, 1253 : 255 - 272
[40] Parallel sparse Cholesky factorization
Schulze, J
MULTISCALE PHENOMENA AND THEIR SIMULATION, 1997, : 292 - 296

← 1 2 3 4 5 →