A Parallel and Efficient Algorithm for Learning to Match

被引:3
|
作者
Shang, Jingbo [1 ,4 ]
Chen, Tianqi [2 ]
Li, Hang [3 ]
Lu, Zhengdong [3 ]
Yu, Yong [4 ]
机构
[1] Univ Illinois, Champaign, IL 61801 USA
[2] Univ Washington, Seattle, WA 98195 USA
[3] Huawei Noahs Ark Lab, Hong Kong, Hong Kong, Peoples R China
[4] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
来源
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2014年
关键词
MATRIX FACTORIZATION;
D O I
10.1109/ICDM.2014.71
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques, referred to as learning-to-match in this paper, have been successfully applied to the problems. Among them, a class of state-of-the-art methods, named feature-based matrix factorization, formalize the task as an extension to matrix factorization by incorporating auxiliary features into the model. Unfortunately, making those algorithms scale to real world problems is challenging, and simple parallelization strategies fail due to the complex cross talking patterns between sub-tasks. In this paper, we tackle this challenge with a novel parallel and efficient algorithm. Our algorithm, based on coordinate descent, can easily handle hundreds of millions of instances and features on a single machine. The key recipe of this algorithm is an iterative relaxation of the objective to facilitate parallel updates of parameters, with guaranteed convergence on minimizing the original objective function. Experimental results demonstrate that the proposed method is effective on a wide range of matching problems, with efficiency significantly improved upon the baselines while accuracy retained unchanged.
引用
收藏
页码:971 / 976
页数:6
相关论文
共 50 条
  • [41] An efficient parallel algorithm for building the separating tree
    Han, Yijie
    Saxena, Sanjeev
    Shen, Xiaojun
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2010, 70 (06) : 625 - 629
  • [42] A New Efficient Parallel Revised Relaxation Algorithm
    Zhang, Jianjun
    Li, Qinghua
    Song, Yexin
    Qu, Yong
    INTELLIGENT COMPUTING, PART I: INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, ICIC 2006, PART I, 2006, 4113 : 812 - 821
  • [43] A fast and efficient parallel sorting algorithm on LARPBS
    Chen, HJ
    Chen, YX
    Chen, L
    Li, T
    DCABES 2004, Proceedings, Vols, 1 and 2, 2004, : 393 - 397
  • [44] Efficient parallel inversion using the Neighbourhood Algorithm
    Rickwood, P.
    Sambridge, M.
    GEOCHEMISTRY GEOPHYSICS GEOSYSTEMS, 2006, 7
  • [45] A genetic algorithm for learning Bayesian Networks to match training sets
    Ling, YB
    Jiang, YF
    Wu, XJ
    Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, : 1489 - 1492
  • [46] An Efficient Parallel Algorithm for the Set Partition Problem
    Hoang Chi Thanh
    Nguyen Quang Thanh
    NEW CHALLENGES FOR INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2011, 351 : 25 - +
  • [47] An efficient parallel algorithm for LISSOM neural network
    Chang, LC
    Chang, FJ
    PARALLEL COMPUTING, 2002, 28 (11) : 1611 - 1633
  • [49] An Efficient Parallel Algorithm for Polygons Overlay Analysis
    Zhou, Yuke
    Wang, Shaohua
    Guan, Yong
    APPLIED SCIENCES-BASEL, 2019, 9 (22):
  • [50] An Efficient Parallel Sorting Algorithm on Metacube Multiprocessors
    Li, Yamin
    Peng, Shietung
    Chu, Wanming
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PROCEEDINGS, 2009, 5574 : 372 - +