Determining the number of latent factors in statistical multi-relational learning

被引:0
|
作者
Shi, Chengchun [1 ]
Lu, Wenbin [1 ]
Song, Rui [1 ]
机构
[1] Department of Statistics, North Carolina State University, Raleigh,NC,27695, United States
关键词
Knowledge graph - Maximum likelihood estimation - Learning systems - Sampling - Statistics - Factorization;
D O I
暂无
中图分类号
学科分类号
摘要
Statistical relational learning is primarily concerned with learning and inferring relationships between entities in large-scale knowledge graphs. Nickel et al. (2011) proposed a RESCAL tensor factorization model for statistical relational learning, which achieves better or at least comparable results on common benchmark data sets when compared to other state-of-the-art methods. Given a positive integer s, RESCAL computes an s-dimensional latent vector for each entity. The latent factors can be further used for solving relational learning tasks, such as collective classification, collective entity resolution and link-based clustering. The focus of this paper is to determine the number of latent factors in the RESCAL model. Due to the structure of the RESCAL model, its log-likelihood function is not concave. As a result, the corresponding maximum likelihood estimators (MLEs) may not be consistent. Nonetheless, we design a specific pseudometric, prove the consistency of the MLEs under this pseudometric and establish its rate of convergence. Based on these results, we propose a general class of information criteria and prove their model selection consistencies when the number of relations is either bounded or diverges at a proper rate of the number of entities. Simulations and real data examples show that our proposed information criteria have good finite sample properties. © 2019 Chengchun Shi, Wenbin Lu, Rui Song.
引用
收藏
相关论文
共 50 条
  • [31] Learning Product Embedding from Multi-relational User Behavior
    Zhang, Zhao
    Chen, Weizheng
    Ren, Xiaoxuan
    Zhang, Yan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT I, 2018, 10937 : 513 - 524
  • [32] Limits of multi-relational graphs
    Alvarado, Juan
    Wang, Yuyi
    Ramon, Jan
    MACHINE LEARNING, 2023, 112 (01) : 177 - 216
  • [33] Classification of Multi-relational Databases
    Wang, Xinchun
    Zhang, Sujuan
    APPLIED INFORMATICS AND COMMUNICATION, PT 2, 2011, 225 : 390 - +
  • [34] Multi-relational Clustering Based on Relational Distance
    Luan, Luan
    Li, Yun
    Yin, Jiang
    Sheng, Yan
    INTERNATIONAL CONFERENCE ON APPLIED PHYSICS AND INDUSTRIAL ENGINEERING 2012, PT C, 2012, 24 : 1982 - 1989
  • [35] Multi-relational discretization methods
    He, Jun
    Xie, Yebo
    Liu, Hongyan
    Gu, Yingqin
    Du, Xiaoyong
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2010, 50 (01): : 40 - 44
  • [36] Limits of multi-relational graphs
    Juan Alvarado
    Yuyi Wang
    Jan Ramon
    Machine Learning, 2023, 112 : 177 - 216
  • [37] FACTORBASE: multi-relational structure learning with SQL all the way
    Oliver Schulte
    Zhensong Qian
    International Journal of Data Science and Analytics, 2019, 7 : 289 - 309
  • [38] FACTORBASE : Multi-Relational Model Learning with SQL All The Way
    Qian, Zhensong
    Schulte, Oliver
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 438 - 447
  • [39] FACTORBASE: multi-relational structure learning with SQL all the way
    Schulte, Oliver
    Qian, Zhensong
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2019, 7 (04) : 289 - 309
  • [40] Multi-relational Clustering Based on Relational Distance
    Wei, Liting
    Li, Yun
    2015 12TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2015, : 297 - 300