Heterogeneous information network representation learning based on transition probability matrix (HINtpm)

被引:0
|
作者
Zhao T.-T. [1 ]
Wang Z. [1 ]
Lu Y.-N. [1 ]
机构
[1] College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Changchun
关键词
Auto-encoder; Heterogeneous information network (HIN); Meta-path; Network representation learning; Nodes' similarity; Transition probability matrix;
D O I
10.3785/j.issn.1008-973X.2019.03.016
中图分类号
学科分类号
摘要
First, the final probability transition matrix was obtained by combining the first-order and second-order similarity of the nodes, according to the meta-path and the commuting matrix. Then, a Denoisin Auto-encoder was used to reduce the dimension of probability transition matrix for getting the node representation in heterogeneous information network. Finally, the node representation in heterogeneous information network was classified by gradient boosting decision tree (GBDT) and the classification accuracy under different percentage training set was obtained. Use the clustering index normalized mutual information (NMI) to evaluate the clustering effect and use T-SNE to show the visual effect. Experiments were performed on data sets DBLP and AMiner. The proposed heterogeneous information network representation learning based on transition probability matrix (HINtpm) was compared with DeepWalk, node2vec and metapath2vec methods. As results, compared with DeepWalk method, HINtpm improved the classification accuracy by 24% the maximum on the application task-node classify and increased the clustering index NMI by 13% the maximum. © 2019, Zhejiang University Press. All right reserved.
引用
收藏
页码:548 / 554
页数:6
相关论文
共 22 条
  • [1] Tu C.-C., Yang C., Liu Z.-Y., Et al., Network representation learning: an overview, SCIENTIA SINICA: Informations, 47, 8, pp. 980-996, (2017)
  • [2] Roweis S.T., Saul L.K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 5500, pp. 2323-2326, (2000)
  • [3] Cox T.F., Cox M.A.A., Multidimensional Scaling, pp. 123-141, (2000)
  • [4] Belkin M., Niyogi P., Laplacian eigenmaps and spectral techniques for embedding and clustering, Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, pp. 585-591, (2001)
  • [5] Chen M., Yang Q., Tang X.O., Directed graph embedding, Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2707-2712, (2007)
  • [6] Nallapati R., Cohen W.W., Link-PLSA-LDA: a new unsupervised model for topics and influence of blogs, Proceedings of the 2nd International Conference on Weblogs and Social Media, pp. 84-92, (2008)
  • [7] Chang J., Blei D.M., Relational topic models for document networks, Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, pp. 81-88, (2009)
  • [8] Le T.M.V., Lauw H.W., Probabilistic latent document network embedding, Proceedings of the 2014 International Conference on Data Mining, pp. 270-279, (2014)
  • [9] Ward C.K., Word2Vec, Natural Language Engineering, 23, 1, pp. 155-162, (2016)
  • [10] Tang J., Qu M., Wang M., Et al., Line: large-scale information network embedding, Proceedings of the 24th International Conference on World Wide Web, pp. 1067-1077, (2015)