Automatic feature selection for supervised learning in link prediction applications: a comparative study

被引:43
|
作者
Pecli, Antonio [1 ]
Cavalcanti, Maria Claudia [2 ]
Goldschmidt, Ronaldo [1 ]
机构
[1] Mil Inst Engn IME, Praca Gen Tiburcio 80, Rio De Janeiro, Brazil
[2] Mil Inst Engn IME, Comp Engn Dept, Praca Gen Tiburcio 80, Rio De Janeiro, Brazil
关键词
Complex network analysis; Link prediction; Binary classification; Feature selection;
D O I
10.1007/s10115-017-1121-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For the last years, a considerable amount of attention has been devoted to the research about the link prediction (LP) problem in complex networks. This problem tries to predict the likelihood of an association between two not interconnected nodes in a network to appear in the future. One of the most important approaches to the LP problem is based on supervised machine learning (ML) techniques for classification. Although many works have presented promising results with this approach, choosing the set of features (variables) to train the classifiers is still a major challenge. In this article, we report on the effects of three different automatic variable selection strategies (Forward, Backward and Evolutionary) applied to the feature-based supervised learning approach in LP applications. The results of the experiments show that the use of these strategies does lead to better classification models than classifiers built with the complete set of variables. Such experiments were performed over three datasets (Microsoft Academic Network, Amazon and Flickr) that contained more than twenty different features each, including topological and domain-specific ones. We also describe the specification and implementation of the process used to support the experiments. It combines the use of the feature selection strategies, six different classification algorithms (SVM, K-NN, na < ve Bayes, CART, random forest and multilayer perceptron) and three evaluation metrics (Precision, F-Measure and Area Under the Curve). Moreover, this process includes a novel ML voting committee inspired approach that suggests sets of features to represent data in LP applications. It mines the log of the experiments in order to identify sets of features frequently selected to produce classification models with high performance. The experiments showed interesting correlations between frequently selected features and datasets.
引用
收藏
页码:85 / 121
页数:37
相关论文
共 50 条
  • [21] Integrative Clustering and Supervised Feature Selection for Clinical Applications
    Xin, Bowen
    Xu, Chongrui
    Wang, Linlin
    Dong, Taotao
    Zheng, Chaojie
    Wang, Xiuying
    2018 15TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2018, : 1316 - 1320
  • [22] Machine-learning models for activity class prediction: A comparative study of feature selection and classification algorithms
    Chong, Joana
    Tjurin, Petra
    Niemela, Maisa
    Jamsa, Timo
    Farrahi, Vahid
    GAIT & POSTURE, 2021, 89 : 45 - 53
  • [23] LINK PREDICTION IN SOCIAL NETWORK BY SNA AND SUPERVISED LEARNING
    Limsaiprom, Prajit
    Tantatsanawong, Panjai
    2011 INTERNATIONAL CONFERENCE ON MECHANICAL ENGINEERING AND TECHNOLOGY (ICMET 2011), 2011, : 765 - 770
  • [24] Link Prediction in Social Network by SNA and Supervised Learning
    Limsaiprom, Prajit
    Tantatsanawong, Panjai
    2011 INTERNATIONAL CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND AUTOMATION (CCCA 2011), VOL I, 2010, : 474 - 479
  • [25] Employment Prediction Models Based on Weighted Feature Selection and Semi-Supervised Machine Learning
    Department of Computer, Hebei University of Water Resources and Electric Engineering, Cangzhou
    061000, China
    不详
    061000, China
    不详
    35900, Malaysia
    J. Network Intell., 2024, 2 (971-987): : 971 - 987
  • [26] Supervised Learning Using Community Detection for Link Prediction
    Kerkache, Mohamed Hassen
    Sadeg-Belkacem, Lamia
    Tayeb, Fatima Benbouzid-Si
    Ali, Amri
    ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2022, 513 : 85 - 94
  • [27] GROUP-WISE FEATURE SELECTION FOR SUPERVISED LEARNING
    Xiao, Qi
    Li, Hebi
    Tian, Jin
    Wang, Zhengdao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3149 - 3153
  • [28] A novel framework for online supervised learning with feature selection
    Sun, Lizhe
    Wang, Mingyuan
    Zhu, Siquan
    Barbu, Adrian
    JOURNAL OF NONPARAMETRIC STATISTICS, 2024,
  • [29] Supervised feature selection by self -paced learning regression
    Gan, Jiangzhang
    Wen, Guoqiu
    Yu, Hao
    Zheng, Wei
    Lei, Cong
    PATTERN RECOGNITION LETTERS, 2020, 132 : 30 - 37
  • [30] A Supervised Learning Approach to Link Prediction in Dynamic Networks
    Xu, Shuai
    Han, Kai
    Xu, Naiting
    WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS (WASA 2018), 2018, 10874 : 799 - 805