Automatic feature selection for supervised learning in link prediction applications: a comparative study

被引:43
|
作者
Pecli, Antonio [1 ]
Cavalcanti, Maria Claudia [2 ]
Goldschmidt, Ronaldo [1 ]
机构
[1] Mil Inst Engn IME, Praca Gen Tiburcio 80, Rio De Janeiro, Brazil
[2] Mil Inst Engn IME, Comp Engn Dept, Praca Gen Tiburcio 80, Rio De Janeiro, Brazil
关键词
Complex network analysis; Link prediction; Binary classification; Feature selection;
D O I
10.1007/s10115-017-1121-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For the last years, a considerable amount of attention has been devoted to the research about the link prediction (LP) problem in complex networks. This problem tries to predict the likelihood of an association between two not interconnected nodes in a network to appear in the future. One of the most important approaches to the LP problem is based on supervised machine learning (ML) techniques for classification. Although many works have presented promising results with this approach, choosing the set of features (variables) to train the classifiers is still a major challenge. In this article, we report on the effects of three different automatic variable selection strategies (Forward, Backward and Evolutionary) applied to the feature-based supervised learning approach in LP applications. The results of the experiments show that the use of these strategies does lead to better classification models than classifiers built with the complete set of variables. Such experiments were performed over three datasets (Microsoft Academic Network, Amazon and Flickr) that contained more than twenty different features each, including topological and domain-specific ones. We also describe the specification and implementation of the process used to support the experiments. It combines the use of the feature selection strategies, six different classification algorithms (SVM, K-NN, na < ve Bayes, CART, random forest and multilayer perceptron) and three evaluation metrics (Precision, F-Measure and Area Under the Curve). Moreover, this process includes a novel ML voting committee inspired approach that suggests sets of features to represent data in LP applications. It mines the log of the experiments in order to identify sets of features frequently selected to produce classification models with high performance. The experiments showed interesting correlations between frequently selected features and datasets.
引用
收藏
页码:85 / 121
页数:37
相关论文
共 50 条
  • [1] Automatic feature selection for supervised learning in link prediction applications: a comparative study
    Antonio Pecli
    Maria Claudia Cavalcanti
    Ronaldo Goldschmidt
    Knowledge and Information Systems, 2018, 56 : 85 - 121
  • [2] Feature Selection for Link Prediction
    Xu, Ye
    Rockmore, Dan
    PROCEEDINGS OF THE 5TH PH.D. WORKSHOP ON INFORMATION AND KNOWLEDGE, 2012, : 25 - 32
  • [3] Feature Selection in Supervised Saliency Prediction
    Liang, Ming
    Hu, Xiaolin
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (05) : 900 - 912
  • [4] Heterogeneous Fault Prediction Using Feature Selection and Supervised Learning Algorithms
    Arora, Rashmi
    Kaur, Arvinder
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2022, 09 (03) : 261 - 284
  • [5] A Comparative Study for Breast Cancer Prediction using Machine Learning and Feature Selection
    Dhanya, R.
    Paul, Irene Rose
    Akula, Sai Sindhu
    Sivakumar, Madhumathi
    Nair, Jyothisha J.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1049 - 1055
  • [6] Feature Selection for Supervised Learning and Compression
    Taylor, Phillip
    Griffiths, Nathan
    Hall, Vince
    Xu, Zhou
    Mouzakitis, Alex
    APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [7] Feature Selection in Software Defect Prediction: A Comparative Study
    Kakkar, Misha
    Jain, Sarika
    2016 6TH INTERNATIONAL CONFERENCE - CLOUD SYSTEM AND BIG DATA ENGINEERING (CONFLUENCE), 2016, : 658 - 663
  • [8] A comparative study of supervised/unsupervised machine learning algorithms with feature selection approaches to predict student performance
    Hamoud, Alaa Khalaf
    Alasady, Ali Salah
    Awadh, Wid Akeel
    Dahr, Jasim Mohammed
    Kamel, Mohammed B. M.
    Humadi, Aqeel Majeed
    Najm, Ihab Ahmed
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2023, 15 (04) : 393 - 409
  • [9] Constraint scores for semi-supervised feature selection: A comparative study
    Kalakech, Mariam
    Biela, Philippe
    Macaire, Ludovic
    Hamad, Denis
    PATTERN RECOGNITION LETTERS, 2011, 32 (05) : 656 - 665
  • [10] SENSITIVITY BASED GENERALIZATION ERROR FOR SUPERVISED LEARNING PROBLEM WITH APPLICATIONS IN MODEL SELECTION AND FEATURE SELECTION
    Yeung, Daniel S.
    ICINCO 2009: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL 3, 2009, : IS5 - IS5