Automatic feature selection for supervised learning in link prediction applications: a comparative study

被引:43
|
作者
Pecli, Antonio [1 ]
Cavalcanti, Maria Claudia [2 ]
Goldschmidt, Ronaldo [1 ]
机构
[1] Mil Inst Engn IME, Praca Gen Tiburcio 80, Rio De Janeiro, Brazil
[2] Mil Inst Engn IME, Comp Engn Dept, Praca Gen Tiburcio 80, Rio De Janeiro, Brazil
关键词
Complex network analysis; Link prediction; Binary classification; Feature selection;
D O I
10.1007/s10115-017-1121-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For the last years, a considerable amount of attention has been devoted to the research about the link prediction (LP) problem in complex networks. This problem tries to predict the likelihood of an association between two not interconnected nodes in a network to appear in the future. One of the most important approaches to the LP problem is based on supervised machine learning (ML) techniques for classification. Although many works have presented promising results with this approach, choosing the set of features (variables) to train the classifiers is still a major challenge. In this article, we report on the effects of three different automatic variable selection strategies (Forward, Backward and Evolutionary) applied to the feature-based supervised learning approach in LP applications. The results of the experiments show that the use of these strategies does lead to better classification models than classifiers built with the complete set of variables. Such experiments were performed over three datasets (Microsoft Academic Network, Amazon and Flickr) that contained more than twenty different features each, including topological and domain-specific ones. We also describe the specification and implementation of the process used to support the experiments. It combines the use of the feature selection strategies, six different classification algorithms (SVM, K-NN, na < ve Bayes, CART, random forest and multilayer perceptron) and three evaluation metrics (Precision, F-Measure and Area Under the Curve). Moreover, this process includes a novel ML voting committee inspired approach that suggests sets of features to represent data in LP applications. It mines the log of the experiments in order to identify sets of features frequently selected to produce classification models with high performance. The experiments showed interesting correlations between frequently selected features and datasets.
引用
收藏
页码:85 / 121
页数:37
相关论文
共 50 条
  • [41] Supervised Machine Learning and Feature Selection for a Document Analysis Application
    Pope, James
    Powers, Daniel
    Connell, J. A.
    Jasemi, Milad
    Taylor, David
    Fafoutis, Xenofon
    ICPRAM: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2020, : 415 - 424
  • [42] Effective Feature Selection for Supervised Learning Using Genetic Algorithm
    Glaris, T. Hilda
    Rajalaxmi, R. R.
    2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 909 - 914
  • [43] Binary Label Learning for Semi-Supervised Feature Selection
    Shi, Dan
    Zhu, Lei
    Li, Jingjing
    Cheng, Zhiyong
    Liu, Zhenguang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (03) : 2299 - 2312
  • [44] The Application of Feature Selection under Supervised Learning in Liquid Recognition
    SongQing
    LuoYuan
    ZouCunwei
    LiJie
    CEIS 2011, 2011, 15
  • [45] Semi-Supervised Feature Selection with Adaptive Graph Learning
    Jiang B.-B.
    He W.-D.
    Wu X.-Y.
    Xiang J.-H.
    Hong L.-B.
    Sheng W.-G.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (07): : 1643 - 1652
  • [46] Change-point detection with supervised learning and feature selection
    Eruhimov, Victor
    Martyanov, Vladimir
    Tuv, Eugene
    Runger, George C.
    ICINCO 2007: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL ICSO: INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION, 2007, : 359 - +
  • [47] Supervised learning in automatic channel selection for epileptic seizure detection
    Nhan Duy Truong
    Kuhlmann, Levin
    Bonyadi, Mohammad Reza
    Yang, Jiawei
    Faulks, Andrew
    Kavehei, Omid
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 86 : 199 - 207
  • [48] Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction
    Shi, Ping
    Ray, Surajit
    Zhu, Qifu
    Kon, Mark A.
    BMC BIOINFORMATICS, 2011, 12
  • [49] Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction
    Ping Shi
    Surajit Ray
    Qifu Zhu
    Mark A Kon
    BMC Bioinformatics, 12
  • [50] Combining feature selection, feature learning and ensemble learning for software fault prediction
    Hung Duy Tran
    Le Thi My Hanh
    Nguyen Thanh Binh
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 78 - 85