Automatic feature selection for supervised learning in link prediction applications: a comparative study

被引：43

作者：

Pecli, Antonio ^{[1
]}

Cavalcanti, Maria Claudia ^{[2
]}

Goldschmidt, Ronaldo ^{[1
]}

机构：

[1] Mil Inst Engn IME, Praca Gen Tiburcio 80, Rio De Janeiro, Brazil

[2] Mil Inst Engn IME, Comp Engn Dept, Praca Gen Tiburcio 80, Rio De Janeiro, Brazil

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2018年 / 56卷 / 01期

关键词：

Complex network analysis; Link prediction; Binary classification; Feature selection;

D O I：

10.1007/s10115-017-1121-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For the last years, a considerable amount of attention has been devoted to the research about the link prediction (LP) problem in complex networks. This problem tries to predict the likelihood of an association between two not interconnected nodes in a network to appear in the future. One of the most important approaches to the LP problem is based on supervised machine learning (ML) techniques for classification. Although many works have presented promising results with this approach, choosing the set of features (variables) to train the classifiers is still a major challenge. In this article, we report on the effects of three different automatic variable selection strategies (Forward, Backward and Evolutionary) applied to the feature-based supervised learning approach in LP applications. The results of the experiments show that the use of these strategies does lead to better classification models than classifiers built with the complete set of variables. Such experiments were performed over three datasets (Microsoft Academic Network, Amazon and Flickr) that contained more than twenty different features each, including topological and domain-specific ones. We also describe the specification and implementation of the process used to support the experiments. It combines the use of the feature selection strategies, six different classification algorithms (SVM, K-NN, na < ve Bayes, CART, random forest and multilayer perceptron) and three evaluation metrics (Precision, F-Measure and Area Under the Curve). Moreover, this process includes a novel ML voting committee inspired approach that suggests sets of features to represent data in LP applications. It mines the log of the experiments in order to identify sets of features frequently selected to produce classification models with high performance. The experiments showed interesting correlations between frequently selected features and datasets.

引用

页码：85 / 121

页数：37

共 50 条

[41] Supervised Machine Learning and Feature Selection for a Document Analysis Application
Pope, James
Powers, Daniel
Connell, J. A.
Jasemi, Milad
Taylor, David
Fafoutis, Xenofon
ICPRAM: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2020, : 415 - 424
[42] Effective Feature Selection for Supervised Learning Using Genetic Algorithm
Glaris, T. Hilda
Rajalaxmi, R. R.
2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 909 - 914
[43] Binary Label Learning for Semi-Supervised Feature Selection
Shi, Dan
Zhu, Lei
Li, Jingjing
Cheng, Zhiyong
Liu, Zhenguang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (03) : 2299 - 2312
[44] The Application of Feature Selection under Supervised Learning in Liquid Recognition
SongQing
LuoYuan
ZouCunwei
LiJie
CEIS 2011, 2011, 15
[45] Semi-Supervised Feature Selection with Adaptive Graph Learning
Jiang B.-B.
He W.-D.
Wu X.-Y.
Xiang J.-H.
Hong L.-B.
Sheng W.-G.
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (07): : 1643 - 1652
[46] Change-point detection with supervised learning and feature selection
Eruhimov, Victor
Martyanov, Vladimir
Tuv, Eugene
Runger, George C.
ICINCO 2007: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL ICSO: INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION, 2007, : 359 - +
[47] Supervised learning in automatic channel selection for epileptic seizure detection
Nhan Duy Truong
Kuhlmann, Levin
Bonyadi, Mohammad Reza
Yang, Jiawei
Faulks, Andrew
Kavehei, Omid
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 86 : 199 - 207
[48] Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction
Shi, Ping
Ray, Surajit
Zhu, Qifu
Kon, Mark A.
BMC BIOINFORMATICS, 2011, 12
[49] Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction
Ping Shi
Surajit Ray
Qifu Zhu
Mark A Kon
BMC Bioinformatics, 12
[50] Combining feature selection, feature learning and ensemble learning for software fault prediction
Hung Duy Tran
Le Thi My Hanh
Nguyen Thanh Binh
PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 78 - 85

← 1 2 3 4 5 →