Large-scale multi-label ensemble learning on Spark

被引：12

作者：

Gonzalez-Lopez, Jorge ^{[1
]}

Cano, Alberto ^{[1
]}

Ventura, Sebastian ^{[2
]}

机构：

[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA

[2] Univ Cordoba, Dept Comp Sci, Cordoba, Spain

来源：

2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS | 2017年

关键词：

Multi-label learning; Ensemble learning; Distributed computing; Apache Spark; Big data; MAPREDUCE; PERFORMANCE;

D O I：

10.1109/Trustcom/BigDataSE/ICESS.2017.328

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-label learning is a challenging problem which has received growing attention in the research community over the last years. Hence, there is a growing demand of effective and scalable multi-label learning methods for larger datasets both in terms of number of instances and numbers of output labels. The use of ensemble classifiers is a popular approach for improving multi-label model accuracy, especially for datasets with high-dimensional label spaces. However, the increasing computational complexity of the algorithms in such ever-growing high dimensional label spaces, requires new approaches to manage data effectively and efficiently in distributed computing environments. Spark is a framework based on MapReduce, a distributed programming model that offers a robust paradigm to handle large-scale datasets in a cluster of nodes. This paper focuses on multi-label ensembles and proposes a number of implementations through the use of parallel and distributed computing using Spark. Additionally, five different implementations are proposed and the impact on the performance of the ensemble is analyzed. The experimental study shows the benefits of using distributed implementations over the traditional single-node single-thread execution, in terms of performance over multiple metrics as well as significant speedup tested on 29 benchmark datasets.

引用

页码：893 / 900

页数：8

共 50 条

[41] Fast Random k-Labelsets for Large-Scale Multi-Label Classification
Kimura, Keigo
Kudo, Mineichi
Sun, Lu
Koujaku, Sadamori
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 438 - 443
[42] Parallel Multi-label Propagation for Overlapping Community Detection in Large-Scale Networks
Li, Rongrong
Guo, Wenzhong
Guo, Kun
Qiu, Qirong
MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, MIWAI 2015, 2015, 9426 : 351 - 362
[43] A Large-Scale English Multi-Label Twitter Dataset for Online Abuse Detection
Salawu, Semiu
Lumsden, Jo
He, Yulan
WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS, 2021, : 146 - 156
[44] Multi-label Selective Ensemble
Li, Nan
Jiang, Yuan
Zhou, Zhi-Hua
MULTIPLE CLASSIFIER SYSTEMS (MCS 2015), 2015, 9132 : 76 - 88
[45] Large-Scale Learning with AdaGrad on Spark
Hadgu, Asmelash Teka
Nigam, Aastha
Diaz-Aviles, Ernesto
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2828 - 2830
[46] MLCE: A Multi-Label Crotch Ensemble Method for Multi-Label Classification
Yao, Yuan
Li, Yan
Ye, Yunming
Li, Xutao
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (04)
[47] Hierarchical Graph Transformer-Based Deep Learning Model for Large-Scale Multi-Label Text Classification
Gong, Jibing
Teng, Zhiyong
Teng, Qi
Zhang, Hekai
Du, Linfeng
Chen, Shuai
Bhuiyan, Md Zakirul Alam
Li, Jianhua
Liu, Mingsheng
Ma, Hongyuan
IEEE ACCESS, 2020, 8 : 30885 - 30896
[48] Predicting drug side effects by multi-label learning and ensemble learning
Zhang, Wen
Liu, Feng
Luo, Longqiang
Zhang, Jingxia
BMC BIOINFORMATICS, 2015, 16
[49] Predicting drug side effects by multi-label learning and ensemble learning
Wen Zhang
Feng Liu
Longqiang Luo
Jingxia Zhang
BMC Bioinformatics, 16
[50] EnML: Multi-label Ensemble Learning for Urdu Text Classification
Mehmood, Faiza
Shahzadi, Rehab
Ghafoor, Hina
Asim, Muhammad Nabeel
Ghani, Muhammad Usman
Mahmood, Waqar
Dengel, Andreas
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)

← 1 2 3 4 5 →