Large-scale multi-label ensemble learning on Spark

被引:12
|
作者
Gonzalez-Lopez, Jorge [1 ]
Cano, Alberto [1 ]
Ventura, Sebastian [2 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
[2] Univ Cordoba, Dept Comp Sci, Cordoba, Spain
关键词
Multi-label learning; Ensemble learning; Distributed computing; Apache Spark; Big data; MAPREDUCE; PERFORMANCE;
D O I
10.1109/Trustcom/BigDataSE/ICESS.2017.328
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-label learning is a challenging problem which has received growing attention in the research community over the last years. Hence, there is a growing demand of effective and scalable multi-label learning methods for larger datasets both in terms of number of instances and numbers of output labels. The use of ensemble classifiers is a popular approach for improving multi-label model accuracy, especially for datasets with high-dimensional label spaces. However, the increasing computational complexity of the algorithms in such ever-growing high dimensional label spaces, requires new approaches to manage data effectively and efficiently in distributed computing environments. Spark is a framework based on MapReduce, a distributed programming model that offers a robust paradigm to handle large-scale datasets in a cluster of nodes. This paper focuses on multi-label ensembles and proposes a number of implementations through the use of parallel and distributed computing using Spark. Additionally, five different implementations are proposed and the impact on the performance of the ensemble is analyzed. The experimental study shows the benefits of using distributed implementations over the traditional single-node single-thread execution, in terms of performance over multiple metrics as well as significant speedup tested on 29 benchmark datasets.
引用
收藏
页码:893 / 900
页数:8
相关论文
共 50 条
  • [31] Multi-label Pixelwise Classification for Reconstruction of Large-scale Urban Areas
    He, Yuanlie
    Mudur, Sudhir
    Poullis, Charalambos
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), 2018, : 195 - 203
  • [32] Deep Supervised Hashing for Multi-Label and Large-Scale Image Retrieval
    Wu, Dayan
    Lin, Zheng
    Li, Bo
    Ye, Mingzhen
    Wang, Weiping
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 155 - 163
  • [33] DyLas: A dynamic label alignment strategy for large-scale multi-label text classification
    Ren, Lin
    Liu, Yongbin
    Ouyang, Chunping
    Yu, Ying
    Zhou, Shuda
    He, Yidong
    Wan, Yaping
    INFORMATION FUSION, 2025, 120
  • [34] Extreme multi-label learning : A large scale classification approach in machine learning
    Prajapati, Purvi
    Thakkar, Amit
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (04): : 983 - 1001
  • [35] Speeding up k-Nearest Neighbors classifier for large-scale multi-label learning on GPUs
    Skryjomski, Przemyslaw
    Krawczyk, Bartosz
    Cano, Alberto
    NEUROCOMPUTING, 2019, 354 : 10 - 19
  • [36] Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning
    Wu, Baoyuan
    Chen, Weidong
    Fan, Yanbo
    Zhang, Yong
    Hou, Jinlong
    Liu, Jie
    Zhang, Tong
    IEEE ACCESS, 2019, 7 : 172683 - 172693
  • [37] Parallel Learning of Large-scale Multi-Label Classification Problems with Min-Max Modular LIBLINEAR
    Chen, Yangyang
    Lu, Bao-Liang
    Zhao, Hai
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [38] Discrete semi-supervised learning for multi-label image classification and large-scale image retrieval
    He, Lang
    Xie, Liang
    Shu, Haohao
    Hu, Shengyuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24519 - 24537
  • [39] Discrete semi-supervised learning for multi-label image classification and large-scale image retrieval
    Lang He
    Liang Xie
    Haohao Shu
    Shengyuan Hu
    Multimedia Tools and Applications, 2019, 78 : 24519 - 24537
  • [40] An Improved Multi-label Classification Ensemble Learning Algorithm
    Fu, Zhongliang
    Wang, Lili
    Zhang, Danpu
    PATTERN RECOGNITION (CCPR 2014), PT I, 2014, 483 : 243 - 252