Improving Instance Selection Methods for Big Data Classification

被引:0
|
作者
Malhat, Mohamed [1 ]
El Menshawy, Mohamed [1 ]
Mousa, Hamdy [1 ]
El Sisi, Ashraf [1 ]
机构
[1] Menoufia Univ, Fac Comp & Informat, Comp Sci Dept, Shibin Al Kawm, Egypt
关键词
Big data; Data Mining; Data Reduction; Instance Selection; REDUCTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosion of data in many application domains leads to a new term called big data. While the big data volume rapidly exceeds, the capacity and processing capabilities of contributed data mining algorithms are not effective. The instance selection methods become a mandatory step prior to applying data mining algorithms. Instance selection methods scale training set down by eliminating redundant, erroneous, and unrelated instances. Recently, instance selection methods have improved to work on big data sets by splitting training data into disjoint subsets and applying instance selection methods on individual subsets. However, these improved methods have a variable performance in the degree of reduction rate and classification accuracy. In this work, we propose an operational and unified framework to balance between reduction rate and classification accuracy. It starts with splitting a training set into class-balanced subsets to analyze the impact of the splitting process on the performance regarding the reduction rate and classification accuracy. It then applies two different instance selection methods on each subset. We implement and test experimentally the framework using two standard data sets. With the random splitting process as a benchmark, the results prove that the class-balanced splitting process is preferred regarding the classification accuracy criterion. The results also depict that the combination of two instance selection methods remarkably reduces the performance variability.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 50 条
  • [1] Evidential instance selection for K-nearest neighbor classification of big data
    Gong, Chaoyu
    Su, Zhi-gang
    Wang, Pei-hong
    Wang, Qian
    You, Yang
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 138 : 123 - 144
  • [2] Instance selection of linear complexity for big data
    Arnaiz-Gonzalez, Alvar
    Diez-Pastor, Jose-Francisco
    Rodriguez, Juan J.
    Garcia-Osorio, Cesar
    KNOWLEDGE-BASED SYSTEMS, 2016, 107 : 83 - 95
  • [3] LSIS: Large Scale Instance Selection Algorithm for Big Data
    Marone, Reine Marie
    Camara, Fode
    Ndiaye, Samba
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2353 - 2356
  • [4] Exploring Performance of Instance Selection Methods in Text Sentiment Classification
    Onan, Aytug
    Korukoglu, Serdar
    ARTIFICIAL INTELLIGENCE PERSPECTIVES IN INTELLIGENT SYSTEMS, VOL 1, 2016, 464 : 167 - 179
  • [5] Instance Selection Techniques for Multiple Instance Classification
    Branikas, Efstathios
    Papastergiou, Thomas
    Zacharaki, Evangelia, I
    Megalooikonomou, Vasileios
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS (IISA), 2019, : 46 - 52
  • [6] MR-DIS: democratic instance selection for big data by MapReduce
    Arnaiz-González Á.
    González-Rogel A.
    Díez-Pastor J.-F.
    López-Nozal C.
    Progress in Artificial Intelligence, 2017, 6 (3) : 211 - 219
  • [7] A review on feature selection methods for improving the performance of classification in educational data mining
    Zaffar M.
    Hashmani M.A.
    Savita K.S.
    Khan S.A.
    International Journal of Information Technology and Management, 2021, 20 (1-2): : 110 - 131
  • [8] Cluster-Based Instance Selection for the Imbalanced Data Classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200
  • [9] Simultaneous instance and feature selection for improving prediction in special education data
    Villuendas-Rey, Yenny
    Rey-Benguria, Carmen
    Lytras, Miltiadis
    Yanez-Marquez, Cornelio
    Camacho-Nieto, Oscar
    PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS, 2017, 51 (03) : 278 - 297
  • [10] Instance selection based on sample entropy for efficient data classification with ELM
    Wang, Xizhao
    Miao, Qing
    Zhai, Mengyao
    Zhai, Junhai
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 970 - 974