Improving Instance Selection Methods for Big Data Classification

被引:0
|
作者
Malhat, Mohamed [1 ]
El Menshawy, Mohamed [1 ]
Mousa, Hamdy [1 ]
El Sisi, Ashraf [1 ]
机构
[1] Menoufia Univ, Fac Comp & Informat, Comp Sci Dept, Shibin Al Kawm, Egypt
关键词
Big data; Data Mining; Data Reduction; Instance Selection; REDUCTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosion of data in many application domains leads to a new term called big data. While the big data volume rapidly exceeds, the capacity and processing capabilities of contributed data mining algorithms are not effective. The instance selection methods become a mandatory step prior to applying data mining algorithms. Instance selection methods scale training set down by eliminating redundant, erroneous, and unrelated instances. Recently, instance selection methods have improved to work on big data sets by splitting training data into disjoint subsets and applying instance selection methods on individual subsets. However, these improved methods have a variable performance in the degree of reduction rate and classification accuracy. In this work, we propose an operational and unified framework to balance between reduction rate and classification accuracy. It starts with splitting a training set into class-balanced subsets to analyze the impact of the splitting process on the performance regarding the reduction rate and classification accuracy. It then applies two different instance selection methods on each subset. We implement and test experimentally the framework using two standard data sets. With the random splitting process as a benchmark, the results prove that the class-balanced splitting process is preferred regarding the classification accuracy criterion. The results also depict that the combination of two instance selection methods remarkably reduces the performance variability.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 50 条
  • [31] Simple Incremental Instance Selection Wrapper for Classification
    Grochowski, Marek
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 64 - 72
  • [32] Feature selection methods and genomic big data: a systematic review
    Tadist, Khawla
    Najah, Said
    Nikolov, Nikola S.
    Mrabti, Fatiha
    Zahi, Azeddine
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [33] Proposal of big data route selection methods for autonomous vehicles
    Reddig, Klaudia
    Dikunow, Blazej
    Krzykowska, Karolina
    INTERNET TECHNOLOGY LETTERS, 2018, 1 (05):
  • [34] Feature selection methods and genomic big data: a systematic review
    Khawla Tadist
    Said Najah
    Nikola S. Nikolov
    Fatiha Mrabti
    Azeddine Zahi
    Journal of Big Data, 6
  • [35] Advancing methods in big data capture, integration, classification and liberation
    Zdravevski, Eftim
    Pires, Ivan Miguel
    BMC RESEARCH NOTES, 2023, 16 (01)
  • [36] Advancing methods in big data capture, integration, classification and liberation
    Eftim Zdravevski
    Ivan Miguel Pires
    BMC Research Notes, 16
  • [37] Joint feature and instance selection using manifold data criteria: application to image classification
    Fadi Dornaika
    Artificial Intelligence Review, 2021, 54 : 1735 - 1765
  • [38] Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification
    Guan, Sheng
    Chen, Min
    Ha, Hsin-Yu
    Chen, Shu-Ching
    Shyu, Mei-Ling
    Zhang, Chengde
    2015 IEEE CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC), 2015, : 288 - 295
  • [39] Joint feature and instance selection using manifold data criteria: application to image classification
    Dornaika, Fadi
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1735 - 1765
  • [40] Aided Selection of Sampling Methods for Imbalanced Data Classification
    Sahni, Deep
    Pappu, Satya Jayadev
    Bhatt, Nirav
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 198 - 202