Hashing Supported Iterative MapReduce Based Scalable SBE Reduct Computation

被引:4
|
作者
Divya, U. Venkata [1 ]
Prasad, P. S. V. S. Sai [2 ]
机构
[1] Quadrat Insights Pvt Ltd, Hyderabad, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad, India
关键词
Rough Sets; Reduct; Iterative MapReduce; Apache Spark; Scalable feature selection;
D O I
10.1007/978-3-319-72344-0_13
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Feature Selection plays a major role in preprocessing stage of Data mining and helps in model construction by recognizing relevant features. Rough Sets has emerged in recent years as an important paradigm for feature selection i.e. finding Reduct of conditional attributes in given data set. Two control strategies for Reduct Computation are Sequential Forward Selection (SFS), Sequential Backward Elimination(SBE). With the objective of scalable feature seletion, several MapReduce based approaches were proposed in literature. All these approaches are SFS based and results in super set of reduct i.e. with redundant attributes. Even though SBE approaches results in exact Reduct, it requires lot of data movement in shuffle and sort phase of MapReduce. To overcome this problem and to optimize the network bandwidth utilization, a novel hashing supported SBE Reduct algorithm(MRSBER Hash) is proposed in this work and implemented using Iterative MapReduce framework of Apache Spark. Experiments conducted on large benchmark decision systems have empirically established the relevance of proposed approach for decision systems with large cardinality of conditional attributes.
引用
收藏
页码:163 / 170
页数:8
相关论文
共 50 条
  • [21] A Scalable Similarity Join Algorithm Based on MapReduce and LSH
    Sébastien Rivault
    Mostafa Bamha
    Sébastien Limet
    Sophie Robert
    International Journal of Parallel Programming, 2022, 50 : 360 - 380
  • [22] MapReduce based improved quick reduct algorithm with granular refinement using vertical partitioning scheme
    Sowkuntla, Pandu
    Prasad, P. S. V. S. Sai
    KNOWLEDGE-BASED SYSTEMS, 2020, 189
  • [23] A new algorithm for reduct computation based on gap elimination and attribute contribution
    Rodriguez-Diez, Vladimir
    Martinez-Trinidad, Jose Fco.
    Carrasco-Ochoa, Jesus A.
    Lazo-Cortes, Manuel S.
    INFORMATION SCIENCES, 2018, 435 : 111 - 123
  • [24] Load Balancing in MapReduce Based on Scalable Cardinality Estimates
    Gufler, Benjamin
    Augsten, Nikolaus
    Reiser, Angelika
    Kemper, Alfons
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 522 - 533
  • [25] Scalable Fuzzy Rough Set Reduct Computation Using Fuzzy Min-Max Neural Network Preprocessing
    Kumar, Anil
    Prasad, P. S. V. S. Sai
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (05) : 953 - 964
  • [26] SEMI: A Scalable Entity Matching System Based on MapReduce
    Chao, Pingfu
    Li, Yuming
    Gao, Zhu
    Fang, Junhua
    He, Xiaofeng
    Zhang, Rong
    DATABASES THEORY AND APPLICATIONS, 2015, 9093 : 328 - 332
  • [27] A Scalable Similarity Join Algorithm Based on MapReduce and LSH
    Rivault, Sebastien
    Bamha, Mostafa
    Limet, Sebastien
    Robert, Sophie
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2022, 50 (3-4) : 360 - 380
  • [28] A MAPREDUCE BASED DISTRIBUTED LSI FOR SCALABLE INFORMATION RETRIEVAL
    Liu, Yang
    Li, Maozhen
    Khan, Mukhtaj
    Qi, Man
    COMPUTING AND INFORMATICS, 2014, 33 (02) : 259 - 280
  • [29] Scalable Multi-agent Simulation Based on MapReduce
    Ahlbrecht, Tobias
    Dix, Juergen
    Fiekas, Niklas
    MULTI-AGENT SYSTEMS AND AGREEMENT TECHNOLOGIES, EUMAS 2016, 2017, 10207 : 364 - 371
  • [30] Taming Computation Skews of Block-Oriented Iterative Scientific Applications in MapReduce Systems
    Yang, Xin
    Li, Min
    Yu, Ze
    Li, Xiaolin
    2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 176 - 183