Parallel attribute reduction in high-dimensional data: An efficient MapReduce strategy with fuzzy discernibility matrix

被引:0
|
作者
Sowkuntla, Pandu [1 ]
Prasad, P. S. V. S. Sai [2 ]
机构
[1] SRM Univ AP, Dept Comp Sci & Engn, Amaravati 522502, Andhra Pradesh, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Telangana, India
关键词
Apache spark; Attribute reduction; Fuzzy-rough sets; Fuzzy discernibility matrix; High dimensionality; Hybrid decision systems; INCREMENTAL FEATURE-SELECTION; ROUGH; MODEL; SETS;
D O I
10.1016/j.asoc.2025.112870
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The hybrid paradigm of fuzzy-rough set theory, which combines fuzzy and rough sets, has proven effective in attribute reduction for hybrid decision systems encompassing both numerical and categorical attributes. However, current parallel/distributed approaches are limited to handling datasets with either categorical or numerical attributes and often rely on fuzzy dependency measures. There exists little research on parallel/distributed attribute reduction for large-scale hybrid decision systems. The challenge of handling high-dimensional data in hybrid decision systems necessitates efficient distributed computing techniques to ensure scalability and performance. MapReduce, a widely used framework for distributed processing, provides an organized approach to handling large-scale data. Despite its potential, there is a noticeable lack of attribute reduction techniques that leverage MapReduce's capabilities with a fuzzy discernibility matrix, which can significantly improve the efficiency of processing high-dimensional hybrid datasets. This paper introduces a vertically partitioned fuzzy discernibility matrix within the MapReduce computation model to address the high dimensionality of hybrid datasets. The proposed MapReduce strategy for attribute reduction minimizes data movement during the shuffle and sort phase, overcoming limitations present in existing approaches. Furthermore, the method's efficiency is enhanced by integrating a feature known as SAT-region removal, which removes matrix entries that satisfy the maximum satisfiability conditions during the attribute reduction process. Extensive experimental analysis validates the proposed method, demonstrating its superior performance compared to recent parallel/distributed methods in attribute reduction.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix
    Sowkuntla, Pandu
    Prasad, P. S. V. S. Sai
    APPLIED INTELLIGENCE, 2022, 52 (01) : 154 - 173
  • [2] MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix
    Pandu Sowkuntla
    P. S. V. S. Sai Prasad
    Applied Intelligence, 2022, 52 : 154 - 173
  • [3] Parallel attribute reduction algorithm for unlabeled data based on fuzzy discernibility matrix and soft deletion behavior
    Wen, Haotong
    Xu, Yi
    Liang, Meishe
    INFORMATION SCIENCES, 2025, 689
  • [4] Efficient attribute reduction based on discernibility matrix
    Xu, Zhangyan
    Zhang, Chengqi
    Zhang, Shichao
    Song, Wei
    Yang, Bingru
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2007, 4481 : 13 - +
  • [5] Efficient attribute reduction algorithm by modificatory discernibility matrix
    Cai, Weidong
    Li, Fan
    Xu, Zhangyan
    Yang, Bingru
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2007, 35 (09): : 110 - 113
  • [6] Discernibility Matrix Based Attribute Reduction in Intuitionistic Fuzzy Decision Systems
    Feng, Qinrong
    Li, Rui
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, 2013, 8170 : 147 - 156
  • [7] Discernibility matrix based incremental attribute reduction for dynamic data
    Wei, Wei
    Wu, Xiaoying
    Liang, Jiye
    Cui, Junbiao
    Sun, Yijun
    KNOWLEDGE-BASED SYSTEMS, 2018, 140 : 142 - 157
  • [8] Efficient dimension reduction for high-dimensional matrix-valued data
    Wang, Dong
    Shen, Haipeng
    Truong, Young
    NEUROCOMPUTING, 2016, 190 : 25 - 34
  • [9] Parallel similarity joins on massive high-dimensional data using MapReduce
    Ma, Youzhong
    Meng, Xiaofeng
    Wang, Shaoya
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (01): : 166 - 183
  • [10] PHiDJ: Parallel Similarity Self-Join for High-Dimensional Vector Data with MapReduce
    Fries, Sergej
    Boden, Brigitte
    Stepien, Grzegorz
    Seidl, Thomas
    2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 796 - 807