Parallel attribute reduction in high-dimensional data: An efficient MapReduce strategy with fuzzy discernibility matrix

被引:0
|
作者
Sowkuntla, Pandu [1 ]
Prasad, P. S. V. S. Sai [2 ]
机构
[1] SRM Univ AP, Dept Comp Sci & Engn, Amaravati 522502, Andhra Pradesh, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Telangana, India
关键词
Apache spark; Attribute reduction; Fuzzy-rough sets; Fuzzy discernibility matrix; High dimensionality; Hybrid decision systems; INCREMENTAL FEATURE-SELECTION; ROUGH; MODEL; SETS;
D O I
10.1016/j.asoc.2025.112870
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The hybrid paradigm of fuzzy-rough set theory, which combines fuzzy and rough sets, has proven effective in attribute reduction for hybrid decision systems encompassing both numerical and categorical attributes. However, current parallel/distributed approaches are limited to handling datasets with either categorical or numerical attributes and often rely on fuzzy dependency measures. There exists little research on parallel/distributed attribute reduction for large-scale hybrid decision systems. The challenge of handling high-dimensional data in hybrid decision systems necessitates efficient distributed computing techniques to ensure scalability and performance. MapReduce, a widely used framework for distributed processing, provides an organized approach to handling large-scale data. Despite its potential, there is a noticeable lack of attribute reduction techniques that leverage MapReduce's capabilities with a fuzzy discernibility matrix, which can significantly improve the efficiency of processing high-dimensional hybrid datasets. This paper introduces a vertically partitioned fuzzy discernibility matrix within the MapReduce computation model to address the high dimensionality of hybrid datasets. The proposed MapReduce strategy for attribute reduction minimizes data movement during the shuffle and sort phase, overcoming limitations present in existing approaches. Furthermore, the method's efficiency is enhanced by integrating a feature known as SAT-region removal, which removes matrix entries that satisfy the maximum satisfiability conditions during the attribute reduction process. Extensive experimental analysis validates the proposed method, demonstrating its superior performance compared to recent parallel/distributed methods in attribute reduction.
引用
收藏
页数:16
相关论文
共 50 条
  • [11] Parallel Attribute Reduction Algorithm for Complex Heterogeneous Data Using MapReduce
    Zhang, Tengfei
    Ma, Fumin
    Cao, Jie
    Peng, Chen
    Yue, Dong
    COMPLEXITY, 2018,
  • [12] Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Tang, Mingjie
    Yu, Yongyang
    Aref, Walid G.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2113 - 2114
  • [13] Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Tang, Mingjie
    Yu, Yongyang
    Aref, Walid G.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1838 - 1851
  • [14] A novel approach of rough set-based attribute reduction using fuzzy discernibility matrix
    Yang, Ming
    Chen, Songcan
    Yang, Xubing
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 96 - 101
  • [15] A Distributed Attribute Reduction Algorithm for High-Dimensional Data under the Spark Framework
    Wu, Zhengjiang
    Mei, Qiuyu
    Zhang, Yaning
    Yang, Tian
    Luo, Junwei
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2022, 15 (01)
  • [16] A Distributed Attribute Reduction Algorithm for High-Dimensional Data under the Spark Framework
    Zhengjiang Wu
    Qiuyu Mei
    Yaning Zhang
    Tian Yang
    Junwei Luo
    International Journal of Computational Intelligence Systems, 15
  • [17] Efficient indexing of high-dimensional data through dimensionality reduction
    Goh, CH
    Lim, A
    Ooi, BC
    Tan, KL
    DATA & KNOWLEDGE ENGINEERING, 2000, 32 (02) : 115 - 130
  • [18] Compressed binary discernibility matrix based incremental attribute reduction algorithm for group dynamic data
    Ma, Fumin
    Ding, Mianwei
    Zhang, Tengfei
    Cao, Jie
    NEUROCOMPUTING, 2019, 344 : 20 - 27
  • [19] PaMPa-HD: a Parallel MapReduce-based frequent Pattern miner for High-Dimensional data
    Apiletti, Daniele
    Baralis, Elena
    Cerquitelli, Tania
    Garza, Paolo
    Pulvirenti, Fabio
    Michiardi, Pietro
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 839 - 846
  • [20] Parallel coordinate order for high-dimensional data
    Tilouche, Shaima
    Partovi Nia, Vahid
    Bassetto, Samuel
    STATISTICAL ANALYSIS AND DATA MINING, 2021, 14 (05) : 501 - 515