Parallel attribute reduction in high-dimensional data: An efficient MapReduce strategy with fuzzy discernibility matrix

被引:0
|
作者
Sowkuntla, Pandu [1 ]
Prasad, P. S. V. S. Sai [2 ]
机构
[1] SRM Univ AP, Dept Comp Sci & Engn, Amaravati 522502, Andhra Pradesh, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Telangana, India
关键词
Apache spark; Attribute reduction; Fuzzy-rough sets; Fuzzy discernibility matrix; High dimensionality; Hybrid decision systems; INCREMENTAL FEATURE-SELECTION; ROUGH; MODEL; SETS;
D O I
10.1016/j.asoc.2025.112870
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The hybrid paradigm of fuzzy-rough set theory, which combines fuzzy and rough sets, has proven effective in attribute reduction for hybrid decision systems encompassing both numerical and categorical attributes. However, current parallel/distributed approaches are limited to handling datasets with either categorical or numerical attributes and often rely on fuzzy dependency measures. There exists little research on parallel/distributed attribute reduction for large-scale hybrid decision systems. The challenge of handling high-dimensional data in hybrid decision systems necessitates efficient distributed computing techniques to ensure scalability and performance. MapReduce, a widely used framework for distributed processing, provides an organized approach to handling large-scale data. Despite its potential, there is a noticeable lack of attribute reduction techniques that leverage MapReduce's capabilities with a fuzzy discernibility matrix, which can significantly improve the efficiency of processing high-dimensional hybrid datasets. This paper introduces a vertically partitioned fuzzy discernibility matrix within the MapReduce computation model to address the high dimensionality of hybrid datasets. The proposed MapReduce strategy for attribute reduction minimizes data movement during the shuffle and sort phase, overcoming limitations present in existing approaches. Furthermore, the method's efficiency is enhanced by integrating a feature known as SAT-region removal, which removes matrix entries that satisfy the maximum satisfiability conditions during the attribute reduction process. Extensive experimental analysis validates the proposed method, demonstrating its superior performance compared to recent parallel/distributed methods in attribute reduction.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Comparing and Exploring High-Dimensional Data with Dimensionality Reduction Algorithms and Matrix Visualizations
    Cutura, Rene
    Aupetit, Michael
    Fekete, Jean-Daniel
    Sedlmair, Michael
    PROCEEDINGS OF THE WORKING CONFERENCE ON ADVANCED VISUAL INTERFACES AVI 2020, 2020,
  • [22] SeekAView: An Intelligent Dimensionality Reduction Strategy for Navigating High-Dimensional Data Spaces
    Krause, Josua
    Dasgupta, Aritra
    Fekete, Jean-Daniel
    Bertini, Enrico
    2016 IEEE 6TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2016, : 11 - 19
  • [23] Asynchronous Parallel Fuzzy Stochastic Gradient Descent for High-Dimensional Incomplete Data Representation
    Qin, Wen
    Luo, Xin
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (02) : 445 - 459
  • [24] streamingRPHash: Random Projection Clustering of High-Dimensional Data in a MapReduce Framework
    Franklin, Jacob
    Wenke, Samuel
    Quasem, Sadiq
    Carraher, Lee A.
    Wilsey, Philip A.
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 168 - 169
  • [25] High-Dimensional Data Clustering for Customers with Duplicate Attribute Values
    Wu, Sen
    Fu, Liwei
    2016 INTERNATIONAL CONFERENCE ON LOGISTICS, INFORMATICS AND SERVICE SCIENCES (LISS' 2016), 2016,
  • [26] Improved Model for Attribute Selection on High-Dimensional Economic Data
    Somol, Petr
    Pudil, Pavel
    Castek, Ondrej
    Pokorna, Jana
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON MANAGEMENT, LEADERSHIP AND GOVERNANCE (ICMLG 2014), 2014, : 276 - 285
  • [27] Efficient Learning on High-dimensional Operational Data
    Samani, Forough Shahab
    Zhang, Hongyi
    Stadler, Rolf
    2019 15TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2019,
  • [28] An Efficient Kernelized Fuzzy Possibilistic C-Means for High-Dimensional Data Clustering
    Shanmugapriya, B.
    Punithavalli, M.
    COMPUTATIONAL VISION AND ROBOTICS, 2015, 332 : 219 - 230
  • [29] Efficient Density Estimation for High-Dimensional Data
    Majdara, Aref
    Nooshabadi, Saeid
    IEEE ACCESS, 2022, 10 : 16592 - 16608
  • [30] Efficient Outlier Detection for High-Dimensional Data
    Liu, Huawen
    Li, Xuelong
    Li, Jiuyong
    Zhang, Shichao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461