Parallel attribute reduction in high-dimensional data: An efficient MapReduce strategy with fuzzy discernibility matrix

被引:0
|
作者
Sowkuntla, Pandu [1 ]
Prasad, P. S. V. S. Sai [2 ]
机构
[1] SRM Univ AP, Dept Comp Sci & Engn, Amaravati 522502, Andhra Pradesh, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Telangana, India
关键词
Apache spark; Attribute reduction; Fuzzy-rough sets; Fuzzy discernibility matrix; High dimensionality; Hybrid decision systems; INCREMENTAL FEATURE-SELECTION; ROUGH; MODEL; SETS;
D O I
10.1016/j.asoc.2025.112870
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The hybrid paradigm of fuzzy-rough set theory, which combines fuzzy and rough sets, has proven effective in attribute reduction for hybrid decision systems encompassing both numerical and categorical attributes. However, current parallel/distributed approaches are limited to handling datasets with either categorical or numerical attributes and often rely on fuzzy dependency measures. There exists little research on parallel/distributed attribute reduction for large-scale hybrid decision systems. The challenge of handling high-dimensional data in hybrid decision systems necessitates efficient distributed computing techniques to ensure scalability and performance. MapReduce, a widely used framework for distributed processing, provides an organized approach to handling large-scale data. Despite its potential, there is a noticeable lack of attribute reduction techniques that leverage MapReduce's capabilities with a fuzzy discernibility matrix, which can significantly improve the efficiency of processing high-dimensional hybrid datasets. This paper introduces a vertically partitioned fuzzy discernibility matrix within the MapReduce computation model to address the high dimensionality of hybrid datasets. The proposed MapReduce strategy for attribute reduction minimizes data movement during the shuffle and sort phase, overcoming limitations present in existing approaches. Furthermore, the method's efficiency is enhanced by integrating a feature known as SAT-region removal, which removes matrix entries that satisfy the maximum satisfiability conditions during the attribute reduction process. Extensive experimental analysis validates the proposed method, demonstrating its superior performance compared to recent parallel/distributed methods in attribute reduction.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] A novel attribute weighting algorithm for clustering high-dimensional categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    Cao, Fuyuan
    PATTERN RECOGNITION, 2011, 44 (12) : 2843 - 2861
  • [42] An efficient clustering method of data mining for high-dimensional data
    Chang, JW
    Kang, HM
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 273 - 278
  • [43] High-dimensional Data Dimension Reduction Based on KECA
    Hu, Yongde
    Pan, Jingchang
    Tan, Xin
    SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 1101 - 1104
  • [44] Dimensionality reduction for visualizing high-dimensional biological data
    Malepathirana, Tamasha
    Senanayake, Damith
    Vidanaarachchi, Rajith
    Gautam, Vini
    Halgamuge, Saman
    BIOSYSTEMS, 2022, 220
  • [45] Dimensionality Reduction for Registration of High-Dimensional Data Sets
    Xu, Min
    Chen, Hao
    Varshney, Pramod K.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3041 - 3049
  • [46] Accelerating high-dimensional clustering with lossless data reduction
    Qaqish, Bahjat F.
    O'Brien, Jonathon J.
    Hibbard, Jonathan C.
    Clowers, Katie J.
    BIOINFORMATICS, 2017, 33 (18) : 2867 - 2872
  • [47] Adaptive Dimensionality Reduction Method for High-dimensional Data
    Duan, Shuyong
    Yang, Jianhua
    Han, Xu
    Liu, Guirong
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2024, 60 (17): : 283 - 296
  • [48] Efficient feature selection filters for high-dimensional data
    Ferreira, Artur J.
    Figueiredo, Mario A. T.
    PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1794 - 1804
  • [49] Efficient quadratures for high-dimensional Bayesian data assimilation
    Cheng, Ming
    Wang, Peng
    Tartakovsky, Daniel M.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2024, 506
  • [50] Efficient Sparse Representation for Learning With High-Dimensional Data
    Chen, Jie
    Yang, Shengxiang
    Wang, Zhu
    Mao, Hua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4208 - 4222