Parallel attribute reduction in high-dimensional data: An efficient MapReduce strategy with fuzzy discernibility matrix

被引:0
|
作者
Sowkuntla, Pandu [1 ]
Prasad, P. S. V. S. Sai [2 ]
机构
[1] SRM Univ AP, Dept Comp Sci & Engn, Amaravati 522502, Andhra Pradesh, India
[2] Univ Hyderabad, Sch Comp & Informat Sci, Hyderabad 500046, Telangana, India
关键词
Apache spark; Attribute reduction; Fuzzy-rough sets; Fuzzy discernibility matrix; High dimensionality; Hybrid decision systems; INCREMENTAL FEATURE-SELECTION; ROUGH; MODEL; SETS;
D O I
10.1016/j.asoc.2025.112870
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The hybrid paradigm of fuzzy-rough set theory, which combines fuzzy and rough sets, has proven effective in attribute reduction for hybrid decision systems encompassing both numerical and categorical attributes. However, current parallel/distributed approaches are limited to handling datasets with either categorical or numerical attributes and often rely on fuzzy dependency measures. There exists little research on parallel/distributed attribute reduction for large-scale hybrid decision systems. The challenge of handling high-dimensional data in hybrid decision systems necessitates efficient distributed computing techniques to ensure scalability and performance. MapReduce, a widely used framework for distributed processing, provides an organized approach to handling large-scale data. Despite its potential, there is a noticeable lack of attribute reduction techniques that leverage MapReduce's capabilities with a fuzzy discernibility matrix, which can significantly improve the efficiency of processing high-dimensional hybrid datasets. This paper introduces a vertically partitioned fuzzy discernibility matrix within the MapReduce computation model to address the high dimensionality of hybrid datasets. The proposed MapReduce strategy for attribute reduction minimizes data movement during the shuffle and sort phase, overcoming limitations present in existing approaches. Furthermore, the method's efficiency is enhanced by integrating a feature known as SAT-region removal, which removes matrix entries that satisfy the maximum satisfiability conditions during the attribute reduction process. Extensive experimental analysis validates the proposed method, demonstrating its superior performance compared to recent parallel/distributed methods in attribute reduction.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Efficient Dimensionality Reduction for High-Dimensional Network Estimation
    Celik, Safiye
    Logsdon, Benjamin A.
    Lee, Su-In
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1953 - 1961
  • [32] Fuzzy nearest neighbor clustering of high-dimensional data
    Wang, HB
    Yu, YQ
    Zhou, DR
    Meng, B
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572
  • [33] Data-Efficient Dimensionality Reduction and Surrogate Modeling of High-Dimensional Stress Fields
    Samaddar, Anirban
    Ravi, Sandipp Krishnan
    Ramachandra, Nesar
    Luan, Lele
    Madireddy, Sandeep
    Bhaduri, Anindya
    Pandita, Piyush
    Sun, Changjie
    Wang, Liping
    JOURNAL OF MECHANICAL DESIGN, 2025, 147 (03)
  • [34] Efficient parallel processing of high-dimensional spatial kNN queries
    Jiang, Tao
    Zhang, Bin
    Lin, Dan
    Gao, Yunjun
    Li, Qing
    SOFT COMPUTING, 2022, 26 (22) : 12291 - 12316
  • [35] Efficient parallel processing of high-dimensional spatial kNN queries
    Tao Jiang
    Bin Zhang
    Dan Lin
    Yunjun Gao
    Qing Li
    Soft Computing, 2022, 26 : 12291 - 12316
  • [36] ASKIT: AN EFFICIENT, PARALLEL LIBRARY FOR HIGH-DIMENSIONAL KERNEL SUMMATIONS
    March, William B.
    Xiao, Bo
    Yu, Chenhan D.
    Biros, George
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2016, 38 (05): : S720 - S749
  • [37] An efficient algorithm for the parallel solution of high-dimensional differential equations
    Klus, Stefan
    Sahai, Tuhin
    Liu, Cong
    Dellnitz, Michael
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2011, 235 (09) : 3053 - 3062
  • [38] On fuzzy feature selection in designing fuzzy classifiers for high-dimensional data
    Mansoori E.G.
    Shafiee K.S.
    Evol. Syst., 4 (255-265): : 255 - 265
  • [39] Attribute Compartmentation and Greedy UCC Discovery for High-Dimensional Data Anonymisation
    Podlesny, Nikolai J.
    Kayem, Anne V. D. M.
    Meinel, Christoph
    PROCEEDINGS OF THE NINTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY (CODASPY '19), 2019, : 109 - 119
  • [40] Testing the Mean Matrix in High-Dimensional Transposable Data
    Touloumis, Anestis
    Tavare, Simon
    Marioni, John C.
    BIOMETRICS, 2015, 71 (01) : 157 - 166