Merging of Numerical Intervals in Entropy-Based Discretization

被引:3
|
作者
Grzymala-Busse, Jerzy W. [1 ,2 ]
Mroczek, Teresa [2 ]
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland
关键词
data mining; discretization; numerical attributes; entropy; CONTINUOUS ATTRIBUTES; PREDICTION;
D O I
10.3390/e20110880
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
As previous research indicates, a multiple-scanning methodology for discretization of numerical datasets, based on entropy, is very competitive. Discretization is a process of converting numerical values of the data records into discrete values associated with numerical intervals defined over the domains of the data records. In multiple-scanning discretization, the last step is the merging of neighboring intervals in discretized datasets as a kind of postprocessing. Our objective is to check how the error rate, measured by tenfold cross validation within the C4.5 system, is affected by such merging. We conducted experiments on 17 numerical datasets, using the same setup of multiple scanning, with three different options for merging: no merging at all, merging based on the smallest entropy, and merging based on the biggest entropy. As a result of the Friedman rank sum test (5% significance level) we concluded that the differences between all three approaches are statistically insignificant. There is no universally best approach. Then, we repeated all experiments 30 times, recording averages and standard deviations. The test of the difference between averages shows that, for a comparison of no merging with merging based on the smallest entropy, there are statistically highly significant differences (with a 1% significance level). In some cases, the smaller error rate is associated with no merging, in some cases the smaller error rate is associated with merging based on the smallest entropy. A comparison of no merging with merging based on the biggest entropy showed similar results. So, our final conclusion was that there are highly significant differences between no merging and merging, depending on the dataset. The best approach should be chosen by trying all three approaches.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Entropy-based Dyslalia Screening
    Mahmut, Emilian-Erman
    Della Ventura, Michele
    Berian, Dorin
    Stoicu-Tivadar, Vasile
    HEALTH INFORMATICS VISION: FROM DATA VIA INFORMATION TO KNOWLEDGE, 2019, 262 : 252 - 255
  • [32] A Similarity Measurement with Entropy-Based Weighting for Clustering Mixed Numerical and Categorical Datasets
    Que, Xia
    Jiang, Siyuan
    Yang, Jiaoyun
    An, Ning
    ALGORITHMS, 2021, 14 (06)
  • [33] Unsupervised Discretization Method based on Adjustable Intervals
    Bennasar, Mohamed
    Setchi, Rossitza
    Hicks, Yulia
    ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 79 - 87
  • [34] An entropy-based measure of founder informativeness
    Reyes-Valdés, MH
    Williams, CG
    GENETICS RESEARCH, 2005, 85 (01) : 81 - 88
  • [35] An entropy-based metric for product remanufacturability
    Ramoni M.O.
    Zhang H.-C.
    Journal of Remanufacturing, 2 (1)
  • [36] Entropy-Based Static Index Pruning
    Zheng, Lei
    Cox, Ingemar J.
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 713 - 718
  • [37] Data Entropy-Based Imbalanced Learning
    Fan, Yutao
    Huang, Heming
    RECENT ADVANCES IN NEXT-GENERATION DATA SCIENCE, SDSC 2024, 2024, 2158 : 95 - 109
  • [38] EWRPL: entropy-based weighted RPL
    Kamble, Sneha
    Chandavarkar, B. R.
    WIRELESS NETWORKS, 2025, 31 (01) : 613 - 622
  • [39] Entropy-based fade modeling and detection
    San Pedro Wandelmer, Jose
    Dominguez Cabrerizo, Sergio
    Denis, Nicolas
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2007, 23 (04) : 1265 - 1280
  • [40] Entropy-based operations on fuzzy sets
    Rudas, IJ
    Kaynak, MO
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1998, 6 (01) : 33 - 40