Merging of Numerical Intervals in Entropy-Based Discretization

被引:3
|
作者
Grzymala-Busse, Jerzy W. [1 ,2 ]
Mroczek, Teresa [2 ]
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland
关键词
data mining; discretization; numerical attributes; entropy; CONTINUOUS ATTRIBUTES; PREDICTION;
D O I
10.3390/e20110880
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
As previous research indicates, a multiple-scanning methodology for discretization of numerical datasets, based on entropy, is very competitive. Discretization is a process of converting numerical values of the data records into discrete values associated with numerical intervals defined over the domains of the data records. In multiple-scanning discretization, the last step is the merging of neighboring intervals in discretized datasets as a kind of postprocessing. Our objective is to check how the error rate, measured by tenfold cross validation within the C4.5 system, is affected by such merging. We conducted experiments on 17 numerical datasets, using the same setup of multiple scanning, with three different options for merging: no merging at all, merging based on the smallest entropy, and merging based on the biggest entropy. As a result of the Friedman rank sum test (5% significance level) we concluded that the differences between all three approaches are statistically insignificant. There is no universally best approach. Then, we repeated all experiments 30 times, recording averages and standard deviations. The test of the difference between averages shows that, for a comparison of no merging with merging based on the smallest entropy, there are statistically highly significant differences (with a 1% significance level). In some cases, the smaller error rate is associated with no merging, in some cases the smaller error rate is associated with merging based on the smallest entropy. A comparison of no merging with merging based on the biggest entropy showed similar results. So, our final conclusion was that there are highly significant differences between no merging and merging, depending on the dataset. The best approach should be chosen by trying all three approaches.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Entropy-based image merging
    German, A
    Jenkin, MR
    Lespérance, Y
    2nd Canadian Conference on Computer and Robot Vision, Proceedings, 2005, : 81 - 85
  • [2] Entropy-based discretization methods for ranking data
    de Sa, Claudio Rebelo
    Soares, Carlos
    Knobbe, Arno
    INFORMATION SCIENCES, 2016, 329 : 921 - 936
  • [3] Evolutionary Optimization Guided by Entropy-Based Discretization
    Sheri, Guleng
    Corne, David W.
    APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2009, 5484 : 695 - 704
  • [4] Biometric recognition using entropy-based discretization
    Kumar, Ajay
    Zhang, David
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 125 - +
  • [5] Reduced Data Sets and Entropy-Based Discretization
    Grzymala-Busse, Jerzy W.
    Hippe, Zdzislaw S.
    Mroczek, Teresa
    ENTROPY, 2019, 21 (11)
  • [6] Online entropy-based discretization for data streaming classification
    Ramirez-Gallego, S.
    Garcia, S.
    Herrera, F.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 59 - 70
  • [7] An entropy-based discretization method for classification rules with inconsistency checking
    Li, RP
    Wang, ZO
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 243 - 246
  • [8] Hand-geometry recognition using entropy-based discretization
    Kumar, Ajay
    Zhang, David
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2007, 2 (02) : 181 - 187
  • [9] Forecasting Stock Price Based on Fuzzy Time-Series with Entropy-Based Discretization Partitioning
    Chen, Bo-Tsuen
    Chen, Mu-Yen
    Chiang, Hsiu-Sen
    Chen, Chia-Chen
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II: 15TH INTERNATIONAL CONFERENCE, KES 2011, 2011, 6882 : 382 - 391
  • [10] Hybrid Fuzzy Genetics-based Machine Learning with Entropy-based Inhomogeneous Interval Discretization
    Takahashi, Yuji
    Nojima, Yusuke
    Ishibuchi, Hisao
    2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 1512 - 1517