Merging of Numerical Intervals in Entropy-Based Discretization

被引:3
|
作者
Grzymala-Busse, Jerzy W. [1 ,2 ]
Mroczek, Teresa [2 ]
机构
[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland
关键词
data mining; discretization; numerical attributes; entropy; CONTINUOUS ATTRIBUTES; PREDICTION;
D O I
10.3390/e20110880
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
As previous research indicates, a multiple-scanning methodology for discretization of numerical datasets, based on entropy, is very competitive. Discretization is a process of converting numerical values of the data records into discrete values associated with numerical intervals defined over the domains of the data records. In multiple-scanning discretization, the last step is the merging of neighboring intervals in discretized datasets as a kind of postprocessing. Our objective is to check how the error rate, measured by tenfold cross validation within the C4.5 system, is affected by such merging. We conducted experiments on 17 numerical datasets, using the same setup of multiple scanning, with three different options for merging: no merging at all, merging based on the smallest entropy, and merging based on the biggest entropy. As a result of the Friedman rank sum test (5% significance level) we concluded that the differences between all three approaches are statistically insignificant. There is no universally best approach. Then, we repeated all experiments 30 times, recording averages and standard deviations. The test of the difference between averages shows that, for a comparison of no merging with merging based on the smallest entropy, there are statistically highly significant differences (with a 1% significance level). In some cases, the smaller error rate is associated with no merging, in some cases the smaller error rate is associated with merging based on the smallest entropy. A comparison of no merging with merging based on the biggest entropy showed similar results. So, our final conclusion was that there are highly significant differences between no merging and merging, depending on the dataset. The best approach should be chosen by trying all three approaches.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Entropy-Based Financial Asset Pricing
    Ormos, Mihaly
    Zibriczky, David
    PLOS ONE, 2014, 9 (12):
  • [42] Entropy-based adaptive attitude estimation
    Kiani, Maryam
    Barzegar, Aylin
    Pourtakdoust, Seid H.
    ACTA ASTRONAUTICA, 2018, 144 : 271 - 282
  • [43] Entropy-Based Localization of Textured Regions
    Lo Presti, Liliana
    La Cascia, Marco
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2011, PT I, 2011, 6978 : 616 - 625
  • [44] An Entropy-Based Technique for Conferences Ranking
    Majeed, Fiaz
    Ul Haq, Rana Azhar
    DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2019, VOL 1, 2020, 1042 : 229 - 239
  • [45] ENTROPY-BASED SELECTION WITH MULTIPLE OBJECTIVES
    BARRON, FH
    SCHMIDT, CP
    NAVAL RESEARCH LOGISTICS, 1988, 35 (06) : 643 - 654
  • [46] An entropy-based investigation into the variability of precipitation
    Mishra, Ashok K.
    Ozger, Mehmet
    Singh, Vijay P.
    JOURNAL OF HYDROLOGY, 2009, 370 (1-4) : 139 - 154
  • [47] Entropy-Based Covariance Determinant Estimation
    De Cabrera, Ferran
    Riba, Jaume
    Vazquez, Gregori
    2017 IEEE 18TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (SPAWC), 2017,
  • [48] Entropy-Based Anomaly Detection in a Network
    Ajay Shankar Shukla
    Rohit Maurya
    Wireless Personal Communications, 2018, 99 : 1487 - 1501
  • [49] Entropy-based artificial dissipation as a corrective mechanism for numerical stability in convective heat transfer
    Ogban, Peter U.
    Naterer, Greg F.
    NUMERICAL HEAT TRANSFER PART B-FUNDAMENTALS, 2023, 84 (01) : 1 - 23
  • [50] Entropy-based learning of sensing matrices
    Parthasarathy, Gayatri
    Abhilash, G.
    IET SIGNAL PROCESSING, 2019, 13 (07) : 650 - 660