Merging of Numerical Intervals in Entropy-Based Discretization

被引：3

作者：

Grzymala-Busse, Jerzy W. ^{[1
,2
]}

Mroczek, Teresa ^{[2
]}

机构：

[1] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA

[2] Univ Informat Technol & Management, Dept Expert Syst & Artificial Intelligence, PL-35225 Rzeszow, Poland

来源：

ENTROPY | 2018年 / 20卷 / 11期

关键词：

data mining; discretization; numerical attributes; entropy; CONTINUOUS ATTRIBUTES; PREDICTION;

D O I：

10.3390/e20110880

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

As previous research indicates, a multiple-scanning methodology for discretization of numerical datasets, based on entropy, is very competitive. Discretization is a process of converting numerical values of the data records into discrete values associated with numerical intervals defined over the domains of the data records. In multiple-scanning discretization, the last step is the merging of neighboring intervals in discretized datasets as a kind of postprocessing. Our objective is to check how the error rate, measured by tenfold cross validation within the C4.5 system, is affected by such merging. We conducted experiments on 17 numerical datasets, using the same setup of multiple scanning, with three different options for merging: no merging at all, merging based on the smallest entropy, and merging based on the biggest entropy. As a result of the Friedman rank sum test (5% significance level) we concluded that the differences between all three approaches are statistically insignificant. There is no universally best approach. Then, we repeated all experiments 30 times, recording averages and standard deviations. The test of the difference between averages shows that, for a comparison of no merging with merging based on the smallest entropy, there are statistically highly significant differences (with a 1% significance level). In some cases, the smaller error rate is associated with no merging, in some cases the smaller error rate is associated with merging based on the smallest entropy. A comparison of no merging with merging based on the biggest entropy showed similar results. So, our final conclusion was that there are highly significant differences between no merging and merging, depending on the dataset. The best approach should be chosen by trying all three approaches.

引用

页数：12

共 50 条

[31] Entropy-based Dyslalia Screening
Mahmut, Emilian-Erman
Della Ventura, Michele
Berian, Dorin
Stoicu-Tivadar, Vasile
HEALTH INFORMATICS VISION: FROM DATA VIA INFORMATION TO KNOWLEDGE, 2019, 262 : 252 - 255
[32] A Similarity Measurement with Entropy-Based Weighting for Clustering Mixed Numerical and Categorical Datasets
Que, Xia
Jiang, Siyuan
Yang, Jiaoyun
An, Ning
ALGORITHMS, 2021, 14 (06)
[33] Unsupervised Discretization Method based on Adjustable Intervals
Bennasar, Mohamed
Setchi, Rossitza
Hicks, Yulia
ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 79 - 87
[34] An entropy-based measure of founder informativeness
Reyes-Valdés, MH
Williams, CG
GENETICS RESEARCH, 2005, 85 (01) : 81 - 88
[35] An entropy-based metric for product remanufacturability
Ramoni M.O.
Zhang H.-C.
Journal of Remanufacturing, 2 (1)
[36] Entropy-Based Static Index Pruning
Zheng, Lei
Cox, Ingemar J.
ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 713 - 718
[37] Data Entropy-Based Imbalanced Learning
Fan, Yutao
Huang, Heming
RECENT ADVANCES IN NEXT-GENERATION DATA SCIENCE, SDSC 2024, 2024, 2158 : 95 - 109
[38] EWRPL: entropy-based weighted RPL
Kamble, Sneha
Chandavarkar, B. R.
WIRELESS NETWORKS, 2025, 31 (01) : 613 - 622
[39] Entropy-based fade modeling and detection
San Pedro Wandelmer, Jose
Dominguez Cabrerizo, Sergio
Denis, Nicolas
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2007, 23 (04) : 1265 - 1280
[40] Entropy-based operations on fuzzy sets
Rudas, IJ
Kaynak, MO
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1998, 6 (01) : 33 - 40

← 1 2 3 4 5 →