Performance Improvement Validation of Decision Tree Algorithms with Non-normalized Information Distance in Experiments

被引:0
|
作者
Araki, Takeru [1 ]
Luo, Yuan [1 ]
Guo, Minyi [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
关键词
Decision tree; ID3; algorithm; Information distance; Information gain; Gain ratio;
D O I
10.1007/978-3-031-20862-1_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of ID3 algorithm in decision tree depends on the information gain but it has a drawback because of tending to select attributes with many values as the branching attributes. The gain ratio (especially in C4.5) is proposed to improve the information gain, but it does not always improve the performance, nor is it always defined. Some scientists use normalized information distance to improve the gain ratio, however, it is ineffective. In this paper, we investigate two non-normalized information distance selection criteria to replace the information gain and the gain ratio and conduct detailed experiments on 13 datasets classified into four types with theoretical analysis. Surprisingly, on the datasets where the number of values of each attribute differ greatly i.e. in Type1 and Type2, non-normalized information distance-based algorithms can increase the accuracy of about 15-25% of ID3 algorithm. The first reason is that more values for an attribute does not reduce the distances, which is suggested by Mantaras. The second reason is that the conditional entropy which is the opposite one used in the information gain can bring balance to the multi-valued biased values. Furthermore, our methods can maintain results comparable to those of existing algorithms on other cases. Compared to the gain ratio, the algorithms with non-normalized information distances conquer the drawback much better on Type1 datasets, which is strongly confirmed by experiments and corresponding analysis. It can be presumed that "normalization" improvement methods such as normalized information distance and the gain ratio are not always effective.
引用
收藏
页码:450 / 464
页数:15
相关论文
共 50 条
  • [21] An information entropy based splitting criterion better for the Data Mining Decision Tree algorithms
    Badulescu, Laviniu Aurelian
    2018 22ND INTERNATIONAL CONFERENCE ON SYSTEM THEORY, CONTROL AND COMPUTING (ICSTCC), 2018, : 535 - 540
  • [22] Predicting Student Performance Using Decision Tree Classifiers and Information Gain
    Guleria, Pratiyush
    Thakur, Niveditta
    Sood, Manu
    2014 INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2014, : 126 - 129
  • [23] Performance Improvement of Decision Tree: A Robust Classifier Using Tabu Search Algorithm
    Hafeez, Muhammad Asfand
    Rashid, Muhammad
    Tariq, Hassan
    Ul Abideen, Zain
    Alotaibi, Saud S.
    Sinky, Mohammed H.
    APPLIED SCIENCES-BASEL, 2021, 11 (15):
  • [24] Classification Performance Analysis of Decision Tree-Based Algorithms with Noisy Class Variable
    Alharbi, Abdulmajeed Atiah
    DISCRETE DYNAMICS IN NATURE AND SOCIETY, 2024, 2024
  • [25] Applications of python']python to evaluate the performance of decision tree-based boosting algorithms
    Kadiyala, Akhil
    Kumar, Ashok
    ENVIRONMENTAL PROGRESS & SUSTAINABLE ENERGY, 2018, 37 (02) : 618 - 623
  • [26] Improving performance of decision tree algorithms with multi-edited nearest neighbor rule
    Ye, CZ
    Yang, J
    Yao, LX
    Chen, NY
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 394 - 398
  • [27] Performance Improvement of Satellite Navigation System Using Inter-satellite Distance Information
    Moonseok Choi
    Jongsun Ahn
    Sangkyung Sung
    Jaegyu Jang
    Young Jae Lee
    International Journal of Aeronautical and Space Sciences, 2018, 19 : 470 - 477
  • [28] Enhancing the performance of decision tree-based packet classification algorithms using CPU cluster
    Mahdi Abbasi
    Aazad Shokrollahi
    Cluster Computing, 2020, 23 : 3203 - 3219
  • [29] Enhancing the performance of decision tree-based packet classification algorithms using CPU cluster
    Abbasi, Mahdi
    Shokrollahi, Aazad
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (04): : 3203 - 3219
  • [30] Performance Improvement of Satellite Navigation System Using Inter-satellite Distance Information
    Choi, Moonseok
    Ahn, Jongsun
    Sung, Sangkyung
    Jang, Jaegyu
    Lee, Young Jae
    INTERNATIONAL JOURNAL OF AERONAUTICAL AND SPACE SCIENCES, 2018, 19 (02) : 470 - 477