Incremental Optimization Mechanism for Constructing a Decision Tree in Data Stream Mining

被引:16
|
作者
Yang, Hang [1 ]
Fong, Simon [1 ]
机构
[1] Univ Macau, Fac Sci & Technol, Dept Comp & Informat Sci, Taipa, Peoples R China
关键词
D O I
10.1155/2013/580397
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Imperfect data stream leads to tree size explosion and detrimental accuracy problems. Overfitting problem and the imbalanced class distribution reduce the performance of the original decision-tree algorithm for stream mining. In this paper, we propose an incremental optimization mechanism to solve these problems. The mechanism is called Optimized Very Fast Decision Tree (OVFDT) that possesses an optimized node-splitting control mechanism. Accuracy, tree size, and the learning time are the significant factors influencing the algorithm's performance. Naturally a bigger tree size takes longer computation time. OVFDT is a pioneer model equipped with an incremental optimization mechanism that seeks for a balance between accuracy and tree size for data stream mining. It operates incrementally by a test-then-train approach. Three types of functional tree leaves improve the accuracy with which the tree model makes a prediction for a new data stream in the testing phase. The optimized node-splitting mechanism controls the tree model growth in the training phase. The experiment shows that OVFDT obtains an optimal tree structure in both numeric and nominal datasets.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Constructing a decision tree for graph-structured data and its applications
    Geamsakul, W
    Yoshida, T
    Ohara, K
    Motoda, H
    Yokoi, H
    Takabayashi, K
    FUNDAMENTA INFORMATICAE, 2005, 66 (1-2) : 131 - 160
  • [42] Extremely Fast Decision Tree Mining for Evolving Data Streams
    Bifet, Albert
    Zhang, Jiajin
    Fan, Wei
    He, Cheng
    Zhang, Jianfeng
    Qian, Jianfeng
    Holmes, Geoff
    Pfahringer, Bernhard
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1733 - 1742
  • [43] Generalization and decision tree induction: Efficient classification in data mining
    Kamber, M
    Winstone, L
    Gong, W
    Cheng, S
    Han, JW
    SEVENTH INTERNATIONAL WORKSHOP ON RESEARCH ISSUES IN DATA ENGINEERING, PROCEEDINGS: HIGH PERFORMANCE DATABASE MANAGEMENT FOR LARGE-SCALE APPLICATIONS, 1997, : 111 - 120
  • [44] Research on the application of data mining algorithm based on decision tree
    Song, Liangong
    Metallurgical and Mining Industry, 2015, 7 (09): : 843 - 848
  • [45] Collective data mining in the ant colony decision tree approach
    Kozak, Jan
    Boryczka, Urszula
    INFORMATION SCIENCES, 2016, 372 : 126 - 147
  • [46] A hybrid decision tree/genetic algorithm method for data mining
    Carvalho, DR
    Freitas, AA
    INFORMATION SCIENCES, 2004, 163 (1-3) : 13 - 35
  • [47] Data Mining And Analysis Of Our Agriculture Based On The Decision Tree
    Gao Yi-yang
    Ren Nan-ping
    2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL II, 2009, : 134 - +
  • [48] Improving the Prediction Accuracy of Decision Tree Mining with Data Preprocessing
    Dept. of Computer Science, Kennesaw State University, Marietta
    GA, United States
    不详
    MD, United States
    Proc Int Comput Software Appl Conf, (481-484):
  • [49] Parallelism of spatial data mining based on autocorrelation decision tree
    Zhang Shuyu & Zhu ZhongyingDept. of Automation
    Journal of Systems Engineering and Electronics, 2005, (04) : 947 - 956
  • [50] Case study: Visualization for decision tree analysis in data mining
    Barlow, T
    Neville, P
    IEEE SYMPOSIUM ON INFORMATION VISUALIZATION 2001, PROCEEDINGS, 2001, : 149 - 152