Mining significant association rules from uncertain data

被引:0
|
作者
Anshu Zhang
Wenzhong Shi
Geoffrey I. Webb
机构
[1] The Hong Kong Polytechnic University,Department of Land Surveying and Geo
[2] Monash University,Informatics
来源
关键词
Pattern discovery; Association rules; Statistical evaluation; Uncertain data;
D O I
暂无
中图分类号
学科分类号
摘要
In association rule mining, the trade-off between avoiding harmful spurious rules and preserving authentic ones is an ever critical barrier to obtaining reliable and useful results. The statistically sound technique for evaluating statistical significance of association rules is superior in preventing spurious rules, yet can also cause severe loss of true rules in presence of data error. This study presents a new and improved method for statistical test on association rules with uncertain erroneous data. An original mathematical model was established to describe data error propagation through computational procedures of the statistical test. Based on the error model, a scheme combining analytic and simulative processes was designed to correct the statistical test for distortions caused by data error. Experiments on both synthetic and real-world data show that the method significantly recovers the loss in true rules (reduces type-2 error) due to data error occurring in original statistically sound method. Meanwhile, the new method maintains effective control over the familywise error rate, which is the distinctive advantage of the original statistically sound technique. Furthermore, the method is robust against inaccurate data error probability information and situations not fulfilling the commonly accepted assumption on independent error probabilities of different data items. The method is particularly effective for rules which were most practically meaningful yet sensitive to data error. The method proves promising in enhancing values of association rule mining results and helping users make correct decisions.
引用
收藏
页码:928 / 963
页数:35
相关论文
共 50 条
  • [31] Mining Local Association Rules from Temporal Data Set
    Mazarbhuiya, Fokrul Alom
    Abulaish, Muhammad
    Mahanta, Anjana Kakoti
    Ahmad, Tanvir
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 255 - +
  • [32] Mining association rules from XML data with index table
    Li, Xin-Ye
    Yuan, Jin-Sha
    Kong, Ying-Hui
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3905 - 3910
  • [33] A Sampling Based Algorithm for Finding Association Rules from Uncertain Data
    Zhu Qian
    Pan Donghua
    Yang Guangfei
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2010, 6319 : 124 - 131
  • [34] Mining significant crisp-fuzzy spatial association rules
    Shi, Wenzhong
    Zhang, Anshu
    Webb, Geoffrey I.
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2018, 32 (06) : 1247 - 1270
  • [35] Mining significant fuzzy association rules with differential evolution algorithm
    Zhang, Anshu
    Shi, Wenzhong
    APPLIED SOFT COMPUTING, 2020, 97
  • [36] Neural Network Based Association Rule Mining from Uncertain Data
    Mansha, Sameen
    Babar, Zaheer
    Kamiran, Faisal
    Karim, Asim
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV, 2016, 9950 : 129 - 136
  • [37] Role of sampling in data mining for association rules
    Jeragh, M
    Mehrotra, KG
    IC-AI'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS I-III, 2001, : 483 - 489
  • [38] Mining Multilevel Association Rules on RFID data
    Kim, Younghee
    Kim, Ungmo
    2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2009, : 46 - 50
  • [39] Evaluation of sampling for data mining of association rules
    Zaki, MJ
    Parthasarathy, S
    Li, W
    Ogihara, M
    SEVENTH INTERNATIONAL WORKSHOP ON RESEARCH ISSUES IN DATA ENGINEERING, PROCEEDINGS: HIGH PERFORMANCE DATABASE MANAGEMENT FOR LARGE-SCALE APPLICATIONS, 1997, : 42 - 50
  • [40] Scalable parallel data mining for association rules
    Han, EH
    Karypis, G
    Kumar, V
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2000, 12 (03) : 337 - 352