Data mining in large databases using domain generalization graphs

被引:20
|
作者
Hilderman, RJ [1 ]
Hamilton, HJ
Cercone, N
机构
[1] Univ Regina, Dept Comp Sci, Regina, SK S4S 0A2, Canada
[2] Univ Waterloo, Fac Math, Dept Comp Sci, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
data mining; knowledge discovery; machine learning; knowledge representation; attribute-oriented generalization; domain generalization graphs;
D O I
10.1023/A:1008769516670
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.
引用
收藏
页码:195 / 234
页数:40
相关论文
共 50 条
  • [1] Data Mining in Large Databases Using Domain Generalization Graphs
    Robert J. Hilderman
    Howard J. Hamilton
    Nick Cercone
    Journal of Intelligent Information Systems, 1999, 13 : 195 - 234
  • [2] Methods of data mining and knowledge generalization in large databases
    Vagin, VN
    Fedotov, AA
    Fomin, MV
    JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 1999, 38 (05) : 714 - 727
  • [3] Spatio-temporal data mining with expected distribution domain generalization graphs
    Hamilton, HJ
    Geng, L
    Findlater, L
    Randall, DJ
    TIME-ICTL 2003: 10TH INTERNATIONAL SYMPOSIUM ON TEMPORAL REPRESENTATION AND REASONING AND FOURTH INTERNATIONAL CONFERENCE ON TEMPORAL LOGIC, PROCEEDINGS, 2003, : 181 - 191
  • [4] Data mining on large graphs
    Palmer, CR
    Gibbons, PB
    Faloutsos, C
    DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS, 2003, : 265 - 286
  • [5] Generalization for calendar attributes using domain generalization graphs
    Randall, DJ
    Hamilton, HJ
    Hilderman, RJ
    FIFTH INTERNATIONAL WORKSHOP ON TEMPORAL REPRESENTATION AND REASONING - PROCEEDINGS: TIME-98, 1998, : 177 - 184
  • [6] Handling large databases in data mining
    Owrang, MM
    CHALLENGES OF INFORMATION TECHNOLOGY MANAGEMENT IN THE 21ST CENTURY, 2000, : 121 - 125
  • [7] Data mining by attribute generalization with fuzzy hierarchies in fuzzy databases
    Petry, Frederick E.
    Zhao, Lei
    FUZZY SETS AND SYSTEMS, 2009, 160 (15) : 2206 - 2223
  • [8] Temporal generalization with domain generalization graphs
    Randall, DJ
    Hamliton, HJ
    Hilderman, RJ
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 1999, 13 (02) : 195 - 217
  • [9] Using databases and data mining in vaccinology
    Davies, Matthew N.
    Guan, Pingping
    Blythe, Martin J.
    Salomon, Jesper
    Toseland, Christopher P.
    Hattotuwagama, Channa
    Walshe, Valerie
    Doytchinova, Irini A.
    Flower, Darren R.
    EXPERT OPINION ON DRUG DISCOVERY, 2007, 2 (01) : 19 - 35
  • [10] A new data clustering approach for data mining in large databases
    Tsai, CF
    Wu, HC
    Tsai, CW
    I-SPAN'02: INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2002, : 315 - 320