Data mining in large databases using domain generalization graphs

被引：20

作者：

Hilderman, RJ ^{[1
]}

Hamilton, HJ

Cercone, N

机构：

[1] Univ Regina, Dept Comp Sci, Regina, SK S4S 0A2, Canada

[2] Univ Waterloo, Fac Math, Dept Comp Sci, Waterloo, ON N2L 3G1, Canada

来源：

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS | 1999年 / 13卷 / 03期

基金：

加拿大自然科学与工程研究理事会;

关键词：

data mining; knowledge discovery; machine learning; knowledge representation; attribute-oriented generalization; domain generalization graphs;

D O I：

10.1023/A:1008769516670

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.

引用

页码：195 / 234

页数：40

共 50 条

[31] Hypertext databases and data mining
Chakrabarti, S
SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 508 - 508
[32] Mining databases and data streams
Zaniolo, Carlo
Thakkar, Hetal
HOMELAND SECURITY TECHNOLOGY CHALLENGES: FROM SENSING AND ENCRYPTING TO MINING AND MODELING, 2008, : 103 - +
[33] Mining constrained gradients in large databases
Dong, GZ
Han, JW
Lam, JMW
Pei, JA
Wang, K
Zou, W
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (08) : 922 - 938
[34] Outlier Detection in Spatial Databases Using Clustering Data Mining
Karmaker, Amitava
Rahman, Syed M.
PROCEEDINGS OF THE 2009 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, VOLS 1-3, 2009, : 1657 - +
[35] Case mining from large databases
Yang, Q
Cheng, H
CASE-BASED REASONING RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2003, 2689 : 691 - 702
[36] An Algorithm for Mining Large Sequences in Databases
Bhasker, Bharat
INNOVATION AND KNOWLEDGE MANAGEMENT IN TWIN TRACK ECONOMIES: CHALLENGES & SOLUTIONS, VOLS 1-3, 2009, : 21 - 25
[37] Scaling mining algorithms to large databases
Bradley, P
Gehrke, J
Ramakrishnan, R
Srikant, R
COMMUNICATIONS OF THE ACM, 2002, 45 (08) : 38 - 43
[38] Data mining: Efficiency of using sequence databases for polymorphism discovery
Cox, DG
Boillot, C
Canzian, F
HUMAN MUTATION, 2001, 17 (02) : 141 - 150
[39] Probabilistic Mining in Large Transaction Databases
Anand, Hareendran S.
Chandra, S. S. Vinod
DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 486 - 494
[40] Learning from the data: Mining of large high-throughput screening databases
Yan, S. Frank
King, Frederick J.
He, Yun
Caldwell, Jeremy S.
Zhou, Yingyao
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2006, 46 (06) : 2381 - 2395

← 1 2 3 4 5 →