Data mining in large databases using domain generalization graphs

被引：20

作者：

Hilderman, RJ ^{[1
]}

Hamilton, HJ

Cercone, N

机构：

[1] Univ Regina, Dept Comp Sci, Regina, SK S4S 0A2, Canada

[2] Univ Waterloo, Fac Math, Dept Comp Sci, Waterloo, ON N2L 3G1, Canada

来源：

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS | 1999年 / 13卷 / 03期

基金：

加拿大自然科学与工程研究理事会;

关键词：

data mining; knowledge discovery; machine learning; knowledge representation; attribute-oriented generalization; domain generalization graphs;

D O I：

10.1023/A:1008769516670

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.

引用

页码：195 / 234

页数：40

共 50 条

[41] The role of domain knowledge in a large scale Data Mining project
Kopanas, I
Avouris, NM
Daskalaki, S
METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2002, 2308 : 288 - 299
[42] Compact in-memory representation of large graph databases for efficient mining of maximal frequent sub graphs
Lakshmi, K.
Meyyappan, T.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (03):
[43] Domain Generalization with Small Data
Chen, Kecheng
Gal, Elena
Yan, Hong
Li, Haoliang
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3172 - 3190
[44] Integrating data mining with SQL databases: OLE DB for data mining
Netz, A
Chaudhuri, S
Fayyad, U
Bernhardt, J
17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 379 - 387
[45] Data mining of software development databases
Khoshgoftaar, TM
Allen, EB
Jones, WD
Hudepohl, JP
SOFTWARE QUALITY JOURNAL, 2001, 9 (03) : 161 - 176
[46] Data Mining of Software Development Databases
Taghi M. Khoshgoftaar
Edward B. Allen
Wendell D. Jones
John P. Hudepohl
Software Quality Journal, 2001, 9 : 161 - 176
[47] Gene expression databases and data mining
Anderle, P
Duval, M
Draghici, S
Kuklin, A
Littlejohn, TG
Medrano, JE
Vilanova, D
Roberts, MA
BIOTECHNIQUES, 2003, : 36 - 44
[48] Data Mining in Multimodal Medical Databases
Strungaru, Rodica
Ungureanu, G. Mihaela
Murri, Roberto
Pasqualli, Clara
Seidel, Klaus
Datcu, Mihai
Stanciu, Radu
INTEGRATING BIOMEDICAL INFORMATION: FROM E-CELL TO E-PATIENT, 2006, : 85 - +
[49] DATA-MINING CHESS DATABASES
Bleicher, E.
Haworth, G. Mc C.
van der Heijden, H. M. J. F.
ICGA JOURNAL, 2010, 33 (04) : 212 - 214
[50] Data mining and modeling in scientific databases
Kapetanios, E
Norrie, MC
NINTH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 1997, : 24 - 27

← 1 2 3 4 5 →