Data mining in large databases using domain generalization graphs

被引:20
|
作者
Hilderman, RJ [1 ]
Hamilton, HJ
Cercone, N
机构
[1] Univ Regina, Dept Comp Sci, Regina, SK S4S 0A2, Canada
[2] Univ Waterloo, Fac Math, Dept Comp Sci, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
data mining; knowledge discovery; machine learning; knowledge representation; attribute-oriented generalization; domain generalization graphs;
D O I
10.1023/A:1008769516670
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attribute-oriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to user-defined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the Multi-Attribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generate-and-test approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.
引用
收藏
页码:195 / 234
页数:40
相关论文
共 50 条
  • [41] The role of domain knowledge in a large scale Data Mining project
    Kopanas, I
    Avouris, NM
    Daskalaki, S
    METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2002, 2308 : 288 - 299
  • [42] Compact in-memory representation of large graph databases for efficient mining of maximal frequent sub graphs
    Lakshmi, K.
    Meyyappan, T.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (03):
  • [43] Domain Generalization with Small Data
    Chen, Kecheng
    Gal, Elena
    Yan, Hong
    Li, Haoliang
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3172 - 3190
  • [44] Integrating data mining with SQL databases: OLE DB for data mining
    Netz, A
    Chaudhuri, S
    Fayyad, U
    Bernhardt, J
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 379 - 387
  • [45] Data mining of software development databases
    Khoshgoftaar, TM
    Allen, EB
    Jones, WD
    Hudepohl, JP
    SOFTWARE QUALITY JOURNAL, 2001, 9 (03) : 161 - 176
  • [46] Data Mining of Software Development Databases
    Taghi M. Khoshgoftaar
    Edward B. Allen
    Wendell D. Jones
    John P. Hudepohl
    Software Quality Journal, 2001, 9 : 161 - 176
  • [47] Gene expression databases and data mining
    Anderle, P
    Duval, M
    Draghici, S
    Kuklin, A
    Littlejohn, TG
    Medrano, JE
    Vilanova, D
    Roberts, MA
    BIOTECHNIQUES, 2003, : 36 - 44
  • [48] Data Mining in Multimodal Medical Databases
    Strungaru, Rodica
    Ungureanu, G. Mihaela
    Murri, Roberto
    Pasqualli, Clara
    Seidel, Klaus
    Datcu, Mihai
    Stanciu, Radu
    INTEGRATING BIOMEDICAL INFORMATION: FROM E-CELL TO E-PATIENT, 2006, : 85 - +
  • [49] DATA-MINING CHESS DATABASES
    Bleicher, E.
    Haworth, G. Mc C.
    van der Heijden, H. M. J. F.
    ICGA JOURNAL, 2010, 33 (04) : 212 - 214
  • [50] Data mining and modeling in scientific databases
    Kapetanios, E
    Norrie, MC
    NINTH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 1997, : 24 - 27