Efficient attribute-oriented generalization for knowledge discovery from large databases

被引：38

作者：

Carter, CL ^{[1
]}

Hamilton, HJ ^{[1
]}

机构：

[1] Univ Regina, Dept Comp Sci, Networks Ctr Excellence Program, Ctr Excellence Lab,IRIS, Regina, SK S4S 0A2, Canada

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 1998年 / 10卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

knowledge discovery from databases; data mining; attribute-oriented induction;

D O I：

10.1109/69.683752

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present GDBR (Generalize DataBase Relation) and FIGR (Fast, incremental Generalization and Regeneralization), two enhancements of Attribute-Oriented Generalization, a well-known knowledge discovery from databases technique. GDBR and FIGR are both O(n) and, as such, are optimal. GDBR is an on-line algorithm and requires only a small, constant amount of space. FIGR also requires a constant amount of space that is generally reasonable although, under certain circumstances, may grow large. FIGR is incremental, allowing changes to the database to be reflected in the generalization results without rereading input data. FIGR also allows fast regeneralization to both higher and lower levels of generality without rereading input. We compare GDBR and FIGR to two previous algorithms, LCHR and AOI, which are O(n log n) and O(np), respectively, where n is the number of input tuples and p the number of tuples in the generalized relation. Both require O(n) space that, for large input, causes memory problems. We implemented all four algorithms and ran empirical tests, and we found that GDBR and FIGR are faster. In addition, their runtimes increase only linearly as input size increases, while the runtimes of LCHR and AOI increase greatly when input size exceeds memory limitations.

引用

页码：193 / 208

页数：16

共 50 条

[31] Towards process-oriented tool support for knowledge discovery in databases
Wirth, R
Shearer, C
Grimmer, U
Reinartz, T
Schlosser, J
Breitner, C
Engels, R
Lindner, G
PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1997, 1263 : 243 - 253
[32] Knowledge discovery from object-oriented databases using an association rules mining algorithm
Changchien, SW
Lu, TC
KNOWLEDGE-BASED INTELLIGENT INFORMATION ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, PTS 1 AND 2, 2001, 69 : 1083 - 1088
[33] Generalized Knowledge Discovery from Relational Databases
Wu, Yu-Ying
Chen, Yen-Liang
Chang, Ray-I
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (06): : 148 - 153
[34] From data mining to knowledge discovery in databases
Fayyad, U
PiatetskyShapiro, G
Smyth, P
AI MAGAZINE, 1996, 17 (03) : 37 - 54
[35] Knowledge discovery from databases: An introductory review
Vickery, B
JOURNAL OF DOCUMENTATION, 1997, 53 (02) : 107 - 122
[36] Knowledge discovery from databases on the semantic web
Scotney, B
McClean, S
16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2004, : 333 - 336
[37] Knowledge discovery in large text databases using the MST algorithm
Romanov, V
Pantileeva, E
Data Mining VI: Data Mining, Text Mining and Their Business Applications, 2005, : 153 - 162
[38] PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases
Cheng-Fa Tsai
Heng-Fu Yeh
Jui-Fang Chang
Ning-Han Liu
Applied Intelligence, 2010, 33 : 39 - 53
[39] PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases
Tsai, Cheng-Fa
Yeh, Heng-Fu
Chang, Jui-Fang
Liu, Ning-Han
APPLIED INTELLIGENCE, 2010, 33 (01) : 39 - 53
[40] ShadowAQP: Efficient Approximate Group-by and Join Query via Attribute-oriented Sample Size Allocation and Data Generation
Gu, Rong
Li, Han
Dai, Haipeng
Huang, Wenjie
Xue, Jie
Li, Meng
Zheng, Jiaqi
Cai, Haoran
Huang, Yihua
Chen, Guihai
PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (13): : 4216 - 4229

← 1 2 3 4 5 →