Efficient attribute-oriented generalization for knowledge discovery from large databases

被引:38
|
作者
Carter, CL [1 ]
Hamilton, HJ [1 ]
机构
[1] Univ Regina, Dept Comp Sci, Networks Ctr Excellence Program, Ctr Excellence Lab,IRIS, Regina, SK S4S 0A2, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
knowledge discovery from databases; data mining; attribute-oriented induction;
D O I
10.1109/69.683752
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present GDBR (Generalize DataBase Relation) and FIGR (Fast, incremental Generalization and Regeneralization), two enhancements of Attribute-Oriented Generalization, a well-known knowledge discovery from databases technique. GDBR and FIGR are both O(n) and, as such, are optimal. GDBR is an on-line algorithm and requires only a small, constant amount of space. FIGR also requires a constant amount of space that is generally reasonable although, under certain circumstances, may grow large. FIGR is incremental, allowing changes to the database to be reflected in the generalization results without rereading input data. FIGR also allows fast regeneralization to both higher and lower levels of generality without rereading input. We compare GDBR and FIGR to two previous algorithms, LCHR and AOI, which are O(n log n) and O(np), respectively, where n is the number of input tuples and p the number of tuples in the generalized relation. Both require O(n) space that, for large input, causes memory problems. We implemented all four algorithms and ran empirical tests, and we found that GDBR and FIGR are faster. In addition, their runtimes increase only linearly as input size increases, while the runtimes of LCHR and AOI increase greatly when input size exceeds memory limitations.
引用
收藏
页码:193 / 208
页数:16
相关论文
共 50 条
  • [21] On efficient factorization of standard fuzzy concept lattices and attribute-oriented fuzzy concept lattices
    Konecny, Jan
    FUZZY SETS AND SYSTEMS, 2018, 351 : 108 - 121
  • [22] Methods of data mining and knowledge generalization in large databases
    Vagin, VN
    Fedotov, AA
    Fomin, MV
    JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 1999, 38 (05) : 714 - 727
  • [23] Discovery of temporal knowledge from databases
    Watanabe, K
    Miura, T
    Shioya, I
    INFORMATION REUSE AND INTEGRATION, 2001, : 13 - 17
  • [24] Knowledge discovery from industrial databases
    Gertosio, C
    Dussauchoy, A
    JOURNAL OF INTELLIGENT MANUFACTURING, 2004, 15 (01) : 29 - 37
  • [25] Knowledge discovery from industrial databases
    Christine Gertosio
    Alan Dussauchoy
    Journal of Intelligent Manufacturing, 2004, 15 : 29 - 37
  • [26] Discovery of knowledge from diagnostic databases
    Moczulski, WA
    Kostka, P
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY IV, 2002, 4730 : 126 - 137
  • [27] Optimization of the knowledge discovery process in very large databases
    Owrang, MM
    PROCEEDINGS OF THE FIFTH JOINT CONFERENCE ON INFORMATION SCIENCES, VOLS 1 AND 2, 2000, : 490 - 495
  • [28] Knowledge discovery from very large databases using frequent concept lattices
    Waiyamai, K
    Lakhal, L
    MACHINE LEARNING: ECML 2000, 2000, 1810 : 437 - 445
  • [29] Efficient discovery of new information in large text databases
    Bradford, RB
    INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2005, 3495 : 374 - 380
  • [30] An efficient algorithm for pattern discovery in large text databases
    Li, D
    Wang, K
    Deogun, JS
    Donis, RO
    IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 96 - 102