On efficiently summarizing categorical databases

被引：29

作者：

Wang, JY

Karypis, G ^{[1
]}

机构：

[1] Univ Minnesota, Digital Technol Ctr, Dept Comp Sci, Minneapolis, MN 55455 USA

[2] Univ Minnesota, Army HPC Res Ctr, Minneapolis, MN 55455 USA

[3] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2006年 / 9卷 / 01期

关键词：

data mining; frequent itemset; categorical database; clustering;

D O I：

10.1007/s10115-005-0216-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Frequent itemset mining was initially proposed and has been studied extensively in the context of association rule mining. In recent years, several studies have also extended its application to transaction or document clustering. However, most of the frequent itemset based clustering algorithms need to first mine a large intermediate set of frequent itemsets in order to identify a subset of the most promising ones that can be used for clustering. In this paper, we study how to directly find a subset of high quality frequent itemsets that can be used as a concise summary of the transaction database and to cluster the categorical data. By exploring key properties of the subset of itemsets that we are interested in, we proposed several search space pruning methods and designed an efficient algorithm called SUMMARY. Our empirical results show that SUMMARY runs very fast even when the minimum support is extremely low and scales very well with respect to the database size, and surprisingly, as a: pure frequent itemset mining algorithm it is very effective in clustering the categorical data and summarizing the dense transaction databases.

引用

页码：19 / 37

页数：19

共 50 条

[21] Efficiently Summarizing Text and Graph Encodings of Multi-Document Clusters
Pasunuru, Ramakanth
Liu, Mengwen
Bansal, Mohit
Ravi, Sujith
Dreyer, Markus
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 4768 - 4779
[22] VIEWS AND DECOMPOSITIONS OF DATABASES FROM A CATEGORICAL PERSPECTIVE
TUIJN, C
GYSSENS, M
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 646 : 99 - 112
[23] Inferential disclosure limitation in multivariate categorical databases
Justice, R
Mukherjee, S
SAM'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SECURITY AND MANAGEMENT, VOLS 1 AND 2, 2003, : 650 - 653
[24] Mining categorical concept hierarchies in large databases
Chien, BC
Liao, SY
7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTER SCIENCE AND ENGINEERING, 2003, : 244 - 249
[25] Computing Distance Histograms Efficiently in Scientific Databases
Tu, Yi-Cheng
Chen, Shaoping
Pandit, Sagar
ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 796 - +
[26] Efficiently Evaluating Skyline Queries on RDF Databases
Chen, Ling
Gao, Sidan
Anyanwu, Kemafor
SEMANTIC WEB: RESEARCH AND APPLICATIONS, PT II, 2011, 6644 : 123 - 138
[27] Efficiently matching proximity relationships in spatial databases
Lin, XM
Zhou, XM
Liu, CF
ADVANCES IN SPATIAL DATABASES, 1999, 1651 : 188 - 206
[28] Efficiently calculating inbreeding on large pedigrees databases
Elliott, Brendan
Cheng, En
Mayes, Stephen
Ozsoyoglu, Z. Meral
INFORMATION SYSTEMS, 2009, 34 (06) : 469 - 492
[29] EFFICIENTLY MINING FREQUENT ITEMSETS IN TRANSACTIONAL DATABASES
Alghyaline, Salah
Hsieh, Jun-Wei
Lai, Jim Z. C.
JOURNAL OF MARINE SCIENCE AND TECHNOLOGY-TAIWAN, 2016, 24 (02): : 184 - 191
[30] Efficiently Managing Encrypted Data in Cloud Databases
Ben Omran, Osama M.
Panda, Brajendra
2015 IEEE 2ND INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD), 2015, : 266 - 271

← 1 2 3 4 5 →