Organizing hidden-Web databases by clustering visible Web documents

被引：0

作者：

Barbosa, Luciano ^{[1
]}

Freire, Juliana ^{[1
]}

Silva, Altigran ^{[2
]}

机构：

[1] Univ Utah, Salt Lake City, UT 84112 USA

[2] Univ Fed Amazonas, Manaus, Amazonas, Brazil

来源：

2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3 | 2007年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we address the problem of organizing hidden-Web databases. Given a heterogeneous set of Web forms that serve as entry points to hidden-Web databases, our goal is to cluster the forms according to the database domains to which they belong. We propose a new clustering approach that models Web forms as a set of hyperlinked objects and considers visible information in the form context-both within and in the neighborhood of forms-as the basis for similarity comparison. Since the clustering is performed over features that can be automatically extracted, the process is scalable. In addition, because it uses a rich set of metadata, our approach is able to handle a wide range of,forms, including content-rich forms that contain multiple attributes, as well as simple keyword-based search inter-faces. An experimental evaluation over real Web data shows that our strategy generates high-quality clusters-measured both in terms of entropy and F-measure. This indicates that our approach provides an effective and general solution to the problem of organizing hidden-Web databases.

引用

页码：301 / +

页数：2

共 50 条

[31] Link-Based Clustering Algorithm for Clustering Web Documents
Ashokkumar, P.
Don, S.
JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 4096 - 4107
[32] HDBTracker: Monitoring the Aggregates On Dynamic Hidden Web Databases
Liu, Weimo
Bin Suhaim, Saad
Thirumuruganathan, Saravanan
Zhang, Nan
Das, Gautam
Jaoua, Ali
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13): : 1569 - 1572
[33] Aggregate Estimation Over Dynamic Hidden Web Databases
Liu, Weimo
Thirumuruganathan, Saravanan
Zhang, Nan
Das, Gautam
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (12): : 1107 - 1118
[34] Sampling, information extraction and summarisation of Hidden Web databases
Hedley, Yih-Ling
Younas, Muhammad
James, Anne
Sanderson, Mark
DATA & KNOWLEDGE ENGINEERING, 2006, 59 (02) : 213 - 230
[35] A Method for Web Documents Clustering Based on Dynamic Concept
Wang, Yunhua
Ke, Huiyan
2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 2183 - 2187
[36] Clustering documents into a web directory for bootstrapping a supervised classification
Adami, G
Avesani, P
Sona, D
DATA & KNOWLEDGE ENGINEERING, 2005, 54 (03) : 301 - 325
[37] Research of Web Documents Clustering Based on Dynamic Concept
WANG Yun-hua 1
2.National Engineering Research Center for Multimedia Software
Wuhan University Journal of Natural Sciences, 2004, (05) : 547 - 552
[38] Contextual adaptive clustering of Web and text documents with personalization
Ciesielski, Krzysztof
Klopotek, Mieczyslaw A.
Wierzchon, Slawomir T.
MINING COMPLEX DATA, 2008, 4944 : 116 - 130
[39] Improving Suffix Tree Clustering Algorithm for Web Documents
Zhuang, Yan
Chen, Youguang
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON LOGISTICS, ENGINEERING, MANAGEMENT AND COMPUTER SCIENCE (LEMCS 2015), 2015, 117 : 1557 - 1561
[40] Mining Evolving Web Sessions and Clustering Dynamic Web Documents for Similarity-Aware Web Content Management
Xiao, Jitian
ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 99 - 110

← 1 2 3 4 5 →