Parallel Web Crawler Architecture for Clickstream Analysis

被引:0
|
作者
Ahmadi-Abkenari, Fatemeh [1 ]
Selamat, Ali [1 ]
机构
[1] Univ Technol Malaysia, Software Engn Dept, Fac Comp Sci & Informat Syst, Utm 81310, Johor, Malaysia
来源
KNOWLEDGE TECHNOLOGY | 2012年 / 295卷
关键词
Clickstream analysis; Parallel crawlers; Web data management; Web page importance metrics;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The tremendous growth of the Web causes many challenges for single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues. As a result, more robust algorithms needed to produce more precise and relevant search results in an appropriate timely manner. The existed Web crawlers mostly implement link dependent Web page importance metrics. One of the harriers of applying this metrics is that these metrics produce considerable communication overhead on the multi agent crawlers. Moreover, they suffer from the shortcoming of high dependency to their own index size that ends in their failure to rank Web pages with complete accuracy. Hence more enhanced metrics need to he addressed in this area. Proposing new Web page importance metric needs define a new architecture as a framework to implement the metric. The aim of this paper is to propose architecture for a focused parallel crawler. In this framework, the decision-making on Web page importance is based on a combined metric of clickstream analysis and context similarity analysis to the issued queries.
引用
收藏
页码:123 / 132
页数:10
相关论文
共 50 条
  • [1] Architecture for a Parallel Focused Crawler for Clickstream Analysis
    Selamat, Ali
    Ahmadi-Abkenari, Fatemeh
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2011, PT I, 2011, 6591 : 27 - 35
  • [2] An architecture for a focused trend parallel Web crawler with the application of clickstrearn analysis
    Ahmadi-Abkenari, Fatemeh
    Selamat, Ali
    INFORMATION SCIENCES, 2012, 184 (01) : 266 - 281
  • [3] Parallel crawler architecture and web page change detection
    Computer Science and Information Technology, Jaypee Institute of Information Technology University, Noida, India
    WSEAS Trans. Comput., 2008, 7 (929-940):
  • [4] A Novel Architecture for Deep Web Crawler
    Sharma, Dilip Kumar
    Sharma, A. K.
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2011, 6 (01) : 25 - 48
  • [5] A Cloud-based Web Crawler Architecture
    Bahrami, Mehdi
    Singhal, Mukesh
    Zhuang, Zixuan
    2015 8TH INTERNATIONAL CONFERENCE ON INTELLIGENCE IN NEXT GENERATION NETWORKS, 2015, : 216 - 223
  • [6] A Critical Review of Migrating Parallel Web Crawler
    Farooqui, Md. Faizan
    Beg, Md. Rizwan
    Rafiq, Md. Qasim
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 2, 2013, 177 : 631 - +
  • [7] Design of a Parallel and Scalable Crawler for the Hidden Web
    Gupta, Sonali
    Bhatia, Komal Kumar
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (01)
  • [8] Scrawler: A seed-by-seed parallel web crawler
    Lee, Joo Yong
    Lee, Sang Ho
    Kim, Yanggon
    ICE-B 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON E-BUSINESS, 2007, : 151 - +
  • [9] Architecture Design of Subject-Oriented Web Crawler
    Cao Xin
    Zhang Yong
    Zhang Fuyan
    Ni Changbao
    2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND ENGINEERING APPLICATIONS, 2013, : 174 - 177
  • [10] An architecture for SCS:: A specialized web crawler on the topic of security
    Özmutlu, HC
    Özmutlu, S
    ASIST 2004: PROCEEDINGS OF THE 67TH ASIS&T ANNUAL MEETING, VOL 41, 2004: MANAGING AND ENHANCING INFORMATION: CULTURES AND CONFLICTS, 2004, 41 : 317 - 326