Parallel Web Crawler Architecture for Clickstream Analysis

被引:0
|
作者
Ahmadi-Abkenari, Fatemeh [1 ]
Selamat, Ali [1 ]
机构
[1] Univ Technol Malaysia, Software Engn Dept, Fac Comp Sci & Informat Syst, Utm 81310, Johor, Malaysia
来源
KNOWLEDGE TECHNOLOGY | 2012年 / 295卷
关键词
Clickstream analysis; Parallel crawlers; Web data management; Web page importance metrics;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The tremendous growth of the Web causes many challenges for single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues. As a result, more robust algorithms needed to produce more precise and relevant search results in an appropriate timely manner. The existed Web crawlers mostly implement link dependent Web page importance metrics. One of the harriers of applying this metrics is that these metrics produce considerable communication overhead on the multi agent crawlers. Moreover, they suffer from the shortcoming of high dependency to their own index size that ends in their failure to rank Web pages with complete accuracy. Hence more enhanced metrics need to he addressed in this area. Proposing new Web page importance metric needs define a new architecture as a framework to implement the metric. The aim of this paper is to propose architecture for a focused parallel crawler. In this framework, the decision-making on Web page importance is based on a combined metric of clickstream analysis and context similarity analysis to the issued queries.
引用
收藏
页码:123 / 132
页数:10
相关论文
共 50 条
  • [41] THE WEB-CRAWLER WARS
    TAUBES, G
    SCIENCE, 1995, 269 (5229) : 1355 - 1355
  • [42] Web crawler on client machine
    Shettar, Rajashree
    Shobha, G.
    IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 1121 - 1124
  • [43] An semantic rank for web crawler based on formal concept analysis
    Du, Yajun
    Li, Xinchun
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE 2007), 2007,
  • [44] A Novel Architecture for a Blog Crawler
    Madaan, Rosy
    Sharma, Ashok. Kumar
    Dixit, Ashutosh
    2012 2ND IEEE INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2012, : 452 - 456
  • [45] NoSQL Web Crawler Application
    Deka, Ganesh Chandra
    DEEP DIVE INTO NOSQL DATABASES: THE USE CASES AND APPLICATIONS, 2018, 109 : 77 - 100
  • [46] Smart Focused Web Crawler for Hidden Web
    Kaur, Sawroop
    Geetha, G.
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 419 - 427
  • [47] A Web crawler that knows the difference
    不详
    ONLINE & CDROM REVIEW, 1997, 21 (06): : 384 - 384
  • [48] Design of the Distributed Web Crawler
    Chen, Xing
    Li, Weijiang
    Zhao, Tiejun
    Piao, Xinghai
    ADVANCED RESEARCH ON INDUSTRY, INFORMATION SYSTEMS AND MATERIAL ENGINEERING, PTS 1-7, 2011, 204-210 : 1454 - +
  • [49] Smart distributed web crawler
    Bal, Sawroop Kaur
    Geetha, G.
    2016 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2016,
  • [50] ARABELLA A Directed Web Crawler
    Lopes, Pedro
    Pinto, Davide
    Campos, David
    Oliveira, Jose Luis
    KDIR 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2009, : 270 - 273