Parallel Web Crawler Architecture for Clickstream Analysis

被引:0
|
作者
Ahmadi-Abkenari, Fatemeh [1 ]
Selamat, Ali [1 ]
机构
[1] Univ Technol Malaysia, Software Engn Dept, Fac Comp Sci & Informat Syst, Utm 81310, Johor, Malaysia
来源
KNOWLEDGE TECHNOLOGY | 2012年 / 295卷
关键词
Clickstream analysis; Parallel crawlers; Web data management; Web page importance metrics;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The tremendous growth of the Web causes many challenges for single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues. As a result, more robust algorithms needed to produce more precise and relevant search results in an appropriate timely manner. The existed Web crawlers mostly implement link dependent Web page importance metrics. One of the harriers of applying this metrics is that these metrics produce considerable communication overhead on the multi agent crawlers. Moreover, they suffer from the shortcoming of high dependency to their own index size that ends in their failure to rank Web pages with complete accuracy. Hence more enhanced metrics need to he addressed in this area. Proposing new Web page importance metric needs define a new architecture as a framework to implement the metric. The aim of this paper is to propose architecture for a focused parallel crawler. In this framework, the decision-making on Web page importance is based on a combined metric of clickstream analysis and context similarity analysis to the issued queries.
引用
收藏
页码:123 / 132
页数:10
相关论文
共 50 条
  • [11] Web farming with clickstream
    Hu, Jia
    Zhong, Ning
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2008, 7 (02) : 291 - 308
  • [12] Applying clickstream data mining to real-time Web crawler detection and containment using ClickTips platform
    Lourenco, Analia
    Belo, Orlando
    ADVANCES IN DATA ANALYSIS, 2007, : 351 - +
  • [13] More effective, efficient,.and scalable Web crawler system architecture
    El-Ramly, NA
    Harb, HM
    Amin, N
    Tolba, AM
    ICEEC'04: 2004 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTER ENGINEERING, PROCEEDINGS, 2004, : 120 - 123
  • [14] A novel incremental parallel web crawler based on focused crawling
    Huang, Qiuyan
    Li, Qingzhong
    Yan, Zhongmin
    Fu, Hong
    Journal of Computational Information Systems, 2013, 9 (06): : 2461 - 2469
  • [15] A COMPARATIVE ANALYSIS OF CLICKSTREAM AS WEB PAGE IMPORTANCE METRIC
    Surya, Anupama
    Sharma, Dilip Kumar
    2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT 2013), 2013, : 776 - 781
  • [16] A novel combining method of dynamic and static web crawler with parallel computing
    Liu, Qingyang
    Yahyapour, Ramin
    Liu, Hongjiu
    Hu, Yanrong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (21) : 60343 - 60364
  • [17] Performance Metrics of Web Crawler In Client-Server And Mvc Architecture
    Badgujar, Jyotsana
    Jailia, Manisha
    Kumar, Ashok
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND APPLICATIONS (ICACEA), 2015, : 393 - 398
  • [18] Self Adjusting Refresh Time Based Architecture for Incremental Web Crawler
    Sharma, A. K.
    Dixit, Ashutosh
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (12): : 349 - 354
  • [19] IMPLEMENTATION OF WEB CRAWLER
    Gupta, Pooja
    Johari, Kalpana
    2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 775 - 780
  • [20] Elastic Web Crawler Service-Oriented Architecture Over Cloud Computing
    M. E. ElAraby
    Hossam M. Moftah
    Sherihan M. Abuelenin
    M. Z. Rashad
    Arabian Journal for Science and Engineering, 2018, 43 : 8111 - 8126