Parallel Web Crawler Architecture for Clickstream Analysis

被引:0
|
作者
Ahmadi-Abkenari, Fatemeh [1 ]
Selamat, Ali [1 ]
机构
[1] Univ Technol Malaysia, Software Engn Dept, Fac Comp Sci & Informat Syst, Utm 81310, Johor, Malaysia
来源
KNOWLEDGE TECHNOLOGY | 2012年 / 295卷
关键词
Clickstream analysis; Parallel crawlers; Web data management; Web page importance metrics;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The tremendous growth of the Web causes many challenges for single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues. As a result, more robust algorithms needed to produce more precise and relevant search results in an appropriate timely manner. The existed Web crawlers mostly implement link dependent Web page importance metrics. One of the harriers of applying this metrics is that these metrics produce considerable communication overhead on the multi agent crawlers. Moreover, they suffer from the shortcoming of high dependency to their own index size that ends in their failure to rank Web pages with complete accuracy. Hence more enhanced metrics need to he addressed in this area. Proposing new Web page importance metric needs define a new architecture as a framework to implement the metric. The aim of this paper is to propose architecture for a focused parallel crawler. In this framework, the decision-making on Web page importance is based on a combined metric of clickstream analysis and context similarity analysis to the issued queries.
引用
收藏
页码:123 / 132
页数:10
相关论文
共 50 条
  • [21] Design of Analysis System for Documents Based on Web Crawler
    Shang, Jingtao
    Lin, Jianjun
    Qin, Van
    Li, Bo
    Wu, Mengmeng
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 289 - 293
  • [22] Extensible Web Crawler - Towards Multimedia Material Analysis
    Turek, Wojciech
    Opalinski, Andrzej
    Kisiel-Dorohinicki, Marek
    MULTIMEDIA COMMUNICATIONS, SERVICES, AND SECURITY, 2011, 149 : 183 - 190
  • [23] Elastic Web Crawler Service-Oriented Architecture Over Cloud Computing
    ElAraby, M. E.
    Moftah, Hossam M.
    Abuelenin, Sherihan M.
    Rashad, M. Z.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (12) : 8111 - 8126
  • [24] Analysis and implementation of an Ajax-enabled web crawler
    1600, Science and Engineering Research Support Society, 20 Virginia Court, Sandy Bay, Tasmania, Prof B.H.Kang's Office,, Australia (06):
  • [25] Analysis and Implementation of an Ajax-enabled Web Crawler
    Cui, Li-Lie
    He, Hui
    Xuan, Hong-Wei
    INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING, 2013, 6 (02): : 139 - 146
  • [26] Visualization and analysis of clickstream data of online stores with a parallel coordinate system
    Lee, J
    Podlaseck, M
    ELECTRONIC COMMERCE AND WEB TECHNOLOGIES, PROCEEDINGS, 2000, 1875 : 145 - 154
  • [27] Clickstream log acquisition with Web farming
    Hu, J
    Zhong, N
    2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 257 - 263
  • [28] Visualization and analysis of clickstream data of online stores for understanding Web merchandising
    Lee, J
    Podlaseck, M
    Schonberg, E
    Hoch, R
    DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (1-2) : 59 - 84
  • [29] Reducing web crawler overhead using mobile crawler
    M.E. Computer Science and Engineering, Arunai Engineering College, Tiruvannamalai-606 603, Tamil Nadu, India
    不详
    Int. Conf. Emerg. Trends Electr. Comput. Technol., ICETECT, 2011, (926-932):
  • [30] Analysis and Detection of Bogus Behavior in Web Crawler Measurement
    Bai, Quan
    Xiong, Gang
    Zhao, Yong
    He, Longtao
    2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 : 1084 - 1091