An architecture for a focused trend parallel Web crawler with the application of clickstrearn analysis

被引:30
|
作者
Ahmadi-Abkenari, Fatemeh [1 ]
Selamat, Ali [1 ]
机构
[1] Univ Teknol Malaysia, Fac Comp Sci & Informat Syst, Dept Software Engn, Software Engn Res Grp, Johor Baharu, Malaysia
关键词
Clickstream analysis; Focused crawlers; Parallel crawlers; Web data management; Web page importance metrics;
D O I
10.1016/j.ins.2011.08.022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:266 / 281
页数:16
相关论文
共 50 条
  • [31] Scrawler: A seed-by-seed parallel web crawler
    Lee, Joo Yong
    Lee, Sang Ho
    Kim, Yanggon
    ICE-B 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON E-BUSINESS, 2007, : 151 - +
  • [32] Architecture Design of Subject-Oriented Web Crawler
    Cao Xin
    Zhang Yong
    Zhang Fuyan
    Ni Changbao
    2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND ENGINEERING APPLICATIONS, 2013, : 174 - 177
  • [33] An architecture for SCS:: A specialized web crawler on the topic of security
    Özmutlu, HC
    Özmutlu, S
    ASIST 2004: PROCEEDINGS OF THE 67TH ASIS&T ANNUAL MEETING, VOL 41, 2004: MANAGING AND ENHANCING INFORMATION: CULTURES AND CONFLICTS, 2004, 41 : 317 - 326
  • [34] HAWK: A Focused Crawler with Content and Link Analysis
    Chen, Xiaoyun
    Zhang, Xin
    PROCEEDINGS OF THE ICEBE 2008: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, 2008, : 677 - 680
  • [35] Designing a Modular and Distributed Web Crawler Focused on Unstructured Cybersecurity Intelligence
    Jenkins, Donovan
    Liebrock, Lorie M.
    Urias, Vince
    2021 INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2021,
  • [36] LSCrawler: A framework for an enhanced focused web crawler based on link semantics
    Yuvarani, M.
    Iyengar, N. Ch. S. N.
    Kannan, A.
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 794 - 797
  • [37] Research on improved focused crawler and its application in food safety public opinion analysis
    Geng, ZhiQiang
    Shang, Dirui
    Zhu, QunXiong
    Wu, QiangQiang
    Han, YongMing
    2017 CHINESE AUTOMATION CONGRESS (CAC), 2017, : 2847 - 2852
  • [38] More effective, efficient,.and scalable Web crawler system architecture
    El-Ramly, NA
    Harb, HM
    Amin, N
    Tolba, AM
    ICEEC'04: 2004 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTER ENGINEERING, PROCEEDINGS, 2004, : 120 - 123
  • [39] A novel focused crawler combining Web space evolution and domain ontology
    Liu, Jingfa
    Li, Xin
    Zhang, Qiansheng
    Zhong, Guo
    KNOWLEDGE-BASED SYSTEMS, 2022, 243
  • [40] An ontology-supported web focused-crawler for Java programs
    Dept. of Computer and Communication Engineering, St. John's University, Taiwan
    不详
    IEEE Int. Conf. Ubi-Media Comput., U-Media, (266-271):