An architecture for a focused trend parallel Web crawler with the application of clickstrearn analysis

被引:30
|
作者
Ahmadi-Abkenari, Fatemeh [1 ]
Selamat, Ali [1 ]
机构
[1] Univ Teknol Malaysia, Fac Comp Sci & Informat Syst, Dept Software Engn, Software Engn Res Grp, Johor Baharu, Malaysia
关键词
Clickstream analysis; Focused crawlers; Parallel crawlers; Web data management; Web page importance metrics;
D O I
10.1016/j.ins.2011.08.022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:266 / 281
页数:16
相关论文
共 50 条
  • [41] An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
    Yang, Wantong
    Wang, Enze
    Gui, Zhiwen
    Zhou, Yuan
    Wang, Baosheng
    Xie, Wei
    APPLIED SCIENCES-BASEL, 2025, 15 (02):
  • [42] Adaptive focused crawler based on tunneling and link analysis
    Zhang, Xiaoming
    Li, Zhoujun
    Hu, Chaojian
    11TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS I-III, PROCEEDINGS,: UBIQUITOUS ICT CONVERGENCE MAKES LIFE BETTER!, 2009, : 2225 - 2230
  • [43] Study And Application of Web Crawler Algorithm Based on Heritrix
    Liu, DongFei
    Fan, XianShuang
    ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 1069 - 1072
  • [44] An Improved Focused Crawler: Using Web Page Classification and Link Priority Evaluation
    Lu, Houqing
    Zhan, Donghui
    Zhou, Lei
    He, Dengchao
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016
  • [45] An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm
    Prabha, K. S. Sakunthala
    Mahesh, C.
    Raja, S. P.
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2021, 21 (02) : 105 - 120
  • [46] Focused Crawler Enhancement Technique with Language Detection Module for Malay Web Retrieval
    Mohd, Masnizah
    Fauzi, Wan Fariza Paizi
    Jasin, Amri
    GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2018, 18 (03): : 170 - 185
  • [47] Amelioration of linguistic semantic classifier with sentiment classifier manacle for the focused web crawler
    Prabha K.S.S.
    Mahesh C.
    Goundar S.
    Raja S.P.
    International Journal of Information Technology, 2023, 15 (2) : 1137 - 1149
  • [48] An Empirical Study on Performance Evaluation of Parallel Architecture for Web Application Services
    Zulkipli, Nurul Huda Nik
    Idris, Norazlan
    2013 IEEE SYMPOSIUM ON COMPUTERS AND INFORMATICS (ISCI 2013), 2013,
  • [49] iSurfer: a focused Web crawler based on incremental learning from positive samples
    Ye, YM
    Ma, FY
    Lu, YM
    Chiu, M
    Huang, J
    ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 122 - 134
  • [50] A novel combining method of dynamic and static web crawler with parallel computing
    Liu, Qingyang
    Yahyapour, Ramin
    Liu, Hongjiu
    Hu, Yanrong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (21) : 60343 - 60364