An architecture for a focused trend parallel Web crawler with the application of clickstrearn analysis

被引:30
|
作者
Ahmadi-Abkenari, Fatemeh [1 ]
Selamat, Ali [1 ]
机构
[1] Univ Teknol Malaysia, Fac Comp Sci & Informat Syst, Dept Software Engn, Software Engn Res Grp, Johor Baharu, Malaysia
关键词
Clickstream analysis; Focused crawlers; Parallel crawlers; Web data management; Web page importance metrics;
D O I
10.1016/j.ins.2011.08.022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:266 / 281
页数:16
相关论文
共 50 条
  • [21] A Cloud-based Web Crawler Architecture
    Bahrami, Mehdi
    Singhal, Mukesh
    Zhuang, Zixuan
    2015 8TH INTERNATIONAL CONFERENCE ON INTELLIGENCE IN NEXT GENERATION NETWORKS, 2015, : 216 - 223
  • [22] A Critical Review of Migrating Parallel Web Crawler
    Farooqui, Md. Faizan
    Beg, Md. Rizwan
    Rafiq, Md. Qasim
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 2, 2013, 177 : 631 - +
  • [23] Design of a Parallel and Scalable Crawler for the Hidden Web
    Gupta, Sonali
    Bhatia, Komal Kumar
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (01)
  • [24] A Focused Crawler for Web Feature Service and Web Map Service Discovering
    Alexandrino, Victor Macedo
    Comarela, Giovanni
    da Silva, Altigran Soares
    Lisboa-Filho, Jugurta
    WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS (W2GIS 2020), 2020, 12473 : 111 - 124
  • [25] Design Crawler: A Web Application For Digital Design Metadata Analysis
    Hosny, Sherif
    Baher, Amr
    2019 20TH INTERNATIONAL WORKSHOP ON MICROPROCESSOR/SOC TEST, SECURITY AND VERIFICATION (MTV 2019), 2019, : 31 - 34
  • [26] A Focused Crawler Based on Correlation Analysis
    Qin, Qiuli
    Peng, Xin
    INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING, 2014, 7 (06): : 13 - 20
  • [27] A New Architecture of Ajax Web Application Security Crawler with Finite-State Machine
    An Huiyao
    Song Yang
    Yu Tao
    Li Hui
    Zhang Peng
    Zha Jun
    2014 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2014, : 112 - 117
  • [28] Customized focused crawler for peer-to-peer Web search
    Fang, Qiming
    Yang, Guangwen
    Wu, Yongwei
    Zhu, Anping
    Zheng, Weimin
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2007, 35 (SUPPL. 2): : 148 - 152
  • [29] A Semantic Focused Web Crawler Based on a Knowledge Representation Schema
    Hernandez, Julio
    Marin-Castro, Heidy M.
    Morales-Sandoval, Miguel
    APPLIED SCIENCES-BASEL, 2020, 10 (11):
  • [30] Template-Driven Semantic Parsing for Focused Web Crawler
    Blinkiewicz, Michal
    Galler, Mariusz
    Szwabe, Andrzej
    SEMANTIC TECHNOLOGY (JIST 2014), 2015, 8943 : 351 - 358