Web crawling

被引:151
|
作者
Olston C. [1 ]
Najork M. [2 ]
机构
[1] Yahoo Research, Sunnyvale, CA, 94089
[2] Microsoft Research, Mountain View, CA, 94043
来源
关键词
D O I
10.1561/1500000017
中图分类号
学科分类号
摘要
This is a survey of the science and practice of web crawling. While at first glance web crawling may appear to be merely an application of breadth-first-search, the truth is that there are many challenges ranging from systems concerns such as managing very large data structures to theoretical questions such as how often to revisit evolving content sources. This survey outlines the fundamental challenges and describes the state-of-the-art models and solutions. It also highlights avenues for future work. © 2010 C. Olston and M. Najork.
引用
收藏
页码:175 / 246
页数:71
相关论文
共 50 条
  • [1] Board forum crawling: A web crawling method for web forum
    Guo, Yan
    Li, Kui
    Zhang, Kai
    Zhang, Gang
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 745 - +
  • [2] Crawling the web with OntoDir
    Picariello, Antonio
    Rinaldi, Antonio M.
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 730 - +
  • [3] Crawling the infinite web
    Baeza-Yates, Ricardo
    Castillo, Carlos
    JOURNAL OF WEB ENGINEERING, 2007, 6 (01): : 49 - 72
  • [4] Crawling toward the Web
    Sinclair, Ken
    Engineered Systems, 2002, 19 (11):
  • [5] On the Stability of Web Crawling and Web Search
    Anderson, Reid
    Borgs, Christian
    Chayes, Jennifer
    Hopcroft, John
    Mirrokni, Vahab
    Teng, Shang-Hua
    ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2008, 5369 : 680 - 691
  • [6] EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING
    Pirkola, Ari
    Talvensaari, Tuomas
    WEBIST 2009: PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2009, : 376 - 381
  • [7] Web Crawling Technique for Vulnerability Assessment on Web
    Yudha, Fietyata
    Panji, Andi Muhammad T.
    Adiputro, Laksono A. R.
    Ramadhani, Erika
    LECTURE NOTES IN ELECTRICAL, ELECTRONIC AND COMPUTER ENGINEERING, 2019, : 48 - 54
  • [8] An effective approach of web crawling for deep web
    Wang, Shunyan
    Wu, Binghua
    Zhong, Luo
    DCABES 2007 Proceedings, Vols I and II, 2007, : 855 - 858
  • [9] An Architecture for Efficient Web Crawling
    Hernandez, Inma
    Rivero, Carlos R.
    Ruiz, David
    Corchuelo, Rafael
    ADVANCED INFORMATION SYSTEMS ENGINEERING WORKSHOPS, CAISE 2012, 2012, 112 : 228 - 234
  • [10] Crawling toward a Wiser Web
    Hayes, Brian
    AMERICAN SCIENTIST, 2015, 103 (03) : 184 - 187