Using Web Pages Dynamicity to Prioritise Web Crawling

被引:1
|
作者
Alderratia, Nisreen [1 ]
Elsheh, Mohammed [1 ]
机构
[1] Libyan Acad Misurata, Third Ring Rd, Misurata, Libya
关键词
Web crawler; importance metric; dynamicity;
D O I
10.1145/3366750.3366757
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web crawling is a process performed to collect web pages from the web, in order to be indexed and used for displaying the search results according to users' requirements. In addition, web crawlers must continually revisit web pages, to keep the search engine database updated. Moreover, it is fundamental to determine in the crawling process, the most important pages to be recrawled first. This is to avoid the time limitation and network issues that face the web crawling process. Thus, this research attempts to introduce a method that is used to indicate the crawler, specifically, in order to identify in what order it should recrawl web pages that have been crawled before, as to acquire more important and valuable pages earlier than others. In addition, the researchers proposed a web crawling strategy which is based on the topic similarity, accompanied with the dynamicity of web pages, where the crawler was downloading relevant pages and recrawling them recursively. Also, every time a change emerged in one of the pages, its counter increased. Therefore, if the page was relevant and changed frequently it would be considered an important page and was given a high priority in the crawling process. The obtained results indicated that using web pages' dynamicity is an effective way for prioritising web pages in the crawling process, in order to obtain the highest dynamic pages first, as there is a high possibility of being changed in terms of their content, before the least dynamic ones.
引用
收藏
页码:40 / 44
页数:5
相关论文
共 50 条
  • [11] Using the web information structure for retrieving web pages
    Adriani, Mirna
    Pandugita, Rama
    ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 892 - 897
  • [12] Evolving dynamic web pages using web mining
    Menon, K
    Dagli, CH
    INTELLIGENT COMPUTING: THEORY AND APPLICATIONS, 2003, 5103 : 48 - 57
  • [13] Semantic Annotation of Web Pages Using Web Patterns
    Kudelka, Milos
    Snasel, Vaclav
    Lehecka, Ondrej
    El-Qawasmeh, Eyas
    Pokorny, Jaroslav
    ADVANCED INTERNET BASED SYSTEMS AND APPLICATIONS, 2009, 4879 : 280 - +
  • [14] On the Stability of Web Crawling and Web Search
    Anderson, Reid
    Borgs, Christian
    Chayes, Jennifer
    Hopcroft, John
    Mirrokni, Vahab
    Teng, Shang-Hua
    ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2008, 5369 : 680 - 691
  • [15] Validation of a Web Application by Using a Limited Number of Web Pages
    Popescu, Doru Anastasiu
    Danauta, Maria Catrinel
    BRAIN-BROAD RESEARCH IN ARTIFICIAL INTELLIGENCE AND NEUROSCIENCE, 2012, 3 (01):
  • [16] Crawling the web with OntoDir
    Picariello, Antonio
    Rinaldi, Antonio M.
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 730 - +
  • [17] Similarity Measurement of Web Sites Using Sink Web Pages
    Popescu, Doru Anastasiu
    Maria, Danauta Catrinel
    2011 34TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2011, : 24 - 26
  • [18] Crawling toward the Web
    Sinclair, Ken
    Engineered Systems, 2002, 19 (11):
  • [19] Crawling the infinite web
    Baeza-Yates, Ricardo
    Castillo, Carlos
    JOURNAL OF WEB ENGINEERING, 2007, 6 (01): : 49 - 72
  • [20] Web Crawling Technique for Vulnerability Assessment on Web
    Yudha, Fietyata
    Panji, Andi Muhammad T.
    Adiputro, Laksono A. R.
    Ramadhani, Erika
    LECTURE NOTES IN ELECTRICAL, ELECTRONIC AND COMPUTER ENGINEERING, 2019, : 48 - 54