Deep Reinforcement Learning for Web Crawling

被引：4

作者：

Avrachenkov, Konstantin ^{[1
]}

Borkar, Vivek ^{[2
]}

Patil, Kishor ^{[1
]}

机构：

[1] Inria Sophia Antipolis, F-06902 Valbonne, France

[2] Indian Inst Technol, Mumbai 400076, Maharashtra, India

来源：

2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC) | 2021年

关键词：

Reinforcement Learning; Adaptive Web Crawling; Thompson Sampling; Multi-armed Restless Bandits;

D O I：

10.1109/ICC54714.2021.9703160

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

A search engine uses a web crawler to crawl the pages from the world wide web (WWW) and aims to maintain its local cache as fresh as possible. Unfortunately, the rates at which different pages change in WWW are highly non-uniform and also, unknown in many real-life scenarios. In addition, the finite available bandwidth and possible server restrictions on crawling frequency make it very difficult for the crawler to find the optimal scheduling policy that maximises the freshness of the local cache. We model this problem in a multi-armed restless bandits framework, where each arm represents a web page or an aggregate of statistically identical web pages. The objective is to find the scheduling policy that gives the exact indices of the pages to be crawled at a particular instance. We provide an online learning scheme using deep reinforcement learning (DRL) framework which learns the unknown page change dynamics on the fly along with the optimal crawling policy. Finally, we run numerical simulations to compare our approach with state-of-the-art algorithms such as static optimisation and Thompson sampling. We observe better performance for DRL.

引用

页码：201 / 206

页数：6

共 50 条

[41] Ranked Deep Web Page Detection Using Reinforcement Learning and Query Optimization
Madan, Kapil
Bhatia, Rajesh K.
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2021, 17 (04) : 99 - 121
[42] Minimum Web Reinforcement in Deep Beams
Birrcher, David B.
Tuchscherer, Robin G.
Huizinga, Matt
Bayrak, Oguzhan
ACI STRUCTURAL JOURNAL, 2014, 111 (01) : 223 - 224
[43] Deep beams with inclined web reinforcement
KONG FK
ROBINS PJ
KIRBY DP
SHORT DR
1600, (69):
[44] Minimum Web Reinforcement in Deep Beams
Birrcher, David B.
Tuchscherer, Robin G.
Huizinga, Matt
Bayrak, Oguzhan
ACI STRUCTURAL JOURNAL, 2013, 110 (02) : 297 - 306
[45] Minimum web reinforcement in deep beams
Birrcher, D.B., 2013, American Concrete Institute (110)
[46] An adaptive focused Web crawling algorithm based on learning automata
Torkestani, Javad Akbari
APPLIED INTELLIGENCE, 2012, 37 (04) : 586 - 601
[47] An adaptive focused Web crawling algorithm based on learning automata
Javad Akbari Torkestani
Applied Intelligence, 2012, 37 : 586 - 601
[48] OXPath: A language for scalable data extraction, automation, and crawling on the deep web
Tim Furche
Georg Gottlob
Giovanni Grasso
Christian Schallhart
Andrew Sellers
The VLDB Journal, 2013, 22 : 47 - 72
[49] Crawling the Deep Web Using Asynchronous Advantage Actor Critic Technique
Madan, Kapil
Bhatia, Rajesh
JOURNAL OF WEB ENGINEERING, 2021, 20 (03): : 879 - 902
[50] Transfer Learning in Deep Reinforcement Learning
Islam, Tariqul
Abid, Dm. Mehedi Hasan
Rahman, Tanvir
Zaman, Zahura
Mia, Kausar
Hossain, Ramim
PROCEEDINGS OF SEVENTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, ICICT 2022, VOL 1, 2023, 447 : 145 - 153

← 1 2 3 4 5 →