Reinforcement online learning to rank with unbiased reward shaping

被引:0
|
作者
Shengyao Zhuang
Zhihao Qiao
Guido Zuccon
机构
[1] The University of Queensland,
来源
关键词
Online learning to rank; Unbiased reward shaping; Reinforcement learning;
D O I
暂无
中图分类号
学科分类号
摘要
Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users’ interactions, such as clicks. Clicks however are a biased signal: specifically, top-ranked documents are likely to attract more clicks than documents down the ranking (position bias). In this paper, we propose a novel learning algorithm for OLTR that uses reinforcement learning to optimize rankers: Reinforcement Online Learning to Rank (ROLTR). In ROLTR, the gradients of the ranker are estimated based on the rewards assigned to clicked and unclicked documents. In order to de-bias the users’ position bias contained in the reward signals, we introduce unbiased reward shaping functions that exploit inverse propensity scoring for clicked and unclicked documents. The fact that our method can also model unclicked documents provides a further advantage in that less users interactions are required to effectively train a ranker, thus providing gains in efficiency. Empirical evaluation on standard OLTR datasets shows that ROLTR achieves state-of-the-art performance, and provides significantly better user experience than other OLTR approaches. To facilitate the reproducibility of our experiments, we make all experiment code available at https://github.com/ielab/OLTR.
引用
收藏
页码:386 / 413
页数:27
相关论文
共 50 条
  • [21] Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping
    Miranda, Victor R. F.
    Neto, Armando A.
    Freitas, Gustavo M.
    Mozelli, Leonardo A.
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (06) : 6013 - 6020
  • [22] Reward Shaping from Hybrid Systems Models in Reinforcement Learning
    Qian, Marian
    Mitsch, Stefan
    NASA FORMAL METHODS, NFM 2023, 2023, 13903 : 122 - 139
  • [23] Obstacle Avoidance and Navigation Utilizing Reinforcement Learning with Reward Shaping
    Zhang, Daniel
    Bailey, Colleen P.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS II, 2020, 11413
  • [24] Guaranteeing Control Requirements via Reward Shaping in Reinforcement Learning
    De Lellis, Francesco
    Coraggio, Marco
    Russo, Giovanni
    Musolesi, Mirco
    di Bernardo, Mario
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2024, 32 (06) : 2102 - 2113
  • [25] Multi-Objectivization of Reinforcement Learning Problems by Reward Shaping
    Brys, Tim
    Harutyunyan, Anna
    Vrancx, Peter
    Taylor, Matthew E.
    Kudenko, Daniel
    Nowe, Ann
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 2315 - 2322
  • [26] Landmark Based Reward Shaping in Reinforcement Learning with Hidden States
    Demir, Alper
    Cilden, Erkin
    Polat, Faruk
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1922 - 1924
  • [27] Reward Shaping for Model-Based Bayesian Reinforcement Learning
    Kim, Hyeoneun
    Lim, Woosang
    Lee, Kanghoon
    Noh, Yung-Kyun
    Kim, Kee-Eung
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3548 - 3555
  • [28] Graph convolutional recurrent networks for reward shaping in reinforcement learning
    Sami, Hani
    Bentahar, Jamal
    Mourad, Azzam
    Otrok, Hadi
    Damiani, Ernesto
    INFORMATION SCIENCES, 2022, 608 : 63 - 80
  • [29] Optimizing Reinforcement Learning Agents in Games Using Curriculum Learning and Reward Shaping
    Khan, Adil
    Muhammad, Muhammad
    Naeem, Muhammad
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2025, 36 (01)
  • [30] Unbiased Learning to Rank with Unbiased Propensity Estimation
    Ai, Qingyao
    Bi, Keping
    Luo, Cheng
    Guo, Jiafeng
    Croft, W. Bruce
    ACM/SIGIR PROCEEDINGS 2018, 2018, : 385 - 394