Reinforcement online learning to rank with unbiased reward shaping

被引:0
|
作者
Shengyao Zhuang
Zhihao Qiao
Guido Zuccon
机构
[1] The University of Queensland,
来源
关键词
Online learning to rank; Unbiased reward shaping; Reinforcement learning;
D O I
暂无
中图分类号
学科分类号
摘要
Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users’ interactions, such as clicks. Clicks however are a biased signal: specifically, top-ranked documents are likely to attract more clicks than documents down the ranking (position bias). In this paper, we propose a novel learning algorithm for OLTR that uses reinforcement learning to optimize rankers: Reinforcement Online Learning to Rank (ROLTR). In ROLTR, the gradients of the ranker are estimated based on the rewards assigned to clicked and unclicked documents. In order to de-bias the users’ position bias contained in the reward signals, we introduce unbiased reward shaping functions that exploit inverse propensity scoring for clicked and unclicked documents. The fact that our method can also model unclicked documents provides a further advantage in that less users interactions are required to effectively train a ranker, thus providing gains in efficiency. Empirical evaluation on standard OLTR datasets shows that ROLTR achieves state-of-the-art performance, and provides significantly better user experience than other OLTR approaches. To facilitate the reproducibility of our experiments, we make all experiment code available at https://github.com/ielab/OLTR.
引用
收藏
页码:386 / 413
页数:27
相关论文
共 50 条
  • [1] Reinforcement online learning to rank with unbiased reward shaping
    Zhuang, Shengyao
    Qiao, Zhihao
    Zuccon, Guido
    INFORMATION RETRIEVAL JOURNAL, 2022, 25 (04): : 386 - 413
  • [2] Differentiable Unbiased Online Learning to Rank
    Oosterhuis, Harrie
    de Rijke, Maarten
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1293 - 1302
  • [3] Unbiased Learning to Rank: Online or Offline?
    Ai, Qingyao
    Yang, Tao
    Wang, Huazheng
    Mao, Jiaxin
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2021, 39 (02)
  • [4] Belief Reward Shaping in Reinforcement Learning
    Marom, Ofir
    Rosman, Benjamin
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3762 - 3769
  • [5] Reward Shaping in Episodic Reinforcement Learning
    Grzes, Marek
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
  • [6] Multigrid Reinforcement Learning with Reward Shaping
    Grzes, Marek
    Kudenko, Daniel
    ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
  • [7] Unbiased Learning to Rank: Counterfactual and Online Approaches
    Oosterhuis, Harrie
    Jagerman, Rolf
    de Rijke, Maarten
    WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 299 - 300
  • [8] Reward Shaping for Reinforcement Learning by Emotion Expressions
    Hwang, K. S.
    Ling, J. L.
    Chen, Yu-Ying
    Wang, Wei-Han
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 1288 - 1293
  • [9] Hindsight Reward Shaping in Deep Reinforcement Learning
    de Villiers, Byron
    Sabatta, Deon
    2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
  • [10] Reward Shaping Based Federated Reinforcement Learning
    Hu, Yiqiu
    Hua, Yun
    Liu, Wenyan
    Zhu, Jun
    IEEE ACCESS, 2021, 9 : 67259 - 67267