Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

被引:0
|
作者
Gao, Yang [1 ]
Meyer, Christian M. [2 ]
Mesgar, Mohsen [2 ]
Gurevych, Iryna [2 ]
机构
[1] Royal Holloway Univ London, Dept Comp Sci, London, England
[2] Tech Univ Darmstadt, Ubiquitous Knowledge Proc Lab UKP TUDA, Darmstadt, Germany
来源
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.
引用
收藏
页码:2350 / 2356
页数:7
相关论文
共 50 条
  • [31] Improved Reward Estimation for Efficient Robot Navigation Using Inverse Reinforcement Learning
    Saha, Olimpiya
    Dasgupta, Prithviraj
    2017 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS (AHS), 2017, : 245 - 252
  • [32] Similarity versus relatedness: A novel approach in extractive Persian document summarisation
    Shafiee, Fatemeh
    Shamsfard, Mehrnoush
    JOURNAL OF INFORMATION SCIENCE, 2018, 44 (03) : 314 - 330
  • [33] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte R.T.
    Klassen T.Q.
    Valenzano R.
    McIlraith S.A.
    Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
  • [34] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    Mcllraith, Sheila A.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
  • [35] Extractive Document Summarization Using a Supervised Learning Approach
    Charitha, Sangaraju
    Chittaragi, Nagaratna B.
    Koolagudi, Shashidhar G.
    PROCEEDINGS OF 2018 IEEE DISTRIBUTED COMPUTING, VLSI, ELECTRICAL CIRCUITS AND ROBOTICS (DISCOVER), 2018, : 7 - 12
  • [36] Reinforcement learning and the reward positivity with aversive outcomes
    Bauer, Elizabeth A.
    Watanabe, Brandon K.
    Macnamara, Annmarie
    PSYCHOPHYSIOLOGY, 2024, 61 (04)
  • [37] Reward Certification for Policy Smoothed Reinforcement Learning
    Mu, Ronghui
    Marcolino, Leandro Soriano
    Zhang, Yanghao
    Zhang, Tianle
    Huang, Xiaowei
    Ruan, Wenjie
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21429 - 21437
  • [38] Reinforcement Learning in Reward-Mixing MDPs
    Kwon, Jeongyeol
    Efroni, Yonathan
    Caramanis, Constantine
    Mannor, Shie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [39] Explicable Reward Design for Reinforcement Learning Agents
    Devidze, Rati
    Radanovic, Goran
    Kamalaruban, Parameswaran
    Singla, Adish
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [40] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803