Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

被引：0

作者：

Gao, Yang ^{[1
]}

Meyer, Christian M. ^{[2
]}

Mesgar, Mohsen ^{[2
]}

Gurevych, Iryna ^{[2
]}

机构：

[1] Royal Holloway Univ London, Dept Comp Sci, London, England

[2] Tech Univ Darmstadt, Ubiquitous Knowledge Proc Lab UKP TUDA, Darmstadt, Germany

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.

引用

页码：2350 / 2356

页数：7

共 50 条

[31] Improved Reward Estimation for Efficient Robot Navigation Using Inverse Reinforcement Learning
Saha, Olimpiya
Dasgupta, Prithviraj
2017 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS (AHS), 2017, : 245 - 252
[32] Similarity versus relatedness: A novel approach in extractive Persian document summarisation
Shafiee, Fatemeh
Shamsfard, Mehrnoush
JOURNAL OF INFORMATION SCIENCE, 2018, 44 (03) : 314 - 330
[33] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte R.T.
Klassen T.Q.
Valenzano R.
McIlraith S.A.
Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
[34] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte, Rodrigo Toro
Klassen, Toryn Q.
Valenzano, Richard
Mcllraith, Sheila A.
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
[35] Extractive Document Summarization Using a Supervised Learning Approach
Charitha, Sangaraju
Chittaragi, Nagaratna B.
Koolagudi, Shashidhar G.
PROCEEDINGS OF 2018 IEEE DISTRIBUTED COMPUTING, VLSI, ELECTRICAL CIRCUITS AND ROBOTICS (DISCOVER), 2018, : 7 - 12
[36] Reinforcement learning and the reward positivity with aversive outcomes
Bauer, Elizabeth A.
Watanabe, Brandon K.
Macnamara, Annmarie
PSYCHOPHYSIOLOGY, 2024, 61 (04)
[37] Reward Certification for Policy Smoothed Reinforcement Learning
Mu, Ronghui
Marcolino, Leandro Soriano
Zhang, Yanghao
Zhang, Tianle
Huang, Xiaowei
Ruan, Wenjie
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21429 - 21437
[38] Reinforcement Learning in Reward-Mixing MDPs
Kwon, Jeongyeol
Efroni, Yonathan
Caramanis, Constantine
Mannor, Shie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[39] Explicable Reward Design for Reinforcement Learning Agents
Devidze, Rati
Radanovic, Goran
Kamalaruban, Parameswaran
Singla, Adish
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[40] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803

← 1 2 3 4 5 →