Model-based offline reinforcement learning for sustainable fishery management

被引：0

作者：

Ju, Jun ^{[1
,3
]}

Kurniawati, Hanna ^{[2
]}

Kroese, Dirk ^{[1
]}

Ye, Nan ^{[1
,3
]}

机构：

[1] Univ Queensland, Sch Math & Phys, St Lucia, Qld, Australia

[2] Australian Natl Univ, Sch Comp, Canberra, ACT, Australia

[3] Univ Queensland, Sch Math & Phys, St Lucia, Qld 4072, Australia

来源：

EXPERT SYSTEMS | 2025年 / 42卷 / 01期

基金：

澳大利亚研究理事会;

关键词：

Beverton-Holt model; fishery management; incomplete data; model misspecification; offline reinforcement learning; POMDP; Schaefer model; ADAPTIVE MANAGEMENT; DECISION; UNCERTAINTY; INFERENCE;

D O I：

10.1111/exsy.13324

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fisheries, as indispensable natural resources for human, need to be managed with both short-term economical benefits and long-term sustainability in consideration. This has remained a challenge, because the population and catch dynamics of the fisheries are complex and noisy, while the data available is often scarce and only provides partial information on the dynamics. To address these challenges, we formulate the population and catch dynamics as a Partially Observable Markov Decision Process (POMDP), and propose a model-based offline reinforcement learning approach to learn an optimal management policy. Our approach allows learning fishery management policies from possibly incomplete fishery data generated by a stochastic fishery system. This involves first learning a POMDP fishery model using a novel least squares approach, and then computing the optimal policy for the learned POMDP. The learned fishery dynamics model is useful for explaining the resulting policy's performance. We perform systematic and comprehensive simulation study to quantify the effects of stochasticity in fishery dynamics, proliferation rates, missing values in fishery data, dynamics model misspecification, and variability of effort (e.g., the number of boat days). When the effort is sufficiently variable and the noise is moderate, our method can produce a competitive policy that achieves 85% of the optimal value, even for the hardest case of noisy incomplete data and a misspecified model. Interestingly, the learned policies seem to be robust in the presence of model learning errors. However, non-identifiability kicks in if there is insufficient variability in the effort level and the fishery system is stochastic. This often results in poor policies, highlighting the need for sufficiently informative data. We also provide a theoretical analysis on model misspecification and discuss the tendency of a Schaefer model to overfit compared with a Beverton-Holt model.

引用

页数：28

共 50 条

[41] Incremental model-based reinforcement learning with model constraint
Yang, Zhiyou
Fu, Mingsheng
Qu, Hong
Li, Fan
Shi, Shuqing
Hu, Wang
NEURAL NETWORKS, 2025, 185
[42] Model-based reinforcement learning with dimension reduction
Tangkaratt, Voot
Morimoto, Jun
Sugiyama, Masashi
NEURAL NETWORKS, 2016, 84 : 1 - 16
[43] On Effective Scheduling of Model-based Reinforcement Learning
Lai, Hang
Shen, Jian
Zhang, Weinan
Huang, Yimin
Zhang, Xing
Tang, Ruiming
Yu, Yong
Li, Zhenguo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[44] Objective Mismatch in Model-based Reinforcement Learning
Lambert, Nathan
Amos, Brandon
Yadan, Omry
Calandra, Roberto
LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 761 - 770
[45] Transferring Instances for Model-Based Reinforcement Learning
Taylor, Matthew E.
Jong, Nicholas K.
Stone, Peter
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 488 - 505
[46] A comparison of direct and model-based reinforcement learning
Atkeson, CG
Santamaria, JC
1997 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION - PROCEEDINGS, VOLS 1-4, 1997, : 3557 - 3564
[47] Modeling Survival in model-based Reinforcement Learning
Moazami, Saeed
Doerschuk, Peggy
2020 SECOND INTERNATIONAL CONFERENCE ON TRANSDISCIPLINARY AI (TRANSAI 2020), 2020, : 17 - 24
[48] Adaptive Discretization for Model-Based Reinforcement Learning
Sinclair, Sean R.
Wang, Tianyu
Jain, Gauri
Banerjee, Siddhartha
Yu, Christina Lee
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
[49] Model-based average reward reinforcement learning
Tadepalli, P
Ok, D
ARTIFICIAL INTELLIGENCE, 1998, 100 (1-2) : 177 - 224
[50] Continual Model-Based Reinforcement Learning with Hypernetworks
Huang, Yizhou
Xie, Kevin
Bharadhwaj, Homanga
Shkurti, Florian
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 799 - 805

← 1 2 3 4 5 →