Novel data-driven two-dimensional Q-learning for optimal tracking control of batch process with unknown dynamics

被引：18

作者：

Wen, Xin ^{[1
]}

Shi, Huiyuan ^{[1
,2
,3
]}

Su, Chengli ^{[1
,4
,7
]}

Jiang, Xueying ^{[5
]}

Li, Ping ^{[1
,4
]}

Yu, Jingxian ^{[6
]}

机构：

[1] Liaoning Petrochem Univ, Sch Informat & Control Engn, Fushun, Peoples R China

[2] Northwestern Polytech Univ, Sch Automat, Xian, Peoples R China

[3] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang, Peoples R China

[4] Univ Sci & Technol Liaoning, Sch Elect & Informat Engn, Anshan, Peoples R China

[5] Northeastern Univ, Sch Informat Sci & Engn, Shenyang, Peoples R China

[6] Liaoning Petrochem Univ, Sch Sci, Fushun, Peoples R China

[7] Liaoning Petrochem Univ, Sch Informat & Control Engn, Fushun 113001, Peoples R China

来源：

ISA TRANSACTIONS | 2022年 / 125卷

基金：

中国国家自然科学基金;

关键词：

Batchprocess; Data-driven; 2Doff-policyQ-learning; Optimaltrackingcontrol; Injectionmolding; MODEL PREDICTIVE CONTROL; FAULT-TOLERANT CONTROL; STATE DELAY; DESIGN; FEEDBACK;

D O I：

10.1016/j.isatra.2021.06.007

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In view that the previous control methods usually rely too much on the models of batch process and have difficulty in a practical batch process with unknown dynamics, a novel data-driven twodimensional (2D) off-policy Q-learning approach for optimal tracking control (OTC) is proposed to make the batch process obtain a model-free control law. Firstly, an extended state space equation composing of the state and output error is established for ensuring tracking performance of the designed controller. Secondly, the behavior policy of generating data and the target policy of optimization as well as learning is introduced based on this extended system. Then, the Bellman equation independent of model parameters is given via analyzing the relation between 2D value function and 2D Q-function. The measured data along the batch and time directions of batch process are just taken to carry out the policy iteration, which can figure out the optimal control problem despite lacking systematic dynamic information. The unbiasedness and convergence of the designed 2D off-policy Q-learning algorithm are proved. Finally, a simulation case for injection molding process manifests that control effect and tracking effect gradually become better with the increasing number of batches.(c) 2021 ISA. Published by Elsevier Ltd. All rights reserved.

引用

页码：10 / 21

页数：12

共 50 条

[31] Q-learning optimal state estimation and control for discrete systems with unknown parameters
Li J.-N.
Ma S.-K.
Kongzhi yu Juece/Control and Decision, 2021, 35 (12): : 2889 - 2897
[32] Data-driven optimal cooperative adaptive cruise control of heterogeneous vehicle platoons with unknown dynamics
Xiulan Song
Feng Ding
Feng Xiao
Defeng He
Science China Information Sciences, 2020, 63
[33] Data-driven optimal cooperative adaptive cruise control of heterogeneous vehicle platoons with unknown dynamics
Xiulan SONG
Feng DING
Feng XIAO
Defeng HE
ScienceChina(InformationSciences), 2020, 63 (09) : 109 - 120
[34] Data-driven constrained reinforcement learning for optimal control of a multistage evaporation process
Yao, Yao
Ding, Jinliang
Zhao, Chunhui
Wang, Yonggang
Chai, Tianyou
CONTROL ENGINEERING PRACTICE, 2022, 129
[35] Data-driven optimal cooperative adaptive cruise control of heterogeneous vehicle platoons with unknown dynamics
Song, Xiulan
Ding, Feng
Xiao, Feng
He, Defeng
SCIENCE CHINA-INFORMATION SCIENCES, 2020, 63 (09)
[36] Data-driven control for stochastic linear-quadratic optimal problem with completely unknown dynamics
Chen, Yanlin
Lan, Weiyao
INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2025,
[37] Explainable data-driven Q-learning control for a class of discrete-time linear autonomous systems
Perrusquia, Adolfo
Zou, Mengbang
Guo, Weisi
INFORMATION SCIENCES, 2024, 682
[38] Data-Driven Adaptive Tracking Control of Unknown Autonomous Marine Vehicles
Weng, Yongpeng
Wang, Ning
Qin, Hongde
Karimi, Hamid Reza
Qi, Wenhai
IEEE ACCESS, 2018, 6 : 55723 - 55730
[39] A comparison of learning performance in two-dimensional Q-learning by the difference of Q-values alignment
Aung, Kathy Thi
Fuchida, Takayasu
PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 597 - 600
[40] Data-driven approximate Q-learning stabilization with optimality error bound analysis
Li, Yongqiang
Yang, Chengzan
Hou, Zhongsheng
Feng, Yuanjing
Yin, Chenkun
AUTOMATICA, 2019, 103 (435-442) : 435 - 442

← 1 2 3 4 5 →