TY - JOUR
T1 - A novel sim2real reinforcement learning algorithm for process control
AU - Liang, Huiping
AU - Xie, Junyao
AU - Huang, Biao
AU - Li, Yonggang
AU - Sun, Bei
AU - Yang, Chunhua
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2025/2
Y1 - 2025/2
N2 - While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.
AB - While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.
KW - Fix-horizon return
KW - Industrial roasting process
KW - Model-plant mismatch
KW - Process control
KW - Reinforcement learning
UR - https://www.scopus.com/pages/publications/85209252040
U2 - 10.1016/j.ress.2024.110639
DO - 10.1016/j.ress.2024.110639
M3 - Article
AN - SCOPUS:85209252040
SN - 0951-8320
VL - 254
JO - Reliability Engineering and System Safety
JF - Reliability Engineering and System Safety
M1 - 110639
ER -