A novel sim2real reinforcement learning algorithm for process control

Huiping Liang, Junyao Xie, Biao Huang, Yonggang Li, Bei Sun*, Chunhua Yang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.

Original languageEnglish
Article number110639
JournalReliability Engineering and System Safety
Volume254
DOIs
Publication statusPublished - Feb 2025
Externally publishedYes

Keywords

  • Fix-horizon return
  • Industrial roasting process
  • Model-plant mismatch
  • Process control
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'A novel sim2real reinforcement learning algorithm for process control'. Together they form a unique fingerprint.

Cite this