Abstract
Operation optimization plays a crucial role in process control, directly influencing product quality and profitability. Reinforcement learning (RL), with its capabilities in autonomous learning and dynamic adaptability, has become a promising solution in this domain. However, its real-world application is constrained by the high costs and risks associated with its interactions with environments. Offline RL, which leverages fixed datasets without interactions, offers an alternative but faces significant challenges in the process industry due to imbalanced multioperating condition scenarios and heightened safety sensitivity. To address these challenges, this article introduces a novel offline actor-critic algorithm with expert knowledge guidance (EKG-AC). The method begins with a diffusion-transformer-based action generation framework that mitigates the out-of-distribution problem by capturing the evolution of decision sequences and the interdependencies between states and actions. An expert knowledge guidance mechanism is then integrated, steering the model to generate safe and adaptive candidate actions aligned with current operating conditions and expert knowledge. Subsequently, within the actor-critic framework, the optimal action is selected from the candidate pool based on the evaluated Q-value, thereby setting the operational variables for the optimization task. The proposed algorithm is validated through two real-world industrial processes, demonstrating superior optimization performance and behavior that is closely aligned with expert decision-making, underscoring its substantial practical value.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Cybernetics |
| DOIs | |
| Publication status | Accepted/In press - 2025 |
| Externally published | Yes |
Keywords
- Diffusion transformer
- expert knowledge guidance
- offline reinforcement learning (RL)
- process industrial optimization
Fingerprint
Dive into the research topics of 'EKG-AC: A New Paradigm for Process Industrial Optimization Based on Offline Reinforcement Learning With Expert Knowledge Guidance'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver