TY - JOUR
T1 - Online reinforcement learning for condition-based group maintenance using factored Markov decision processes
AU - Xu, Jianyu
AU - Liu, Bin
AU - Zhao, Xiujie
AU - Wang, Xiao Lin
N1 - Publisher Copyright:
© 2023 The Author(s)
PY - 2024/5/16
Y1 - 2024/5/16
N2 - We investigate a condition-based group maintenance problem for multi-component systems, where the degradation process of a specific component is affected only by its neighbouring ones, leading to a special type of stochastic dependence among components. We formulate the maintenance problem into a factored Markov decision process taking advantage of this dependence property, and develop a factored value iteration algorithm to efficiently approximate the optimal policy. Through both theoretical analyses and numerical experiments, we show that the algorithm can significantly reduce computational burden and improve efficiency in solving the optimization problem. Moreover, since model parameters are unknown a priori in most practical scenarios, we further develop an online reinforcement learning algorithm to simultaneously learn the model parameters and determine an optimal maintenance action upon each inspection. A novel feature of this online learning algorithm is that it is capable of learning both transition probabilities and system structure indicating the stochastic dependence among components. We discuss the error bound and sample complexity of the developed learning algorithm theoretically, and test its performance through numerical experiments. The results reveal that our algorithm can effectively learn the model parameters and approximate the optimal maintenance policy.
AB - We investigate a condition-based group maintenance problem for multi-component systems, where the degradation process of a specific component is affected only by its neighbouring ones, leading to a special type of stochastic dependence among components. We formulate the maintenance problem into a factored Markov decision process taking advantage of this dependence property, and develop a factored value iteration algorithm to efficiently approximate the optimal policy. Through both theoretical analyses and numerical experiments, we show that the algorithm can significantly reduce computational burden and improve efficiency in solving the optimization problem. Moreover, since model parameters are unknown a priori in most practical scenarios, we further develop an online reinforcement learning algorithm to simultaneously learn the model parameters and determine an optimal maintenance action upon each inspection. A novel feature of this online learning algorithm is that it is capable of learning both transition probabilities and system structure indicating the stochastic dependence among components. We discuss the error bound and sample complexity of the developed learning algorithm theoretically, and test its performance through numerical experiments. The results reveal that our algorithm can effectively learn the model parameters and approximate the optimal maintenance policy.
KW - Condition-based group maintenance
KW - Factored Markov decision process
KW - Factored value iteration
KW - Maintenance
KW - Online reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85178145416&partnerID=8YFLogxK
U2 - 10.1016/j.ejor.2023.11.039
DO - 10.1016/j.ejor.2023.11.039
M3 - Article
AN - SCOPUS:85178145416
SN - 0377-2217
VL - 315
SP - 176
EP - 190
JO - European Journal of Operational Research
JF - European Journal of Operational Research
IS - 1
ER -