Optimal Control of Stochastic Markovian Jump Systems with Wiener and Poisson Noises: Two Reinforcement Learning Approaches

2025-01-07 Tue
Stochastic Markovian Jump Systems Wiener Noise Poisson Noise Reinforcement Learning Optimal Control Policy Iteration Algorithm Numerical Verification
Optimal Control of Stochastic Markovian Jump Systems with Wiener and Poisson Noises: Two Reinforcement Learning ApproachesAcademic ContextIn modern control theory, optimal control is a crucial research field, aiming to design an optimal control strategy under various constraints for dynamic systems to minimize a given cost function. For stochastic systems, traditional optimal control methods typically require full system model information, which poses significant limitations in practical applications. Recently, reinforcement learning (RL), as a model-free approach, has emerged as a vital tool for solving optimal control problems. RL, by learning directly from data, can derive optimal value functions and policies, and policy iteration methods enable continuous performance improvement.
Stochastic Markovian Jump Systems (SMJS) are a significant class of stochastic system models widely applied in finance, engineering, and other fields. However, SMJS are often influenced by various noises, particularly Wiener noise and Poisson noise. Wiener noise models continuous noise, while Poisson noise models abrupt events (e.g., natural disasters, equipment breakdowns). Due to the complexity introduced by these two noise types, traditional control methods struggle to handle them effectively. Hence, it is of great theoretical and practical importance to study the design of optimal control strategies under the combined influence of Wiener and Poisson noises.
This paper, co-authored by Zhiguo Yan, Tingkun Sun, and Guolin Hu, was published in the December 2024 issue of the IEEE Transactions on Artificial Intelligence. The authors proposed two novel policy iteration algorithms to address the optimal control problem for SMJS affected by Wiener and Poisson noises, and validated their effectiveness and convergence through numerical experiments.
Research ContentResearch ProcessThe study’s procedure comprises the following key steps:
Problem Definition and System Modeling:
The paper defines the SMJS model influenced by Wiener and Poisson noises and presents the system’s state equations and cost function. The state equations of the system are as follows:
[
dx(t) = [A_1(\delta_t)x(t) + B_1(\delta_t)u(t)]dt + [A_2(\delta_t)x(t) + B_2(\delta_t)u(t)]dw(t) + [A_3(\delta_t)x(t) + B_3(\delta_t)u(t)]dp(t)
]
where (x(t)) is the system state, (u(t)) is the control input, (w(t)) is the Wiener process, (p(t)) is the Poisson process, and (\delta_t) is the Markov jump process.
Design of Policy Iteration Algorithms:
The authors introduced two new policy iteration algorithms based on Integral Reinforcement Learning (IRL) and the Subsystems Transformation Technique (ST). The crux of these algorithms is to iteratively update policies and value functions to approximate the optimal control strategy without directly solving the complex Stochastic Coupled Algebraic Riccati Equation (SCARE).
Algorithm 1: A policy iteration algorithm based on IRL and ST techniques. This algorithm iteratively updates policies and value functions to converge to the optimal solution. Its convergence is rigorously proven.
Algorithm 2: An improved policy iteration algorithm. This algorithm does not rely on the Poisson jump intensity (\lambda) and flexibly updates policy improvements, depending solely on the system’s state trajectory information.
Numerical Experiments and Validation:
The authors conducted numerical experiments to validate the effectiveness and convergence of the proposed algorithms. The results demonstrated that both algorithms effectively solve the optimal control problem for SMJS and exhibit robustness under varying Poisson jump intensity (\lambda).
Key ResultsResults of Algorithm 1:
Using Algorithm 1, the authors derived the optimal control strategy and value functions for the system. The experimental results indicated that Algorithm 1 can effectively approximate the optimal solution and demonstrated good convergence under varying (\lambda) values.
Results of Algorithm 2:
Algorithm 2 also exhibited good convergence and did not depend on changes in (\lambda). Experimental results showed that Algorithm 2 could effectively solve the optimal control problem under different (\lambda) values.
Impact of Poisson Jump Intensity (\lambda):
The study explored the effect of (\lambda) on the convergence and equation error of the algorithms. The results revealed that while the accuracy of convergence decreases with increasing (\lambda), the algorithms remain effective within a wide range of (\lambda).
ConclusionThis paper studied the infinite horizon optimal control problem for SMJS under Wiener and Poisson noises and proposed two novel policy iteration algorithms using RL approaches. These algorithms can obtain the optimal solution without directly solving the complex SCARE equation and rely solely on the system’s state trajectory information. The experimental results demonstrate that the proposed algorithms exhibit good convergence and robustness under varying (\lambda) values. Additionally, the study of the influence of Poisson jump intensity on convergence and equation error provides valuable insights for further exploration.
Research HighlightsHandling Complex Noise Models:
The paper is the first to simultaneously incorporate Wiener and Poisson noises into the optimal control problem for SMJS, proposing policy iteration algorithms suitable for complex noise environments.
Model-Free Strategy:
The proposed algorithms do not require complete system model information and rely solely on state trajectory information, making them highly beneficial for practical applications.
Flexible Algorithm Design:
Algorithm 2 adopts a flexible policy improvement approach that is independent of the Poisson jump intensity (\lambda), broadening its application scope.
Research Significance and ValueThe research carries significant theoretical and practical implications. Theoretically, the proposed policy iteration algorithms offer new solutions for complex noise environments in optimal control problems, enriching the applications of RL in control theory. Practically, the algorithms can be applied to areas such as risk control in financial markets and fault diagnosis in engineering systems, showcasing extensive practical potential.
Through innovative algorithm design and rigorous experimental validation, this study provides effective solutions for SMJS’ optimal control problem under Wiener and Poisson noises, contributing significantly to both academic research and real-world applications.