Discounted Stable Adaptive Critic Design for Zero-Sum Games with Application Verifications

Discounted Adaptive Critic Design with Application Verification in Zero-Sum Games

Research Background

In the field of control, optimal control is a core research direction aimed at designing and analyzing control systems to optimize system performance. As system complexity increases, traditional optimal control methods based on the Hamilton-Jacobi-Bellman (HJB) equation face the “curse of dimensionality” problem. To address this challenge, researchers have proposed adaptive dynamic programming (ADP) methods, which combine reinforcement learning and function approximation techniques to effectively enhance the control capabilities of complex systems.

Zero-sum games are an important research direction in optimal control, often used to handle dynamic systems with adversarial properties. The core goal of zero-sum games is to optimize system performance by designing control policy pairs while mitigating the negative effects of adversarial disturbances. However, traditional value iteration methods cannot guarantee the admissibility of policy pairs during the iteration process, and the introduction of a discount factor may lead to system stability issues, which pose a significant challenge in current research.

To address this, this paper proposes a discounted adaptive critic design method based on discounted value iteration to solve the optimal control problem in discrete-time zero-sum games while ensuring the asymptotic stability of the system. The innovations of this paper are as follows: 1) a discounted value iteration algorithm suitable for nonlinear and linear discrete-time systems is proposed; 2) the impact of the discount factor on system stability is thoroughly investigated; 3) the effectiveness of the proposed method is validated through practical applications in power systems and ball-beam systems.

Research Team and Publication Information

This paper was co-authored by Jin Ren, Ding Wang, Menghua Li, and Junfei Qiao from the School of Information Science and Technology, Beijing University of Technology, and was published in the IEEE Transactions on Automation Science and Engineering journal in 2025. The research was supported by the National Natural Science Foundation of China, the National Key R&D Program of China, and the Beijing Natural Science Foundation.

Research Methods and Technical Details

Problem Description

The nonlinear discrete-time system model considered in this paper is as follows:

[ x_{k+1} = f(x_k, u_k, \omega_k), \quad k \in \mathbb{N} ]

where ( x_k ) is the system state, ( u_k ) is the control input, and ( \omega_k ) is the disturbance input. The goal of the system is to design a control policy pair ( (u_k, \omega_k) ) to optimize system performance under adversarial disturbances.

Discounted Value Iteration Algorithm

To solve the zero-sum game problem, this paper proposes an adaptive critic design method based on discounted value iteration. First, the initial cost function ( v_0(x_k) ) and the initial policy pair ( (u_0(x_k), \omega_0(x_k)) ) are defined. Then, iterative optimization is performed through the following steps:

  1. Policy Evaluation: Update the cost function ( v_{i+1}(x_k) ) based on the current policy pair.
  2. Policy Improvement: Optimize the control policy ( u_i(x_k) ) and the disturbance policy ( \omega_i(x_k) ) based on the updated cost function.

Through continuous iteration, the policy pair gradually converges to approximate the optimal policy pair ( (u^(x_k), \omega^(x_k)) ).

Stability Analysis

In zero-sum games, the selection of the discount factor has a significant impact on system stability. Through theoretical analysis, this paper proposes the range of the discount factor and the conditions for ensuring system stability. Specifically, the system is asymptotically stable under the control of the policy pair when the following condition is met:

[ \gamma \in (\max{0, \gamma_{\min}}, 1] ]

where ( \gamma_{\min} = 1 - u(x_k, u_i(x_k), \omega_i(x_k)) / v_i(x_k) ).

Special Case of Linear Systems

For linear systems, this paper further explores the discounted value iteration algorithm and its stability analysis. Through the game algebraic Riccati equation (GARE), this paper proposes policy evaluation and policy improvement methods for linear systems and provides conditions for selecting the discount factor to ensure system stability.

Experimental Results and Verification

Power System

First, this paper uses a power system as the experimental object to validate the effectiveness of the proposed method in linear systems. The experimental results show that the system state gradually converges to the equilibrium point, and the iterative cost function and policy pair converge to the optimal values.

Ball-Beam System

Next, this paper uses a ball-beam system as the experimental object for nonlinear systems. The experimental results demonstrate that the proposed method ensures the asymptotic stability of the system, and the obtained policy pair is admissible.

Conclusion and Contributions

This paper proposes a discounted adaptive critic design method based on discounted value iteration, effectively solving the optimal control problem in discrete-time zero-sum games while ensuring the asymptotic stability of the system. Through theoretical analysis and experimental verification, this paper provides important insights into the selection of the discount factor and the assurance of system stability, offering new ideas for the optimal design of complex control systems.

Research Highlights

  1. Innovation: A discounted value iteration algorithm suitable for nonlinear and linear systems is proposed.
  2. Theoretical Contribution: The impact of the discount factor on system stability is thoroughly investigated, and conditions for selecting the discount factor are provided.
  3. Practical Value: The effectiveness and practicality of the proposed method are validated through experiments on power systems and ball-beam systems.

Future Outlook

Future research will explore how to determine the appropriate range of the discount factor and ensure the asymptotic stability of the system when the system model is unknown. Additionally, the proposed method can be further extended to other complex control systems, such as smart grids and robotic control, with broad application prospects.