Advanced Optimal Tracking Integrating a Neural Critic Technique for Asymmetric Constrained Zero-Sum Games

Academic Report: Advanced Optimal Tracking Integrating Neural Critic Technique for Asymmetric Constrained Zero-Sum Games

Background and Research Problem

In the field of modern control, game theory is the mathematical model that studies the competition and cooperation between intelligent decision-makers, involving an interaction decision problem with at least two players. In recent years, differential games have attracted increasing attention in the control field. When faced with the optimal control problem of complex disturbed systems, it is often regarded as a Zero-Sum Game (ZSG). If the control problem of a system involves multiple control strategies and no disturbances, it is referred to as a Non-Zero-Sum Game (Non-ZSG). However, due to the presence of various disturbances in real systems, it is essential to consider ZSG problems further to mitigate the impact of disturbances on system performance.

Especially in Continuous-Time (CT) nonlinear systems, traditional dynamic programming methods, despite their significant value, often face challenges due to the curse of dimensionality when solving nonlinear optimal control problems. To address this difficulty, Werbos proposed Adaptive Dynamic Programming (ADP) in 1974, which is based on dynamic programming, neural networks, and reinforcement learning. It is an efficient and powerful intelligent optimization tool. Therefore, this paper uses neural critic techniques, i.e., ADP, to study the tracking control problem of CT nonlinear systems under asymmetric constraints in the context of zero-sum games.

Source and Author Information

This research paper, titled “Advanced Optimal Tracking Integrating a Neural Critic Technique for Asymmetric Constrained Zero-Sum Games,” is authored by Menghua Li, Ding Wang, Jin Ren, and Junfei Qiao from the College of Information Technology at Beijing University of Technology, and is affiliated with the Beijing Computational Intelligence and Intelligent Systems Laboratory, the Beijing Artificial Intelligence Research Institute, and the Beijing Smart Environmental Protection Laboratory. The paper will be published online in Neural Networks journal on May 15, 2024.

Workflow

First, the study proposes an improved algorithm to solve the tracking control problem in CT nonlinear multi-player zero-sum games. A new non-quadratic function is designed to address asymmetric constraint problems, reducing strict requirements on the control matrix. The study further derives the optimal control, worst disturbance, and tracking Hamilton-Jacobi-Isaacs (HJI) equation. Next, a neural critic network is constructed to estimate the optimal cost function to obtain approximations for the optimal control and worst disturbance. Finally, based on the Lyapunov method, the stability of the tracking error and the weight estimation error of the critic network is analyzed.

Research Steps

  1. Construct Nonlinear System Model: Set system state variables, control inputs, and external disturbances. Define the desired trajectory generated by the reference system and describe the tracking error dynamics of the system by introducing a tracking error vector.

  2. Solve HJI Equation: Using the Bellman optimality principle, derive the system’s tracking HJI equation. Obtain the optimal control and worst disturbance via the stationarity principle.

  3. Implement Neural Critic Technique for Tracking Control: Due to the difficulty of solving the HJI equation in high dimensions, the study chooses to use a neural critic technique. An evaluation network is built, and through weight update rules, approximations for optimal control and worst disturbance are obtained.

  4. Stability Analysis: Using the Lyapunov candidate function method, prove that the system is stable in the sense of uniformly ultimately bounded (UUB).

  5. Simulation Examples Verification: Two examples, an inverted pendulum system and a four-player CT nonlinear system, are used to validate the effectiveness of the proposed tracking control scheme.

Research Results

Through weight training and simulation verification, this study achieves several significant results:

  1. Weight Convergence: By training the evaluation network, the weights converge over time, ensuring that the approximate optimal control correctly reflects the system state.

  2. Tracking Error Convergence: Simulation experiments show that the tracking error quickly converges to zero, validating the effectiveness of the proposed method in different disturbance environments.

  3. Disturbance Resistance: The system demonstrates strong disturbance resistance, as the tracking error can quickly recover even after the introduction of disturbance signals.

Conclusion and Significance

This paper proposes an effective method for solving the tracking control problem of CT nonlinear asymmetric constrained zero-sum games using neural critic techniques. By relaxing strict requirements on the control matrix, the method extends the applicability of the algorithm and can effectively control without requiring the reference trajectory to ultimately converge to zero. The study not only provides a new theoretical method but also helps address a broader range of situations in practical applications.

Research Highlights

  1. Innovative Algorithm: The proposed algorithm relaxes previous constraints on the control matrix, enabling effective operation in a broader range of applications.

  2. Application of Neural Critic Techniques: By approximating optimal control using neural networks, the method better handles the curse of dimensionality, achieving efficient control strategies.

  3. Validation in Multiple Application Scenarios: The proposed algorithm’s broad applicability and effectiveness are validated through simulation examples involving an inverted pendulum system and a four-player system.