Adaptive Composite Fixed-Time RL-Optimized Control for Nonlinear Systems and Its Application to Intelligent Ship Autopilot

Nonlinear Fixed-Time Reinforcement Learning Optimized Control for Intelligent Ship Autopilots

In recent years, intelligent autopilot technology has gradually become a research hotspot in the field of automation control. For complex nonlinear systems, the design of optimized control strategies, especially the achievement of system stability and performance optimization within a fixed time, represents a major challenge for control engineers and researchers. However, existing fixed-time control theory often overlooks the trade-off between resource utilization efficiency and performance during state convergence, which may lead to over-compensation or under-compensation phenomena, thus increasing the steady-state error of the system. Moreover, there is still a scarcity of studies addressing the minimization of estimation errors for nonlinear uncertainties within time constraints. This study aims to present an adaptive composite fixed-time reinforcement learning optimized control solution to address this critical issue.

Research Background and Objectives

Since its inception, fixed-time control theory has gained wide attention due to its independence from initial conditions concerning convergence time. Compared to finite-time control methods, fixed-time control reduces dependency on initial states. However, while existing research has addressed optimal control problems for nonlinear systems within finite timeframes, most of these focus on affine nonlinear systems rather than strict-feedback systems. Additionally, although neural network (NN) techniques have been widely applied to deal with nonlinear uncertainties due to their strong learning and approximation capabilities, improving their estimation accuracy and reducing system error in practical applications remains an important unresolved issue.

Against this backdrop, this study was conducted collaboratively by several researchers, including Siwen Liu and Yi Zuo from the Navigation College of Dalian Maritime University, Tieshan Li and Xiaoyang Gao from the School of Automation Engineering at the University of Electronic Science and Technology of China and its Yangtze Delta Research Institute, Huanqing Wang from the College of Mathematical Sciences at Bohai University, and Yang Xiao from the Department of Computer Science at the University of Alabama. The paper was published in the January 2025 issue of IEEE Transactions on Artificial Intelligence and was supported by the National Natural Science Foundation of China (grant numbers 51939001, 61976033, 62173046, and 52301418).

Research Process and Approach

Research Design

This study, based on strict-feedback systems, proposes an adaptive composite fixed-time reinforcement learning optimized control strategy to address the issue of nonlinear uncertainties. The main steps of the research process are outlined as follows:

  1. Problem Modeling:
    Nonlinear systems are represented in a strict-feedback structure, with the state equations as:
    [ \dot{x}i(t) = x{i+1}(t) + f_i(\overline{x}_i(t)),\quad y(t) = x_1(t) ]
    Here, the system state is ( x \in \mathbb{R}^n ). The authors define the tracking error ( z_i ) and formulate the problem of achieving error convergence within a fixed time.

  2. Constructing the Approximation Model:
    Radial basis function neural networks (RBFNNs) are used to model the uncertain target function ( f_i ), with the approximation relationship expressed as:
    [ f(x) \approx W^T S(x) + \epsilon ]
    where ( W ) is the weight matrix to be trained, ( S(x) ) is the Gaussian basis function, and the error term ( \epsilon ) satisfies theoretical constraints.

  3. Introducing a Fixed-Time Smooth Estimation System:
    To improve the performance of the neural network, the authors designed a new composite adaptive update rule, including an adaptive weight adjustment parameter ( \dot{\hat{\theta}}_i ) and a tracking error predictive feedback mechanism. This significantly enhances the stability and precision of RBFNN weight estimation.

  4. Designing the Reinforcement Learning Control Strategy:
    The study leverages a reinforcement learning (RL)-based critic-actor architecture. The critic approximates the minimization of the Hamilton-Jacobi-Bellman (HJB) equation, while the actor realizes the optimized control law. Feedback weight updating and event-triggered mechanisms are used to further balance system performance and computation resources.

  5. Analysis of Algorithm Stability:
    Using Lyapunov functions, the authors rigorously prove the stability of the proposed controller and the convergence of errors, concluding that the error converges to a controllable region near zero within a fixed time.

  6. Simulation Verification:
    Finally, the proposed algorithm’s effectiveness and practicality are validated through numerical simulation experiments on the problem of intelligent ship autopilot.

Highlights of the Methodology

a) Introduced a fixed-time smooth estimation system to fundamentally improve approximation performance;
b) Designed robust update rules in the critic-actor architecture, achieving optimal weight learning through fixed-time parameter control;
c) Established a dual-feedback regulation mechanism to avoid singularity issues in the derivative of the indirect controller;
d) Demonstrated potential applications in multi-agent systems with fixed-time control problems.

Major Results and Analysis

Modeling and Optimization Results

The study revealed the following key findings regarding the stability analysis of the fixed-time tracking error dynamic equations:

  • For the performance function ( J(x(0), u(x(0))) ), the authors successfully derived a unique optimized control law ( u^*(x) ) from the optimized Hamilton-Jacobi-Bellman equation.
  • Lyapunov function derivations clearly demonstrated that error variables ( z_i, \chi_i ), etc., converge to a controllable region near the origin within a fixed time ( T_s ).

Simulation Experiments

In the intelligent ship autopilot simulation experiment, the proposed algorithm was applied to control the heading angle of the ship. The results showed:

  • System Response Curves (e.g., ( x_1(t) ) and the reference trajectory ( y_r(t) )): The tracking error quickly converges to zero, significantly improving performance.
  • Performance Function Convergence: Through the optimized path, cost functions ( c_1 ) and ( c_2 ) exhibit fast convergence, highlighting efficient resource utilization.

Research Conclusions and Value

Significance of the Study

  1. Theoretical Perspective:
    This study fills a gap in the field of fixed-time control methods in composite adaptive optimization, providing an important reference for nonlinear control theory.

  2. Practical Applications:
    The proposed method demonstrates significant potential in the intelligent shipping domain and could also be widely applied in future endeavors such as multi-agent robot collaboration and autonomous vehicles.

Research Highlights

  • The innovative fixed-time smooth estimation system effectively reduces neural network approximation errors, providing an efficient tool for related fields;
  • The seamless integration of reinforcement learning and composite control strategies demonstrates its reliability and practicality in addressing nonlinear system uncertainties;
  • Rigorous stability analysis clearly defines the robustness and applicability of the method.

This study not only holds considerable academic value in the field of intelligent autonomous control but also showcases its potential for impactful real-world engineering applications.