Effective Probabilistic Neural Networks Model for Model-Based Reinforcement Learning in USV

A New Approach to Model Predictive Control for Unmanned Surface Vehicles (USV): A Probabilistic Neural Network-Based MBRL Framework

Academic Background

Unmanned Surface Vehicles (USVs) have seen rapid development in recent years within the field of marine science, finding extensive applications in scenarios such as marine transportation, environmental monitoring, and disaster rescue. However, USV control systems still face numerous challenges, particularly in handling external disturbances in complex marine environments. Traditional Model-Free Reinforcement Learning (MFRL) methods, while effective in certain tasks, rely heavily on large amounts of data and simulated training, and lack robustness in uncertain environments. To address these issues, Model-Based Reinforcement Learning (MBRL) methods have emerged. By simultaneously learning environment models and optimizing control policies, MBRL can more efficiently respond to external disturbances.

However, most mainstream MBRL methods are based on Gaussian Process (GP) models, whose computational complexity increases exponentially with sample size, limiting their application in complex scenarios. To overcome this limitation, this paper proposes a novel Probabilistic Neural Networks Model Predictive Control (PNMPC) method, aiming to model USV dynamics from a probabilistic perspective using neural networks while reducing computational complexity and enhancing control performance.

Source of the Paper

This paper is co-authored by Wenjun Huang, Yunduan Cui, Huiyun Li, and Xinyu Wu, who are affiliated with the University of Chinese Academy of Sciences and the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Published in IEEE Transactions on Automation Science and Engineering, the paper was formally released in 2025. The research was supported by the National Natural Science Foundation of China and the Shenzhen R&D Foundation.

Research Process

1. Problem Definition and Model Construction

The research goal of this paper is to design an MBRL framework capable of efficiently controlling USVs in complex marine environments. First, the authors model USV dynamics as a Markov Decision Process (MDP). The state space of the USV includes variables such as position, heading, speed, rudder angle, and throttle, while the action space consists of control commands for the rudder angle and throttle. Through this modeling approach, the authors can better capture the dynamic behavior of USVs under external disturbances.

2. Design of the Probabilistic Neural Network Model

To address the high computational complexity of GP models, the authors propose a probabilistic neural network model. This model captures uncertainties in USV dynamics through random dropout and neural network ensembles. Specifically, by using multiple independent neural networks and random dropout units, the model predicts the next state of the USV from a probabilistic perspective. To improve prediction accuracy, the model incorporates a continuous two-step dynamic loss function during training, which helps better capture the temporal dynamics of USVs.

3. Model Predictive Control Strategy

Based on the probabilistic neural network model, the authors design a Model Predictive Control (MPC) strategy. This strategy optimizes a sequence of actions to maximize future rewards while accounting for uncertainties in USV dynamics. Unlike traditional GP-MPC methods, PNMPC propagates uncertainties through neural network ensembles and random dropout units, avoiding excessive error amplification in multi-step predictions.

4. Experiments and Evaluation

To validate the effectiveness of PNMPC, the authors conducted experiments in a real USV data-driven simulation environment, including position-keeping and multi-target tracking tasks. The experiments simulated complex marine environments with three levels of external disturbances. The results show that PNMPC significantly outperforms traditional GP-based methods in both model accuracy and control performance, and its computational complexity is independent of sample size, making it suitable for large-scale applications.

Key Results

1. Model Learning and Prediction Accuracy

Experimental results demonstrate that PNMPC exhibits higher accuracy and lower prediction error variance when forecasting the next state of USVs. Compared to traditional GP models and existing neural network methods, PNMPC better captures the dynamic features of USVs, especially under strong disturbances. Furthermore, PNMPC’s model prediction error decreases significantly with increasing sample size, indicating strong generalization capabilities.

2. Control Performance

In the position-keeping task, PNMPC achieves lower average position offsets and higher task success rates compared to other baseline methods. In the multi-target tracking task, PNMPC also shows significant advantages in tracking distance and task completion rates. Even under the highest level of disturbance, PNMPC maintains high control stability, while the performance of other methods deteriorates rapidly.

3. Computational Efficiency

Since PNMPC’s computational complexity is independent of sample size, it can efficiently operate on large-scale datasets. In contrast, GP-based methods experience significantly increased optimization times with larger datasets, making them unsuitable for real-time control.

Conclusion

The PNMPC method proposed in this paper effectively addresses the computational complexity and robustness issues of traditional MBRL methods in USV control by combining probabilistic neural networks with model predictive control strategies. Experimental results show that PNMPC significantly outperforms existing methods in model accuracy, control performance, and computational efficiency, providing an efficient solution for USV control in complex marine environments.

Research Highlights

  1. Innovative Probabilistic Neural Network Model: Through random dropout and neural network ensembles, PNMPC efficiently captures USV dynamics from a probabilistic perspective, avoiding the high computational complexity of traditional GP models.
  2. Efficient Uncertainty Propagation Mechanism: PNMPC combines the strengths of Deep PILCO and PETS, proposing a novel uncertainty propagation mechanism that enhances the stability of multi-step predictions.
  3. Sample-Size-Independent Computational Complexity: PNMPC’s computational complexity is independent of sample size, making it suitable for large-scale applications.
  4. Robust Control Performance: Under strong disturbances, PNMPC demonstrates significant control advantages and generalization capabilities.

Significance and Value

The introduction of PNMPC not only provides new theories and methods for the field of USV control but also has broad application prospects. Its efficient modeling and optimization capabilities can be extended to the control of other unmanned systems (e.g., drones, unmanned vehicles), offering technical support for autonomous control in complex environments. Additionally, PNMPC’s sample-size-independent computational complexity opens possibilities for applications with large datasets, holding significant engineering and practical importance.