Reinforcement Learned Multiagent Cooperative Navigation in Hybrid Environment with Relational Graph Learning

2025-02-05 Wed
multiagent system reinforcement learning relational graph learning hybrid environment cooperative navigation decentralized execution collision avoidance
Multi-agent Cooperative Navigation in Hybrid Environments: A New Reinforcement Learning Approach Based on Relational Graph LearningMobile robotics is witnessing a surge in applications, fueled by advancements in artificial intelligence, with navigation capabilities being one of the core focus areas of research. Traditional navigation methods often face challenges such as algorithmic complexity, computational resource requirements, and a lack of model generalizability when addressing tasks in dynamic environments, obstacle avoidance, and multi-robot collaboration. To address these issues, a research team from Central South University and Zhejiang University of Technology has proposed a novel approach based on Graph Attention Networks (GAT), called GAR-CoNav, offering a new solution for the Multi-Robot Cooperative Navigation Problem (MCNP) in hybrid environments. This method, published in the IEEE Transactions on Artificial Intelligence, not only introduces a novel model but also demonstrates its superiority through extensive simulations in highly complex hybrid environments.
Background and Research SignificanceWith the maturation of artificial intelligence and robotics technologies, there is an increasing demand for multiple robots to collaborate and complete complex tasks in dynamic hybrid environments. The goal of MCNP is to develop strategies for multiple robots to cooperate, avoid obstacles, and efficiently navigate to their respective destinations. Solving this problem has direct implications for enhancing automation in industries like manufacturing and logistics and driving technological advancements in fields such as intelligent transportation, public safety, and building inspection.
Currently, solutions to MCNP primarily fall into centralized and decentralized methods. Centralized methods rely on global environmental observations but are computationally expensive and lack scalability. Decentralized methods prioritize autonomy but face issues such as environmental non-stationarity, which limits their collaboration efficiency and reliability. Additionally, existing methods struggle with flexibility in adapting to dynamic obstacles and multi-goal complex environments. Traditional approaches often assume static or pre-assigned goals and cannot reassign tasks dynamically.
Addressing these challenges, the authors developed the GAR-CoNav model, which combines a centralized training and decentralized execution (CTDE) framework with Graph Attention Networks and reinforcement learning. This approach delivers a scalable solution capable of achieving collaborative multi-goal navigation.
Authors and SourceThe paper is co-authored by Wen Ou, Biao Luo, Xiaodong Xu, Yu Feng, and Yuqian Zhao, with Biao Luo and Yu Feng being Senior Members of IEEE. The team members are affiliated with the School of Automation, Central South University, and the Information Engineering College, Zhejiang University of Technology. The paper was published online in August 2024 and featured in the January 2025 issue of IEEE Transactions on Artificial Intelligence.
Methods and Research ProcessResearch Framework and Problem DescriptionThe authors modeled MCNP as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and proposed a hybrid environment global representation method by combining Velocity Obstacle (VO) encoding with graph structures. The research process comprised the following components:
1. Graph Representation and Connection Rules:

The hybrid environment is represented as a graph where nodes include robots, static obstacles, dynamic obstacles, and destinations, while edges represent their interactions. The authors defined the following connection rules:
- Robot nodes are influenced by all other nodes.
- Dynamic obstacle nodes are only influenced by other obstacle nodes (both static and dynamic).
- Static obstacle and destination nodes are not influenced by any other nodes.
This design ensures that the graph structure aligns with the dynamic interactions in a hybrid environment and prevents robots from adopting extremely aggressive behaviors toward dynamic obstacles.
2. Feature Encoding and Representations:

Each node’s features are encoded into specific attribute vectors. For example, the encoding for robot nodes includes position, velocity, radius, and orientation information. For obstacles and destinations, VO cone boundary vectors are introduced to capture potential collision information during navigation. These features are concatenated into sparse matrices and input into a graph neural network alongside adjacency matrices.
3. Reinforcement Learning Algorithms and Reward Design:

The reinforcement learning component employs a centralized training decentralized execution framework based on GAT. Within this framework:
- Graph Attention Network (GAT): Explores dynamic interaction weights between nodes, aggregating information from related nodes to generate new state representations.
- Bidirectional Gated Recurrent Unit (Bi-GRU): Processes obstacle features to capture sequential dynamic variations in the environment.
- Reward Function Design: The reward design includes shared and individual rewards, punishing collision behavior while encouraging cooperative destination achievements. This reward mechanism overcomes the traditional focus on optimizing single-destination distances, enabling safer and more cooperative navigation strategies.
Experiments and Research ResultsThe paper validates the effectiveness of the GAR-CoNav model through various experiments conducted in complex simulated environments. The main tasks include obstacle avoidance, traversal, and coverage.
1. Simulation Environment and Experimental SetupThe simulated environments feature hybrid obstacles (static and dynamic) and multi-goal configurations, with robots allowed only local observations. A typical scenario is described in the format (3, 4, 3, 3): 3 robots, 4 static obstacles, 3 dynamic obstacles, and 3 destinations.
2. Obstacle Avoidance PerformanceIn obstacle avoidance experiments, GAR-CoNav demonstrated significant performance improvements. Whether in static or dynamic obstacle environments, the model showed lower Collision Rates (Rc) and Danger Rates (Rd), with smoother navigation paths. Compared to traditional methods like Non-Holonomic ORCA (NH-ORCA) and RL-RVO, GAR-CoNav balanced path efficiency with higher safety.
3. Cooperative Navigation PerformanceIn both traversal and coverage tasks (ensuring all targets are reached by robots), GAR-CoNav exhibited superior performance:
- Autonomous task allocation significantly improved Success Rates (Rs).
- Dynamic, real-time target reassignment maximized overall path optimization.
Trajectories in certain complex scenarios demonstrated the cooperative nature of the model. For example, in a highly complex obstacle layout, robots autonomously avoided resource conflicts, prioritizing global utility, offering optimized solutions for complex multi-goal navigation problems.
Research Significance and ContributionsScientific Value:

GAR-CoNav addresses the limitations of conventional centralized or decentralized navigation systems in dynamic obstacle avoidance, task allocation flexibility, and multi-agent collaboration. It also validates the feasibility of applying reinforcement learning and graph neural networks to MCNP in complex dynamic environments.
Practical Value:

The research results can be directly applied to various real-world robotic navigation tasks, such as warehouse logistics optimization, collaborative aerial traffic management for drones, and multi-agent behavior planning in urban traffic optimization.
Innovative Highlights:

a) Integration of VO encoding into graph networks for dynamic environment modeling.

b) Attention mechanism-based methods for target assignment and information aggregation.

c) Comprehensive reward function design emphasizing both local obstacle avoidance and global cooperation.
Conclusion and Future OutlookCompared to traditional methods or existing RL-RVO, GAR-CoNav demonstrates superior performance and stability in hybrid obstacle environments and multi-goal collaborative tasks. This paper provides a new framework to tackle complex cooperative navigation problems in dynamic environments, with implications for both scientific research and engineering applications. However, the authors acknowledge the need for further exploration to improve GAR-CoNav’s efficiency in single-target scenarios and address the “Sim-to-Real” challenge of transferring simulation results to real-world applications. Future work will focus on enhancing task efficiency and addressing real-world adaptation challenges to expand the scope of practical applications.