Real-World Humanoid Locomotion with Reinforcement Learning

Real-World Humanoid Robot Walking Based on Reinforcement Learning

Background Introduction

Humanoid robots have enormous potential for autonomous operation in diverse environments, not only alleviating labor shortages in factories but also assisting elderly people at home and exploring new planets. Although classical controllers show excellent effects in some scenarios, their promotion and adaptability in new environments remain a major challenge. To address this, this paper proposes a purely learning-based method for motion control of humanoid robots in real-world applications. Model Architecture

Research Motivation

Classical control methods have made significant progress in achieving stable and robust motion control, but their adaptability and generalizability are limited. Learning-based methods, which can learn from diverse simulated or real environments, have gradually gained more attention. This paper aims to use reinforcement learning to train a controller based on Transformer networks to achieve motion control of humanoid robots in complex environments.

Authors and Publication Information

This paper was collaboratively completed by Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, and Koushil Sreenath, all from the University of California, Berkeley. The study was published in Science Robotics on April 17, 2024.

Workflow

Research Process

This study includes multiple phases:

  1. Large-scale training in simulated environments:

    • Initially conducted large-scale model-free reinforcement learning in simulated environments. The training environment included a wide range of randomized conditions to ensure the model’s adaptability to external disturbances.
    • Utilized causal Transformers to extract information from motion and observation histories and predict the next action.
  2. Initial deployment in the real world:

    • Directly deployed models trained entirely in simulated environments to the real world without retuning the model parameters.
    • Deployment environments included various outdoor terrains such as sidewalks, tracks, and grass.

Experimentation and Testing

  1. Outdoor environment deployment:

    • Conducted tests in various everyday environments such as plazas, sidewalks, and grass. The controller’s performance was stable without any falls, even without using safety supports.
  2. Indoor experiments:

    • Performed controlled experiments, such as handling external forces, different terrains, and various loads. Experiments demonstrated that the robot could maintain balance despite external disturbances and complex terrains and could carry objects of different weights and shapes.
  3. Simulation comparison:

    • Conducted comparative testing with the most advanced models currently available. Results showed that the new controller performed excellently on slopes, steps, and unstable surfaces and exhibited recovery capabilities superior to existing methods in some scenarios.

Main Results

  1. Outdoor test results:

    • The robot was able to walk on surfaces of different materials and conditions, such as dry and wet concrete, sidewalks, and grass.
    • During a week-long 247 test, the robot did not fall.
  2. Indoor experiment results:

    • Verified the controller’s stability against sudden external forces by simulating various disturbances such as pushes and collisions.
    • Set up various types of rough surfaces in the laboratory, and the robot successfully adapted and walked through.
    • The robot could carry different types of loads and adjust its posture to maintain balance.
  3. Real-time commands and natural walking:

    • The controller could accurately track and execute real-time changing speed commands, supporting omnidirectional walking.
    • In experiments, the robot demonstrated human-like natural walking characteristics, such as arm-swinging, which further reduced energy consumption.

Conclusion and Significance

The experimental results show that the simple and general controller based on learning is feasible for complex, high-dimensional humanoid robot control in real-world environments. The main contributions include:

  1. Adaptability and Robustness:

    • The controller can perform stably in unknown environments, adapting to different types of terrains and external disturbances.
  2. Behavioral Performance:

    • The controller demonstrated natural walking behavior, including changing gait as per commands and quickly adapting to sudden obstacles.
  3. Scientific and Application Value:

    • This research not only provides a new perspective on robot control theory but also supports diverse tasks for humanoid robots in practical applications.

Method Innovations and Results Support

The methods in this paper achieve success mainly through the following innovations:

  1. Causal Transformer Model:

    • Utilized causal Transformers to extract information from motion history, adapt to different environments, and have the ability to dynamically adjust behaviors.
  2. Large-scale Simulation Training:

    • Ensured high adaptability and robustness of the model through training in massively randomized environments.
  3. Combination of Imitation Learning and Reinforcement Learning:

    • Combined teacher imitation and reinforcement learning for joint optimization, improving training efficiency and model performance.

Future Prospects

Although the methods in this study show excellent performance in adaptability and robustness, there are still some limitations, such as stability under extreme external disturbances. Future work could focus on improving the model’s adaptability to extreme conditions and explore more potential applications of Transformer models.

Summary

This research achieved efficient motion control of humanoid robots in real-world environments through advanced learning methods, providing new theoretical perspectives and important insights for practical applications. More research is expected to further optimize and expand such methods in the future.