Exploration-based Model Learning with Self-Attention for Risk-Sensitive Robot Control

The overall schematic of the algorithm that consists of the main loop for execution of the robot (yellow line) and the calculation through the agent (red line)

Discussion on Risk-Sensitive Robot Control Based on Self-Attention Mechanism

Research Background

The kinematics and dynamics in robot control are key factors to ensure the precise completion of tasks. Most robot control schemes rely on various models to achieve task optimization, scheduling, and priority control. However, the dynamic characteristics calculations of traditional models are usually complex and prone to errors. To address this issue, it has become a feasible alternative to automatically acquire models through machine learning and reinforcement learning techniques. However, when directly applied to actual robot systems, this method carries the risk of drastic motion changes and undesirable behavior outputs.

Research Origin

This paper is written by Dongwook Kim, Sudong Lee, Tae Hwa Hong, and Yong-Lae Park from Seoul National University and École Polytechnique Fédérale de Lausanne. The study was published in the npj Robotics journal in 2023.

Research Content

Research Process

This paper proposes an online model update algorithm that is directly applied to actual robot systems. The algorithm utilizes a self-attention mechanism model embedded in a neural network to handle the kinematics and dynamics of the target system. Its innovation lies in the redundancy setting of the self-attention path and the time-independent model establishment, which allows for anomaly detection by calculating the trace value of the self-attention matrix and reduces random changes during the exploration process in model updating.

Experimental Process

  1. Introduction to Multi-Step Process:

    1. Initial input (kinematic model) utilizes a Radial Basis Function Neural Network (RBFNN) to approximate the kinematics.
    2. Introduce encoder, decoder, and self-attention layers to extract correlation features from time associations.
    3. Adjust the robot’s exploration area using the trace value of the self-attention matrix to achieve optimized trajectory control.
    4. The dynamic model considers the relationship between input control and robot configuration state, excluding external force influences, detecting interference, and excluding disturbed datasets.
    5. Finally, robust shielding and adjustment of the task space and datasets are performed through two self-attention networks, ultimately generating real-time feedback control laws.
  2. Specific Operational Steps:

    1. The robot receives control input and executes a single-step action.
    2. Observe the robot’s state (configuration state and task state).
    3. Calculate kinematic and dynamic equation components.
    4. Prepare the next target task state.
    5. Determine the next control input.
    6. Train the neural network approximation model using the replay buffer.
    7. Adjust the task space and shield datasets through the self-attention network matrix.

Experimental Methods

  1. Kinematic Self-Attention Model: The self-attention layer correlates features from past time steps through the encoding layer and decoding layer. By reducing the difference between the trace value and the unit matrix, the model prediction accuracy is improved. At the same time, anomaly detection is combined with exploration area constraints.

  2. Dynamic Self-Attention Model: Control input is passed to the encoding layer, and self-attention network processing is performed with other time-related inputs. Predict changes in the configuration state, identifying and ignoring external disturbances.

Main Results

  1. Verification in Simulated Environment: The robot completes two tasks (round-trip motion and circular trajectory tracking) in the PyBullet simulation environment. From the execution effects of all tasks, using a self-attention network for exploration adjustment can significantly improve tracking accuracy and timely detect and handle interference.

  2. Application of Soft Robot Arm Trajectory Tracking: A soft robot arm with three-dimensional control capability was actually built and tested. The soft robot arm successfully completed the tracking task of an ’S’-shaped curve. Dynamic changes reduced the exploration area constraint value, effectively expanding the motion range and reducing errors.

  3. Autonomous Operation of Industrial Robots: An industrial robot UR5e was used to play the piano. During the process, the task space was gradually expanded through the self-attention mechanism. After 25 experimental cycles, the robot successfully mastered and played complex musical pieces.

  4. Quadruped Robot Gait Control: A quadruped robot imitated and replicated reference trajectories without simulation, achieving stable walking. By increasing the exploration domain of the reference trajectory, the motion range was enhanced.

Research Conclusion

By applying the self-attention mechanism model update algorithm to actual robot systems, this study significantly improved the accurate control of complex task states and reduced external disturbances, verifying the algorithm’s effectiveness in a wide range of robot applications.

Research Highlights

  1. The direct application of the self-attention mechanism in real-time robot control improves the rapid response capability of model detection and adjustment.
  2. There is no need to rely on a simulation environment or prior model knowledge, achieving generalization and efficient data utilization.
  3. Both dynamics and kinematics are considered, greatly reducing the risk of errors in actual operation.

With the continuous optimization of the algorithm, it is expected that similar methods can be further used in robot operation control for more complex and high-risk tasks.