Reinforcement Learning for Stable Bipedal Robot Locomotion

A new two-stage AI system combines physics-driven trajectory planning with adaptive reinforcement learning, enabling robots to walk smoothly, remain upright in the face of shocks, and navigate real-world uncertainty.

Modern white futuristic humanoid robot close up shot

Image credit: XanderSt/Shutterstock.com

In an article published in the journal Scientific Reports, researchers presented a deep reinforcement learning (DRL) system for bipedal robot walking. They integrated an optimization-based trajectory planning stage with a DRL controller to generate optimal joint torques.

The goal was stable, periodic, and efficient locomotion. The trained robot demonstrated strong balance, handled mass and length variations, and rejected disturbances, improving robustness for real-world applications.

Limitations Of Current Controllers

The growing demand for service robots underscores the need for robust bipedal locomotion, a challenging area due to nonlinear dynamics and external disturbances. Traditional control methods, like zero moment point (ZMP) and inverse dynamics, require precise models and lack adaptability. While reinforcement learning (RL) offered model-free alternatives, it struggled with high-dimensional spaces.

DRL later addressed this issue, with algorithms such as deep deterministic policy gradient (DDPG) enabling control in continuous action spaces. However, many studies focus solely on control or gait generation. This paper fills this gap by integrating a deep learning-based trajectory planner with a DRL control system, creating a unified framework that explicitly handles model uncertainties and disturbances to achieve more resilient and efficient walking.

Hybrid Two-Stage Control for Stable and Efficient Bipedal Locomotion

This study presented a novel two-stage control framework for bipedal walking robots that integrates deep learning-based trajectory planning with an RL-based torque controller to achieve stable and efficient locomotion.

The first stage focuses on generating optimal and stable walking trajectories. Through optimization, the system creates joint trajectories that are both energy-efficient and dynamically feasible, ensuring stability by keeping the ZMP safely within the robot’s support polygon. These optimized trajectories are then used to train a deep neural network, which learns to predict smooth and stable joint motions based on input parameters such as step length, mass, and body length.

In the second stage, a DDPG RL algorithm takes over for torque control. Instead of learning from scratch, the DDPG agent uses the pre-planned trajectories from the first stage as a reference guide. Its goal is to output the precise joint torques needed to follow these trajectories. The agent is trained with a tailored reward function that encourages forward velocity, long simulation times (indicating stability), and minimal energy consumption, while penalizing falls.

To ensure robustness, the training incorporates randomized variations in the robot's mass and leg length, as well as external disturbances applied at various points throughout the walking cycle.

The method’s main strength is its hybrid approach. This hybrid strategy delivers both stability and adaptability, overcoming the limitations of rigid model-based controllers and the inefficiencies often found in purely learning-driven approaches. The result is a controller that can produce stable, human-like, and energy-efficient walking while robustly adapting to uncertainties and external pushes, making it highly suitable for real-world applications.

How the Robot Performed

Based on comprehensive simulations in a high-fidelity MATLAB environment, the proposed two-stage control framework successfully enabled stable and robust bipedal walking. The deep neural network for trajectory planning was highly accurate, replicating optimized joint angles with minimal error. The DDPG controller then used these trajectories to learn a stable walking policy, producing smooth and periodic joint motions and torques over ten consecutive steps.

When compared to a traditional inverse dynamics controller, the RL controller demonstrated superior performance. It achieved a more stable and human-like gait, characterized by a smaller, more upright torso sway and reduced variation in joint angular velocities. Although it consumed slightly more torque on average, it resulted in a more efficient swing leg motion.

Most notably, the system exhibited exceptional robustness. It maintained stable walking across 100 tests with random variations of up to 20% in mass and 5% in leg length, achieving a 100% success rate. Furthermore, the controller effectively rejected a wide range of external disturbances, including sudden changes in joint velocities, at different phases of the gait cycle. It consistently recovered balance after disturbances, including velocity changes as high as +55 % and -30 %, and successfully returned the robot to its stable, periodic walking pattern without falling.

Why This Method Matters for Robotics

In conclusion, this research successfully demonstrated the efficacy of a hybrid, two-stage control framework that integrates deep learning-based trajectory planning with a DDPG controller for bipedal robots. The system achieved stable, periodic walking and exhibited exceptional robustness, maintaining a 100% success rate across 100 tests despite significant model uncertainties, including mass variations of up to 20% and length variations of up to 5%.

Furthermore, the controller effectively rejected a wide range of external disturbances applied at different gait phases, consistently recovering balance. When compared to a traditional inverse dynamics controller, the proposed method produced a smoother, more human-like gait. These results confirm the framework's superior adaptability and its significant potential for enabling reliable bipedal locomotion in real-world, dynamic environments.

Journal Reference

Shiyu, F. (2025). Reinforcement learning-driven deep learning approaches for optimized robot trajectory planning. Scientific Reports, 15(1), 37898. DOI:10.1038/s41598-025-21664-5. https://www.nature.com/articles/s41598-025-21664-5

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2025, November 14). Reinforcement Learning for Stable Bipedal Robot Locomotion. AZoRobotics. Retrieved on November 14, 2025 from https://www.azorobotics.com/News.aspx?newsID=16249.

  • MLA

    Nandi, Soham. "Reinforcement Learning for Stable Bipedal Robot Locomotion". AZoRobotics. 14 November 2025. <https://www.azorobotics.com/News.aspx?newsID=16249>.

  • Chicago

    Nandi, Soham. "Reinforcement Learning for Stable Bipedal Robot Locomotion". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=16249. (accessed November 14, 2025).

  • Harvard

    Nandi, Soham. 2025. Reinforcement Learning for Stable Bipedal Robot Locomotion. AZoRobotics, viewed 14 November 2025, https://www.azorobotics.com/News.aspx?newsID=16249.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.