Posted in | News | Consumer Robotics | Humanoids | Artificial Intelligence

Reinforcement Learning for Stable Bipedal Robot Locomotion

Download PDF Copy

By Soham NandiReviewed by Lauren HardakerNov 14 2025

A new two-stage AI system combines physics-driven trajectory planning with adaptive reinforcement learning, enabling robots to walk smoothly, remain upright in the face of shocks, and navigate real-world uncertainty.

Image credit: XanderSt/Shutterstock.com

In an article published in the journal Scientific Reports, researchers presented a deep reinforcement learning (DRL) system for bipedal robot walking. They integrated an optimization-based trajectory planning stage with a DRL controller to generate optimal joint torques.

The goal was stable, periodic, and efficient locomotion. The trained robot demonstrated strong balance, handled mass and length variations, and rejected disturbances, improving robustness for real-world applications.

Limitations Of Current Controllers

The growing demand for service robots underscores the need for robust bipedal locomotion, a challenging area due to nonlinear dynamics and external disturbances. Traditional control methods, like zero moment point (ZMP) and inverse dynamics, require precise models and lack adaptability. While reinforcement learning (RL) offered model-free alternatives, it struggled with high-dimensional spaces.

DRL later addressed this issue, with algorithms such as deep deterministic policy gradient (DDPG) enabling control in continuous action spaces. However, many studies focus solely on control or gait generation. This paper fills this gap by integrating a deep learning-based trajectory planner with a DRL control system, creating a unified framework that explicitly handles model uncertainties and disturbances to achieve more resilient and efficient walking.

Hybrid Two-Stage Control for Stable and Efficient Bipedal Locomotion

This study presented a novel two-stage control framework for bipedal walking robots that integrates deep learning-based trajectory planning with an RL-based torque controller to achieve stable and efficient locomotion.

The first stage focuses on generating optimal and stable walking trajectories. Through optimization, the system creates joint trajectories that are both energy-efficient and dynamically feasible, ensuring stability by keeping the ZMP safely within the robot’s support polygon. These optimized trajectories are then used to train a deep neural network, which learns to predict smooth and stable joint motions based on input parameters such as step length, mass, and body length.

In the second stage, a DDPG RL algorithm takes over for torque control. Instead of learning from scratch, the DDPG agent uses the pre-planned trajectories from the first stage as a reference guide. Its goal is to output the precise joint torques needed to follow these trajectories. The agent is trained with a tailored reward function that encourages forward velocity, long simulation times (indicating stability), and minimal energy consumption, while penalizing falls.

To ensure robustness, the training incorporates randomized variations in the robot's mass and leg length, as well as external disturbances applied at various points throughout the walking cycle.

The method’s main strength is its hybrid approach. This hybrid strategy delivers both stability and adaptability, overcoming the limitations of rigid model-based controllers and the inefficiencies often found in purely learning-driven approaches. The result is a controller that can produce stable, human-like, and energy-efficient walking while robustly adapting to uncertainties and external pushes, making it highly suitable for real-world applications.

How the Robot Performed

Based on comprehensive simulations in a high-fidelity MATLAB environment, the proposed two-stage control framework successfully enabled stable and robust bipedal walking. The deep neural network for trajectory planning was highly accurate, replicating optimized joint angles with minimal error. The DDPG controller then used these trajectories to learn a stable walking policy, producing smooth and periodic joint motions and torques over ten consecutive steps.

When compared to a traditional inverse dynamics controller, the RL controller demonstrated superior performance. It achieved a more stable and human-like gait, characterized by a smaller, more upright torso sway and reduced variation in joint angular velocities. Although it consumed slightly more torque on average, it resulted in a more efficient swing leg motion.

Most notably, the system exhibited exceptional robustness. It maintained stable walking across 100 tests with random variations of up to 20% in mass and 5% in leg length, achieving a 100% success rate. Furthermore, the controller effectively rejected a wide range of external disturbances, including sudden changes in joint velocities, at different phases of the gait cycle. It consistently recovered balance after disturbances, including velocity changes as high as +55 % and -30 %, and successfully returned the robot to its stable, periodic walking pattern without falling.

Why This Method Matters for Robotics

In conclusion, this research successfully demonstrated the efficacy of a hybrid, two-stage control framework that integrates deep learning-based trajectory planning with a DDPG controller for bipedal robots. The system achieved stable, periodic walking and exhibited exceptional robustness, maintaining a 100% success rate across 100 tests despite significant model uncertainties, including mass variations of up to 20% and length variations of up to 5%.

Furthermore, the controller effectively rejected a wide range of external disturbances applied at different gait phases, consistently recovering balance. When compared to a traditional inverse dynamics controller, the proposed method produced a smoother, more human-like gait. These results confirm the framework's superior adaptability and its significant potential for enabling reliable bipedal locomotion in real-world, dynamic environments.

Journal Reference

Shiyu, F. (2025). Reinforcement learning-driven deep learning approaches for optimized robot trajectory planning. Scientific Reports, 15(1), 37898. DOI:10.1038/s41598-025-21664-5. https://www.nature.com/articles/s41598-025-21664-5

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Nandi, Soham. (2025, November 14). Reinforcement Learning for Stable Bipedal Robot Locomotion. AZoRobotics. Retrieved on December 29, 2025 from https://www.azorobotics.com/News.aspx?newsID=16249.
MLA
Nandi, Soham. "Reinforcement Learning for Stable Bipedal Robot Locomotion". AZoRobotics. 29 December 2025. <https://www.azorobotics.com/News.aspx?newsID=16249>.
Chicago
Nandi, Soham. "Reinforcement Learning for Stable Bipedal Robot Locomotion". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=16249. (accessed December 29, 2025).
Harvard
Nandi, Soham. 2025. Reinforcement Learning for Stable Bipedal Robot Locomotion. AZoRobotics, viewed 29 December 2025, https://www.azorobotics.com/News.aspx?newsID=16249.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback

(Logout)

Public Comment

Private Feedback to AZoRobotics.com

Submit

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.