Simulation results indicate that this approach enhances robot intelligence, promotes more human-like motion, and improves patient comfort during training.
Why This Matters
As the number of patients with lower limb impairments, particularly following stroke, continues to rise, robot-assisted rehabilitation has become increasingly important for restoring motor function. However, controlling rehabilitation robots remains challenging. Their dynamics are highly nonlinear, which makes precise mathematical modeling difficult.
Traditional control strategies such as PD control, sliding mode control, and fuzzy control are widely used, but they often struggle to adapt to individual patient variability. PD control, in particular, is valued for its simplicity, yet it lacks the flexibility needed for complex, human-centered movement.
Recent progress in DRL has shown promise in handling nonlinear, high-dimensional control problems. Still, most systems rely either on learning-based methods or traditional controllers alone. This study brings the two together, using DRL for intelligent adaptation while retaining PD control for stable, precise tracking.
How the Controller Works
At the core of the system is the deep deterministic policy gradient (DDPG) algorithm, chosen for its effectiveness in continuous control tasks. DDPG uses an actor–critic structure:
- The actor network generates control actions.
- The critic network evaluates those actions and guides learning.
This setup allows the system to learn directly from experience without requiring a fully defined environmental model.
The overall control architecture is layered. The PD controller operates at a low level, ensuring accurate tracking of joint angles. Above it, the DRL agent acts as a high-level decision-maker, learning human gait characteristics and adapting control signals accordingly. The system focuses on two degrees of freedom (hip and knee joints) in a fully actuated configuration.
A central strength of the study lies in its reward function design. Rather than simply encouraging forward motion, the reward incorporates multiple safety and comfort factors. It promotes:
- Stable walking speed
- Upright trunk posture
- Consistent hip height
- Adequate foot clearance
- Reduced joint torque
A specific penalty term, 'peffort', discourages excessive torque output to limit discomfort during rehabilitation. The agent is also penalized for deviating from reference human joint angles, reinforcing natural gait patterns.
Training was conducted in MATLAB’s DRL Toolbox over 10,000 episodes. Key hyperparameters included a learning rate of 1e-4, a discount factor of 0.99, and a replay buffer size of 1e6. Lyapunov stability analysis further confirmed the theoretical stability of the combined DRL-PD control system.
Data Collection and Experimental Setup
To ground the controller in realistic movement, researchers collected gait data from healthy subjects using a Nokov motion capture system. Reflective markers were placed on anatomical landmarks to capture joint motion as participants walked at self-selected speeds. A synchronized three-dimensional force plate recorded ground reaction forces.
All procedures were ethically approved, and participants provided informed consent.
What the Results Showed
After training, the DRL-PD controller produced stable, periodic joint motions resembling human gait. The hip, knee, and ankle joints moved within realistic ranges:
- Hip: −0.3 to 0.3 rad
- Knee: −0.3 to 0.4 rad
- Ankle: 0 to 0.1 rad
Closed-loop limit cycles confirmed system stability.
Tracking accuracy was strong. Errors converged rapidly to below 0.1 rad. Across 10 stable gait cycles, root mean square errors were:
- 0.028 ± 0.003 rad (hip)
- 0.035 ± 0.004 rad (knee)
Joint torques remained within controlled ranges of 0–50 N·m for the hip and −10–60 N·m for the knee.
The most notable improvement appeared in motion smoothness. Measured using the root mean square of torque derivatives, the DRL-PD controller achieved:
- 5.3 ± 0.3 N·m/s (hip)
- 6.6 ± 0.4 N·m/s (knee)
By comparison, conventional PD controllers reported values of 12.5 and 15.8 N·m/s, respectively. This represents roughly a 60 % reduction in torque variability, which is a significant improvement that directly supports patient comfort during rehabilitation exercises.
Robustness testing further strengthened the findings. With ±5 % mass variation, tracking errors increased by less than 10 %, indicating stability under parameter uncertainty.
Although training required approximately 12 hours on an RTX 3060 GPU, the trained policy was able to run efficiently once deployed. The authors note that future work will focus on optimizing real-time implementation.
Conclusion
This study demonstrates that combining DRL with traditional PD control can produce more natural, stable, and comfortable movement in lower limb rehabilitation robots. By grounding the controller in real human gait data and embedding comfort directly into the reward structure, the researchers created a system that balances adaptability with reliability.
While the results are currently based on simulations and controlled testing, the next critical step will be clinical validation with patients. If successful, this approach could strengthen the role of intelligent control systems in personalized robot-assisted therapy.
Journal Reference
Jin, Y., Zhang, J., Li, W., Yu, J., Wang, Z., & Sun, S. (2026). A humanoid control strategy based on deep reinforcement learning for enhanced comfort in lower limb rehabilitation robots. Scientific Reports. DOI:10.1038/s41598-026-39011-7. https://www.nature.com/articles/s41598-026-39011-7
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.