Experiments show that it performs reliably across tasks such as long-horizon planning and tabletop rearrangement, using only open-source models.
Background
Traditional robotic development depends on expert engineers to decompose tasks into atomic actions and assemble them into behaviors. While this method works well, it is inherently rigid and struggles to adapt to dynamic environments like homes or healthcare settings, where non-experts often need to update capabilities quickly.
Frameworks such as ROS offer a modular foundation, but they still require expert input for defining tasks and expanding skills. At the same time, advances in LLMs have made natural language interaction with robots increasingly practical.
Even with these developments, a gap remains.
Current systems do not fully support intuitive task composition by non-experts, nor do they make it easy to extend action libraries through demonstration or refine behaviors iteratively. This work addresses that gap by combining LLM-based reasoning with ROS, alongside imitation learning and feedback-driven adaptation.
System Architecture
To bridge this divide, the proposed framework allows non-expert users to program robots through natural language interaction. It separates responsibilities between:
- Experts, who define an initial library of pre-trained atomic actions (e.g., “pick,” “navigate”)
- Non-experts, who interact with the robot through a chat interface without needing technical expertise
At its core, the system is organized around four tightly connected components.
First, the atomic action library serves as the foundation, storing primitive robot skills along with their textual descriptions and executable code. Building on this, the imitation learning module allows non-experts to expand the library by physically guiding the robot or demonstrating tasks, removing the need for manual coding.
Once actions are defined, the atomic action optimizer refines them. It uses LLMs to identify key parameters within the action code and improves them through Bayesian optimization, enhancing performance without requiring expert intervention.
Tying everything together is the AI agent, which acts as the decision-making core. It processes user instructions alongside environmental observations, converted into text, and selects appropriate actions. Depending on the task, it can operate in different modes: executing single actions in dynamic settings, chaining sequences for multi-step tasks, running custom code, or using behavior trees for more complex logic.
Throughout this process, prompts are carefully structured to include task goals, available actions, and user feedback.
Rather than relying on continuous retraining, the system improves iteratively through interaction, allowing non-experts to shape robot behavior over time. While the framework supports both open-source and commercial LLMs, only open-source models were used in these experiments.
Experimental Validation
The researchers evaluated the framework across a range of robots and task environments, with results that highlight both its flexibility and consistency.
In a kitchen setup using a UR5 arm, the robot completed a 12-step coffee-making task from a single natural language prompt, demonstrating strong long-horizon planning without human intervention. Building on this, non-experts introduced new actions like stirring and pouring through demonstration, which the system then used to carry out a “cook me pasta” task.
This progression illustrates how the framework naturally expands its capabilities through imitation.
As tasks became more complex, the role of feedback became clearer. In tabletop rearrangement experiments, performance dropped when relying solely on the language model.
However, when human corrections were incorporated, success rates remained consistently high. Importantly, the system was able to reuse feedback. For example, after being instructed to verify object locations before grasping, it applied that correction independently in later trials.
The framework also proved effective in distributed settings. An operator in Europe successfully controlled a robot in Asia using natural language, completing pick-and-place tasks despite a 2–3 second delay. In a laboratory scenario, the system interpreted unstructured textbook instructions to perform a pH test, showing its ability to work with less formal inputs.
Further experiments reinforced these findings.
Bayesian optimization improved air hockey performance from 30 % to 52 %, while a quadruped robot demonstrated real-time failure recovery by resolving issues such as gripper obstructions in an office environment.
Insights and Challenges
The results show that LLMs can make robotic systems far more accessible, allowing non-experts to define complex tasks through natural language. The framework performs particularly well in long-horizon planning and modular action sequencing.
At the same time, several challenges affect reliability. Performance is highly sensitive to prompt wording, with even small phrasing changes sometimes leading to failure. The model can also be misled by examples, attempting actions involving objects mentioned only in passing. In some cases, it generates actions that do not exist in the library, although few-shot prompting helps reduce this behavior.
Despite these issues, the system remains effective when paired with clear, actionable human feedback, especially as task complexity increases. Still, there is no single correction strategy that works across all scenarios, and the framework continues to rely on careful prompt design and human-in-the-loop guidance for more demanding tasks.
Conclusion
Overall, this work presents a cohesive framework that integrates LLM-based AI agents with ROS, making robot programming more accessible through natural language. By combining imitation learning, action optimization, and iterative feedback, the system supports flexible and adaptive task execution across a wide range of applications.
While challenges such as prompt sensitivity and action illusion remain, the framework marks steady progress toward more usable and adaptable robotic systems without yet fully achieving general-purpose autonomy.
Journal Reference
Mower et al. (2026). A robot operating system framework for using large language models in embodied AI. Nature Machine Intelligence. DOI:10.1038/s42256-026-01186-z. https://www.nature.com/articles/s42256-026-01186-z
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.