Editorial Feature

How Has Robotic Dexterity Changed with AI?

For years, robots have been great at repeating tasks, whether that be lifting heavy parts, moving packages, or welding in perfect straight lines. But ask one to pick up something soft, adjust to a wobbly object, or do anything slightly unpredictable, and that's where they fall short.

Concept of a robotic hand.

Image Credit: Rawpixel.com/Shutterstock.com

That kind of flexible, real-world movement - what we call robotic dexterity - has always been a major challenge as robots are still unable to adapt in the moment, react to feedback, or make quick decisions when things don’t go exactly as planned.

Now, with the rise of artificial intelligence (AI) foundation models, that is finally starting to change. These massive, pretrained models, built on data from language, vision, and even physical actions, are giving robots the ability to understand their environment, respond to natural commands, and learn new tasks without needing to be reprogrammed from scratch.

In this article, we’ll aim to break down what’s new, how it works, and why robotic dexterity might finally be catching up to the real world.

Want all the details? Grab your free PDF here!

What Are Foundation Models?

If you’ve used ChatGPT or seen AI image generators online, you’ve already seen foundation models in action. They’re essentially huge neural networks trained on massive amounts of data, from text and images to video and even human demonstrations. The overarching goal of these foundation models is to give AI a broad understanding of the world, not just one narrow skill.

Originally built for language and vision tasks, these models are now being adapted for robotics. And that’s a big shift.

Instead of training a robot from scratch for every single task, like "grasp object A" or "stack box B," foundation models let robots generalize. They learn from diverse, internet-scale data and apply that knowledge to new, unfamiliar situations. This is what researchers call zero-shot generalization: the robot hasn’t done the task before, but it figures it out based on what it already knows.1,2

In practical terms, this means a robot can:

  • Understand a natural language command (“Put the red cup on the table”)
  • Use its vision system to find the cup and the table
  • Plan and execute the right movements - without someone coding every single step

By utilizing large vision-language models (VLMs) and large language models (LLMs), robots are now able to acquire a multimodal understanding of the world through both visual and linguistic inputs. This capability enhances their ability to follow natural language commands effectively, resulting in more adaptable behavior. As such, robots can tackle a broader range of tasks, such as grasping unfamiliar objects or safely navigating through messy environments, in contrast to those limited to specific jobs only.1

Improving Perception for Better Dexterity

Dexterity in robots depends heavily on accurate perception of objects, environments, and interaction forces. To handle real-world tasks with precision, they need to see what’s around them, feel what they’re touching, and understand how their own movements affect the environment. 

Foundation models improve perception capabilities by fusing multimodal sensor data - like vision, touch (haptics), and proprioception (basically the robot’s sense of its own position) - into one big-picture understanding. This helps robots recognize a wider range of objects and better estimate things like size, shape, and texture. That level of detail is crucial for delicate tasks, like handling fragile items or fitting parts together.1,2

Unlike older models that only recognize objects from a fixed list, multimodal foundation models are trained on massive, diverse datasets. That means they can handle open-vocabulary object recognition, spotting unfamiliar items and making educated guesses about what they are.1,2 These models also help robots connect visual and language cues, so when you give a command like “pick up the black mug,” they can actually figure out what you mean and act on it. 

What is even more impressive is that robots using these models can adjust in real time. If something shifts or slips while they’re holding it, they can feel the change and adapt, tweaking their grip or movement on the fly.

By combining vision and tactile feedback, robots can now tackle hands-on tasks like:

  • Folding laundry without tangling it
  • Assembling parts with tight tolerances
  • Using tools with just the right amount of force

This integration improves their ability to navigate and complete simple, everyday tasks.2,3

How Robots Make Decisions and Stay in Control

Dexterity is also about deciding what to do next. For a robot, that means figuring out how to move, how much force to apply, and how to react when things don’t go as planned.

This is where foundation models play another key role: decision-making and control.

Some of the same models used to write text (like large language models, or LLMs) are now being adapted to generate robot control code. That means you can give a robot a high-level instruction, like “clean up the desk,” and it can break that down into smaller actions: identifying objects, figuring out how to grasp them, and deciding where to move them.1,2

One effective approach is using reinforcement learning alongside foundation models to enhance how robots learn from their experiences. By being pretrained on large datasets of human demonstrations and simulated interactions, these models learn to operate with less task-specific data. This leads to better sample efficiency, enabling robots to generalize the skills they acquire in one setting to new tasks or contexts, which enhances their flexibility and dexterity.2,4

A major benefit here is skill transfer. Foundation models allow for:

  • Zero-shot learning: Robots can perform tasks they’ve never been trained on
  • Few-shot learning: They can learn new tasks from just a few examples

That’s a game-changer for real-world use, where tasks don’t always resemble the training data exactly. Whether it’s working in a warehouse or helping out in a home, robots need to handle unexpected situations - and now, they’re starting to do just that. Overall, these advancements contribute to improved robot performance and independence.2,4

Why Multimodal Learning Matters for Dexterity

One of the biggest reasons foundation models are improving robotic dexterity comes down to a single word: multimodality. That just means robots aren’t relying on just one sense (like vision) anymore; instead, they’re learning from a mix of visual, language, tactile, and proprioceptive data all at once.

This kind of integrated learning helps robots understand context better and make more precise movements. For example, a robot might use vision to identify an object, language to interpret an instruction, touch to feel pressure, and proprioception to know where its joints and limbs are in space - all at the same time.

What ties this all together are alignment mechanisms inside foundation models. These systems connect what a robot sees with what it means and what it should do next. So, a visual cue (like seeing a fragile glass) doesn’t just get recognized - it leads directly to a different, more careful kind of action.1,2

Researchers are also combining different types of models to build more capable systems:5

  • Vision-language models (VLMs) handle images and connect them to meaning
  • Large language models (LLMs) understand and reason with natural language
  • Reinforcement learning models help with real-time decision-making and control

Together, they form a sort of teamwork system that lets robots reason, act, and adapt in ways that look more and more human-like.5

However, applying these models to robotics poses challenges, particularly in generating quick and reliable motor commands. To address this, recent developments, such as diffusion models, allow for continuous action outputs. This innovation is crucial for enabling robots to respond swiftly, ensuring they maintain control and dexterity during interactions in real-world scenarios.3,5

So while robots still have a long way to go, this kind of multimodal, integrated learning is a major leap toward more natural, coordinated movement.

What’s Still Hard About Giving Robots Dexterity

As promising as foundation models are, there’s still a long list of challenges to work through before robots can fully match human dexterity - especially in unpredictable, real-world environments.

1. The Data Problem

One of the main issues is the lack of large-scale, diverse, and high-quality data on robot interactions. This data is essential for fine-tuning models for specific tasks, such as manipulation.1,2

To get good at manipulating objects, robots need access to huge amounts of training data, and that means more than just images or text. Instead, it means actual recordings of robot interactions. The problem is that this kind of data is expensive and slow to collect. Robots have to physically perform tasks over and over again, and recording that across lots of scenarios takes serious time and resources.

2. Safety and Stability

Another concern is ensuring safety and reliability when robots operate in unpredictable environments. Even the most intelligent model can make unusual decisions when it encounters something entirely new. That’s a big issue when you’re deploying robots in shared spaces - like hospitals, homes, or warehouses - where unpredictable behavior can be risky.

Researchers are actively working on ways to estimate uncertainty and give robots the ability to say, “I’m not sure about this.” That kind of self-awareness is key to keeping people safe around autonomous machines.2,6

3. Speed and Efficiency

Foundation models are powerful, but they’re also computationally heavy. Running them on real-world robots, particularly smaller or mobile ones, can lead to delays and high energy use. In robotics, even a tiny bit of lag can throw everything off.

That’s why there’s so much work happening around model compression and hardware acceleration to help make these models faster and more efficient without losing accuracy.5,6

What’s Next for Robotic Dexterity?

Even with the challenges, there’s still a lot to be excited about. Foundation models are opening the door to general-purpose robotics. In other words, robots that don’t need a new program every time their task changes.

Researchers are working on systems that bring together perception, planning, and control, all within one model, or at least one tightly connected stack of models. These systems will also understand natural language, so you can interact with them using normal instructions, not programming code.1,2

That kind of intuitive interface makes robots much more accessible for real-world use, especially in dynamic environments like homes, kitchens, or healthcare settings.

To help track progress, the robotics community is building standardized benchmarks and datasets focused on dexterity tasks. Think of these like fitness tests for robots - ways to measure how well they can handle a wide variety of actions. These tools will help researchers compare results, improve models faster, and push the field forward.

One interesting direction is combining AI-based skills with traditional robot control methods. That hybrid approach could give us the best of both worlds: the flexibility of learning-based systems, with the safety and predictability of classic robotics.1,2

Looking ahead, combining foundation models with fast communication networks like 6G could let robots share information with each other in real time. That means robots coordinating on tasks, learning from each other’s mistakes, and working together more efficiently like an actual team.5,7

Companies like Mimic Robotics are already applying this in manufacturing and logistics by training robots on huge datasets of human actions. And that’s just the beginning.

A New Era of Dexterity

Robotic dexterity has always been a tough problem, but foundation models are finally moving the needle. By combining massive amounts of multimodal data with smarter decision-making, robots are getting better at sensing, responding, and acting in real-world environments.

Instead of relying on rigid programming, today’s AI-powered robots can adapt on the fly, learn new tasks with minimal training, and even understand language instructions. That kind of flexibility is unlocking a whole new range of applications.

Of course, there’s still work to do - especially around safety, speed, and scalability. But with progress being made in areas like reinforcement learning, diffusion models, and model efficiency, the gap between human and robotic dexterity is narrowing fast.

Want to Explore More?

If this topic caught your attention, here are a few things you might want to check out next: 

Want all the details? Grab your free PDF here!

References and Further Reading

  1. Firoozi, R. et al. (2025). Foundation models in robotics: Applications, challenges, and the future. The International Journal of Robotics Research, Vol. 44, Issue 5. DOI:10.1177/02783649241281508. https://journals.sagepub.com/doi/full/10.1177/02783649241281508
  2. Xiao, X. et al. (2025). Robot learning in the era of foundation models: A survey. Neurocomputing, 638, 129963. DOI:10.1016/j.neucom.2025.129963. https://www.sciencedirect.com/science/article/abs/pii/S0925231225006356
  3. π0: Our First Generalist Policy. (2024). Physical Intelligence (π). https://www.physicalintelligence.company/blog/pi0
  4. Mitchell, S. (2025). Mimic raises USD $16 million to scale dexterous AI robotics. IT Brief. https://itbrief.asia/story/mimic-raises-usd-16-million-to-scale-dexterous-ai-robotics
  5. Fang, B. et al. (2024). What Foundation Models can Bring for Robot Learning in Manipulation : A Survey. arXiv. DOI:10.48550/arXiv.2404.18201. https://arxiv.org/abs/2404.18201v2
  6. Yafei, H. et al. (2023). Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis. arXiv. DOI:10.48550/arXiv.2312.08782. https://ui.adsabs.harvard.edu/abs/2023arXiv231208782H%2F/abstract
  7. Wang, G. et al. (2025). Robots Empowered by AI Foundation Models and the Opportunities for 6G. Huawei. https://www.huawei.com/en/huaweitech/future-technologies/robots-empowered-ai-foundation-models-6g

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Ankit Singh

Written by

Ankit Singh

Ankit is a research scholar based in Mumbai, India, specializing in neuronal membrane biophysics. He holds a Bachelor of Science degree in Chemistry and has a keen interest in building scientific instruments. He is also passionate about content writing and can adeptly convey complex concepts. Outside of academia, Ankit enjoys sports, reading books, and exploring documentaries, and has a particular interest in credit cards and finance. He also finds relaxation and inspiration in music, especially songs and ghazals.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Singh, Ankit. (2025, November 05). How Has Robotic Dexterity Changed with AI?. AZoRobotics. Retrieved on November 05, 2025 from https://www.azorobotics.com/Article.aspx?ArticleID=786.

  • MLA

    Singh, Ankit. "How Has Robotic Dexterity Changed with AI?". AZoRobotics. 05 November 2025. <https://www.azorobotics.com/Article.aspx?ArticleID=786>.

  • Chicago

    Singh, Ankit. "How Has Robotic Dexterity Changed with AI?". AZoRobotics. https://www.azorobotics.com/Article.aspx?ArticleID=786. (accessed November 05, 2025).

  • Harvard

    Singh, Ankit. 2025. How Has Robotic Dexterity Changed with AI?. AZoRobotics, viewed 05 November 2025, https://www.azorobotics.com/Article.aspx?ArticleID=786.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this article?

Leave your feedback
Your comment type
Submit

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.