Scientists from UC Berkeley have come up with an innovative robotic learning technology that allows robots to visualize their later actions to enable them to interpret a way to control objects they have not confronted before.
In the years to come, this technology can assist self-driving cars to predict events that might occur on the road and make highly intelligent robotic assistants in homes. However, the initial prototype concentrates on acquiring knowledge of uncomplicated manual skills completely through autonomous play.
Vestri the robot imagines how to perform tasks
UC Berkeley video by Roxanne Makasdjian and Stephen McNally
By applying this technology, known as visual foresight, the robots can anticipate the views of their cameras by performing a specific sequence of movements. These robotic visualizations are yet comparatively uncomplicated at present, where visualizations are performed just a few seconds ahead into the future, but they are adequate for the robot to interpret how to move objects across on a table without interrupting obstacles. Essentially, the robot can gain knowledge of carrying out the tasks without any assistance from humans or advance knowledge related to physics, its surroundings, or related to the objects.
This is due to the fact that the visualization is learned completely from the start by means of unsupervised and unwatched exploration in which the robot plays with objects on the table. At the end of this play mode, the robot constructs an anticipative model of the environment and can adopt this model to control new objects that it has not confronted earlier.
In the same way that we can imagine how our actions will move the objects in our environment, this method can enable a robot to visualize how different behaviors will affect the world around it. This can enable intelligent planning of highly flexible skills in complex real-world situations.
Sergey Levine, Assistant Professor, Department of Electrical Engineering and Computer Sciences
The researchers demonstrated the visual foresight technology on December 5th, 2017 at the Neural Information Processing Systems conference in Long Beach, California.
At the core of this system this system is an in-depth learning technology dependent on convolutional recurrent video prediction, or dynamic neural advection (DNA). DNA-based models speculate the way pixels of an image will move from one frame to the subsequent frame by analyzing the actions of the robot. Latest advancements to this type of models, together with enhanced planning potentials, have allowed robotic control dependent on video prediction to carry out highly complicated tasks such as repositioning multiple objects and sliding toys around obstacles.
“In that past, robots have learned skills with a human supervisor helping and providing feedback. What makes this work exciting is that the robots can learn a range of visual object manipulation skills entirely on their own,” stated Chelsea Finn, a doctoral student in Levine’s lab and inventor of the original DNA model.
By adopting the new technology, a robot moves objects on a table and then adopts the perceived prediction model to select motions that will move an object to a chosen location. Robots adopt the perceived model from unprocessed camera observations to instruct themselves on the way to circumvent obstacles and push objects around obstructions.
Humans learn object manipulation skills without any teacher through millions of interactions with a variety of objects during their lifetime. We have shown that it possible to build a robotic system that also leverages large amounts of autonomously collected data to learn widely applicable manipulation skills, specifically object pushing skills.
Frederik Ebert, Graduate Student in Levine’s lab
Due to the fact that manipulation through video prediction is dependent only on observations that can be autonomously gathered by the robot (e.g. through camera images), the ensuing technique is general and widely applicable. Unlike traditional computer vision techniques that mandate humans to manually label thousands, or even millions, of images, the development of video prediction models mandates only unannotated video, which can be completely autonomously gathered by the robot. As a matter of fact, video prediction models have also been implemented for datasets that depict every single thing from human activities to driving, with convincing outcomes.
Children can learn about their world by playing with toys, moving them around, grasping, and so forth. Our aim with this research is to enable a robot to do the same: to learn about how the world works through autonomous interaction. The capabilities of this robot are still limited, but its skills are learned entirely automatically, and allow it to predict complex physical interactions with objects that it has never seen before by building on previously observed patterns of interaction.
The UC Berkeley team has continued the study of manipulation by means of video prediction, concentrating on further advancement of video prediction and prediction-dependent manipulation, and also on creating more advanced techniques through which robots can gather more focused video data for complicated tasks such as collecting and placing objects and controlling deformable and soft objects such as rope or cloth, as well as assembly.