Brown University Computer Scientists, in their research work, have developed a new system that improves the efficiency of robots to follow spoken instructions, no matter how specific or abstract those instructions could be.
This new development was recently presented at the Robotics: Science and Systems 2017 conference in Boston, and is considered to be a step toward robots capable of communicating with human collaborators in a more seamless manner.
The research was headed by Dilip Arumugam and Siddharth Karamcheti, both Undergraduates at Brown when the work was performed. Arumugam is presently a Brown Graduate Student. They collaborated with Graduate Student Nakul Gopalan and Postdoctoral Researcher Lawson L.S. Wong in the lab of Stefanie Tellex, a Professor of Computer Science at Brown.
The issue we’re addressing is language grounding, which means having a robot take natural language commands and generate behaviors that successfully complete a task. The problem is that commands can have different levels of abstraction, and that can cause a robot to plan its actions inefficiently or fail to complete the task at all.
Dilip Arumugam, Undergraduate, Brown University
For instance, picture an individual in a warehouse working together with a robotic forklift. This individual could say to the robotic partner, “Grab that pallet.” This indeed is considered to be a majorly abstract command that highlights a number of smaller sub-steps, such as putting the forks underneath, lining up the lift and hoisting it up. Other common commands could however be more fine-grained, involving just one action: “Tilt the forks back a little,” for example.
The Researchers state that the varied levels of abstraction can lead to problems for existing robot language models. Most of the models attempt to identify hint cues from the words in the command and also from the sentence structure and then understand a desired action from that language. This is followed by the inference results triggering a planning algorithm that challenges to solve the task. However, the robot might underplan for more abstract instructions that involve more sub-steps, or overplan for simple instructions if it does not consider the specificity of the instructions. This will lead to incorrect actions or an excessively long planning interval before the robot takes action.
However, this new system introduces an additional level of sophistication to the currently available models. Besides just understanding a desired task from language, this new system will also be able to analyze the language to understand a distinct level of abstraction.
That allows us to couple our task inference as well as our inferred specificity level with a hierarchical planner, so we can plan at any level of abstraction. In turn, we can get dramatic speed-ups in performance when executing tasks compared to existing systems.
Dilip Arumugam, Undergraduate, Brown University
The Researchers developed their new model by using Mechanical Turk, Amazon’s crowdsourcing marketplace and a virtual task domain called Cleanup World. The online domain comprises of a robotic agent, a few color-coded rooms and an object that can be controlled, in this case, referring to a chair capable of being moved from one room to another room.
Mechanical Turk volunteers observed the robot agent carrying out a task in the Cleanup World domain, for example, shifting the chair from a red room to an adjacent blue room. The volunteers were then asked to say what instructions they would have delivered to the robot in order to get it to perform the task that they just witnessed. The volunteers were provided with guidance on the level of specificity needed in their directions. The instructions varied from the high-level, “Take the chair to the blue room,” to the stepwise-level, “Take five steps north, turn right, take two more steps, get the chair, turn left, turn left, take five steps south.” A third level of abstraction employed terminology somewhere in between those two.
The Researchers trained their system to understand the kind of words used in each level of abstraction by using the volunteers’ spoken instructions. From there, the system learnt how to infer a desired action and also how to understand to the abstraction level of the command. With knowledge about these two things, the system will then be able to activate its hierarchical planning algorithm in order to solve the task from the appropriate level.
The Researchers tested their trained system with an actual Roomba-like robot functioning in a physical world similar to the Cleanup World space and also tested it in the virtual Cleanup World. They demonstrated that a robot responded to commands in one second 90% of the time when it was able to infer both the specificity of the instructions and the task. In comparison, half of all tasks needed 20 or more seconds of planning time when no level of specificity was inferred.
We ultimately want to see robots that are helpful partners in our homes and workplaces. This work is a step toward the goal of enabling people to communicate with robots in much the same way that we communicate with each other.
Stefanie Tellex, a Professor of Computer Science at Brown, who specializes in human-robot collaboration
The National Science Foundation (IIS-1637614), DARPA (W911NF-15-1-0503), NASA (NNX16AR61G) and the Croucher Foundation supported the work.