Two teams of Princeton graduate students are making strong showings in national robotics competitions this year. The teams are combining advances in computation with those in sensing technology.
Princeton graduate students are developing a software system that can engagingly converse with humans on a variety of topics. Team members, from left, Niranjani Prasad, Ari Seff, Karan Singh and Daniel Suo work on the project in the computer science building. (Credit: David Kelly Crow for the Office of Engineering Communications)
One group is joining with teammates at the Massachusetts Institute of Technology later this month for the third annual Amazon Robotics Challenge in Nagoya, Japan. The challenge asks teams to develop a robot that can recognize various objects that it has never seen before, pick them up and pack them in a box. The team finished third overall in last year’s challenge. The second Princeton team is a finalist in Amazon’s ongoing Alexa Prize competition, which challenges teams to create software that converses naturally with people.
The Alexa team is plumbing methods to understand and work with language, while the Robotics Challenge team is pushing the boundaries of computer vision and image processing.
“I’m excited to see Princeton teams leading in robotics competitions,” said Jennifer Rexford, the Gordon Y. S. Wu Professor in Engineering and chair of the Department of Computer Science. “The transition of robotics from controlled environments, like factories, to the complex human world brings tremendous opportunity to serve society, while also raising difficult technical challenges. Princeton is tackling these challenges by bringing advances in sensing and computation together to allow future robots to understand the world around them and interact safely in human society.”
A sharp eye and a delicate touch
During a recent demonstration in their lab, Shuran Song and Andy Zeng held up a small black metal-mesh basket to demonstrate the challenge of creating accurate robotic vision. It was an everyday object, the kind of thing used to hold pencils on a desk. But as Zeng slowly spun the small basket in this hand, he said, “It may not seem like it, but this is actually one of the more challenging objects to handle.
“This is because the black reflective surface makes it hard for 3-D sensors to see,” Zeng explained. “Such surfaces appear often in our everyday environments, but are less addressed in recent computer vision research. By using various objects with special properties like this one, the competition forces us to tackle challenging vision problems for real-world scenarios.”
In a video the team made last year, viewers can see a robot painted Princeton orange look into a red container to identify an object using a sensor on the robotic arm. After determining it is a coffee can, the robot cranes down into the crate, uses high-powered suction cups to pick up the can, then lifts its arm upward and places the coffee can on to a shelf.
Princeton’s team is led by Zeng and Song, both graduate students in computer science, and their faculty adviser is Thomas Funkhouser, the David M. Siegel Professor of Computer Science. While the Princeton students work on building an algorithm for identifying objects, their teammates from MIT are working on “manipulation,” or using the robotic arm and hand to grasp and move the objects.
This will be the second year the team has taken part in the challenge, which originally was known as the Amazon Picking Challenge. The students said that although the algorithm they built last year was accurate, the two teams that finished ahead of them built a robot that worked more quickly. “Our biggest weakness was speed,” Song said.
They are working to solve that problem by installing more cameras to supplement the camera on the robotic arm. With that change, along with an improved algorithm, hopefully “we can obtain a speedup similar to the teams that performed well last year,” Zeng said.
Last year, the team was given a list of objects in advance that the robot might be asked to identify, grasp and move. But this year, the challenge is harder: Participants will only have that information a half-hour in advance. That has required them to create an algorithm that is more versatile, the students said.
“We have to adapt our algorithm to be versatile enough so that it can still recognize these new objects” with less time, Zeng said.
A better voice
Since childhood, Cyril Zhang has dreamed of simulating consciousness in a machine. Although scientists still debate whether creating a truly conscious machine is even possible, for now the doctoral student in computer science is working on something that is “perhaps as close as you can get.”
Beginning in early October, Zhang and 12 other graduate students in computer science began working on a software system designed to converse coherently and engagingly with humans on a variety of popular topics. The effort is part of Amazon’s Alexa Prize competition, which requires international teams of university students to create software that can carry on such conversation for at least 20 minutes using Alexa, Amazon’s voice service, as a starting point. The winning team will receive a $500,000 award as well as a $1 million research grant for their department.
In November, Amazon notified the Princeton team that it was one of 12 university groups chosen to be sponsored to participate in the competition, bringing a $100,000 stipend. Since then, team members have been meeting every Thursday with their faculty adviser, Sanjeev Arora, the Charles C. Fitzmorris Professor of Computer Science, to coordinate their respective tasks.
Software that emulates human behavior is known as a socialbot. The Princeton team chose to name its socialbot “Pixie,” as the most concise combination of “Princeton” and “Alexa” that they could devise, according to team member Daniel Suo.
The team grew out of a weekly graduate student reading group on the topic of deep learning. Zhang described the team as “outsiders” since most of the students do not specialize in natural language processing research, but have experience in other fields such as machine learning theory, deep learning, computer vision, robotics or distributed systems.
“Our simultaneous strength and weakness is that we come from a variety of research backgrounds,” Zhang said. “What that means is that I’m optimistic we can come up with something that may never have occurred to someone who has spent a long time in the natural language processing field. But at the same time, we are definitely spending a lot of effort getting oriented to techniques that researchers in the field are already completely comfortable with.”
Emulating human conversation has long been a challenge in software design. Humans communicate in ambiguous terms, and correctly interpreting words and sentences depends on context, common sense and some understanding of the world. Because computers lack such prior knowledge and rely on precision, programming computers to make sense of ambiguities is extremely difficult.
“For any particular input, the bot has to determine – is the user trying to talk about a specific topic? Is it more just general chitchat? Which sources might be needed for generating a suitable response?” said team member Ari Seff.
The team also faces the broader challenge of designing a coherent personality that will entertain the user and keep the conversation natural and fluid.
“The Amazon competition challenges us to think about conversation from a social perspective,” Suo said. “It would get boring to talk to a bot that just told endless one-liners or just answered fact-based questions. But what about language cues indicating how interested or bored someone is? Can we guide the conversation to a new area rather than just react to the user?”
Members of the team expressed excitement at the collaborative nature of the project and the possibility of new and disruptive ideas growing from it.
“It’s an opportunity for us to build something together, but to also learn from each other,” Suo said.
In April, entrant teams received feedback from real-life Amazon Echo users on the success of their socialbot based on the relevance, coherence, interest and speed of the conversation. The final prize winners will be announced in November.
“If we win or if we don’t win is not the point,” team member Davit Buniatyan said. “The fact is that this research is advancing the future of machine learning.”
The Pixie team members are: Oluwatosin Adewale, Alex Beatson, Davit Buniatyan, Jason Ge, Misha Khodak, Holden Lee, Niranjani Prasad, Nikunj Saunshi, Ari Seff, Karan Singh, Daniel Suo, Kiran Vodrahalli and Cyril Zhang.