A team of researchers at the U.S. Army Research Laboratory and the University of Texas at Austin have formulated new methods for robots and computer programs to learn how to do tasks, by interacting with a human instructor. The research findings will be presented and published at the Association for the Advancement of Artificial Intelligence Conference in New Orleans, Louisiana, February 2 to 7.
ARL and UT researchers took up a particular case where a human provides real-time feedback in the form of critique. First introduced by collaborator Dr. Peter Stone, a Professor at the University of Texas at Austin, together with his former doctoral student, Brad Knox, as TAMER, or Training an Agent Manually via Evaluative Reinforcement, the ARL/UT team developed a new algorithm termed as Deep TAMER.
It is an extension of TAMER that uses deep learning - a class of machine learning algorithms that are roughly inspired by the brain to provide a robot the ability to learn how to perform errands by viewing video streams in a short span of time with a human trainer.
According to Army researcher Dr. Garrett Warnell, the team considered situations where a human teaches an agent how to act by witnessing it and providing a critique - for instance, "good job" or "bad job", just like a person might train a dog to perform a trick. Warnell said the researchers extended previous work in this field to enable this type of training for robots or computer programs that presently see the world through images, which is a crucial first step in designing learning agents that can function in the real world.
Many present methods in artificial intelligence require robots to interact with their environment for lengthy periods of time to learn how to ideally perform a task. During this process, the agent might perform actions that may not only be incorrect, like a robot bumping into a wall, but catastrophic, like a robot running off the side of a cliff. Warnell said help from humans will accelerate things up for the agents, and help them avoid possible pitfalls.
As an initial step, the researchers showcased Deep TAMER's success by using it with 15 minutes of human-provided feedback to teach an agent to perform better than humans on the Atari game of bowling - a task that is known to be difficult for even state-of-the-art approaches in artificial intelligence. Deep-TAMER-trained agents displayed superhuman performance, surpassing both their amateur trainers and, on average, an expert human Atari player.
In the next few years, researchers are keen to explore the applicability of their newest method in a wider range of environments: for instance, video games other than Atari Bowling and extra simulation environments, to better denote the types of agents and environments found when fielding robots in the real world.
Their research will be published in the AAAI 2018 conference proceedings.
The Army of the future will consist of Soldiers and autonomous teammates working side-by-side. While both humans and autonomous agents can be trained in advance, the team will inevitably be asked to perform tasks, for example, search and rescue or surveillance, in new environments they have not seen before. In these situations, humans are remarkably good at generalizing their training, but current artificially-intelligent agents are not.
Dr. Garrett Warnell, Army Researcher
Deep TAMER is the primary step in a line of research its researchers foresee will enable more effective human-autonomy teams in the Army. Eventually, they want independent agents that can rapidly and safely learn from their human teammates in a broad range of styles such as demonstration, natural language instruction and critique.