Posted in | News | Drones and UAVs

Swarming Method Helps Unmanned Vehicles Accomplish Various Missions

Download PDF Copy

Reviewed

Aug 11 2020

A reinforcement learning method developed by U.S. Army researchers will enable hordes of remotely operated ground and aerial vehicles to optimally achieve numerous missions while reducing uncertainties related to performance.

A small unmanned Clearpath Husky robot, which was used by ARL researchers to develop a new technique to quickly teach robots novel traversal behaviors with minimal human oversight. Image Credit: U.S. Army.

Swarming can be described as a technique of operations in which numerous unmanned systems behave as a cohesive unit by intensely synchronizing their actions.

According to Army researchers, upcoming multiple domain battles will need swarms of coordinated heterogeneous and dynamically coupled mobile platforms to beat enemy threats and capabilities that target the U.S. forces.

The U.S. Army is turning its attention toward swarming technology to carry out dangerous and time-consuming activities, stated Dr Jemin George from Army Research Laboratory at the U.S. Army Combat Capabilities Development Command.

Finding optimal guidance policies for these swarming vehicles in real-time is a key requirement for enhancing warfighters’ tactical situational awareness, allowing the U.S. Army to dominate in a contested environment.

Dr Jemin George, Army Research Laboratory, U.S. Army Combat Capabilities Development Command

Through reinforcement learning, uncertain agents can be optimally managed to attain multi-objective objectives, specifically when the accurate model for the agent is not available. But the prevalent reinforcement learning methods can only be used in a centralized way, which involves pooling the state data of the whole swarm at a central learner.

But this significantly boosts the communication needs and computational complexity, leading to unreasonable learning time, added George.

Therefore, to resolve this problem, George collaborated with Professor Aranya Chakrabortty from North Carolina State University and Professor He Bai from Oklahoma State University, to create a research effort to deal with the large-scale, multi-agent reinforcement learning issue.

The study was financially supported by the Army via the Director’s Research Award for External Collaborative Initiative—a laboratory program established to replicate and support novel and innovative studies in association with external partners.

The key objective of this study is to create a theoretical basis for data-driven optimal control of commercial-scale swarm networks, in which control actions will be considered on the basis of low-dimensional measurement information and not dynamic models.

The present method is referred to as Hierarchical Reinforcement Learning (HRL), and it breaks down the universal control goal into numerous hierarchies, such as a wide swarm-level macroscopic control and multiple small group-level microscopic control.

Each hierarchy has its own learning loop with respective local and global reward functions. We were able to significantly reduce the learning time by running these learning loops in parallel.

Dr Jemin George, Army Research Laboratory, U.S. Army Combat Capabilities Development Command

George believes that the main application of online reinforcement learning control of swarm is to solve a large-scale algebraic matrix Riccati equation with the help of a system, or swarm, input-output information.

The initial method of the team for dealing with this large-scale matrix Riccati equation was to split the swarm into numerous smaller groups and apply group-level local reinforcement learning simultaneously while performing a universal reinforcement learning on a smaller dimensional compressed state from every group.

The present HRL scheme of the researchers employs a decoupling mechanism that enables the research team to hierarchically approximate a solution to the large-scale matrix equation by initially solving the local reinforcement learning issue and subsequently creating the global control from local controllers (by finding the solution to a least-squares problem) rather than running a global reinforcement learning scheme on the aggregated state.

This approach additionally decreases the learning time. Experiments have demonstrated that when compared to a centralized method, the HRL method effectively decreased the learning time by as much as 80% while reducing the optimality loss to 5%.

“Our current HRL efforts will allow us to develop control policies for swarms of unmanned aerial and ground vehicles so that they can optimally accomplish different mission sets even though the individual dynamics for the swarming agents are unknown,” added George.

George added that he is confident that this study will have a positive impact on the upcoming battlefield, and this has been made feasible by the new association that was recently made.

The core purpose of the ARL science and technology community is to create and exploit scientific knowledge for transformational overmatch. By engaging external research through ECI and other cooperative mechanisms, we hope to conduct disruptive foundational research that will lead to Army modernization while serving as Army’s primary collaborative link to the world-wide scientific community.

Dr Jemin George, Army Research Laboratory, U.S. Army Combat Capabilities Development Command

The researchers are now working to further enhance their HRL control scheme by accounting the optimal grouping of agents in the swarm to reduce both communication and computation complexities while reducing the optimality gap.

The team is also analyzing the application of deep recurrent neural networks to understand and estimate the most optimal grouping patterns and also the use of advanced methods for the best coordination of unmanned ground and air vehicles used in multi-domain operations in dense urban territory.

Together with the ECI partners, George has recently organized and chaired an invited virtual Session on Multi-Agent Reinforcement Learning conducted at the 2020 American Control Conference, in which the team presented the study results.

Source:

https://www.army.mil/