Background
Assistive technologies, including robotic arms, are vital for patients with severe motor impairments to regain independence. Previous systems often relied on physical input devices like joysticks, which are difficult for the target users to operate. Later, gaze-based controls emerged, but they were limited by fixed monitors, collision risks, and a lack of real-world object awareness, requiring manual calibration or combined inputs like voice commands.
This paper addresses these limitations by proposing a system that integrates AR (HoloLens2) with real-time object detection (you-only-look-once (YOLO)v8). This enables intuitive, gaze-only control by automatically locating objects in three-dimensional (3D) space and providing visual feedback, eliminating the need for external hardware or complex calibration.
The Study
This system brought together a mix of technologies to create a hands-free, user-friendly assistive tool. At the center was the Microsoft HoloLens 2 headset, which handled spatial mapping, eye tracking for input, and displayed the AR interface.
For object detection, a lightweight YOLOv8n model ran on a connected PC. It identified items like cups or bottles in real time and sent the results back to the HoloLens, where bounding boxes were overlaid in the user’s view.
To select an object, users simply looked at its bounding box for five seconds. The system then used ray casting on the spatial mesh to estimate the object’s 3D position in the real world.
Identifying where the robotic arm was in the AR space was a key step. This was done by scanning a QR code placed near the arm during setup, which gave the system a fixed reference point. From there, it calculated a path for the Kinova Gen2 robotic arm to move in and grasp the selected object.
After grasping, an AR menu appeared, controlled entirely by gaze. Users could choose from simple options like “Bring” (to bring the object to their mouth), “Move” (to shift it out of the way), or “Cancel.”
Two experiments were run to test how well the system worked. The first checked how accurately the system could locate objects by comparing its 3D coordinates to actual positions across a grid of 64 points. The second looked at how reliably the robot could grasp objects at different distances - helping define the system’s practical limits.
Four people took part in the study (three men, one woman), each using the gaze-based interface in the same way to keep results consistent.
Altogether, this integrated approach combined advanced AR, computer vision, and robotics to deliver a hands-free assistive solution that’s both technically robust and user-friendly.
Results and Discussion
The first experiment focused on measuring how accurately the system could localize objects in 3D space. The results showed that while localization errors increased with distance, this effect was much more pronounced along the depth axis. Up to 50 centimeters (cm), depth error stayed relatively consistent - but beyond that, it rose sharply, with a noticeable jump of 5.72 cm between 60 and 70 cm.
In comparison, horizontal errors were smaller and increased more gradually. A 2D plot helped visualize this, clearly highlighting the “predominance of depth error” as the main challenge.
The second experiment tied these localization findings to real-world performance by testing how well the robotic arm could grasp objects. A key outcome was identifying a depth threshold around 1 meter (corresponding to about 61 cm in ground truth distance). Within this range, the system achieved a 100 % success rate across all tested horizontal positions. However, once objects were placed beyond that one-meter depth, the success rate dropped significantly to around 56–60 %.
Further analysis showed a clear link between failed grasps and depth errors that exceeded a critical threshold - between 9.6 and 12 cm - making it difficult for the arm to reach the target accurately.
Compared to previous systems, this setup performed competitively - or even better - especially within its optimal operating zone. It not only delivered reliable grasping but also offered the added advantages of real-time object detection and a fully gaze-controlled AR interface.
The authors acknowledged that the system’s localization accuracy doesn't match the sub-centimeter precision of some highly calibrated methods. Still, it proved accurate enough for consistent performance within a defined one-meter workspace.
These findings help establish the system’s practical limits and confirm its viability as an assistive tool for object manipulation - while also pointing to depth accuracy as the key area for future improvement if the goal is to extend its reach beyond one meter.
Conclusion
In conclusion, this research presented a novel assistive system that controls a robotic arm using AR and eye tracking, designed for individuals with severe motor impairments. By integrating the HoloLens2 headset with real-time object detection, it enables intuitive, hands-free operation for tasks like grasping objects.
Empirical testing demonstrated the system is highly reliable, achieving a 100 % grasp success rate within a one-meter range, though performance declines beyond this due to depth perception limits. The authors emphasize that these findings are preliminary, based on four participants, and that further studies with larger cohorts will be required to validate the generalizability of the results.
This work validates a practical and user-friendly alternative to traditional control methods, marking a significant step forward in accessible assistive technology, while also identifying AR-based depth estimation as a key target for future refinement.
Journal Reference
Joung-Woo Hyung, Woochan Na, Won, K., & Kim, D.-J. (2025). Robotic arm control by augmented reality-assisted object detection. Scientific Reports, 15(1). DOI:10.1038/s41598-025-19514-5. https://www.nature.com/articles/s41598-025-19514-5
Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.