Enhancing Surgical Robot Precision With Advanced Datasets

A recent paper published in the journal Applied Sciences introduced a novel dataset called robotic surgical maneuvers (ROSMA) alongside a methodology for instrument detection and gesture segmentation in robotic surgical tasks. This dataset encompasses kinematic and video data from common training surgical tasks executed with the da Vinci research kit (dVRK).

ROSMA Dataset Redefines Instrument Detection and Gesture Segmentation
Patient and surgeon side kinematics. The Kinematics of each PSM is defined with respect to the common frame ECM, while the MTMs are described with respect to frame HRSV. Image Credit: https://www.mdpi.com/2076-3417/14/9/3701

The researchers manually annotated the dataset to detect instruments and segment gestures. Additionally, they introduced a neural network model that combines YOLOv4 (You Only Look Once, version 4) and bi-directional Long Short-Term Memory (LSTM) networks to validate their annotations.


Surgical data science is a new and emerging field focused on enhancing the quality and efficiency of surgical procedures by analyzing data from surgical robots, sensors, and images. A primary challenge in this field is understanding the surgical scene and the activities of both surgeons and robotic instruments.

Addressing this challenge holds promise for various applications, including virtual coaching, skill assessment, task recognition, and automation. However, the availability of large and annotated datasets tailored for surgical data analysis poses a significant challenge. Most existing datasets focus on specific tasks or scenarios, limiting their applicability to broader contexts.

About the Research

In this study, the authors developed the ROSMA dataset, a comprehensive resource tailored for surgical data analysis. This dataset includes 206 trials across three common surgical training tasks—post and sleeve, pea on a peg, and wire chaser—designed to evaluate skills crucial for laparoscopic surgery, such as hand-eye coordination, bimanual dexterity, depth perception, and precision.

The tasks were performed by 10 subjects with varied expertise levels using the dVRK platform. The dataset captures 154-dimensional kinematic data at a frequency of 50 Hz, and video data at 15 fps with a resolution of 1024 × 768 pixels. It also includes annotations for task evaluation based on time and errors.

Expanding the ROSMA dataset, the study introduced two annotated subsets: ROSMA with 24 videos (ROSMAT24), featuring bounding box annotations for instrument detection, and ROSMA with 40 videos containing gesture annotations (ROSMAG40), encompassing both high and low-level gesture annotations.

The researchers proposed an annotation methodology that assigns independent labels to right-handed and left-handed tools, which can facilitate the identification of the main and supporting tools across various scenarios. Furthermore, they defined two gesture classes: maneuver descriptors, delineating high-level actions common to all surgical tasks, and fine-grain descriptors, detailing low-level actions specific to the ROSMA training tasks.

To validate the annotation approach, the paper introduced a neural network model merging a YOLOv4 network for instrument detection with a bi-directional LSTM network for gesture segmentation. The authors assessed their model across two experimental scenarios: one mimicking the task and tool configuration of the training set and another with a different configuration.

Additionally, they benchmarked their model against other state-of-the-art methods, reporting performance metrics such as mean average precision, accuracy, recall, and F1-score.

Research Findings

The outcomes indicated that the new model achieved high accuracy and generalization capabilities across both instrument detection and gesture segmentation tasks. Specifically, for instrument detection, the new methodology achieved a mean average precision of 98.5 % under the same task and tool configuration scenario and 97.8 % when confronted with a different configuration. For gesture segmentation, the model demonstrated an accuracy of 77.35 % for maneuver descriptors and 75.16 % for fine-grain descriptors.

Furthermore, the model surpassed alternative methods such as faster region-based convolutional neural networks (Faster R-CNN), residual networks (ResNet), and LSTM. The study attributed the model's success to the utilization of YOLOv4, renowned for its speed and accuracy in object detection, and the incorporation of bidirectional LSTM, which effectively captures temporal dependencies and contextual variation within sequential data.


The authors explored the potential implications of their dataset and methodology within the realm of surgical data sciences. They proposed that their dataset could serve as a valuable resource for developing and evaluating new algorithms and techniques in surgical scene understanding task recognition, skill assessment, and automation.

Additionally, they suggested that their methodology could extend beyond surgical tasks to other domains, including industrial robotics, human-robot interaction, and activity recognition. They also emphasized the benefits of their annotation method, which can provide useful information for supervisory or autonomous systems, such as the role and state of each tool in the surgical scene.


In summary, the novel dataset and methodology showcased effectiveness in instrument detection and gesture segmentation within robotic surgical tasks. The researchers successfully demonstrated high accuracy and generalization capabilities for both tasks, surpassing other state-of-the-art methods.

Moving forward, they proposed directions for future work, including expanding the dataset with additional tasks and scenarios, integrating image data and multimodal fusion techniques, and investigating the utilization of attention mechanisms and graph neural networks for gesture segmentation.

Journal Reference

Rivas-Blanco, I.; López-Casado, C.; Herrera-López, J.M.; Cabrera-Villa, J.; Pérez-del-Pulgar, C.J. Instrument Detection and Descriptive Gesture Segmentation on a Robotic Surgical Maneuvers Dataset. Appl. Sci. 2024, 14, 3701. https://doi.org/10.3390/app14093701, https://www.mdpi.com/2076-3417/14/9/3701.

Disclaimer: The views expressed here are those of the author expressed in their private capacity and do not necessarily represent the views of AZoM.com Limited T/A AZoNetwork the owner and operator of this website. This disclaimer forms part of the Terms and conditions of use of this website.

Article Revisions

  • May 17 2024 - Title changed from "ROSMA Dataset Enhances Robotic Surgery Analysis" to "Enhancing Surgical Robot Precision With Advanced Dataset"
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2024, May 17). Enhancing Surgical Robot Precision With Advanced Datasets. AZoRobotics. Retrieved on June 20, 2024 from https://www.azorobotics.com/News.aspx?newsID=14882.

  • MLA

    Osama, Muhammad. "Enhancing Surgical Robot Precision With Advanced Datasets". AZoRobotics. 20 June 2024. <https://www.azorobotics.com/News.aspx?newsID=14882>.

  • Chicago

    Osama, Muhammad. "Enhancing Surgical Robot Precision With Advanced Datasets". AZoRobotics. https://www.azorobotics.com/News.aspx?newsID=14882. (accessed June 20, 2024).

  • Harvard

    Osama, Muhammad. 2024. Enhancing Surgical Robot Precision With Advanced Datasets. AZoRobotics, viewed 20 June 2024, https://www.azorobotics.com/News.aspx?newsID=14882.

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.