New AI Model Helps Pinpoint Genetic Variants That Cause Rare Diseases

A new AI tool developed by Harvard Medical School researchers could significantly improve how we diagnose and understand rare genetic diseases.

An example output from the popEVE portal. The left and center panels show variant scores in chart and list formats, ranging from most likely to cause disease (dark purple) to least likely (yellow). The right panel depicts a protein crystal structure colored with variant scores.
An example output from the popEVE portal. The left and center panels show variant scores in chart and list formats, ranging from most likely to cause disease (dark purple) to least likely (yellow). The right panel depicts a protein crystal structure colored with variant scores. Image Credit: Marks Lab, Harvard Medical School

Every human genome contains tens of thousands of small genetic changes, known as variants, that affect how cells make proteins. Yet only a few of these actually cause disease. The challenge for scientists has been identifying the harmful variants hidden among the vast majority that are harmless.

To address this, a team from Harvard Medical School (HMS) and collaborators has introduced popEVE, a machine learning model that scores each variant in a person’s genome based on its likelihood of causing disease. Unlike many previous tools, popEVE places these variants on a continuous spectrum, making it easier to prioritize them for diagnosis and research.

In a study published in Nature Genetics, the researchers show that popEVE can reliably distinguish between benign and disease-causing variants and even predict whether a variant is more likely to cause death in childhood or adulthood.

The model identified over 100 previously unknown genetic alterations responsible for rare, undiagnosed conditions.

Our goal was to develop a model that ranks variants by disease severity – providing a prioritized, clinically meaningful view of a person’s genome.

Debora Marks, Study Co-Senior Author and Professor, Systems Biology, Blavatnik Institute, Harvard Medical School

The team hopes that popEVE will be used to help clinicians diagnose single-variant genetic diseases, particularly rare diseases, more quickly and accurately. The model could also be used to identify new drug targets for genetic conditions.

The tool complements efforts across the HMS community to conduct research, build AI tools, and engage in nationwide collaborations to improve the diagnosis and treatment of rare diseases.

Turning EVE into popEVE

As genomic sequencing has become more accessible, physicians now have access to vast amounts of data on their patients’ genetic variants.

But when it comes to variants whose connection to disease isn’t well understood, figuring out which ones are responsible for a patient’s condition is often slow, inefficient, and inconclusive. As a result, many individuals with rare or atypical genetic diseases go undiagnosed for years.

To help address this challenge, the Marks Lab developed a generative AI model called EVE several years ago. EVE uses deep evolutionary data from a range of species to learn which mutations tend to be conserved across biology - insights that help it predict how variants in human genes might impact protein function.

However, EVE and other existing variant prediction models have their limitations. Specifically, they can’t easily compare variants across different human genes to assess which ones pose the greatest risk to health.

That gap inspired the team to look for a better way to prioritize variants across genes, particularly to support clinicians trying to identify the root cause of a patient’s symptoms, explained Rose Orenbuch, a research fellow in the Marks Lab and lead author of the new study.

To build popEVE, the researchers added two key components to the original model: a large-language protein model that learns from amino acid sequences, and human population data that reflects natural genetic variation. These additions allowed them to calibrate the model so that variant scores could be compared across genes, a critical step for real-world clinical use.

By integrating both cross-species and within-species data, popEVE captures not only how a variant disrupts protein function, but also how critical that disruption might be for human biology, said Debora Marks.

Putting popEVE Through its Paces

When the researchers tested popEVE on documented genetic variants and real-world case studies, the model delivered strong results. It was able to:

  • Distinguish between pathogenic (disease-causing) and benign variants
  • Differentiate healthy individuals from patients with severe developmental disorders
  • Predict whether a variant was likely to cause death in childhood or adulthood
  • Determine whether a mutation was inherited or occurred spontaneously - even without access to parental genetic data

Crucially, popEVE showed no significant ancestry bias. It performed consistently across individuals from diverse genetic backgrounds and avoided overestimating the number of disease-causing variants.

To further assess its clinical potential, the researchers applied popEVE to a cohort of about 30,000 patients with severe developmental disorders who had not yet received a diagnosis.

These are diseases that we assumed were genetic and caused by a single variant based on their severity, but the variant hadn’t been found.

Rose Orenbuch, Study Lead Author and Research Fellow, Harvard Medical School

Using popEVE, the team was able to pinpoint a likely diagnosis in roughly one-third of these cases.

One of the most significant outcomes was the model’s identification of disease-linked variants in 123 genes that had never before been associated with developmental disorders. Since then, independent studies have confirmed that 25 of those genes are indeed disease-causing, further validating popEVE’s power to uncover hidden genetic contributors to rare diseases.

Moving popEVE into the Clinic

Marks and her team are now working to make popEVE widely available for real-world use by clinicians and researchers.

The model can already be accessed through an online portal, where users can explore variant scores through interactive visualizations. The interface includes a heat map (ranging from dark purple to yellow to represent disease severity), detailed variant lists, and a 3D protein structure colored by variant scores, offering a multi-layered view of how specific mutations might impact protein function.

The team is also collaborating with several major institutions, including the Children’s Rare Disease Collaborative at Boston Children’s Hospital, the Division of Human Genetics at the Children’s Hospital of Philadelphia, and Genomics England in partnership with the Wellcome Sanger Institute.

According to Marks, a clinician-researcher at the Centro Nacional de Análisis Genómico in Barcelona has already been using popEVE to help interpret variants in his patients, insights that have led to multiple rare-disease diagnoses.

I feel like we are a step closer to popEVE being useful in the day-to-day pipeline of trying to diagnose genetic diseases faster,” said Orenbuch.

She noted that the model holds particular promise for patients who haven’t been diagnosed through conventional genetic testing.

These are the cases where we have to look outside of the known disease genes, and popEVE has already found a lot of gene candidates.

Rose Orenbuch, Study Lead Author and Research Fellow, Harvard Medical School

While the model still needs further validation to confirm its accuracy and safety before it can be fully adopted in clinical practice, the team is optimistic about its potential. They hope popEVE will boost confidence among clinicians in using computational tools to support genetic diagnoses.

To expand its reach, the researchers are integrating popEVE scores into widely used variant and protein databases like ProtVar and UniProt. This will enable scientists around the world to compare variant scores across genes more easily and incorporate the tool into their research.

By helping identify the genetic roots of rare or complex diseases, the team believes popEVE could also lead to new drug targets and therapeutic approaches.

We think prioritizing variants based on predicted disease severity will improve the odds of diagnosis and ultimately pave the way for better treatment and drug discovery,” said Marks.

Journal Reference:

Orenbuch, R., et al. (2025) Proteome-wide model for human disease genetics. Nature Genetics. DOI: 10.1038/s41588-025-02400-1. https://www.nature.com/articles/s41588-025-02400-1

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.