AI Framework from University of Sheffield and AstraZeneca Enhances Protein Design for Drug Development

A new AI-driven approach developed by researchers at the University of Sheffield, AstraZeneca, and the University of Southampton could make it easier to design proteins crucial for emerging therapies.

Image Credit: Anggalih Prasetya/Shutterstock.com

Published in Nature Machine Intelligence, the study introduces a machine learning framework that has demonstrated improved accuracy in inverse protein folding compared to the current leading methods.

Inverse protein folding is essential for engineering novel proteins. It involves identifying amino acid sequences—proteins’ building blocks—that will fold into a specific 3D structure, enabling targeted biological functions. This process is central to drug development, where precision protein design is needed to interact with specific molecules in the body. However, predicting how sequences fold remains a complex challenge.

To address this, scientists are increasingly turning to machine learning. By training models on large datasets of known protein structures and sequences, researchers aim to better predict which sequences will fold into stable, functional proteins.

The new framework, called MapDiff, outperformed current state-of-the-art AI in simulation tests, showing greater accuracy in predicting amino acid sequences likely to form desired structures.

This advancement offers a promising path toward accelerating the design of proteins used in vaccines, gene therapies, and other treatments. It also complements tools like AlphaFold, which predicts 3D protein structures starting from amino acid sequences—essentially the reverse of inverse folding.

This work represents a significant step forward in using AI to design proteins with desired structures. By learning how to generate amino acid sequences that are likely to fold into specific 3D structures, our method opens new possibilities for designing new therapeutic proteins, which can be used in various therapeutic applications. It’s exciting to see AI helping us tackle such a fundamental challenge in biology.

Haiping Lu, Professor and Study Corresponding Author, Machine Learning, University of Sheffield

During my PhD, I was motivated by the potential of AI to accelerate biological discovery. I’m proud that our method, MapDiff, helps design protein sequences that are more likely to fold into desired 3D structures — a key step towards advancing next-generation therapeutics,” added Peizhen Bai, Senior Machine Learning Scientist, AstraZeneca, Ph.D., AI, University of Sheffield’s School of Computer Science.

The study stems from a collaborative, non-funded effort combining academic and industry expertise. It builds on earlier work between Sheffield and AstraZeneca, including the development of DrugBAN—an AI model that predicts whether a drug will bind to its target proteins in the body. That research was among the most cited papers in Nature Machine Intelligence in 2023.

Journal Reference:

Bai, P., et al. (2025) Mask-prior-guided denoising diffusion improves inverse protein folding. Nature Machine Intelligence. doi.org/10.1038/s42256-025-01042-6

Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Submit

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.