Machine-Learning Algorithms can 'Predict' Biological Language of Cancer, Neurodegenerative Diseases

Scientists have found that robust algorithms used by Facebook, Amazon and Netflix can 'predict' the biological language of neurodegenerative disorders, like Alzheimer’s disease and cancer.

Fluorescence microscopy image of protein condensates forming inside living cells. Image Credit: Weitz lab, Harvard University.

A computer language model was fed with Big data, generated during many years of research works, to find out if artificial intelligence can make more breakthrough discoveries than human beings.

Scientists based at St John’s College, University of Cambridge, have now discovered that the machine-learning technology could decode the 'biological language' of cancer and neurodegenerative disorders, including Alzheimer’s disease.

The team’s breakthrough study was recently published in the scientific journal PNAS and can perhaps be used to “correct the grammatical mistakes inside cells that cause disease” in the days to come.

Bringing machine-learning technology into research into neurodegenerative diseases and cancer is an absolute game-changer. Ultimately, the aim will be to use artificial intelligence to develop targeted drugs to dramatically ease symptoms or to prevent dementia happening at all.

Tuomas Knowles, Study Lead Author, Professor, and Fellow, St John’s College, University of Cambridge

Whenever Facebook recommends someone to befriend or Netflix suggests a series to watch, robust machine-learning algorithms are being used by the platforms to make highly educated guesses about what individuals will do next. Alexa and Siri are voice assistants that can also detect individual people and readily 'talk' back to them.

A similar machine-learning technology was used by Dr Kadi Liis Saar, the first author of the study and a Research Fellow at St John’s College to find out what precisely happens when things go wrong with proteins within the body to promote disease.

The human body is home to thousands and thousands of proteins and scientists don’t yet know the function of many of them. We asked a neural network-based language model to learn the language of proteins.

Kadi Liis Saar, Study first author Research Fellow, St John’s College, University of Cambridge

We specifically asked the programme to learn the language of shapeshifting biomolecular condensates—droplets of proteins found in cells—that scientists really need to understand to crack the language of biological function and malfunction that cause cancer and neurodegenerative diseases like Alzheimer’s. We found it could learn, without being explicitly told, what scientists have already discovered about the language of proteins over decades of research,” added Dr Saar.

Proteins are huge, complex molecules that play a number of crucial roles in the body. They perform most of the tasks in cells and are needed for the function, structure and regulation of organs and tissues in the body; for instance, antibodies are proteins that function to defend the body.

Three of the most common neurodegenerative disorders are Huntington’s, Parkinson’s, and Alzheimer’s diseases, but according to scientists, there are several hundred.

In the case of Alzheimer’s disease, which impacts 50 million people across the world, proteins go berserk, create clumps and destroy healthy nerve cells. A quality control system in a healthy brain efficiently discards these potentially harmful masses of proteins, called aggregates.

Investigators now believe that certain disordered proteins also create condensates, which are liquid-like droplets of proteins, that lack a membrane and freely combine with one another.

Protein condensates are different from the irreversible protein aggregates and can form and reform. They are usually compared to blobs of shapeshifting wax used in lava lamps.

Protein condensates have recently attracted a lot of attention in the scientific world because they control key events in the cell such as gene expression—how our DNA is converted into proteins—and protein synthesis—how the cells make proteins.

Tuomas Knowles, Study Lead Author, Professor, and Fellow, St John’s College, University of Cambridge

Professor Knowles continued, “Any defects connected with these protein droplets can lead to diseases such as cancer. This is why bringing natural language processing technology into research into the molecular origins of protein malfunction is vital if we want to be able to correct the grammatical mistakes inside cells that cause disease.”

Dr Saar added, “We fed the algorithm all of data held on the known proteins so it could learn and predict the language of proteins in the same way these models learn about human language and how WhatsApp knows how to suggest words for you to use.

Then we were able ask it about the specific grammar that leads only some proteins to form condensates inside cells. It is a very challenging problem and unlocking it will help us learn the rules of the language of disease,” Dr Saar further added.

The machine-learning technology is advancing at a rapid speed, thanks to increased computing power, the growing availability of data and technical advancements that have generated more robust algorithms.

Additional application of machine-learning could redefine upcoming research on neurodegenerative diseases and cancer. Findings could be made beyond what investigators presently already know and speculate about various diseases, and perhaps even beyond what the human brain can decipher without the aid of machine-learning.

Machine-learning can be free of the limitations of what researchers think are the targets for scientific exploration and it will mean new connections will be found that we have not even conceived of yet. It is really very exciting indeed,” concluded Dr Saar.

The new network is freely available to scientists worldwide to allow more numbers of researchers to make a significant progress.

Journal Reference:

Saar, K. L., et al. (2021) Learning the molecular grammar of protein condensates from sequence determinants and embeddings. PNAS.


Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback
Your comment type
Azthena logo powered by Azthena AI

Your AI Assistant finding answers from trusted AZoM content

Your AI Powered Scientific Assistant

Hi, I'm Azthena, you can trust me to find commercial scientific answers from

A few things you need to know before we start. Please read and accept to continue.

  • Use of “Azthena” is subject to the terms and conditions of use as set out by OpenAI.
  • Content provided on any AZoNetwork sites are subject to the site Terms & Conditions and Privacy Policy.
  • Large Language Models can make mistakes. Consider checking important information.

Great. Ask your question.

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.