In most biomolecules, their three-dimensional structure is essential to their ability to function. Therefore, scientists are interested in the spatial organization of individual biomolecule building components in addition to their sequence.
Bioinformaticians can now accurately anticipate the three-dimensional structure of a protein from its amino acid sequence, thanks to artificial intelligence (AI). However, this technique is still in its infancy for RNA molecules.
In the journal PLOS Computational Biology, published on July 7th, 2022, researchers from Ruhr-Universität Bochum (RUB) revealed a method for using AI to accurately anticipate the structure of specific RNA molecules from their nucleotide sequence.
Teams led by Vivian Brandenburg and Franz Narberhaus from the RUB Chair of Biology of Microorganisms collaborated on the project with Professor Axel Mosig from the Bochum Centre for Protein Diagnostics’ Bioinformatics Competence Area.
Cell Environment Must Be Taken into Account
'RNA is often only seen as a messenger between genomic DNA and proteins. But many RNA molecules take over cellular functions', says Axel Mosig, a professor in the Bioinformatics Competence Area at the Bochum Centre for Protein Diagnostics
Their spatial organization is crucial for this. A nucleotide sequence can group related areas to create three-dimensional configurations.
Identifying these self-similarities in an RNA sequence is like a mathematical puzzle. If the RNA were isolated and floating in aqueous solution, the model could predict the structure very accurately.
Vivian Brandenburg, RUB Chair of Biology of Microorganisms, Ruhr-Universität Bochum
For this conundrum, there is a biophysical model and accompanying prediction methods. The RNA’s cellular environment, which affects the folding process, is not taken into consideration by the model.
Artificial intelligence is relevant in this context. Based on well-known structures, AI can deduce tiny patterns from the biological environment. It might then take these discoveries into account when making its structural predictions. However, for the learning process, the AI needs enough training data, which is unfortunately insufficient in actual practice.
Obtaining Training Data with a Trick
The Bochum team employed a smart tactic to address the issue of the lacking training data: they worked using recognized RNA structural patterns. They were able to produce virtually any number of nucleotide sequences that would fold into these spatial patterns from the energy models of these structures by employing a sort of reverse gear.
The researchers produced several pairings of nucleotide sequences and structures to educate the AI with the aid of this so-called inverse folding.
New Structures Reliably Predictable
The AI was then given a new assignment: to anticipate the structure of specific bacterial RNA molecules. In the translation of genomic DNA in bacteria, these molecules, known as transcription terminators, serve as crucial stop signals.
They are frequently concealed in the genome and hard to identify from regions with other activities, like many other RNA molecules with significant biological roles.
The characteristic, hairpin-like shape of the transcription terminators was accurately recognized and predicted by artificial intelligence. Using experimental data that is available to the public, the study team was able to demonstrate this.
Mosig concluded, “While AI approaches are now almost inevitable in the prediction of protein structures, the development of RNA structures is only just beginning.”
Brandenburg, V., et al. (2022) Inverse folding based pre-training for the reliable identification of intrinsic transcription terminators. PLOS Computational Biology. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010240