New Algorithm Based on Machine Learning Precisely Detects Methylation Sites

Using machine learning, researchers from the New Jersey Institute of Technology and Children’s Hospital of Philadelphia have created a new algorithm that helps estimate the locations of DNA methylation.

Hakon Hakonarson, MD, PhD. Image Credit: Children’s Hospital of Philadelphia.

DNA methylation is a biological process that can alter the DNA activity without altering its overall structure. This process could detect disease-causing mechanisms that otherwise would be overlooked by traditional screening techniques.

The study was recently published online in the Nature Machine Intelligence journal.

DNA methylation is a significant component in gene expression and plays a crucial role in several major cellular processes. Similarly, defects in methylation can be associated with a range of human disorders.

Although genomic sequencing tools can effectively pinpoint polymorphisms that may lead to a disease, the same techniques are incapable of capturing the impacts of methylation. This is because the individual genes still appear the same.

In particular, considerable efforts have been made to analyze DNA methylation on N6-adenine, or 6mA, found in eukaryotic cells, which contain human cells. However, while there is genomic information, the function of methylation in these kinds of cells continues to be a mystery.

Previously, methods that had been developed to identify these methylation sites in the genome were very conservative and could only look at certain nucleotide lengths at a given time, so a large number of methylation sites were missed.

Hakon Hakonarson, MD, PhD, Study Senior Co-Author and Director of the Center for Applied Genomics, Children’s Hospital of Philadelphia

Dr Hakonarson continued, “We needed to develop a better way of identifying and predicting methylation sites with a tool that could identify these motifs throughout the genome that may have a robust functional impact and are potentially disease causing.”

To tackle this problem plaguing the scientific community, the Center for Applied Genomics and its associates from the New Jersey Institute of Technology turned their attention to deep learning.

Zhi Wei, PhD, a professor of computer science from the New Jersey Institute of Technology and the study’s senior co-author, worked with Hakonarson and his group to design a deep learning algorithm that could estimate where these methylation sites occurred. Such insights would subsequently aid scientists to establish the effect they could have on specific proximal genes.

Wei has named his software Deep6mA. To estimate the locations of such methylation sites, Wei headed the development of a neural network—a machine learning model that tries to learn ways similar to the brain.

Although neural networks have been used in cellular research in the past, this is its first-ever application to analyze the sites of DNA methylation on natural multicellular organisms.

Wei highlighted four benefits of the novel technique—combination of a broad spectrum of methylation sequences flanking target genes; automation of the sequence feature representation of varying levels of detail; facilitation of model development and prediction in extensive genomic information; and enabling of the possible visualization of intrinsic sequence motifs for interpretation.

The researchers used this new algorithm on three different kinds of representative organisms: Escherichia coli, Arabidopsis thaliana, and Drosophila melanogaster, where the last two are eukaryotic. Deep6mA effectively detected the 6mA sites of methylation down to the resolution of one nucleotide, or a fundamental DNA unit.

Even in this preliminary confirmation analysis, the team successfully observed the regulatory patterns which they found impossible to visualize through formerly prevalent techniques.

One limitation is that our proposed prediction is purely based on sequence information. Whether a candidate is a 6mA site or not will also depend on many other factors,” Wei stated in his discussion statement of the research.

Methylation, including 6mA, is a dynamic process, which will change with cellular context. In the future, we would like to take other factors into consideration [such as] gene expression. We hope to predict 6mA across cellular context by integrating other data.

Zhi Wei, PhD, Study Senior Co-Author and Professor of Computer Science, New Jersey Institute of Technology

We already know that a number of genes have a disease-causing mechanism brought about by methylation, and while this study was not done in human cells, the eukaryotic cell models were very comparable,” added Dr Hakonarson.

Genomic scientists looking to translate their findings into clinical applications would find this tool very useful, and the level of precision could eventually lead to the discovery of specific cells or targets that are candidates for therapeutic intervention.

Hakon Hakonarson, MD, PhD, Study Senior Co-Author and Director of the Center for Applied Genomics, Children’s Hospital of Philadelphia

The research was funded by the Children’s Hospital of Philadelphia Endowed Chair in Genomic Research and an Institutional Development Award to the Center for Applied Genomics from Children’s Hospital of Philadelphia.

The study was also funded by Extreme Science and Engineering Discovery Environment (XSEDE) via allocation CIE160021 and CIE170034 aided by the National Science Foundation grant ACI-1548562.

The open-source software, which was employed to help in this study, included Tensorflow 1.12, Keras v2+, and the Python3 programming language.

Journal Reference:

Tan, F., et al. (2020) Elucidation of DNA methylation on N6-adenine with deep learning. Nature Machine Intelligence.


Tell Us What You Think

Do you have a review, update or anything you would like to add to this news story?

Leave your feedback