A team headed by MDC bioinformatician Altuna Akalin, in the journal Genome Biology, reports that a new machine learning algorithm called “ikarus” can identify how cancer cells differ from healthy cells. The AI algorithm has found a gene signature that is characteristic of tumors.
Humans are no match for artificial intelligence (AI) when it comes to recognizing patterns in mountains of data. Machine learning, a form of AI, is frequently used to detect regularities in data sets, whether for stock market analysis, image and speech recognition, or cell classification.
A team lead by Dr. Altuna Akalin, head of the Bioinformatics and Omics Data Science Platform at the Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), has developed a machine learning software dubbed “ikarus” that can reliably distinguish cancer cells from healthy cells.
The algorithm discovered a pattern in tumor cells that is common to several types of cancer and consists of a certain set of genes. The program also found types of genes in the pattern that had never been clearly connected to cancer before, as per the team’s paper in Genome Biology.
Machine learning is the process through which an algorithm learns how to answer problems on its own by using training data. It accomplishes this by looking for patterns in the data that aid in problem-solving. Following the training phase, the system can apply what it has learned to assess unknown data.
It was a major challenge to get suitable training data where experts had already distinguished clearly between ‘healthy’ and ‘cancerous’ cells.
Jan Dohmen, Paper First Author, Max Delbrück Center for Molecular Medicine
A Surprisingly High Success Rate
Furthermore, single-cell sequencing data is frequently noisy. It means that the data they carry about the molecular properties of individual cells is not always precise — either because each cell has a different number of genes recognized, or because the samples are not always processed in the same manner.
To obtain suitable data sets, Dohmen and his colleague Dr. Vedran Franke, co-head of the study, searched through innumerable articles and visited a number of research groups. The scientists eventually trained the system with data from lung and colorectal cancer cells before applying it to data sets from other types of cancers.
Ikarus had to find a list of distinguishing genes during the training phase, which it subsequently utilized to categorize the cells.
Dohmen added, “We tried out and refined various approaches.”
According to all three scientists, it was a time-consuming task.
The key was for ikarus to ultimately use two lists: one for cancer genes and one for genes from other cells.
Dr Vedran Franke, Study Co-head, Max Delbrück Center for Molecular Medicine
The system was able to correctly distinguish between healthy and malignant cells in additional types of cancer after the learning period, such as tissue samples from individuals with liver cancer or neuroblastoma. Its success rate was unusually high, which even shocked the research team.
We didn’t expect there to be a common signature that so precisely defined the tumor cells of different kinds of cancer.
Dr Altuna Akalin, Head, Bioinformatics and Omics Data Science Platform, Max Delbrück Center for Molecular Medicine
Dohmen further commented, “But we still can’t say if the method works for all kinds of cancer.”
AI As a Fully Automated Diagnostic Tool
The project’s goal is to move beyond categorizing cells as “healthy” or “cancerous.” Ikarus has already proved that the approach can identify other types (and specific subtypes) of cells from tumor cells in preliminary experiments.
“We want to make the approach more comprehensive, developing it further so that it can distinguish between all possible cell types in a biopsy,” added Akalin.
Pathologists in hospitals typically only analyze tumor tissue samples under a microscope to identify the various cell types. It is an exhausting and time-consuming task. With ikarus, this stage could become completely automated in the future.
Akalin further points out that the data could be utilized to derive conclusions about the tumor’s nearest environment.
As a result, doctors might be able to identify the optimal treatment option, as the composition of malignant cells and the microenvironment typically determines whether or not a treatment or drug will be beneficial. Furthermore, AI may be effective in the development of new drugs.
Akalin added, “Ikarus lets us identify genes that are potential drivers of cancer.”
These molecular structures could potentially be targeted with new medicinal drugs.
The publication’s unique feature is that it was written fully during the COVID outbreak. At the Berlin Institute for Medical Systems Biology (BIMSB), which is part of the MDC, none of the participants were at their customary desks. Instead, they worked from home offices and only talked digitally with one another.
Franke stated, “The project shows that a digital structure can be created to facilitate scientific work under these conditions.”
Dohmen. J, et al. (2022) Identifying tumor cells at the single‑cell level using machine learning. Genome Biology. doi:10.1186/s13059-022-02683-1.