The system, called LION LBD developed by computer scientists and cancer researchers at the University of Cambridge, has been built to help scientists in the search for cancer-related discoveries. It is the first literature-based discovery system intended to support cancer research. The results have been published in the journal Bioinformatics.
Skin cancer cells from a mouse show how cells attach at contact points. (Credit: NIH Image Gallery)
Global cancer research attracts huge amounts of funding worldwide, and the scientific literature is currently so massive that scientists are finding it hard to keep up with it: critical hypothesis-generating evidence is now mostly discovered way past its publication.
Cancer is a complex category of diseases that are not fully understood and are the second-leading cause of death around the world. Cancer development involves changes in many chemical and biochemical molecules, pathways, and reactions, and cancer research is being carried out across a wide range of scientific fields, which have inconsistency in the way that they define similar concepts.
“As a cancer researcher, even if you knew what you were looking for, there are literally thousands of papers appearing every day,” said Professor Anna Korhonen, Co-Director of Cambridge’s Language Technology Lab who led the development of LION LBD in partnership with Dr Masashi Narita at Cancer Research UK Cambridge Institute and Professor Ulla Stenius at Karolinska Institutet in Sweden. “LION LBD uses AI to help scientists keep up-to-date with published discoveries in their field, but could also help them make new discoveries by combining what is already known in the literature by making connections between sources that may appear to be unrelated.”
The ‘LBD’ in LION LBD stands for Literature-Based Discovery, a concept formulated in the 1980s which aims to make new discoveries by combing pieces of data from disconnected sources. The main idea behind the original version of LBD is that concepts that are never explicitly associated in the literature may be indirectly connected through intermediate concepts.
The LION LBD system’s design allows real-time search to find indirect associations between entities in a database of tens of millions of publications while keeping the ability of users to examine each mention in its original context.
For example, you may know that a cancer drug affects the behaviour of a certain pathway, but with LION LBD, you may find that a drug developed for a totally different disease affects the same pathway.
Anna Korhonen, Professor and Co-Director, Cambridge’s Language Technology Lab.
LION LBD is the first system created particularly for the requirements of cancer research. It has a specific focus on the molecular biology of cancer and uses advanced machine learning and natural language processing methods, so as to detect references to the hallmarks of cancer in the text. Evaluations of the system have shown its ability to identify undiscovered links and to rank applicable concepts highly among possible connections.
The system is designed using open source, open data, and open standards, and is available as a programmable API or an interactive web-based interface.
The scientists are presently working on spreading the scope of LION-LBD to include more concepts and relations. They are also working meticulously with cancer scientists to help and enhance the technology for end users.
The system was developed in partnership with the University of Cambridge Language Technology Lab, Cancer Research UK Cambridge Institute, and Karolinska Institutet in Sweden, and was sponsored by the Medical Research Council.