All AI techniques rely on thorough validation, and AI-based drug discovery is no exception. Receptor.AI pays special attention to experimental validation and testing of all pieces of technology which are used in our SaaS platform and in-house services.
If we are speaking about virtual screening, a core technology of our platform, there are two different measures of its performance. The first one is the ability to distinguish "binders" from "non-binders". The fewer non-binders appear among the top-ranked molecules, and the fewer good binders are missed, the better the virtual screening method.
The second measure is the correct ranking of molecules according to their affinity and/or activity. The higher the correlation between real and predicted binding affinities and/or biological activities, the better the methods.
These two performance metrics are usually suitable for two different stages of virtual screening. The first one is more relevant for an initial screening, which is designed to scan huge chemical space and select potential binders quickly and with reasonable precision. The second one is usually applied to the secondary screening in the selected pool of potential binders, which has to prioritise the compounds with the best characteristics for further development.
Receptor.AI Virtual Screening Technologies
Our stack of technologies is designed to follow the idea of the virtual screening funnel based on a holistic approach. The funnel starts with chemical space, which could be pre-processed and clustered in a smart way for achieving unprecedented screening performance (multi-billion databases could be screened in just a few hours). After that, the initial AI-based virtual screening module is applied. The initial screening results are filtered with an advanced AI-based ADME-Tox module consisting of 38 predictive endpoints and fed into the selectivity prediction module against ~10K human proteins. After that, the secondary screening, which is based on fully automated docking with AI rescoring, is performed, and the final set of ranked hit candidates is formed.
The stage of initial screening is represented by two drug-target interaction models: 3DProtDTA and FB-DTI, which are applied in parallel in a consensus mode.
Testing Initial Screening Performance
In order to test the performance of model architectures for initial screening, we performed two experiments using different test datasets.
The first experiment was done with two widespread benchmark datasets for AI-based drug-target affinity predictions referred to as "Davis" and "KIBA".
We compared our 3DProtDTA model with 8 state-of-the-art open-source AI algorithms for drug-target affinity prediction using the same training set, test set, and performance metrics.
We have shown that our approach outperforms all competitors by a significant margin, ensuring that our model architecture and training protocol are top-notch.
In the second experiment, we tested the ability of 3DProtDTA to discriminate binders from non-binders on a large in-house test dataset containing 6,618 unique proteins and 80,079 unique hit compounds with known affinities. This translates to 157,809 experimentally validated protein-ligand pairs (the binders), which were augmented by 1,408,400 non-binder pairs, which are used as negative controls. The latter were composed of experimentally validated pairs with non-active compounds and randomly generated pairs.
We computed the Precision-Recall curve, which is routinely used to evaluate the performance of predictive AI models. The area under this curve (AUC) represents the general ability of the model to make a correct prediction.
Our model has an AUC=0.917, which means that it predicts the correct affinity in almost 92% of cases.
Testing the Secondary Screening Performance
In order to test the secondary screening performance, we took four common proteins with a significant number of known ligands having reliable binding affinities.
We selected 16 most widespread docking techniques dedicated to predicting the ligand poses and affinities. Some of them are based on AI scoring functions, which makes them especially interesting for us.
From our side, we tested not only Receptor.AI docking with AI rescoring (which is our dedicated method for secondary screening) but also our DTI and FB-DTI models, as well as the consensus model of DTI and docking with AI rescoring.
There is an elaborate framework of consensus functions used in our technology stack. For example, DTI and FB-DTI models are balanced by giving them different weights depending on the number of ligands for a particular protein, reliability of its binding pocket annotation, size of the binding pocket and user preferences. Such smart weighting allows automatic prioritisation of the most relevant and reliable DTI model for a given protein target. Another proprietary consensus function is used to combine the results of DTI models with docking scores.
It is necessary to emphasise that the DTI models are designed for initial screening, so they are not required to be highly performant in the correct ranking of the molecules with significant binding affinities. For such techniques, it is crucial to discriminate binders from non-binders, but they may not rank binders as precisely as dedicated docking techniques.
First, we augmented the sets of known ligands for selected proteins with a large number of decoys (which are guaranteed to be non-binders) and checked whether our DTI model recovers real ligands out of decoys. The results are expectably excellent — the top 20 compounds contain all 10 out of 10 known ligands for three proteins and 13 out of 16 for the fourth one.
Then, we evaluated the binding scores for known ligands using our techniques and all 16 competing docking techniques and compared the correlations between predicted and experimental values for all of them.
Quite surprisingly, our DTI and FB-DTI techniques, which are not designed for the correct fine-grained ranking of compounds with high binding affinities, perform on par with the best dedicated docking techniques.
Out in-house docking with AI rescoring is a bit better than this, while a combination of DTI with docking and AI rescoring gives the best possible result.
This is a remarkable result, which shows that Receptor.AI virtual screening techniques could compete with dedicated docking algorithms in their ability to rank the ligands with high binding affinity correctly, while their combination with docking and AI rescoring function outperforms the competitors.