AI in Drug Discovery
- Binding Affinity
- Virtual Screening
- Ligand parameterization
- Receptor parameterization
- Graph neural networks
- Example of implementation
Author note: some words and definitions may be unfamiliar to a reader. Feel free to click and follow the links — you’ll land at the appropriate wiki page where you can find more information.
- Cut expenses and duration of early drug discovery phases (target discovery and validation, lead identification and optimization). The phases are marked in red on the image below.
- Extend virtual screening capabilities and accuracy by rejecting dubious molecular coordinates and transition to a more efficient parameterization of (bio)molecules.
To get a context of the problem and solution, let’s first consider the main concepts.
2. Binding Affinity
3.1 What is binding affinity?
Binding affinity is the strength of the binding interaction between a single biomolecule (e.g. protein or DNA) to its ligand/binding partner (e.g. drug or inhibitor).
Binding affinity is typically measured and reported by the equilibrium inhibition constant (Ki), which is used to evaluate and rank order strengths of biomolecular interactions. The smaller the Ki value, the greater the binding affinity of the ligand for its target. The larger the Ki value, the more weakly the target molecule and ligand are attracted to and bind to one another.
Binding affinity is influenced by non-covalent intermolecular interactions such as hydrogen bonding, electrostatic interactions, hydrophobic and Van der Waals forces between the two molecules. In addition, the binding affinity between a ligand and its target molecule may be affected by the presence of other molecules.
3.2 Why is it important?
Whenever you are characterizing proteins, nucleic acids, and any biomolecule, understanding the binding affinity to substrates, inhibitors, and cofactors is key to the appreciation of the intermolecular interactions driving biological processes, structural biology, and structure-function relationships.
In drug discovery, binding affinity is used to rank hits binding to the target and design drugs that bind their targets selectively.
“Selectively” means the drug must have a high affinity to the selected target and the lowest possible affinities to other targets to avoid off-target binding and caused side-effects.
3.3 How to measure it?
There are many experimental ways to measure binding affinity and inhibition constants, such as ELISAs, gel-shift assays, pull-down assays, equilibrium dialysis, analytical ultracentrifugation, surface plasmon resonance, and spectroscopic assays.
Experimental methods are expensive in terms of required human efforts, time, and resources. Due to the tremendous number of chemical compounds, experimental bioactivity screening efforts require the aid of computational approaches. A set of such approaches is called virtual screening.
3. Virtual Screening
Let’s give a definition and a brief classification for the concept:
Virtual screening is a set of computational techniques for the selection of molecules that are most likely to bind to a drug target (protein or polynucleotide).
There are two main branches of virtual screening:
Example of a legacy virtual screening method failure for affinity evaluation
Conventional structure-based virtual screening methods are AutoDock, AutoDock Vina, Vina with modern scoring functions like RF .
Recently, our group tried to separate known thrombin active and inactive ligands using AutoDock, AutoDock Vina, and Vina with RF Score. Selected active and inactive molecules were established experimentally (i.e., their activity was known in advance; we rather tested virtual screening methods correspondence to reality). All ligands are available at the DUDE database web interface.
As a result, AutoDock, AutoDock Vina, and Vina RF were not able to separate active (blue bars) and inactive (red bars) ligands even with a properly set binding site.
Graph neural networks as a virtual screening tool
Within the current context, the proposed AI technique is classified as a kind of virtual screening. Let’s review how graph neural networks are used for the parameterization of molecules.
4. Ligand parameterization
Ligands are small molecules usually. Thus, the atom/chemical bond scale is appropriate. For large molecules, such as peptides, refer to the next chapter.
4.1 Atom parameterization
Each atom of a ligand is featured with mass, total and partial charges, number of radical electrons (integers); atom type, valence, hybridization, aromaticity, chirality types (one-hot encodings), etc. Atomic coordinates may be used as a feature or not (especially if unknown). Atom features are shown as a vector with three circles in the image below.
4.2 Bond parameterization
Each bond of a ligand is featured with a type (single, double, triple, aromatic), ring affiliation, whether a bond is conjugated (0/1), stereo configuration (cis-/trans-, E/Z, S/R, none), direction (upright/downright) of a bond, etc. Bond features are shown as a vector with two circles in the image below.
4.3 Intramolecular forces
Intramolecular = inside a molecule. Learned by attention mechanism. The mechanism by design should capture electronic density distribution effects within a molecule. Recursively trained attention context vector aggregates information about local neighborhoods to provide expressive representations of small molecules. In higher time steps, target node embedding will include information from further nodes recursively. A more intense (smaller) yellow circle implicates a higher attention level of the neighbor node.
5. Receptor parameterization
Receptors are typically large biomolecules: polynucleotides (DNA, RNA) or proteins. There are different receptor parameterization techniques for graph neural networks. Let’s review some of them.
5.1 As a linear graph
A receptor can be represented as a linked list (linear graph) of amino acids (for proteins and peptides) or nucleotides (for DNA, RNA). A node, in this case, is one amino acid or one nucleotide. A node can be parameterized with such features as charge, flexibility (Smith), hydrogen bond donors/acceptors, hydrophobicity, polarity (Zimmerman), Van-der-Waals volume, etc. Edges are mostly the same (peptide bonds for proteins/peptides and alternating sugar-phosphate backbone along the polynucleotide chain) and do not require parameterization. I can only hope that kind of attention mechanism would be able to simulate/learn the secondary/tertiary structure of these biomolecules. Cartesian coordinates are not required.
5.2 As a graph of intermolecular interfaces
A protein-ligand graph is computed from the atomic coordinates in a PDB file. In this graph, vertices represent secondary structure elements (usually alpha helices and beta strands) or ligand molecules while the edges represent contacts and relative orientations between them. Atomic coordinates are a must for the case.
5.3 As adjacency matrix
Interatomic (or inter-amino-acid, or inter-nucleic-acid) distances matrix is constructed from the Cartesian atomic coordinates. In these matrices, the rows and columns are assigned to the nodes in the network and the presence of an edge is symbolized by a numerical value.
It is possible to build undirected and directed neighbor matrices without Cartesian coordinates, but they are mandatory to create weighted adjacency matrices where the distance between nodes is taken into account. For weighted molecular adjacency matrices, the radial interaction cutoff (typically 12 Angstrom) is used to truncate nodes that do not interact.
5.4 As a raw FASTA string
Applicable only for NLP models, such as Transformers, GRU, LSTM, etc. In this case, raw receptor FASTA string is used, parameterization is made by the model implicitly by learning embeddings, for example. Coordinates are not required in this case.
Node and edge featurization define the type and architecture of the receiver graph neural network.
6. Graph neural networks
This section provides a short introduction to graph neural-networks (GNN) for molecular properties prediction and outlines their categorization.
6.1 Convolutional graph neural network (Conv-GNN)
Convolutional neural networks (CNNs) are networks specialized for interacting with grid-like data, such as a 2D image. As molecules are typically not represented as 2D grids, chemists have focused on a variant of this approach: the Conv-GNN on molecular graphs.
Molecular graphs confer key advantages: they bypass the conformational challenge of using 3D representations while maintaining invariance to rotation and translation due to their pairwise definition. The MoleculeNet paper (Wu et al., 2017) offers a concise conceptual comparison of six major variants. To facilitate the following explanation, the framework of neural message passing networks put forth (Gilmer et al., 2017) is used.
Neural message passing networks utilize a convolutional layer, simply a matrix of scalar weights, to exchange information between atoms or bonds within a molecule and produce a fixed-length, real-valued vector that embeds the molecular information. To begin, they generate or compute a feature vector for each atom within the molecule. These feature vectors are then collected into a matrix. Additionally, they generate a graph topology matrix that specifies the connectivity of the graph. In a forward convolutional pass, these three matrices are multiplied together. This allows information to be exchanged between the feature vectors of each atom with its immediate neighbors, in accordance with the connectivity specified by the topology matrix. This updates each atom’s feature vector to include information about its local environment. This updated feature vector matrix is then passed through an activation function (i.e., ReLU) and can then be iteratively updated by using it as the feature matrix in another convolutional pass. This propagates information throughout the molecule. Finally, these atom feature vectors are either summed or concatenated to give a unique, learned representation of the molecule as a real-valued vector (see the figure below).
The learned representation in vector form is referred to as a representation in latent space and is then used as the input for a traditional fully connected DNN to finally make the classification or prediction. Backpropagation is once again used to train these networks by propagating gradients backward and determining how to change the convolution matrix weights and the parameters in the DNN.
6.2 Recurrent graph neural network (Rec-GNN)
Recurrent neural networks (RNNs), introduced by Hopfield in 1982, are specialized for dealing with sequences of arbitrary length. This makes them ideally suited to handling the textual representation of chemical information, such as FASTA and SMILES. The critical difference is that in the previous architectures each data input is distinct, while in an RNN each input will influence the next one. An illustrative example is viewing any particular input, such as a SMILES string, as time-series data. The presence of a carbon atom at one moment in time influences what the next character is likely to be. This is expressed in the architecture by feeding the output of the hidden layer for that carbon into the hidden layer of the next atom. The feeding of one hidden state into the next gives the system a recursive relationship within the hidden layer, but it can be viewed as directional by “unfolding” the network to form of an unfolded, acyclic network graph. By doing this, it maintains a history of all previous inputs, and they influence its prediction at a later time. The network can then be trained using a recursive form of backpropagation.
Difference between recurrent (Rec-GNN) and convolution (Conv-GNN) based graph neural networks: Rec-GNNs apply the same set of weights (W1=W2) until a convergence criterion is met, whereas Conv-GNNs apply different weights at each iteration.
These GNN architectures have several additions to the basic GNNs (Conv-GNN and Rec-GNN) like different pooling strategies, skip-connections, attention mechanisms, super-nodes, isomorphic graphs.
7. Example of implementation
As an example, we will train Atomic Convolutional Neural Network (ACNN) by Gomes et al., 2017. The dataset is PDBBind 2015 — it contains three subsets: core (195 structures), refined (3,706 structures), and full (14,260). The target is to predict ∆G — free energy of receptor-ligand complexation, which serves as a binding affinity metric.
1. Select configuration
See the full list of available configurations here.
idx = 0 choices=[...] args.update(get_exp_configure(choices[idx]))
2. Prepare train and test datasets
dataset, train_set, test_set = load_dataset(args) args['train_mean'] = train_set.labels_mean.to(args['device']) args['train_std'] = train_set.labels_std.to(args['device']) train_loader = DataLoader(dataset=train_set, batch_size=args['batch_size'], shuffle=False, collate_fn=collate) test_loader = DataLoader(dataset=test_set, batch_size=args['batch_size'], shuffle=True, collate_fn=collate)
3. Load ACNN model, initialize loss function, optimizer, and early stopping callback
model = load_model(args) loss_fn = nn.MSELoss() optimizer = torch.optim.Adam(model.parameters(), lr=args['lr']) stopper = EarlyStopping(mode=args['mode'], patience=args['patience'], filename=choices[idx]+'_model.h5')
4. Train until early stopping
for epoch in range(args['num_epochs']): run_a_train_epoch(args, epoch, model, train_loader, loss_fn, optimizer) test_scores = run_an_eval_epoch(args, model, test_loader) test_msg = update_msg_from_scores('test results', test_scores) early_stop = stopper.step(test_scores['mae'], model) if early_stop: print('Early stopping') break
5. Results and discussion
With the default (ACNN_PDBBind_core_pocket_random) configuration, ACNN achieves 1.5006 MAE, 0.1461 R2.
epoch 12/500, training | loss 0.7162, r2 0.2815, mae 1.6027 test results, r2 0.1461, mae 1.5006
In terms of the task, 1.5006 MAE means that the mean average error of the model is ~1.5 kcal/mol of Gibbs free energy of binding affinity. It is precise enough, compared to the energy of water-water hydrogen bond O−H···:O (21 kJ/mol or 5.0 kcal/mol).
- Challenges. Predicting drug-target interactions is crucial for novel drug discovery, drug repurposing, and uncovering off-target effects. Experimental bioactivity screening takes significant time (1–3 years) and expense (more than 100 million USD on average per new drug-on-market) but has low efficiency. Bioassays are typically backed by computational methods, but legacy simulations fail to deliver either sufficient precision — like in the example of AutoDock Vina with modern RF Score which failed to separate active and inactive thrombin ligands — or sufficient speed — like in the example of molecular dynamics or first-principle quantum mechanics simulations. As a result, more than 90% of the proposed leads are declined (He et al., 2017).
- Solution. In silico methods are highly demanded since they can expedite the drug development process by systemically suggesting a new set of candidate molecules promptly, which saves time and reduces the cost of the whole process by up to 43% (DiMasi et al., 2016). Graph neural networks deliver superior accuracy for the task in a matter of milliseconds per receptor-ligand pair and extend docking capabilities by accepting structures without coordinates.
- Results. The ACNN model easily achieves ~1.5 kcal/mol MAE predicting Gibbs free energy of binding affinity. This repository contains the full implementation.
- Philosophy. Bio-/cheminformatics is now on the edge of a similar paradigm shift as computer vision before the deep learning model AlexNet won the 2012 ImageNet contest. Instead of selecting manually crafted features for molecules, integrative features are learned by optimization methods.
- Gurbych O., Druchok M., Yarish D., Garkot S., “High throughput screening with machine learning”, presented at NeurIPS 2020
- Wang et al. (2005) The PDBbind database: methodologies and updates. J Med Chem 16;48(12):4111–9.
- Gomes et al. (2017) Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity. arXiv:1703.10603.
- Li H. et. al, “Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets”, Molecular Informatics 34(2), February 2015
- Adam C. et. al, “Deep Learning in Chemistry”, J. of Chem. Inf. Model. 2019, 59 (6), 2545–2559
Graph Neural Networks for Binding Affinity Prediction was originally published in The Startup on Medium, where people are continuing the conversation by highlighting and responding to this story.