PPT7.pdf

7 Virtual Screening Introduction Drug-likeness and Compound Filters StructureBased Virtual Screening Billones Lecture Notes 7.1 Introduction Virtual screening • the computational or in silico analogue of biological screening • the aim is to score, rank and/or filter a set of structures using one or more computational procedures • helps decide which compounds to screen, which libraries to synthesize and which compounds to purchase from an external company • employed when analyzing the results of an experiment, such as a HTS run • use a succession of virtual screening methods of increasing complexity • each method acts as a filter to remove structures of no further interest, until at the end of the process a series of candidate structures are available for final selection, synthesis or purchase. Billones Lecture Notes Many virtual screening processes involve a sequence of methodologies. Billones Lecture Notes 7.2 Drug-likeness and Compound Filters Combinatorial chemistry and HTS did not lead to the expected improvements in the numbers of lead molecules being identified. This observation sparked much interest in the concept of “drug-likeness”. Million Dollar Question Which features of drug molecules confer their biological activity and distinguish them from general “organic” compounds? Methods that can be used to assess “drug- likeness”: • use of substructure filters • analysis of the values of simple properties such as molecular weight, the number of rotatable bonds and the calculated logP in known drug molecules. Billones Lecture Notes “Rule of Five” (ROF) [Lipinski et al. 1997] • constitutes a set of simple filters that suggest whether or not a molecule is likely to be poorly absorbed. The “rule of five” states that poor absorption or permeation is more likely when: 1. The molecular weight is greater than 500. 2. The logP is greater than 5. 3. There are more than 5 H-bond donors (i.e. sum of OH and NH groups). 4. There are more than 10 H-bond acceptors (# of N and O atoms). ROF does not apply to compounds that are substrates for biological transporters. More extensive evaluation [Oprea 2000]:70% of the “drug-like” compounds had: • H-Bond donors = 0 – 2 • H-Bond acceptors = 2 – 9 • Rotatable bonds = 2 – 8 • Rings = 1 – 4 Billones Lecture Notes Feed-forward NN [Sadowski and Kubinyi 1998] • used 92 input nodes, 5 hidden nodes and 1 output node to predict “druglikeness • the drugs from WDI, the nondrugs from ACD, the descriptors are atom-type • NN correctly assign 83% of the molecules from the ACD to the non-drugs class and 77% of the WDI molecules to the drugs class. Decision Trees Wagener and van Geerestein [2000] • used the same databases and descriptors • correctly classify 91.9% of the drugs but at the expense of an increased false positive rate (34.3% of non-drugs misclassified). o rules in the decision tree suggested that merely testing for the presence of some simple functional groups such as OH, NR3, NHR2, COOH, phenol or enol groups would distinguish a large proportion of the drug molecules. Billones Lecture Notes Chem-Bio Informatics Journal, Vol.21, pp.39–58 (2021) Our own contribution in this area: Logistic regression and random forest unveil key molecular descriptors of druglikeness Liza T. Billones*, Nadia B. Morales, Junie B. Billones Prediction accuracy of 10 topmost one-variable LogR models on Drug Class by Dragon-type descriptor Department of Physical Sciences and Mathematics, College of Arts and Sciences University of the Philippines Manila, Padre Faura, Ermita, Manila 1000 Philippines *[email protected] (Received February 13, 2021; accepted August 15, 2021; published online September, 8, 2021) Abstract The identification of molecular descriptors that embody the chemical information for druglikeness will be a step forward in data-driven drug discovery and development endeavor. In this study, over 4000 Dragon-type molecular properties were generated for approximately 2000 known drugs and 2000 surrogate nondrugs. Logistic Regression (LogR) and Random Forest (RF) techniques were carried out to unveil the crucial molecular descriptors that can adequately classify a compound as drug or nondrug. Ten one-variable LogR models each demonstrated at least 70% prediction accuracy. A two-variable model consisting of HVcpx and MDDD correctly classified 85% of the test compounds. The best LogR model with 89.0% prediction accuracy identified five most influential descriptors for druglikeness: an information index HVcpx, topological index MDDD, a ring descriptor NNRS, X2A or average connectivity index of order 2, and walk and path count SRW05. The best RF model involving 10 only weakly correlated descriptors was found to be 92.5% accurate and at par with the RF and LogR models that consisted of over 200 variables. The model featured: molecular weight, MW; average molecular weight, AMW; rotatable bond fraction, RBF; percentage carbon, C%; maximal electrotopological negative variation, MAXDN; all-path Wiener index, Wap; structural information content index, neighborhood symmetry of 1 order, SIC1; number of nitrogen atoms, nN; 2D Petitjean shape index, PJI2; and self-returning walk count of order 5, SRW05. Many of these descriptors have straightforward chemical interpretability and future applicability as druglikeness filters in virtual high throughput drug discovery. Figure 7. Predictive performance of Model 8 (RF model on Drug Class with 10-weakly correlated predictors MAXDN, nN, Wap, MW, AMW, SRW05, SIC1, PJI2, RBF, and C%) Heat map displays the test set: 1–536 drugs and 537–1072 surrogate nondrugs. Billones Lecture Notes “Lead-likeness” is a concept distinct from “drug-likeness”. Lead-likeness • the underlying premise is that during the optimization phase of a lead molecule to give the final drug there is an increase in the molecular “complexity” • “lead-like” criteria are used when performing virtual screening rather than the “drug-like” criteria typified by the “rule of five” • interest in lead-likeness led in turn to fragment- based approaches to drug discovery, wherein less complex molecules are screened to provide starting points for subsequent optimization. • theoretical developments associated with lead-likeness, is the “rule of three” Billones Lecture Notes 7.3 Structure-Based Virtual Screening • the number of protein crystal structures has increased tremendously • the interest in using structural knowledge for library design, compound acquisition and data analysis should also increase Factors that contributed towards higher-throughput structure-based methods: • high-performance computer hardware • new algorithms, particularly for molecular docking • tools for the analysis of the output of calculations Billones Lecture Notes 7.3.1 Protein-Ligand Docking • the aim of a docking experiment is to predict the 3D structure (or structures) formed when one or more molecules form an intermolecular complex. Two components to the docking problem: • exploring the space of possible protein–ligand geometries (called poses) • scoring or ranking the poses in order to identify the most likely binding mode for each compound and to assign a priority order to the molecules • the difficulty with molecular docking is due to the fact that it involves many degrees of freedom. o translation, rotation, and conformational degrees of freedom of both the ligand and the protein o the solvent may also play a significant role Billones Lecture Notes DOCK method • early method involves the construction of a “negative image” of the binding site o this negative image consists of a series of overlapping spheres of varying radii, derived from the molecular surface of the protein o each sphere touches the molecular surface at just two points Operation of the DOCK algorithm. Billones Lecture Notes • ligand atoms are then matched to the sphere centers • the orientation is checked to ensure that there are no unacceptable steric interactions and it is then scored • new orientations are produced by generating new sets of matching ligand atoms and sphere centers • the procedure continues until all possible matches have been considered More recent algorithms take the ligand orientational and conformational degrees of freedom into account: Monte Carlo algorithm [Goodsell and Olson 1990] • at each iteration of the procedure either the internal conformation of the ligand is changed (by rotating about a bond) or the entire molecule is subjected to a translation or a rotation within the binding site. Billones Lecture Notes Genetic and evolutionary algorithms [Judson 1994; Oshiro1995 • each chromosome in a population encodes one conformation of the ligand together with its orientation within the binding site. • a scoring function is used to calculate the fitness of each member of the population and to select individuals for each iteration. • the underlying random nature of the genetic algorithm means that it is usual to perform a number of runs and to select the structures with the highest scores Incremental construction methods [Leach and Kuntz 1990] • construct conformations of the ligand within the binding site in a series of stages • a typical algorithm of this type first identifies one or more “base fragments” which are docked into the binding site. • The orientations of the base fragment then form the basis for a systematic conformational analysis of the remainder of the ligand Billones Lecture Notes 7.3.2 Scoring Functions for Protein-Ligand Docking • docking involves the prediction of the binding mode of individual molecules • the aim is to identify the orientation that is closest in geometry to the observed (x-ray) structure. • docking programs using data sets derived from the PDB are able to correctly predict the binding geometry in more than 70% of the cases • it is necessary to be able to score or rank the ligands using some function related to the free energy of association of the protein and ligand • Ideally, the same function would be used for both docking the ligands and for predicting their free energies of binding. • others use different functions for docking and for scoring o This may be due to the fact that the large number of orientations generated during a typical docking run requires a function that can be calculated very rapidly. Billones Lecture Notes “good” “close” “wrong” Illustration of the range of results produced by a typical docking program. • • results obtained by running the GOLD program on three ligands from the PDB in each case the x-ray conformation is shown in brown and the top-ranked docking result in yellow-green. Billones Lecture Notes • docking-scoring methods rarely able to accurately predict the DG of binding. Assumption in Scoring Function: DG of binding can be written as a linear summation of terms to reflect the various contributions to binding. • DGsolvent captures contributions to the free energy of binding from solvent effects. • DGconf arises from conformational changes to the protein and to the ligand. • DGint is the contribution from protein–ligand interactions, arising from electrostatic and van der Waals forces. • DGrot is the penalty associated with freezing internal rotations of the protein and the ligand. • DGt/r is the loss in translational and rotational degrees of freedom arising from the association of two bodies to give a single body • DGvib is the free energy due to changes in vibrational modes. Billones Lecture Notes Two simple scoring functions used in docking. • Left: the basic scoring scheme used by the DOCK program • Right: the piecewise linear potential (parameters for calculation of steric interactions Billones Lecture Notes Bohm (1994) introduced a linear scoring function in which the various terms could be calculated rapidly: • the function includes contributions from hydrogen bonding, ionic interactions, lipophilic interactions, and the loss of internal conformational freedom of the ligand. • the H-bonding and ionic terms are both dependent on the geometry of the interaction. • the lipophilic term is proportional to the contact surface area (Alipo) between protein and ligand involving non-polar atoms. • the conformational entropy term is directly proportional to the number of rotatable bonds in the ligand (NROT). Billones Lecture Notes 7.3.3 Practical Aspects of Structure-Based Virtual Screening • The ligand structures for structure-based virtual screening usually originate as a series of connection tables or SMILES strings. • Docking programs require a reasonable 3D conformation as the starting point. Generating the Starting Conformation: • geometry of undefined chiral centers • ionization and tautomeric state of the ligand • partial atomic charges for the protein and each ligand Protein Preparation • missing atoms (H) • ionization and protonation state of protein Billones Lecture Notes Binding Site Definition • too small a binding site may mean that some potential ligands will be discarded because they are deemed not to fit • too large requires much computational time for exploring unproductive regions of the search space. Practical Tips • apply filters prior to the docking to eliminate undesirable or inappropriate structures • knowledge about possible binding modes enhances the efficiency of the docking procedure (i.e. introduce constraints) • use a 3D pharmacophore search as a filter prior to the docking. • use one of the faster docking programs to reduce the size of the data set prior to a more thorough analysis using a slower but more accurate program. Billones Lecture Notes D rug D esign, D evelopment and erapy D ovepress open access t o scient iﬁ c and m edical r esear ch ORIGINAL RESEARCH Open Access Full Text Article In silico discovery and in vitro activity of inhibitors against Mycobacterium tuberculosis 7,8-diaminopelargonic acid synthase (Mtb BioA) This article was published in the f ollowing Dove Press journal: Drug Design, Development and Therapy 2 March 2017 Number of times this ar ticle has been vie wed Junie B Billones 1,2 Maria Constancia O Carrillo 1 Voltaire G Organo 1 Jamie Bernadette A Sy 1 Nina Abigail B Clavio 1 Stephani Joy Y Macalino 1 Inno A Emnacen 1 Alexandra P Lee 1 Paul Kenny L Ko 1 Gisela P Concepcion 3 OVPAA-EIDR Program, “ComputerAided Discovery of Compounds for the Treatment of Tuberculosis in the 1 Abstract: Computer-aided drug discovery and development approaches such as virtual screening, molecular docking, and in silico drug property calculations have been utilized in this effort to discover new lead compounds against tuberculosis. The enzyme 7,8-diaminopelargonic acid aminotransferase (BioA) in Mycobacterium tuberculosis (Mtb), primarily involved in the lipid biosynthesis pathway, was chosen as the drug target due to the fact that humans are not capable of synthesizing biotin endogenously. The computational screening of 4.5 million compounds from the Enamine REAL database has ultimately yielded 45 high-scoring, high-afﬁnity compounds with desirable in silico absorption, distribution, metabolism, excretion, and toxicity properties. Seventeen of the 45 compounds were subjected to bioactivity validation using the resazurin microtiter assay. Among the 4 actives, compound 7 ((Z)-N-(2-isopropoxyphenyl)-2oxo-2-((3-(triﬂuoromethyl)c yclohexyl)amino)acetimidic acid) displayed inhibitory activity up to 83% at 10 g/mL concentration against the growth of the Mtb H37Ra strain. Keywords: CADDD, ADMET, TOPKAT, BioA inhibitor, structure-based pharmacophore, pharmacophore, molecular docking, resazurin microtiter assay Billones Lecture Notes Billones Lecture Notes Billones Lecture Notes Assignment 1. Using SwissSimilarity (http://www.swisssimilarity.ch), generate the top 10 structures that are similar to Celecoxib (or Celebrex), a COX-2 selective nonsteroidal anti-inflammatory drug (NSAID). Use FP2 Fingerprints descriptors and bioactive ChEMBL full database. SMILES: O=S(=O)(c3ccc(n1nc(cc1c2ccc(cc2)C)C(F)(F)F)cc3)N Copy the SMILES of each compound. 2. Similarly, generate the top 10 structures that are similar to Mefenamic acid, a non-selective NSAID. SMILES: O=C(O)c2c(Nc1cccc(c1C)C)cccc2 Copy the SMILES of each compound. 3. Generate the descriptors of each molecule using the BlueDesc Descriptor calculator in ChemDes. Billones Lecture Notes 4. Combine the data in one datasheet. Remove the descriptors whose values did not vary. Add an additional column for Class and input selective for the Celecoxib top hits and nonselective for the mefenamic top hits. 5. Perform Logistic Regression in RapidMiner using descriptors that are not highly correlated (r < 0.4). 6. Perform similar analysis using Random Forest machine learning technique. (Use Information Gain criterion in splitting the nodes, Maximal depth = 20, and ntree = 100) 7. Dock the correctly classified compounds to COX-2 enzyme (PDB: 3LN1) using SwissDock and rank-order them according to their binding energy (from most negative down to least negative BE). (Choose only the pose with the most negative estimated DG in each case.) 8. Predict the druglikeness of the top 5 hits by calculating its properties using Molinspiration (https://www.molinspiration.com) and pharmacokinetics and druglikeness properties in SwissADME. Billones Lecture Notes Logistic Regression process should be similar to this: 0.7 , 0.3 r > 0.4 Class Random Forest process should be similar to this: 0.7 , 0.3 r > 0.4 Class ntree = 100 maxdepth = 20 criterion = Information Gain Billones Lecture Notes

Document Details

Tags

Related

Full Transcript

Upgrade to continue