3D Molecular Structures - Lecture Notes PDF
Document Details
Uploaded by DazzlingFreedom
University of the Philippines Manila
Billones
Tags
Summary
This document contains lecture notes regarding the representation and manipulation of 3D molecular structures, touching on topics such as databases, pharmacophores, and theoretical methods. Some examples of theoretical methods are explored, along with practical applications.
Full Transcript
2.1 REPRESENTATION AND MANIPULATION OF 3D MOLECULAR STRUCTURES Introduction Experiment al 3D Databases 3D Pharmacophores Implementati on 3D Databases Searching Methods to Derive 3D Pharmacophores Billones Lecture Notes The 3D structures or conformations determines the steric and electronic...
2.1 REPRESENTATION AND MANIPULATION OF 3D MOLECULAR STRUCTURES Introduction Experiment al 3D Databases 3D Pharmacophores Implementati on 3D Databases Searching Methods to Derive 3D Pharmacophores Billones Lecture Notes The 3D structures or conformations determines the steric and electronic properties of a molecule. requires very complex computational models such as quantum mechanics or molecular simulation. Billones Lecture Notes 2.2 EXPERIMENTAL 3D DATABASES The data stored in a 3D database either comes from experiment or can be calculated using a computational method. CSD (since 1965) contains x-ray structures of about 1,000,000 organic and organometallic compounds Billones Lecture Notes PDB (1971) is a communal repository of data files, one for each protein structure. now contains approximately 180,000 structures, most obtained using x-ray crystallography but with some determined using NMR and Cryo-EM techniques. Billones Lecture Notes • The CSD and the PDB - used extensively for data mining, to derive knowledge and rules about conformational properties and intermolecular interactions. This knowledge can then be used in theoretical methods such as conformational search algorithms or protein–ligand docking programs. • The structures in PDB form the basis of comparative modeling (also known as homology modeling) • The CSD is used to determine: how bond lengths depend upon the nature and environment of the atoms involved the conformations of specific molecular fragments Billones Lecture Notes A comparison of the torsion angle distributions for the O–C–C–O fragment (left) and the C–C–C–C fragment (right), derived from the CSD. Billones Lecture Notes The CSD and PDB can be mined to provide information about intermolecular interactions. The intermolecular distribution of hydroxyl groups about esters as extracted from the CSD. • The much larger number of hydroxyl groups that interact with the carbonyl oxygen in the ester functionality demonstrate its greater propensity to act as an acceptor than the ether oxygen. Billones Lecture Notes 2.3 3D PHARMACOPHORES 3D database systems are used for identification of compounds that possess 3D properties believed to be important for interaction with a biological target. 3D pharmacophore set of features together with their relative spatial orientation. Billones Lecture Notes • The use of pharmacophore features is a natural extension of the concept of bioisosterism, which recognizes that certain functional groups have similar biological, chemical, and physical properties. A selection of common bioisosteres; each line contains a set of distinct functional groups that can often be substituted for each other while retaining the biological activity. Billones Lecture Notes • 3D pharmacophore searching is useful when trying to identify structures that have the desired activity but which are from a previously unexplored chemical series (also referred to as “lead hopping”) • The spatial relationships between the features in a 3D pharmacophore can be specified as distances or by defining the (xyz) locations of the features. An example of a simple H1 antihistamine pharmacophore together with some typical inhibitors Billones Lecture Notes • A 3D pharmacophore may also include geometric features such as centroids, planes and angles together with regions of space (excluded volumes) that should not be occupied by the ligand. Some of the features that can be incorporated into 3D pharmacophores. Billones Lecture Notes 2.4 IMPLEMENTATION OF 3D DATABASE SEARCHING Two-stage procedure in 3D searches of databases 1. Rapid screening - eliminates molecules that could not match the query. 2. Graph-matching - identifies those structures that do truly match the query. First, encode information using bitstrings about the distances (or angular information) between relevant functional groups or features in the molecular conformation. Then, a subgraph isomorphism method such as the Ullmann algorithm is employed. In the 3D graph there is an edge between all pairs of atoms, thus, it represents the topography of a molecule. (Can distinguish stereoisomerism.) Billones Lecture Notes 2.5 THEORETICAL 3D DATABASES For most compounds no crystal structure is available. virtual compounds using Combinatorial Chemistry experimental data (if available) is for a single conformation only most molecules have many conformations must have mechanism for taking the conformational space in 3D database searching Billones Lecture Notes 2.5.1 Structure-Generation Programs • A single, low energy conformation is generated from 2D representation (e.g. connection table or SMILES) as input. • The two most widely used structure-generation programs are CONCORD and CORINA The first step is to identify features such as rings, bond orders and stereocenters in the molecule. Next, the acyclic side chains are added in a default low-energy conformation and finally the structure is adjusted to deal with any high-energy, unfavorable interactions. Billones Lecture Notes Billones Lecture Notes 2.5.2 Conformational Search and Analysis Conformational analysis aims to identify of all accessible minimum-energy structures of a molecule. Billones Lecture Notes Billones Lecture Notes 2.5.2 Conformational Search and Analysis Global minimum-energy conformation - the conformation with the lowest energy, although may not necessarily correspond to a biologically active 3D structure. Billones Lecture Notes • a search algorithm to generate a series of initial conformations. • each of conformation in turn is then subjected to energy minimization using molecular or quantum mechanical methods. Billones Lecture Notes • H2 has only 1 configuration • H2O has only 1 configuration • NH3 has only 2 configurations (pyramidal and planar) Billones Lecture Notes Billones Lecture Notes • The number of local minima goes exponentially with the number of variables (degrees of freedom) – a combinatorial explosion problem • Number of conformations = sN where N = number of free rotation angles s = number of discrete values for each rotation angle = 360 / i i = dihedral increment of angle i e.g. Butane N=1 s = 360/60 = 6 # of conformer = 61 = 6 Billones Lecture Notes Billones Lecture Notes 2.5.3 Systematic Conformational Search Systematic search method - conformations are generated by systematically assigning (predetermined) values to the torsion angles of the rotatable bonds in the molecule. e.g. Grid search - conformations corresponding to all possible combinations of torsion angle values are generated. If the number of values permitted to torsion angle i is ni and there are N variable torsion angles in the molecule then the total number of conformations C generated by the grid search is: Billones Lecture Notes • Improvements in efficiency can be made if the large proportion of the structures with high-energy due and unfavorable interactions due to clashes between parts of the structure can be eliminated. a clash • This can be achieved using a depth-first search with tree pruning. First the order of variation of the torsion angles is determined. A conformation is generated in which all torsion angles have their first value assigned. The second conformation is generated by modifying the value of the last torsion angle. When all of its values are exhausted the algorithm backtracks to consider the penultimate torsion angle, and so on. This can be represented in a tree-like structure. Should a problem be encountered the algorithm immediately rejects all combinations that lie below that particular node in the tree. Billones Lecture Notes Tree representation of conformational space used in the depth-first search with backtracking. there are three variable torsion angles with three, two and two values permitted to each respectively, giving a total of 12 possible conformations. each node in the tree represents a state in which between one and three of these torsions have been assigned. The nodes are visited in the sequence 0–1–4–10–4–11–4–1–5–12–5–13–5–1–0–2– 6–14 and so on. Should a problem be detected the algorithm immediately backtracks to the appropriate node. For example, if a problem arises at node 4 then the algorithm moves to node 5; a problem at node 2 leads to node 3. Billones Lecture Notes Billones Lecture Notes Billones Lecture Notes 2.5.4 Random Conformational Search • Random conformational search methods involve some form of iterative procedure in which a structure is selected from those previously generated, randomly modified and then minimized. • If this results in a new conformation it is added to the list of structures found. The process is then repeated. • There is no natural end point for a random search; the process continues until either a predefined number of iterations has been attempted and/or until no new conformations can be generated. • Modifications are most frequently achieved by either varying torsion angles (keeping the bond lengths and angles fixed) or by changing the (xyz) coordinates of the atoms. Billones Lecture Notes Metropolis Monte Carlo scheme is used to make the selection. Make random move and produce a new conformation If the energy of the new structure (Vnew) is lower than its predecessor (Vold) then it is used as the next starting structure. Metropolis criterion: If Vnew is higher than Vold then the Boltzmann factor, P = exp[−(Vnew − Vold)/kT] is calculated (k is the Boltzmann constant and T is the temperature). If P is larger than a random number between zero and one then the new structure is selected. If not then the previous structure is retained. Metropolis method makes it possible for higher-energy structures to be selected; these may correspond to previously unexplored regions of conformational space. Billones Lecture Notes Billones Lecture Notes Simulated annealing – involves gradual reduction of temperature • At high temperatures the system can overcome high-energy barriers; this enables it to explore the search space very widely. • As the temperature falls so the lower energy states become more probable. Billones Lecture Notes 2.5.5 Other Approaches to Conformational Search Distance Geometry uses a description of molecular conformation based on interatomic distances and various mathematical procedures to generate structures for energy minimization. • matrix containing the maximum and minimum values permitted to each interatomic distance in the molecule is calculated • each interatomic distance is arbitrarily assigned values between the upper and the lower bounds • distance matrix is transformed into trial set of cartesian coordinates • structure is refined and conformation is generated Billones Lecture Notes Molecular dynamics solves Newton’s equations of motion for the atoms in the system, to give a trajectory that defines how the positions of the atoms vary with time. Billones Lecture Notes Billones Lecture Notes Billones Lecture Notes 2.6 Methods to Derive 3D Pharmacophores Pharmacophore mapping - the process of deriving a 3D pharmacophore Two key issues to consider when deriving 3D pharmacophores: conformational flexibility many different combinations of pharmacophoric groups As a consequence, there may be hundreds of potential 3D pharmacophores. The objective is to determine which of these potential pharmacophores best fits the data. In general, the aim is to identify the 3D pharmacophore(s) that contains the largest number of features common to all of the active molecules, and where these common features can be presented by each molecule in a low-energy conformation. Billones Lecture Notes 2.6.1 Pharmacophore Mapping using Constrained Systematic Search identify the pharmacophoric groups in each molecule that will be overlaid in the final pharmacophore. D rug D esign, D evelopment and erapy D ovepress open access t o sci en t ifi c an d m edical r esear ch ORIGINAL RESEARCH Open Access Full Text Article In silico discovery and in vitro activity of inhibitors against Mycobacterium tuberculosis 7,8-diaminopelargonic acid synthase (Mtb BioA) This article was published in the f ollowing Dove Press journal: Drug Design, Development and Therapy 2 March 2017 Number of times this ar ticle has been vie wed Junie B Billones 1,2 Maria Constancia O Carrillo 1 Voltaire G Organo 1 Jamie Bernadette A Sy 1 Nina Abigail B Clavio 1 Stephani Joy Y Macalino 1 Inno A Emnacen 1 Alexandra P Lee 1 Paul Kenny L Ko 1 Gisela P Concepcion 3 Abstract: Computer-aided drug discovery and development approaches such as virtual screening, molecular docking, and in silico drug property calculations have been utilized in this effort to discover new lead compounds against tuberculosis. The enzyme 7,8-diaminopelargonic acid aminotransferase (BioA) in Mycobacterium tuberculosis (Mtb), primarily involved in the lipid biosynthesis pathway, was chosen as the drug target due to the fact that humans are not capable of synthesizing biotin endogenously. The computational screening of 4.5 million compounds from the Enamine REAL database has ultimately yielded 45 high-scoring, high-affinity compounds with desirable in silico absorption, distribution, metabolism, excretion, and toxicity properties. Seventeen of the 45 compounds were subjected to bioactivity validation using the resazurin microtiter assay. Among the 4 actives, compound 7 ((Z)-N-(2-isopropoxyphenyl)-2oxo-2-((3-(trifluoromethyl)c yclohexyl)amino)acetimidic acid) displayed inhibitory activity up to 83% at 10 g/mL concentration against the growth of the Mtb H37Ra strain. Billones Lecture Notes 2.6.1 Pharmacophore Mapping using Constrained Systematic Search the most rigid molecule is then taken and its conformational space explored. during the conformational search the distances between all pairs of the selected pharmacophoric groups are recorded. the second most rigid molecule is then taken, and using the inter-pharmacophore distance ranges derived from the first molecule, constraints on the values permitted to each of its torsion angles are derived. only torsion angle values explored for the rotatable bonds in the second molecule are those that may permit it to match the pharmacophore distances found for the first molecule. as more and more molecules are considered so the distance ranges become more and more restricted. when the more flexible compounds are considered the expectation is that only very limited ranges are possible on each of its torsion angles, making the search very efficient. Billones Lecture Notes As successive molecules are considered in the constrained systematic search procedure so the distance constraints between pharmacophoric features become more restricted. Thus the figure on the left shows the distances permitted to the first molecule; these regions may be further restricted after analysis of the second molecule. Billones Lecture Notes Some typical ACE inhibitors together with the features and the five distances used to define the 3D pharmacophores as discovered by the constrained systematic search method. • The two pharmacophores found correspond to different values of the distances d1−d5. • Du indicates the presumed location of the zinc atom in the enzyme (the extended point). Billones Lecture Notes 2.6.2 Pharmacophore Mapping using Clique Detection Clique • a completely connected subgraph in a graph. • It contains a subset of the nodes in the graph such that there is an edge between every pair of nodes in the subgraph. Maximum clique - the largest subgraph that is present. Highlighted in bold are the two cliques present in this graph (other than trivial two-node cliques). Billones Lecture Notes Simple illustration of clique detection method. Consider two molecules, each of which contains two donors (D/d) and two acceptors (A/a) max clique not a clique • The two graphs at the top illustrate the distances between these sets of features in the two conformations being considered. • There are eight nodes in the correspondence graph (bottom) but only six edges. • From these, there is one maximum clique which consists of three pairs of matching features (bottom left). Billones Lecture Notes The DISCO program which uses clique detection • generates a series of low-energy conformations for each molecule. • chooses the “reference” molecule, i.e. one with the smallest number of conformations. • then considers each of the conformations of the reference as reference conformation. • compares all the conformations of the other molecules in the set to this reference conformation and then the cliques identified. • examines the entire set of cliques considering each of the conformations of the reference molecule. Any clique that is common to all of the molecules, such that it is matched by at least one conformation of each molecule in the set, is a common pharmacophore. Billones Lecture Notes 2.6.3 Maximum Likelihood Method for Pharmacophore Mapping Maximum likelihood method • Catalyst/HipHop uses a pre-calculated set of low-energy conformations obtained using poling, which generates a small set of conformations that “covers” pharmacophore space • The poling method adds an additional penalty term to the energy function during the minimization part to “push” the conformation away from those found previously. • First step is to identify all configurations of pharmacophoric groups that are present in the molecules. • Each molecule is taken in turn as a reference structure and its conformations examined. Billones Lecture Notes • All possible combinations of pharmacophoric groups are generated exhaustively. • Each configuration of pharmacophoric groups is compared to the other molecules in the set to determine a conformation that can be successfully superimposed on the configuration. • The configurations (referred to as hypotheses) that are well matched by the active molecules in the set but which are less likely to be matched by a large set of arbitrary molecules are given higher scores. • The scoring function used is: M - number of active molecules in the set K - number of pharmacophoric groups in the hypothesis Billones Lecture Notes • An active molecule is assigned to the class x=K+1 if it matches all the features x = 1 through x = K if it matches one of the configs with one feature removed, x=0 if it matches neither the full hypothesis nor one of the subconfigs q(x) - fraction of active molecules that matches each of the classes x. p(x) - fraction of a large set of arbitrary molecules that would match the “rare” class x configuration • A higher score for a hypothesis is thus associated with a higher value of q(x) (i.e. more of the active molecules match) and lower values of p(x) (i.e. it is less likely that an arbitrary, non-active molecule would match). Billones Lecture Notes 2.6.4 Pharmacophore Mapping Using Genetic Algorithm Genetic algorithms (GAs) • class of optimization methods that are based on models of Darwinian evolution • found widespread use in many fields (protein–ligand docking, the generation of Quantitative Structure–Activity Relationships (QSAR) models, and library design) • involve the creation of a population of potential solutions that gradually evolves towards better solutions. • evolution is dependent upon, and assessed using, a fitness function which provides a mechanism to score and therefore rank different members of the population. • use a chromosome to encode each member of the population and form the basis for the generation of new potential members of the population. Billones Lecture Notes In GASP the chromosome consists of 2N - 1 bitstrings, where N is the number of molecules. N binary strings are used to represent the torsion angle values in the molecules (and so each specifies one molecule’s conformation The three torsion angles of the molecule indicated are each encoded by one byte (eight bits) for use in the GA. Billones Lecture Notes • N -1 integer strings are used to represent the way in which a base molecule (one with smallest number of features) maps onto each of the other molecules. • Mappings are used to superimpose each molecule in the appropriate conformation onto the base molecule using a molecular-fitting procedure. • When the molecules have been superimposed the fitness value is calculated. • The fitness function is dependent on the number and similarity of the features that are overlaid, the volume integral of the superimposed ensemble of conformations; and the van der Waals energy of the molecular conformations. • An initial population of solutions is first created via the random generation of a set of chromosomes. • Each is scored using the fitness function. The population then evolves to explore the search space and (hopefully) to identify better solutions. Billones Lecture Notes • To generate new members of the population the genetic operators crossover and mutation are applied. • In crossover two parent chromosomes are taken, a cross position is randomly selected and two new child chromosomes are produced by swapping the bits either side of the cross position • The mutation operator involves flipping a bit or altering the value of an integer to a new value chosen at random from the set of allowed values. Billones Lecture Notes ACTIVITY 2 1. List down five organic compounds in the I-CARE Early COVID Treatment protocol (First-Line Therapies) of FLCCC Alliance (https://covid19criticalcare.com/covid-19-protocols/i-care-earlycovid-treatment/). Draw the structure of each molecule in CORINA (https://demos.mnam.com/corina_interactive.html) and take the image of the 3D structure. 2. Identify the top 10 hits from ZINC database in a pharmacophore-based searching in ZINCPharmer platform (http://zincpharmer.csb.pitt.edu) using the following pharmacophore features: Show the 10 structures along with the pharmacophore features. Billones Lecture Notes