Analysis of Protein Structures - PDF

A n a l y s i s o f p ro te i n st r u c t u re s Residue solvent accessibility ❑ Solvent accessible surface area What is it? Why do we care? Residue solvent accessibility 3 Residue...

A n a l y s i s o f p ro te i n st r u c t u re s Residue solvent accessibility ❑ Solvent accessible surface area What is it? Why do we care? Residue solvent accessibility 3 Residue solvent accessibility ❑ Solvent accessible surface area (ASA, SASA or SAS, in Å2) → It quantifies the extent to which a residue in a protein structure is accessible to the solvent ❑ Typically calculated by rolling a spherical probe of a particular radius over a protein surface and summing the area that can be accessed by this probe on each residue ➔ Residue solvent accessibility 4 Residue solvent accessibility ❑ Solvent accessible surface area (ASA, SASA or SAS, in Å2) ❑ Solvent excluded surface (SES) – also known as molecular surface, or Connolly surface area Water radius  1.4 Å VdW VdW = Van der Waals radius Residue solvent accessibility 5 Residue solvent accessibility ❑ Solvent accessible surface area (ASA, SASA or SAS, in Å2) ❑ Solvent excluded surface (SES) – also known as molecular surface, or Connolly surface area – usually represented in “surface” visualization  SASA SES Residue solvent accessibility 6 Residue solvent accessibility ❑ Relative accessible surface area (rASA) ▪ Ratio of the actual accessible area of a given residue rASA = ASA / ASAMAX ▪ Enables comparison of accessibility of different amino acids (e.g., long extended vs. spherical amino acids) ❑ Simplified two state description ▪ Buried vs. exposed residues ▪ Threshold for differentiating surface residues vs. buried is not well defined (usually rASA = 15–25 %) ▪ rASA < threshold => buried rASA ≥ threshold => exposed Residue solvent accessibility 7 Protein solubility ❑ Definition: concentration of protein in saturated solution that is in equilibrium with solid phase ❑ For proteins expressed in the lab: multiple factors ❑ Hydrophilic/hydrophobic balance of the solvent-exposed residues ❑ Aggregation-prone regions (APRs) – mainly hydrophobic residues prone to form beta-structures ❑ Protein expressibility in the cells Cross-beta spines of amyloid fibrils Protein solubility 10 Molecular interactions ❑ Intra-molecular – within the same protein structure ❑ Inter-molecular – between different proteins in assemblies ❑ Essential to understand the molecular basis for function and stability of proteins and their complexes Remember?... Molecular interactions 14 Types of interactions ❑ Charge-charge (ionic) interactions ▪ Present in charged residues; ex. salt bridges ❑ Hydrogen bonds (H-bonds) ▪ Donor and acceptor atoms sharing a hydrogen atom ❑ Aromatic (π-π) interactions ▪ Attractive interaction between aromatic rings ❑ Van der Waals (vdW) interactions ▪ Between any two atoms; more important for non-polar residues ❑ Hydrophobic interactions ▪ Entropic origin; important for non-polar/hydrophobic residues Molecular interactions 15 Types of interactions ❑ Disulfide bonds (cysteine bridges) 2 Cys: ❑ Cation-π interactions ▪ Electrostatic interaction of a positively charged residue (Lys or Arg) with an aromatic residue (Phe, Trp, or Tyr) Cation + Lys Aromatic ring Trp Molecular interactions 16 Polar interactions ❑ Arginine interactions ❑ Cation-π: positively charged Arg interacts with aromatic rings ❑ Arginine-arginine stacking: two Arg form parallel “aromatic” stacking Arg: Guanidinium group:  charge Molecular interactions Molecular interactions – how to identify? ❑ Criteria for recognizing various types of interactions ▪ Atom types/functional group ▪ Geometric rules (distances, angles) ▪ Energetics (physicochemical rules) ▪ Contact surface area between atoms If SASATotal < SASAA + SASAB  Interaction Molecular interactions 18 Functional sites Examples? Why are they important? Functional sites 25 Binding sites ❑ Sites on the protein that provides the complementarity for the bound molecule (ligand) ▪ Binding site – its function is molecular recognition ▪ Active/catalytic site– its function is to promote chemical catalysis (break/formation of covalent bonds) – special case of the binding site ❑ Binding involves the formation of non-covalent interactions between the protein and the bound molecule ❑ Bound molecule – small molecule or macromolecule ❑ Binding is usually very specific – complementarity in shape and charge distribution between the site and bound molecule Functional sites → binding sites 27 Binding sites for small molecules ❑ Usually: internal cavities, surface pockets or clefts ▪ Concave regions ▪ Provide microenvironment different from that of the bulk solvent (e.g., many residues with negative charge → very strong electrostatic field enabling binding of highly charged ligands) ▪ Often identifiable by a simple examination of the protein structure ❑ Highly conserved by evolution ❑ Low desolvation energy ❑ Characteristic physicochemical properties Functional sites → binding sites → binding sites for small molecules 29 Binding sites for small molecules ❑ Approaches to identify binding sites: ❑ Evolutionary conservation ❑ Physical detection of “pockets” ▪ Geometry based methods ▪ Energy based methods ❑ Knowledge-based ▪ Machine learning-based methods ▪ Template-based methods ▪ Microenvironment-based methods Functional sites → binding sites → binding sites for small molecules 37 Evolutionary conservation ❑ Residues important for protein function or stability tend to be highly conserved over evolution ❑ Residue conservation in a set of related proteins can be derived from a multiple sequence alignment (MSA) ❑ Mapping of conservation on structure can reveal patches of conserved surface residues – potential binding sites ❑ Protein interior usually more conserved than surface – not suitable for prediction of buried cavities ❑ Not very specific – better to combine with other features Functional sites → binding sites → binding sites for small molecules 38 Physical detection of “pockets” ❑ Analyze the protein surface for pockets (clefts, cavities) ❑ Geometry-based methods ▪ Define favorable cleft regions based on steric assessments ❑ Energy-based methods ▪ Define favorable cleft regions based on energetic evaluations Functional sites → binding sites → binding sites for small molecules 42 Energy-based methods ❑ Pockets are defined by energetic criteria ❑ Evaluate the interaction energy between the protein and a molecular fragment – probe (e.g., a methyl, hydroxyl, amine, etc.) to locate energetically favorable binding sites ❑ Can be combined with other methods to assess the ligandability (ability of a cavity to bind ligands) Note: druggability is referred to the likelihood of finding orally bioavailable small molecules that bind to a particular target in a disease-modifying way. Ligandability is a requirement but not sufficient condition for druggability. Functional sites → binding sites → binding sites for small molecules 46 Knowledge-based: binding site similarity ❑ Prediction of binding sites is based on the similarity with other (known) binding sites ❑ Template-based methods ▪ Binding sites are represented by 3D templates ▪ Based on similarity between homologous proteins ❑ Microenvironment-based methods ▪ Based on description of local environment, such as type of residues, their distances, solvent accessibility and physicochemical properties Functional sites → binding sites → binding sites for small molecules 50 Template-based methods ❑ Definition and construction of 3D templates of features ▪ Local structural motifs, patterns and descriptors that characterize the binding sites (e.g., functional groups, shape, solvent accessibility, etc.) ▪ Capture the essence of the binding sites in the protein ▪ Usually apply constraints on atom types and occasionally sequential relationships ❑ Search a database for structures using template as a query ▪ Identification of structures with a given binding site ❑ Compare the query structure against a 3D template database ▪ Identification of potential binding sites in the query structure Functional sites → binding sites → binding sites for small molecules 51 Binding sites for macromolecules What’s different? Functional sites → binding sites → binding sites for macromolecules 55 Binding sites for macromolecules ❑ Typically protruding loops, large surface clefts but also flat binding sites – flatter than binding sites for small molecules ▪ Recognition of a macromolecule involves interactions over a large continuous surface area or several discrete binding regions ▪ Difficult to identify by a simple examination of the protein structure ❑ High evolutionary conservation ❑ Low desolvation energy ❑ Characteristic physicochemical properties ❑ DNA binding sites have characteristic motifs and positive charged electrostatic patches Functional sites → binding sites → binding sites for macromolecules 56 Binding sites for macromolecules ❑ Approaches to identify binding sites ▪ Evolutionary conservation ▪ Knowledge-based ❑ Meta-servers (tools that combine several methods) Functional sites → binding sites → binding sites for macromolecules 58 Knowledge-based methods ❑ Combine multiple interface features ▪ Conservation ▪ Residue propensity for being at protein-protein interfaces (hydrophobic, aromatic, and charged residues are more likely) ▪ Physicochemical properties ▪ Structural properties ❑ Use known binding sites for parameterization or training → empirical scoring functions and machine learning methods Functional sites → binding sites → binding sites for macromolecules 60 Transport pathways What are these? Examples? Functional sites → transport pathways 63 Transport pathways ❑ Mediate transport of ions and small molecules in proteins – an essential role in functioning of large variety of proteins ▪ Channels/pores – transport of substances across membranes ▪ Tunnels – exchange of ligands between buried active/binding site cavities and the bulk solvent ▪ Intramolecular tunnels – transport of reaction intermediates between two distinct active sites in bifunctional enzymes ❑ The permeability to different substances depends on their size (radii), shape (length and curvature), amino acid composition (physicochemical properties) and dynamics Functional sites → transport pathways 64 Transport pathways & voids Pocket/ cleft/groove Channel tunnel Protein channel Bottleneck Cavity ▪ Bottleneck – the narrowest part of the tunnel/channel; it has critical importance for Enzyme the selectivity tunnels Functional sites → transport pathways 65 Prediction of transport pathways ❑ Identification of overall voids in proteins ❑ Identification of tunnels ❑ Identification of channels Functional sites → transport pathways 71 Identification of overall voids ❑ Methods that aim to accurately represent all types of voids in a protein structure, including channels, tunnels, surface clefts, pockets as well as internal cavities ❑ Usually provide very limited information on tunnel and channel characteristics – the identified voids have to be separated from each other ❑ Geometry-based methods for pocket detection ▪ HOLLOW – http://hollow.sourceforge.net/ ▪ 3V – http://3vee.molmovdb.org/ ▪ fPocket, LIGSITEcsc, PASS, CASTp, SURFNET, POCASA … Functional sites → transport pathways 72 Identification of tunnels ❑ Methods that calculate tunnels connecting occluded cavities with the surrounding bulk solvent ❑ Identify the pathways from a cavity to the protein surface ❑ Voronoi diagrams described by the skeleton of voids between atoms to find all theoretically possible pathways connecting the starting point with the bulk solvent ❑ Diagrams of optimal pathways using Dijkstra’s algorithm, based on criteria defined by a cost function ❑ The probe size defines the lowest radius threshold ❑ Tunnel geometry is approximated by a sequence of spheres Functional sites → transport pathways 73 Identification of tunnels Atom Tunnel origin Tunnel mouth Voronoi diagram Common limitation: the tools identify two spherical tunnels instead of one Probe size: the minimum radius specified for the asymmetric tunnel tunnel search Allowed pathway according to the selected probe Disallowed pathways Functional sites → transport pathways 74 Identification of channels ❑ Methods that calculate channels (or pores) penetrating throughout the proteins ❑ Not suitable to identify tunnels leading from occluded cavities ❑ Usually analyze just one channel per structure ❑ Usually need information about approximate position and direction of the channel (channel axis) – user-provided or automatically identified Functional sites → transport pathways 78 P ro te i n - l i ga n d c o m p l exe s Molecular recognition  Molecular recognition refers to the specific interactions between two or more molecules through non-covalent bonding  Different biological roles  Specific binding  Catalysis  Signaling  Several models to explain molecular recognition Molecular recognition 11 Lock-and-key model  E. Fisher – 1894 What is it? Molecular recognition – mechanisms 12 Lock-and-key model  E. Fisher – 1894  Complementarity between receptor’s binding site and the ligand  Size & shape  Physicochemical properties  Both ligand and receptor are considered rigid  Not sufficient to explain allostery, non-competitive inhibition, or catalysis   Model dismissed, only used for educational purposes Molecular recognition – mechanisms 13 Induced-fit model  D. E. Koshland – 1956 What is it? Molecular recognition – mechanisms 14 Induced-fit model  D. E. Koshland – 1956  Only partial complementarity necessary  Both ligand and receptor can undergo conformational adjustments upon complexation  Conformation of the bound receptor does not exist in its free state Molecular recognition – mechanisms 15 Selected-fit model  B. F. Straub – 1964  This model is also called: conformational selection, fluctuation-fit or population selection  Receptor and ligand flexible  considered as ensembles  Complex is formed in a lock-and-key fashion when two complementary configurations occur  Conformation of the bound receptor exists also in its free state Molecular recognition – mechanisms 16 Keyhole-lock-key model  Z. Prokop – 2012  When the receptor has a buried active site and tunnels  Complementarity with the ligand is needed both for the active site and the tunnel  Explains the extra selectivity filter provided by the tunnel Molecular recognition – mechanisms 17 Biocatalysis  Enzymes increase the speed of chemical reactions by decreasing the activation barrier transition state ▪ Kinetic rate: Ea without −𝐸𝑎 transition state enzyme 𝑘= 𝐴𝑒 𝑅𝑇 G‡ (Arrhenius equation) Ea with enzyme ▪ Lower Ea  higher k enzyme-substrate reactants complex enzyme-products H (faster reaction) complex products Molecular recognition – biocatalysis 18 Biocatalysis  Enzymes increase the speed of chemical reactions by decreasing the activation barrier  Provide environments that stabilize the transition state(s) Ea without transition enzyme state G‡ Ea with enzyme enzyme-substrate reactants complex enzyme-products H complex products Molecular recognition – biocatalysis 19 Structures of complexes  Complexes in RSCB PDB  Databases of complexes  PDBbind  BindingDB  ChEMBL  …  Experimentally determined complexes! Structure of complexes 20 Protein druggability  Druggability  Likelihood of a particular protein to be modulated or targeted by a drug-like molecule in a way that leads to a therapeutic effect  Meaning, it can bind with high affinity to selective, bioavailable, low-molecular weight molecules  Lipinski’s rule of 5 (for orally-active drugs)  MW ≤ 500 Da  ≤ 5 H-bond donors (NH, OH); ≤ 10 H-bond acceptors (F, O, N)  Partition coefficient (log Po/w) ≤ 5  Usually 1 violation is acceptable Protein druggability 34 Protein druggability  Druggability Protein druggability 35 Protein druggability  Druggability Protein druggability 36 Protein druggability  Prediction of protein druggability  By similarity to known target  Sequence of binding domain  Structural features of binding sites  From databases of known targets  Predictive tools: PockDrug Server, DoGSiteScorer, …  Important in target identification phase of drug discovery  Unfortunately, many resources are only private or commercial Protein druggability 37 Small molecules  Representation of small molecules  Databases of small molecule  Cambridge Structural Database  PubChem database  ZINC database  Preparation of small molecule structure Small molecules 41 Representation of small molecules  1D – atom based (empirical formula)  C2H5Cl  2D – chemical structure diagram  Topology or SMILES (Simplified Molecular Line Entry System) CCCl C1=CC=C(C=C1)CN  3D – atomic coordinates  Usually: PDB, SDF or MOL2 files  Beware: may have different protonation states Small molecules 42 Molecular docking What is it? Molecular docking 51 Molecular docking  Useful when experimental data is not available or for virtual screening Crystal (experimental) Docking attempts Score RMSD Molecular docking 52 Molecular docking  Several components/steps  Receptor representation  Ligand representation  Search of binding modes  Scoring Receptor Ligand Complex Molecular docking 53 Receptor representation  Receptor represented only by relevant binding site  Descriptor representation – derived from geometry and interaction abilities of binding site (H-bond donor/acceptor, hydrophobic contacts, …)  Grid representation – entire search region is covered by orthogonal equidistant points carrying information about chemical properties or the interactions of probe atom at those points with the receptor atoms Precomputing properties can speed up calculations Molecular docking – receptor 54 Receptor representation  Receptor flexibility  Fully rigid approximation  Soft docking – employs tolerant “soft” scoring functions to simulate plasticity of otherwise rigid receptor  Explicit side-chain flexibility – optimization of residues by rotating part of their structure or rotation of whole side-chains using predefined rotamer libraries  Docking to molecular ensemble of protein structure – obtained from multiple crystal structures, from NMR structure determination or from a trajectory produced by MD simulation Molecular docking – receptor 55 Ligand representation  Ligands represented by all atoms or just some  Non-polar hydrogens can be united with their respective parent carbon atoms to reduce number of atoms in calculation  Ligand flexibility  Only rotation about single bonds  Docking of a library of pre-generated ligand conformations – applicable only to quite rigid ligands due to exponential increase in number of possible conformers with number of rotatable bonds  Direct sampling of ligand conformational space during searching  Fragment-based techniques – ligand is cut into several fragments and rigidly docked into binding site Molecular docking – ligand 56 Molecular docking – search  Many search algorithms available  Rigid docking   Semi-flexible    Fully flexible  (but demanding)   Molecular docking – search 57 Molecular docking – search  Geometry-based and combinatorial algorithms  Assumes that binding is governed by shape and/or physicochemical complementarity between the ligand and the receptor  Assumes that the degree of complementarity is proportional to the binding energy which is not always true especially for more polar ligands  Energy-driven and stochastic algorithms  Tries to locate directly the global minimum of the binding free energy corresponding to the experimental structure  Random basis of these methods requires multiple independent runs of docking calculations to achieve consistent results Molecular docking – search 58 Molecular docking – scoring  Scoring function  Evaluate all the binding modes from the searching algorithms  Must be computationally efficient and provide accurate description of protein-ligand interactions  Application of scoring functions to rank  Several configurations of one ligand bound to one protein – essential for prediction of the best binding mode  Different ligands bound to one protein – determination of substrate or inhibitor specificity  One ligand bound to several different proteins – functional annotation of proteins and study of drug selectivity Molecular docking – scoring 67 Molecular docking – scoring  Categories of scoring functions  Empirical  Knowledge-based  Force field-based  Machine learning Molecular docking – scoring 68 Evaluation of complexes  Intermolecular interactions  Binding energies Evaluation of complexes 72 Intermolecular interactions  Most common types  Hydrogen bonds  Hydrophobic  Aromatic  Ionic interactions Evaluation of complexes 73 Transport of small molecules  Describe trajectory of ligands through tunnels  Based on geometry w/wo molecular docking  Fast but low accuracy  Good for screening purposes  CaverDock, MoMA-LigPath, SLITHER  Based on force field  Run multiple MD simulations  Accurate but computationally demanding  Metadynamics, steered MD, adaptive sampling, etc. Transport of small molecules 76

Analysis of Protein Structures - PDF

Document Details

Tags

Related

Summary

Full Transcript