Macromolecular Complexes and Interactions PDF

M a c ro m o l e c u l a r c o m p l exe s a n d i nte ra c t i o n s Protein-protein complexes  Two or more polypeptide chains (protomers) may associate into an oligomer  Protein-protein and protein-nucleic acid interactions are essential for every cellular process  Metabolism  Transport  Signal transduction  Genetic activity (transcription, translation, replication, repair,...)  Membrane trafficking  Mobility  … Macromolecular complexes 4 Protein-protein complexes  Obligate complexes  Protomers (individual polypeptides) do not function as independent structures, only when associated  Examples: GABA receptors, ATP synthase, many ion channels, ribosome, etc. GABAB receptor  Non-obligate complexes  Protomers can exist and be functional as independent structures  Examples: hemoglobin, beta-2 adrenergic receptor, insulin receptor, etc. Macromolecular complexes – protein-protein complexes 5 Protein-protein complexes Do individual subunits retain some activity? YES NO Non-obligate Obligate Macromolecular complexes – protein-protein complexes 6 Protein oligomerization  Oligomerization is common  75 % of proteins in a cell are oligomers  Homo-oligomers are the most common  Some proteins exists solely in the oligomeric state  Often symmetric  Oligomerization interfaces are complementary  Favored by evolution Macromolecular complexes – protein-protein complexes 7 Advantages of oligomerization Why do proteins form oligomers? Macromolecular complexes – protein-protein complexes 8 Advantages of oligomerization  Morphology  More complex structures are often required for multiple functions (e.g. membrane pores)  Cooperativity  Allostery (modulation of biological activity)  Multivalent binding  Stability against denaturation  Smaller surface area  Redundancy and error control  E.g. protein translation control Macromolecular complexes – protein-protein complexes 9 Oligomerization interface  Characteristics of oligomeric interface  Large surface area (> 1400 Å2)  Tendency to circular and planar shape (not for obligates)  Some residues protrude from the surface  More non-polar residues (about 2/3) than in other parts of surface  More polar residues (about 1/5) than in protein cores  About 1 H-bond per 200 Å2  “Hot-spot” residues  Responsible for most of the oligomeric interactions  More evolutionary conserved than other surface residues  Frequently polar residues, located about the center of the interface Macromolecular complexes – protein-protein complexes 10 Oligomerization vs Aggregation Oligomerization Aggregation  Oligomers are soluble  Aggregates are insoluble  Precise fold  Can be heterogenous  Proteins are native  Denatured proteins aggregate (not denatured) (temperature, pH, salt…)  Reversible (sometimes)  Irreversible The function of some proteins is to aggregate. Aggregates ≠ pathology Macromolecular complexes – protein-protein complexes 23 Non-pathological aggregates Keratin filaments HET-s (hair, skin, nails) (fungal reproduction and apoptosis) PDB code: 6EC0 6JFV 24 Daskalov et al., 2021, Front. Mol. Neurosci. Pathological aggregates Amyloid β from human brain (involved in Alzheimer’s disease) 50 nm β-solenoid Two different morphologies (I and II) * Transition from I to II 25 Kollmer et al., 2019, Nat Commun Pathological aggregates Amyloid β from human brain (involved in Alzheimer’s disease) Has non-pathological functions too!  Blood-brain barrier maintenance  Anti-microbial peptide  Synapse function  … Bishop and Robinson, 2024, Drugs Aging. 26 Wang et al., 2021, Front. Cell. Neurosci. Protein-nucleic acids complexes  Protein-nucleic acid interactions  Non-specific – electrostatic interactions with negative charge on the backbone of nucleic acid -> Lys and Arg residues  Specific – recognition of particular nucleotide sequences  Major groove – B-DNA  Minor groove – A-DNA or A-RNA  Single strand RNA  Typical interfaces/motifs  DNA binding proteins  RNA binding proteins Macromolecular complexes – protein-nucleic acids complexes 27 Protein-nucleic acids complexes  DNA binding proteins  Helix-turn-helix (+)-sidechains ≈ perpendicular helices Recognises major groove  Zinc finger Zn2+ stabilized by Cys and His residues Zn2+ is essential for folding Zn2+ mediates DNA binding Macromolecular complexes – protein-nucleic acids complexes 28 Protein-nucleic acids complexes  RNA binding proteins  RRM: βαββαβ barrel-like arrangement, sequence-specific RNA recognition  KH domain: ssRNA/DNA binding through H-bonds, electrostatic and shape complementarity  PUF domain: each helix recognizes a single base RNA recognition motif K-homology (KH) domain Pumilio repeat domain (RRM) (PUF) Macromolecular complexes – protein-nucleic acids complexes 29 Quaternary structure in PDB database  Asymmetric unit (ASU)  Macromolecular structures from X-ray crystallography deposited to PDB as a single asymmetric unit  The smallest portion of a crystal structure to which symmetry operations can be applied in order to generate the unit cell  Unit cell (crystal unit)  The basic unit of a crystal that, when repeated in three dimensions, can generate the entire crystal Structure of complexes – quaternary structure in PDB database 39 Quaternary structure in PDB database Structure of complexes – quaternary structure in PDB database 40 Crystalline environment  Crystal contacts  Intermolecular contacts solely due to protein crystallization  Causes artifacts of crystallization  Crystal packing - complicates identification of native quaternary structure Crystal Unit (CU) Asymmetric unit (ASU) Structure of complexes – quaternary structure in PDB database 41 Crystalline environment  Artifacts of crystallization  Concerns about conformation of some surface regions  Often loops or side chains are affected  Can complicate the evaluation of the effects of mutations Structure of complexes – quaternary structure in PDB database 42 Quaternary structure in PDB database  Biological unit  The functional form of a protein in nature  Also called: functional unit, biological assembly, quaternary structure  Can depend on the environment, post-translational modifications of proteins and their mutations Hemoglobin heterotetramer Structure of complexes – quaternary structure in PDB database 43 Biological versus asymmetric unit  Biological unit can consist of: ASU Biol. U  Multiple copies of the ASU  One copy of the ASU  A portion of the ASU Structure of complexes – quaternary structure in PDB database 44 Complex or artifact?  Problem  Most proteins in the PDB have three or more crystal contacts that sum up to 30% of the protein solvent accessible surface area  How to recognize biologically relevant contacts from crystal one? Structure of complexes – complex or artifact? 46 Complex or artifact?  Experimental knowledge of oligomeric state helps with identifying of the structure of native complex  Search literature  Experimental methods  Gel filtration, static or dynamic light scattering, analytical ultracentrifugation, native electrophoresis, …  How to get the structure of a biological unit?  Author-specified assembly  Databases  Predictive tools Structure of complexes – complex or artifact? 47 Author-specified assembly  REMARK 350 in headers of PDB file  Contains symmetry operations to reconstruct biological unit, but…  Verify author-proposed biological unit by other means  Sometimes the specific oligomers were not known at the time the ASU was published  Some authors may have failed to specify the biological unit even when it was known  Rarely, the specified biological unit might be incorrect  Employed by  RCSB PDB and other tools Structure of complexes – complex or artifact? 48 Prediction of 3D structure of complexes Discovering and characterising macromolecular complexes requires heavy experimentation How can we predict macromolecular complexes? Prediction of 3D structure of complexes 51 Prediction of 3D structure of complexes Homology-based predictions Machine learning-based predictions Macromolecular docking Prediction of 3D structure of complexes 52 Homology based methods  A protein complex is built based on a similar protein complex with a known 3D structure  Assumes that the interaction information can be extrapolated from one complex structure to close homologs of interacting proteins  Close homologs (≥ 40% sequence identity) almost always interact in the same way (if they interact with the same partner)  Sequence similarity is only rarely associated with a similarity in interactions  Limited applicability (low number of templates) Prediction of 3D structure of complexes – homology based methods 53 Macromolecular docking  Prediction of the best bound state for given 3D structures of two or more macromolecules  Difficult task  Large search space - many potential ways in which macromolecules can interact  Flexibility of the macromolecular surface and conformational changes upon binding  Can be facilitated by prior knowledge  Ex: known binding site → significant restriction of the search space  Distance constraints on some residues Prediction of 3D structure of complexes – macromolecular docking 57 Macromolecular docking  3 main parameters:  Macromolecule representation  Search algorithm  Scoring function Prediction of 3D structure of complexes – macromolecular docking 58 Macromolecule representation  Representation of the macromolecular surface (applicable to both receptor and ligand)  Geometrical descriptors of shape (set of spheres, surface normals, vectors radiating from the center of the molecule,...)  Discretization of space: grid representation Prediction of 3D structure of complexes – macromolecular docking 59 Macromolecule representation  Macromolecule flexibility  Fully rigid approximation  Soft docking – employs tolerant “soft” potential scoring functions to simulate plasticity of otherwise rigid molecule  Explicit side-chain flexibility – optimization of residues by rotating part of their structure or rotation of whole side-chains using predefined rotamer libraries  Docking to molecular ensemble of protein structure – composed from multiple crystal structures, from NMR structure determination or from trajectory produced by MD simulation Prediction of 3D structure of complexes – macromolecular docking 60 Macromolecule representation  Macromolecule flexibility  Rigid body docking – basic model that considers the two macromolecules as two rigid solid bodies  Semiflexible docking – one of the molecules is rigid, and one is flexible (typically the smaller one)  Flexible docking – both molecules are considered flexible Prediction of 3D structure of complexes – macromolecular docking 61 Macromolecular docking - search  Generally based on the idea of complementarity between the interacting molecules (geometric, electrostatic or hydrophobic contacts)  The main problem is the dimension of the conformational space to be explored:  Rigid docking: 6D (hard)  Flexible docking: 6D + Nfb (impossible!)  Information on the rough location of the binding surface (experimental or predicted) → reduction of the search space Prediction of 3D structure of complexes – macromolecular docking 62 Macromolecular docking - search  Exhaustive search  Full search of the conformational space: try every possible relative orientation of the two molecules  Computationally very expensive – 6 degrees of freedom for rigid molecules (translations + rotations)  Grid approaches Prediction of 3D structure of complexes – macromolecular docking 63 Macromolecular docking - search  Stochastic methods  Monte Carlo  Genetic algorithms  Brownian dynamics ... Prediction of 3D structure of complexes – macromolecular docking 64 Macromolecular docking - scoring  Scoring functions  Evaluation of a large number of putative solutions generated by the search algorithms  Methods often use a two-stage ranking 1. Approximate and fast-to-compute function – used to eliminate very unlikely solutions 2. More accurate function – used to select the best among the remaining solutions Prediction of 3D structure of complexes – macromolecular docking 65 Macromolecular docking - scoring  Scoring functions  Empirical  Knowledge-based  Force field-based  Clustering-based – the presence of many similar solutions is taken as an indication of correctness (all solutions are clustered, and the size of each cluster is used as a scoring parameter) Prediction of 3D structure of complexes – macromolecular docking 66 Analysis of macromolecular complexes  Binding energy  Macromolecular interface  Interaction hot spots Analysis of macromolecular complexes 73 Binding energy  FastContact  http://structure.pitt.edu/servers/fastcontact/  Rapidly estimates the electrostatic and desolvation components of the binding free energy between two proteins  Additionally, evaluates the van der Waals interactions using CHARMM and reports contribution of individual residues and pairs of residues to the free energy → highlight the interaction hot spots Analysis of macromolecular complexes – binding energy 74 Macromolecular interface  The region where two protein chains or protein and nucleic acid chain come into contact  Can be identified by the analysis of the 3D structure of the macromolecular complex Analysis of macromolecular complexes – interface analysis 75 Interface analysis  Provides information about basic features of macromolecular complexes interactions (e.g., shape complementarity, chemical complementarity,...)  Provides information about interface residues  Acquired information is useful for a wide range of applications  Design of mutants for experimental verification of the interactions  Development of drugs targeting macromolecular interactions  Understanding the mechanism of the molecular recognition  Computational prediction of interfaces and complex 3D structures ... Analysis of macromolecular complexes – interface analysis 76 Interface analysis  Most common approaches for the definition of interfaces:  Methods based on the distance between interacting residues  Methods based on the change in the solvent accessible surface area (ASA) upon complex formation  Computational geometry methods (using Voronoi diagrams)  All three approaches provide very similar results Analysis of macromolecular complexes – interface analysis 77 Interface analysis - databases  PDBsum (Pictorial database of 3D structures in the Protein Data Bank)  http://www.ebi.ac.uk/pdbsum/  Provides numerous structural analyses for all PDB structures and AlphaFold DB (human proteins), including information about protein-protein and protein-nucleic acid interfaces  Protein-protein interactions – schematic diagrams of all protein- protein interfaces and corresponding residue-residue interactions  Protein-nucleic acid interactions – schematic diagrams of protein- nucleic acid interactions generated by NUCPLOT Analysis of macromolecular complexes – interface analysis 78 Interface analysis - tools  Analyze interface of a given macromolecular complex  PISA (Protein Interfaces, Surfaces and Assemblies)  MolSurfer  Contact Map WebViewer  PIC (Protein Interaction Calculator)  … Analysis of macromolecular complexes – interface analysis 81 Interaction hotspots  Hot spots: the residues contributing the most to the binding free energy of the complex  Knowledge of hot spots has important implications to:  Understand the principles of protein interactions (an important step to understand recognition and binding processes)  Design of mutants for experimental verification of the interactions  Development of drugs targeting macromolecular interactions ... Analysis of macromolecular complexes – interaction hotspots 86 Interaction hotspots  Hot spots are usually conserved and appear to be clustered in tightly packed regions in the center of the interface  Experimental identification by alanine scanning mutagenesis  if a residue has a significant drop in binding affinity when mutated to alanine it is labeled as a hot spot  Experimental identification of hot spots is costly and cumbersome → the computational predictions of hot spots can help! Analysis of macromolecular complexes – interaction hotspots 87 Prediction of hotspots - tools  Most of the available methods are based on the 3D structure of the complex  Knowledge-based methods  Combination of several physicochemical features  Evolutionary conservation, ASA, residue propensity, structural location, hydrophobicity,...)  Energy-based methods  Calculation of the change in the binding free energy (∆∆Gbind) of the complex upon in silico modification of a given residue to alanine Analysis of macromolecular complexes – interaction hotspots 88 E n g i n e e r i n g o f p ro te i n st r u c t u re s Overview of mutations ❑ Types ▪ Point mutations – a single nucleotide is changed in DNA (or RNA) ▪ Substitutions ▪ Single nucleotide polymorphism (SNP – pronounced “snip”) ▪ Genetic variation; occurs in > 1 % of population ▪ About 10,000,000 in the human genome ▪ Insertions or deletions ▪ Codons have triple nature (3 nucleotides → 1 amino acid) ▪ Potential for frameshift (change in the grouping of codons, resulting in a different translation) ▪ Can be very deleterious ▪ Other types (duplications, translocations, inversions, etc.) Overview of mutations 5 Point mutations at protein level ❑ Types of point mutations ▪ Silent (synonymous SNP) – no effect on protein sequence ▪ Missense (non-synonymous SNP) – substitution of amino acid ▪ Nonsense – introduction of a stop codon -> protein truncation Overview of mutations 6 Databases of mutations ❑ Human Genome Variation Society ▪ http://www.hgvs.org ▪ Lists all the available databases of human mutations by types ❑ Central mutation databases (>20) ▪ Substitutions in all genes ▪ Variability in protein sequences ▪ Data mainly from literature ❑ Locus-specific databases (about 700) ▪ Substitutions in specific genes ▪ Typically manually annotated Databases of mutations 7 Central mutation databases ❑ UniProtKB/Swiss-Prot ▪ http://www.uniprot.org/UniProtKB/ ▪ High-quality manually annotated protein entries with partial lists of known sequence variants Databases of mutations 11 Missense mutations What are they?... How can they affect proteins? Missense mutations 13 Missense mutations ❑ Mutations affecting structure ▪ Stability & folding ▪ Aggregation ❑ Mutations affecting function ▪ Binding & catalysis ▪ Transport processes ▪ Protein dynamics ▪ Protein localization Missense mutations 14 Mutations affecting structure ❑ Major pathogenic consequences of missense mutation ▪ Compromised folding – the protein has modified folds or presents more unfolded states ▪ Decreased stability – the lifetime of the protein is decreased ▪ Increased aggregation Missense mutations - structure 15 Mutations affecting structure ❑ Molecular basis of mutations affecting folding & stability ▪ Introduced clashes – common for small to large mutations in buried residues ▪ Loss of interactions – most pronounced effects related to H-bonds, salt bridges and aromatic interactions Missense mutations - structure 16 Mutations affecting structure ❑ Molecular basis of mutations affecting folding & stability ▪ Altered conformation of protein backbone – mutations concerning residues with specific backbone angles (especially glycine and proline) NOTE: Glycine – the most flexible amino acid Proline – the most rigid ▪ Changes in charge/hydrophobicity ▪ Introducing hydrophilic/charged residue into the protein core ▪ Introducing hydrophobic residue onto the protein surface Missense mutations - structure 17 Mutations affecting structure ❑ Mutations can reduce solubility or increase aggregation ▪ Alterations on the surface residues may affects the solubility (ex: reduction of charge) ▪ Hydrophobic mutations can increase protein aggregation ▪ Aggregating proteins usually have high level of β-structures ❑ Aggregation modulated by short specific sequences ▪ Aggregation-prone regions (APRs) are sequences of 5-15 hydrophobic residues ▪ They tend to stack and form amyloid fibrils (cross-β spines) ▪ Some mutations can increase the propensity to form such amyloid structures Missense mutations - structure 18 Mutations affecting function ❑ Effect on binding and catalysis ▪ Binding sites are tuned to bind specific molecules and stabilize transition states ▪ Mutations can disrupt or improve the binding and catalysis ❑ Example – drug-resistance of HIV-1 protease mutants ▪ Loss of interactions with inhibitors Ile84Val Ile50Val 3x higher Ki 37x higher Ki Means: isoleucine in position 84 was mutated to valine Missense mutations - function 19 Mutations affecting function ❑ Effect on ligand transport ▪ Pathways are adjusted to permit transport of specific molecules ▪ Mutations can speed-up or disrupt the transport, or allow the transport of different molecules Leu177Trp => tunnel becomes almost closed release of products 500x slower Missense mutations - function 20 Mutations affecting function ❑ Effect on protein dynamics ▪ Dynamics enables proteins to adapt to their binding partners and interchanging between conformations ▪ Mutations can: ▪ Make regions more rigid (targeting hinge or very mobile regions, ex.: loops ) -> reduced adaptability ▪ Increase flexibility of rigid regions (targeting residues with many contacts in mobile elements) -> increased adaptability ▪ These change may affect activity, specificity or even recognition Missense mutations - function 21 Mutations affecting function ❑ Effect on protein localization ▪ After translation, the protein must be translocated to the appropriate cellular compartment ▪ Translocation can be regulated by short sequences (Signal Peptides) on the N-terminus, by Translocation Complexes, Chaperones, etc. ▪ Mutations can disrupt or alter the signal, or complex formation -> protein fails to be transported to the correct subcellular location ▪ Missing protein -> inactive reaction pathways or unregulated signaling cascades ▪ Mislocalized protein -> active in the wrong cellular compartment, causing harmful effects Missense mutations - function 22 Identification of mutable residues What are these? Prediction of mutational effects - mutable residues 24 Identification of mutable residues ❑ The effect of mutations on the protein can be predicted directly from the role of the modified residue ❑ Mutation of evolutionary conserved residues ▪ Residues important for protein function or stability tend to be highly conserved over evolution ▪ Mutation of highly conserved residues -> often lead to destabilization or loss of function ▪ Mutation of highly variable residues -> often neutral Prediction of mutational effects - mutable residues 25 Identification of mutable residues ❑ Mutations affecting stability & folding ▪ Mutation of residues with many contacts or with favorable interaction energy -> often destabilizing or compromise folding ▪ Mutation of residues in protein core -> often destabilizing ▪ Small residue to large -> steric clashes ▪ Large to small -> loss of contacts (creation of a void) ▪ Polar to non-polar -> loss of H-bond ▪ Neutral to charged -> introduction of isolated charge ▪ Mutation of residues on protein surface (often neutral) ▪ Polar to hydrophobic -> desolvation penalty (destabilizing) ▪ Mutation involving proline or glycine -> altered conformation Prediction of mutational effects - mutable residues 26 Identification of mutable residues ❑ Mutations affecting function ▪ Mutation of residues in binding or active sites -> modify binding or catalysis ▪ Mutation of residues in transport pathways -> modify transport ▪ Mutation of hinge or mobile residues, residues on loops with many contacts -> modify flexibility ▪ Mutation of residues directing protein localization -> mislocalization of proteins Prediction of mutational effects - mutable residues 27 Prediction of effects on structure ❑ Prediction of mutant structures – general workflow ▪ Mutated residue and its surroundings represented by rotamers from rotamer library (conformations derived form X-ray structures) ▪ The best set of rotamers selected by Monte Carlo approach ▪ Optionally – energy minimization, backbone flexibility ▪ Comparing structures of mutant and native protein -> assessment of the mutational effect (G = GMut - GNative) ❑ Available tools ▪ Geometric: PyMOL; WhatIF ▪ Energy-based: FOLDX, Rosetta-ddG ▪ Homology: Swiss Model, MODELLER, etc. Prediction of mutational effects - structure 33 Prediction of pathogenicity ❑ Prediction of impact of mutation on protein function ▪ Tools employ machine learning approaches ▪ Trained on functional experimental data ▪ Predictions can be based on sequence only ▪ Qualitative results – i.e. deleterious versus neutral ▪ Primarily intended for pathogenicity prediction (leading to disease) ❑ Available tools ▪ MutPred, SNAP, PhD-SNP, SIFT, MAPP … ▪ PredictSNP – meta server combining a pipeline of many tools Prediction of mutational effects - pathogenicity 39 Rational design of proteins ❑ Protein engineering: sometimes we can use mutagenesis to rationally design proteins according to our needs ❑ Properties that can be modified by mutagenesis Such as?... Rational design of proteins 43 Rational design of proteins ❑ Protein engineering: sometimes we can use mutagenesis to rationally design proteins according to our needs ❑ Properties that can be modified by mutagenesis ▪ Stability ▪ Function ▪ Binging site (catalytic activity or substrate specificity) ▪ Macromolecular interface ▪ Molecular tunnels/channels ▪ Solubility Rational design of proteins 44 Rational design: stability ❑ Prediction of stability change upon mutation ▪ Structure of mutant protein may not be produced ▪ Tools often employ ▪ Empirical scoring functions ▪ Evolutionary conservation analysis (ex: back-to-consensus) ▪ Machine learning approaches ❑ Available tools ▪ Energy-based: Rosetta-ddG, FOLDX  ▪ Evolution-based: FireProtASR ▪ Hybrid approaches: FireProt, PROSS Rational design of proteins - stability 45 Rational design: function ❑ RosettaDesign ▪ http://rosettadesign.med.unc.edu/ ▪ Monte Carlo sampling (random search) to predict minimum-energy structure of mutants ▪ Predicts free energy changes upon mutations (G) ▪ Helps design mutations to optimize the binding site and increase interactions with a ligand/substrate Rational design of proteins - function 52 Rational design: solubility ❑ Aggrescan3D; SoluProt (see lecture 7 - Analysis of protein structures) ❑ SolubiS ▪ https://solubis.switchlab.org/ ▪ To identify stabilizing mutations that reduce the aggregation tendency of a protein ▪ 1) Identifies exposed APRs ▪ 2) Introduces “gatekeeper” residues (P, R, K, D and E) into APRs ▪ 3) Assesses the stability changes of mutations (ΔΔG) Rational design of proteins - solubility 58

Macromolecular Complexes and Interactions PDF

Document Details

Tags

Related

Summary

Full Transcript