Protein Modeling Techniques PDF
Document Details
Uploaded by FervidShark
Tags
Related
Summary
This document provides an overview of protein modeling techniques, including homology modeling, database searches, and model validation. It covers various methods such as sequence alignments, template selection, and different prediction programs. The document also briefly touches upon protein dynamics and related databases.
Full Transcript
M o d e l s o f st r u c t u re s Importance of structure q no experimental structure for most of the sequences Number of entries [millions] Year 5-3D Modelling 4 Homology modeling q b...
M o d e l s o f st r u c t u re s Importance of structure q no experimental structure for most of the sequences Number of entries [millions] Year 5-3D Modelling 4 Homology modeling q basic principle – structure is more conserved than sequence § similar sequences adopt practically identical structures haloalkane dehalogenase haloalkane dehalogenase LinB (PDB-ID 1iz7) DhaA (PDB-ID 1cqw) sequence identity: ~ 50 % 5-3D Modelling Homology modeling q basic principle – structure is more conserved than sequence § distantly related sequences still fold into similar structures haloalkane dehalogenase chloroperoxidase L LinB (PDB-ID 1iz7) (PDB-ID 1a88) sequence identity: ~ 15 % 5-3D Modelling Homology modeling q number of folds in SCOP database Number of folds q Per year q Total Year 5-3D Modelling Homology modeling q basic principle – structure is more conserved than sequence § similar sequences adopt practically identical structures § distantly related sequences still fold into similar structures q builds an atomic-resolution model of the target protein based on the experimental 3D structure (template) of a homologous protein q the most accurate 3D prediction approach q if no reliable template is available → fold recognition or ab initio prediction 5-3D Modelling Homology modeling q the quality of the model depends on the sequence identity /similarity between the target and template proteins q For a standard length protein it should be > 25% / > 40% Safe homology modeling zone Twilight zone 5-3D Modelling Homology modelling – steps...MSLGAKPFGE......MSLGAKPFGE......MGV-AKTYGE... target selection of sequence database search sequence template alignment model model loop and side- building model validation optimization chain modeling framework 5-3D Modelling -> Homology modelling 27 Database search q standard sequence-similarity searches § comparison of the target sequence to all sequences with known 3D structures in the wwPDB database § BLAST, FASTA,... q profile-based searches § more sensitive than standard sequence-similarity searches § PSI-BLAST, HHMER, HHblits,... q fold recognition methods § applied if no template can reliably be identified by the sequence or profile based methods (sequence identity < recommended 25 %) § FUGUE, GenTHREADER, pro-sp3-TASSER.. 5-3D Modelling -> Homology modelling 13 Selection of template q wrong template = wrong model q more than one possible template may be identified → a combination of different criteria to select the final template: § sequence identity between the template and target protein § coverage between the template and query sequences § the resolution of the template structure, number of errors § a portion of conserved residues in the region of interest (e.g., binding site residues) §... q multiple templates can be used to create a combined model 5-3D Modelling -> Homology modelling 15 Sequence alignments q reliability of alignment decreases with decreasing similarity of the target and template sequences q quality of alignment is crucial – it determines the quality of the final model q the pairwise target-template alignment provided by the database search methods is almost guaranteed to contain errors → more sophisticated methods needed § multiple sequence alignment § Profile-driven alignments § correction of alignment based on the template structure 5-3D Modelling -> Homology modelling 17 Model validation q finished model contain errors (like any other structure) – the number of errors (for a given method) mainly depends on: q the percentage of sequence identity between template and target sequence, e.g., 90 %: the accuracy of the model comparable to X-ray structures; 50 %-90 %: larger local errors; identity < 25 %: often very large errors q the number of errors in the template structure q problems that occur far from the site of interest may be ignored, others should be tackled 5-3D Modelling -> Homology modelling 28 Homology modelling – steps...MSLGAKPFGE......MSLGAKPFGE......MGV-AKTYGE... target selection of sequence database search sequence template alignment iteration model model loop and side- building model validation optimization chain modeling framework 5-3D Modelling -> Homology modelling 30 Iteration q portions of the homology modeling process can be iterated to correct identified errors § small errors introduced during the optimization → running a shorter molecular dynamics simulation § error in a loop → choosing another loop conformation in the loop modeling step § large mistakes in the backbone conformation → repeating the whole process with another alignment or even different template §... 5-3D Modelling -> Homology modelling 31 Homology modeling programs q MODELLER § http://salilab.org/modeller/ § models built by satisfying the spatial restraints of the C α - C α bond lengths and angles, the dihedral angles of the side-chains, and van der Waals interactions § restraints calculated from the template structures § available as a web server at different sites, e.g., part of: ModWeb workflow https://modbase.compbio.ucsf.edu/modweb/, GeneSilico server https://genesilico.pl/toolkit/unimod?method=Modeller or Bioinformatics toolkit http://toolkit.lmb.uni-muenchen.de/modeller 5-3D Modelling -> Homology modelling (Programs) 32 Homology modeling programs q SWISS-MODEL § http://swissmodel.expasy.org/ § fully automated protein structure homology modeling server 5-3D Modelling -> Homology modelling (Programs) 33 Model validation q mostly the same principles as used for the validation of experimental structures q always check both model and template § The model cannot improve the template if this is “bad” in regions q checks of normality § inside/outside distributions of polar and apolar residues § bad contacts § evaluation of atom/residue environment q energy-based checks § side-chain clashes § bond lengths and angles 5-3D Modelling -> Homology modelling (Model validation) 34 Model validation programs q QMEAN § https://swissmodel.expasy.org/qmean/ § composite scoring function for the quality estimation of protein structure models; evaluates torsion angles, solvation and non-bonded interactions and the agreement between predicted and calculated secondary structure and solvent accessibility 5-3D Modelling -> Homology modelling (Model validation) 35 Model validation programs q Verify3D q ANOLEA q PROCHECK q WHATCHECK q PROSA II q … 5-3D Modelling -> Homology modelling (Model validation, Programs) 36 Fold recognition (Threading) q predicts the fold of a protein by fitting its sequence into a structural database and selecting the best fitting fold q provides a rough approximation of the overall topology of the native structure → does not generate fully refined atomic models for the query sequence q can be used when no suitable template structures available for homology modeling q fails if the correct protein fold does not exist in the database q high rates of false positives 5-3D Modelling -> Fold recognition (Threading) 37 Fold recognition (Threading) q pairwise energy-based methods (threading) – protein sequence is searched for in a structural database to find the best matching structural fold using energy-based criteria 1. alignment of the query sequence with each structural fold in the fold library (essentially performed at the sequence profile level) 5-3D Modelling -> Fold recognition (Threading) 40 Fold recognition (Threading) q pairwise energy-based methods (threading) – protein sequence is searched for in a structural database to find the best matching structural fold using energy-based criteria 1. alignment of the query sequence with each structural fold in the fold library (essentially performed at the sequence profile level) 2. building a crude model for the target sequence (replacing aligned residues in the template structure with the corresponding residues in the query) 4-Str. DBs & 3D Modelling -> 3D modelling -> Fold recognition (Threading) 42 Fold recognition (Threading) q pairwise energy-based methods (threading) – protein sequence is searched for in a structural database to find the best matching structural fold using energy-based criteria 1. alignment of the query sequence with each structural fold in the fold library (essentially performed at the sequence profile level) 2. building a crude model for the target sequence (replacing aligned residues in the template structure with the corresponding residues in the query) 3. calculating energy of the raw model 4-Str. DBs & 3D Modelling -> 3D modelling -> Fold recognition (Threading) 44 Fold recognition (Threading) q pairwise energy-based methods (threading) – protein sequence is searched for in a structural database to find the best matching structural fold using energy-based criteria l is distance in Energy (kcal/mol) sequence (density normalization Glu-Asp (l>10) required) Glu-Arg (l>10) can be calculated from collections of known structures Distance Cb-Cb 5-3D Modelling -> Fold recognition (Threading) 45 5-3D Modelling -> Fold recognition (Threading) Fold recognition (Threading) q pairwise energy-based methods (threading) – protein sequence is searched for in a structural database to find the best matching structural fold using energy-based criteria 1. alignment of the query sequence with each structural fold in the fold library (essentially performed at the sequence profile level) 2. building a crude model for the target sequence (replacing aligned residues in the template structure with the corresponding residues in the query) 3. calculating energy of the raw model 4. ranking of the models based on the energetics – the lowest energy fold represents the structurally most compatible fold 5-3D Modelling -> Fold recognition (Threading) 47 Ab initio prediction q attempts to generate a structure by using physicochemical principles only q used when neither homology modeling nor fold recognition can be applied q search for the structure in the global free-energy minimum q so far still limited success in getting correct structures 5-3D Modelling -> Ab initio 63 Ab initio prediction programs q Rosetta § http://www.rosettacommons.org/ § software suite for predicting and designing protein structures, protein folding mechanisms, and protein-protein interactions 5-3D Modelling -> Ab initio 64 “Hybrid” 3D structure prediction programs q I-TASSER § http://zhanglab.ccmb.med.umich.edu/I-TASSER/ § combines homology modeling, threading and ab initio predictions § No. 1 server for protein structure prediction in previous CASP experiments q Robetta § http://robetta.bakerlab.org/ § combines homology modeling and ab initio predictions § implements ROSETTA software 5-3D Modelling -> Hybrid 66 Assessment of prediction methods q CASP (Critical Assessment of techniques for protein Structure Prediction) § http://predictioncenter.org/ § biannual international contest providing objective evaluation of the performance of individual prediction methods § evaluation based on a large number of blind predictions - contestants are given protein sequences whose structures have been solved, but not yet published - results of the predictions are compared with the newly determined structure § competition in several categories 5-3D Modelling -> Assessment 71 Assessment of prediction methods q CAMEO (Continuous Automated Model EvaluatiOn) § https://www.cameo3d.org/ § weekly assessment of new structures in the PDB § registered prediction servers are sent weekly requests on not-so- easy new structures in the weekly PDB pre-release. § Multiple scores considered, normalized average (IDDT) reported § Categories: § 3D: Prediction of the 3D coordinates of a protein from sequence § QE: Model quality Estimation: Assessment of quality measures reported by participant servers 5-3D Modelling -> Assessment on real 3D structures. 72 Databases of protein models q ModBase § http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi § database of annotated protein models generated by the automated pipeline including the MODELLER program § contains ~38 millions models for ~6.5 millions unique sequences 5-3D Modelling -> Databases of predicted structures 74 P ro te i n fo l d i n g , sta b i l i t y a n d d y n a m i c s Levinthal’s paradox ❑ Cyrus Levinthal ▪ 1968 – impossibility of random folding ▪ random folding ▪ 100 residue protein (average sized) ▪ 3 conformation per residue (many more) ▪ 0.1 ps sampling time per conformation (much longer) ▪ folding time = 3100*10-13 s ≈ 5*1034 s ≈ ▪ 1 634 251 397 552 039 990 billions of years ❑ Experimental folding rates ▪ 1 ms to 10 min Protein folding – Levinthal’s paradox 10 Anfinsen’s thermodynamic hypothesis ❑ Christian Anfinsen ▪ 1973 – protein folding in vitro ▪ refolding of ribonuclease ❑ Findings ▪ native structure of a protein is the thermodynamically stable structure ▪ folding depends only on the amino acid sequence and on the conditions of solution, and not on the kinetic folding route Protein folding – Anfinsen’s thermodynamic hypothesis 11 Mechanisms of protein folding Protein folding – mechanisms 13 Mechanisms of protein folding ❑ Nucleation-growth (propagation) model ▪ continuous growth of tertiary structure from initial nucleus of local secondary structure ▪ it did not account for folding intermediates -> model dismissed Protein folding – mechanisms 14 Mechanisms of protein folding ❑ Framework model ▪ secondary structure folds first -> coalescence of secondary structural units to the native protein ❑ Hydrophobic collapse model ▪ compaction of the protein -> folding in a confined volume -> narrowing the conformational search to the native state ❑ Nucleation-condensation model ▪ concerted & cooperative secondary and tertiary structure formation ▪ transition state resembles distorted form of the native structure ▪ the least distorted part called folding nucleus or molten globule Protein folding – mechanisms 15 Energetics of protein folding ❑ Free energy of folding (ΔGfold = ΔH - T.ΔS) ▪ protein more structured -> ΔS↓ – unfavorable ▪ solvent less structured -> ΔS↑ – favorable ▪ hydrophobic interactions are driving “force” ▪ more non-covalent interactions -> ΔH↓ – favorable Protein folding – energetics 16 Basics of protein stability ❑ Tertiary structure of protein ▪ sum of non-covalent weak interactions vs conformational entropy ▪ folded protein = thermodynamic compromise ▪ folded protein marginally more stable than unfolded (10-80 kJ/mol) Protein stability – basics 21 Basics of protein stability ❑ Tertiary structure of protein ▪ sum of non-covalent weak interactions vs conformational entropy ▪ folded protein = thermodynamic compromise ▪ folded protein marginally more stable than unfolded (10-80 kJ/mol) ▪ Weak interactions are frequently disrupted ▪ denaturation - disrupted bonds replaced by bonds with solvent ▪ dynamics - disrupted bonds reformed between protein atoms Protein stability – basics 22 Introduction to protein dynamics ❑ Origin of dynamics – disruption of weak interactions by ▪ thermal kinetic energy (kb.T) ▪ binding interactions (ligands or other proteins) – induced fit ❑ Protein atoms fluctuates around their average positions ▪ in tightly packed interior – movement restricted ▪ near surface – movement promoted by solvent movements ▪ -> proteins considered as “semi-liquids” Protein dynamics – introduction 31 Time scales of protein motions Protein dynamics – characteristics of protein motions 34 Time scales of protein motions ❑ Time scales governed by local environment ▪ interior – motions coupled due to packing restraints ▪ surface – no coupling of motions ❑ Example: aromatic ring flipping ▪ can occur on ps time scale, but often observed on ms time scale ▪ aromatic residues -> hydrophobic -> inside protein -> tightly packed ▪ -> low probability of synchronized movement of surrounding atoms ▪ -> prolonged time scale Protein dynamics – characteristics of protein motions 35 NMR spectroscopy ❑ Ensemble of possible low energy conformations ❑ Directly shows possible amplitudes of motion ❑ Limited applicability to larger proteins ❑ Does not describe ▪ very fast motions & transition states ▪ time scales & energetics of motions Protein dynamics – approaches to study dynamics 37 High resolution X-ray crystallography ❑ Average low energy structure - more conformations: ▪ in one structure only if both are separated by barrier ▪ in multiple structures Protein dynamics – approaches to study dynamics 38 High resolution X-ray crystallography ❑ Average low energy structure - more conformations: ▪ in one structure only if both are separated by barrier ▪ in multiple structures ❑ Crystalline state ▪ non-native contacts ▪ artificially lower amplitudes of motions ❑ Range of fluctuations – B-factors ❑ Does not describe ▪ very flexible regions ▪ collectiveness of motions ▪ time scales & energetics of motions Protein dynamics – approaches to study dynamics 39 Normal mode analysis ❑ Principle ▪ motion of system as harmonic vibration around a local minimum ▪ Coarse-grained model, residues connected with springs ❑ Small number of low-frequency normal modes ▪ shows directionality, collectiveness and sequence of global motions ❑ Does not describe ▪ local movements ▪ amplitudes & time scales ▪ energetics of motions Protein dynamics – approaches to study dynamics 40 Molecular dynamics ❑ Principle ▪ physical description of interactions within the system (force field) ▪ Newton’s laws of motions ❑ Provides information on energetics, amplitudes, and time scales of local motions on the atomic level ❑ Does not describe ▪ slower large-scale motions (> ms) Protein dynamics – approaches to study dynamics 43 Databases of dynamics ❑ Molecular Dynamics Extended Library (MoDEL) ❑ Dynameomics ❑ Molecular Movements Database (MolMovDB) ❑ ProMode-Elastic Protein dynamics – databases 48