Lecture 1 - Protein Structure PDF
Document Details
Uploaded by ConvincingOak
Imperial College London
Mike Sternberg
Tags
Summary
This document is a lecture on protein structure. The lecture covers primary, secondary, tertiary, and quaternary protein structures. It discusses amino acid residues, protein folding, and important concepts in bioinformatics.
Full Transcript
Bioinformatics – Mike Sternberg Table of Contents Lecture 1 – Protein Structure................................................................................................................................................... 1 Lecture 2 – Protein Sequence Analysis............................
Bioinformatics – Mike Sternberg Table of Contents Lecture 1 – Protein Structure................................................................................................................................................... 1 Lecture 2 – Protein Sequence Analysis.................................................................................................................................. 1 Lecture 3 – Protein Structure Predic2on.................................................................................................................................. 25 Lecture 4 – Structural Bioinforma2cs...................................................................................................................................... 32 Lecture 1 – Protein Structure Hierarchy of Protein Structure: Primary -> Secondary -> Tertiary -> Quaternary Protein Backbone Chirality of Amino Acid Residue The Ca is chiral – look down to the H to Ca and spell CO-R-N clockwise for the L-form. Amino Acid Residues - It should be noted that Glycine is much more flexible than Alanine for example, as it has a H rather than a side chain. - Proline has less backbone flexibility compared to all other amino acids due to the covalent bond with amide nitrogen, thus, proline imports a rigidity to the chain and has less degrees of freedom for rotation. - Cystine forms the only covalent crosslink – the disulphide bridge, which is highly conserved. Protein Primary Structure Definition: a polymer of amino acid residues whose chemical formula termed the amino acid sequence. It consists of a fixed main-chain with variable side chains, and every amino acid residue has a hand (usually the L form) as there is generally 4 diSerent chemical groups attached to the carbon alpha atom causing chirality. Non-Bonded Interactions - Ionic Bond - Van der Waals Interactions - Hydrogen Bond Thermodynamics of Protein Folding ∆" = ∆% − '∆( Where: DG – Free energy of folding DH – Enthalpy (e.g. electrostatics and packing) T – Temperature DS – Entropy (systems favour disorder) Packing and Hydrophobic Interactions - All atoms prefer to pack as touching hard speres, this is known as van der Waals interactions. - Groups of CH atoms often have little charge and are termed hydrophobic/non-polar. - It is energetically favourable for hydrophobic groups to pack together to avoid contact with solvent, this hydrophobic eSect is the main eSect favouring the folded protein. Hydrophobic EGect Bulk water can adopt many conformations making hydrogen bonds, disorder is high here and so it is unfavourable for water to pack with non-polar residues as this prevents orientation of the water next to the non-polar residue. Adding non-polar residues freezes the number of degrees of freedom of the water and so is entropically unfavourable. Electrostatic Interactions +ve polar charge on H of main-chain NH group, and H on NH- groups of some sidechains. -ve polar charge on O of main-chain CO group, and O on CO groups of some sidechains. Favourable +…- interaction between partial charges is called a hydrogen bond, and between fully charged side chains are called salt bridges. Energetics of Electrostatic EGects The formation of electrostatic interactions is at best only marginally favourable in the folded protein. Unfolded chain, with two hydrogen bonds between protein and water -> Folded chain, with 1 hydrogen bond intra-protein and 1 hydrogen bond between water. This has slightly more net stability, even though it is the same number of hydrogen bonds. One almost never finds un-paired main chain NH and CO groups or charged side chain atoms. This constrains the possible shapes that the chain can fold in and makes evolutionary change of a non-polar residue to charged residue unlikely. Proteins fold to avoid this and make salt bridges to avoid an uncompensated buried charge. Entropic EGect It is energetically unfavourable to restrict conformation of an unfolded chain. The unfolded chain can tumble between many conformations and has many degrees of freedom, while the folded chai is very restricted to local fluctuations and thus, has less degrees of freedom (less energetically favourable). Protein Secondary Structure Definition: Local conformation of the main chain (doesn’t have to be repetitive). There are two common types of repetitive structures: alpha helix and beta-strand sheet, and also a characteristic non-repetitive local structure: beta-turn. Alpha Helix The alpha helix is the most common structural arrangement in the secondary structure of proteins. It is also the most extreme type of local structure, and it is the local structure that is most easily predicted from a sequence of amino acids. The alpha helix has a right hand-helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid that is four residues earlier in the protein sequence. B-Sheets Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet. Because peptide chains have a directionality conferred by their N-terminus and C-terminus, β-strands too can be said to be directional. Adjacent β-strands can form hydrogen bonds in antiparallel, parallel, or mixed arrangements. In an antiparallel arrangement, the successive β-strands alternate directions so that the N-terminus of one strand is adjacent to the C-terminus of the next. This is the arrangement that produces the strongest inter-strand stability because it allows the inter-strand hydrogen bonds between carbonyls and amines to be planar, which is their preferred orientation. In a parallel arrangement, all of the N-termini of successive strands are oriented in the same direction; this orientation may be slightly less stable because it introduces nonplanarity in the inter-strand hydrogen bonding pattern. B-Turns These cause a change in direction of the polypeptide chain. There are four categories: Type I-IV. Type I is most common, because it most resembles an alpha helix. Type II beta turns, on the other hand, often occur in association with beta-sheet as part of beta-links. Energetics Alpha helices and beta sheets are not primarily stabilised by hydrogen bonds as electrostatic eSects are not the driving force for protein folding. These periodic (regular repeating) structures are the best way to bury hydrophobic residues without burying the uncompensated partial charges of the main-chain NH and CO groups. Dihedral Angles A Ramachandran plot is a way to visualize energetically allowed regions for backbone dihedral angles ψ against φ of amino acid residues in protein structure. In a protein chain three dihedral angles are defined: ω (omega) is the angle in the chain Cα − C' − N − Cα, φ (phi) is the angle in the chain C' − N − Cα − C' ψ (psi) is the angle in the chain N − Cα − C' – N The side chain dihedral angles are designated with χn (chi-n). They tend to cluster near 180°, 60°, and −60°, which are called the trans, gauche−, and gauche+ conformations. The stability of certain sidechain dihedral angles is aSected by the values φ and ψ. These are called side chain rotamers, the allowed positions of the side chains. Solvent Accessible Area Solvent Accessible surface area (SASA) is the surface area of a biomolecule that is accessible to a solvent. Measurement of ASA is usually described in units of square angstroms. ASA is typically calculated using the 'rolling ball' algorithm developed by Shrake & Rupley in 1973. This algorithm uses a sphere (of solvent) of a particular radius to 'probe' the surface of the molecule. Relative solvent accessible area = Aobs/Ai Aobvs = total calculated solvent accessible area for a residue Ai = solvent area for residue type i If the atom is buried, then the area = 0. If the atom is highly exposed, then the area = 1. Tertiary Structure Definition: The three-dimensional structure of a single chain. The structure is revealed at near atomic resolution by X-ray crystallography, NMR and electron microscopy. The core contains hydrophobic residues, and charged atoms are nearly always stabilised by electrostatic interactions. The surface contains charged atoms interacting with water, and a substantial number of non-polar atoms despite the fact this is not favourable. This is because proteins have to interact with other proteins and so need sticky patches. NOTE: Proteins have marginal stability (10kcal/mole) (Proteins can’t be too stable as we need to recycle them – degrade them). Quaternary Structure Definition: Arrangements of diSerent chains, generally symmetric. There are three categories of proteins: - Transmembrane - Globular (generally water soluble) - Enzymes and antibodies - Fibrous - Elongated and generally not soluble (e.g. silk, muscle and collogen). Fold/Topology Definition: The sequential arrangement of chain sections, particularly alpha helices and beta strands. NOTE: Larger proteins fold into domains. Protein Domains Often a protein sequence is formed from parts known as domains, where each domain is a diSerent homologous family. Domains are generally a distinct structural unit, and a distinct evolutionary unit. Domains can be classified into diSerent fold classes: - α/α: mainly packing of alpha helices - β/β: mainly one or more beta sheets - α/β: roughly alternate alpha and beta with beta-sheet tending to be parallel - α+β: mixed alpha and beta - coil: mainly small proteins (