Tertiary Structure of Proteins PDF

Summary

This document provides information on the tertiary structure of proteins, focusing on the packing of secondary structural elements to achieve stability and complexity. It discusses the hydrophobic core and the interactions of proteins with water. This knowledge is valuable for understanding protein function and structure.

Full Transcript

Topic 3: Tertiary structure Tertiary structure Individual secondary structure elements are generally not stable enough to persist in solution, and are too small and simple for most functional roles...

Topic 3: Tertiary structure Tertiary structure Individual secondary structure elements are generally not stable enough to persist in solution, and are too small and simple for most functional roles Proteins rely upon packing secondary structural elements onto one another to achieve stability and complexity Combining multiple structural elements allows extended protein surfaces to be built These then can then evolve the flexibility, binding complementarity and overall Lysozyme (1HEW) complexity required for most protein functions “Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost totally lacking in the kind of regularities which one instinctively anticipates” John Kendrew in 1958, after determining the structure of myoglobin (the first protein structure) Typical organization of proteins In a typical structure, the majority of amino acids are found in either a-helices or b- strands These elements generally pack tightly together, with a hydrophobic core formed between them Most proteins have 2 – 5 “layers” of secondary structure elements (e.g. b-b; a- b-a; a-a; a-b-b-a) About a third of all residues are typically in turns or loops Organization of proteins surface exposure Helices are generally at least partially surface exposed in soluble proteins b-strands can be fully buried Loops and turns (connecting secondary structure elements) are almost always surface exposed This reflects the need for water to make hydrogen bonds to unpaired backbone groups There are however some proteins that form exceptions to any and all of these rules Lysozyme; 1HEW Hydrophobic residues dominate the Hydrophilic protein interior residues Non-polar residues are generally packed together in the core of the protein (orange spheres) This hydrophobic core maximizes the hydrophobic effect Polar residues are cover most (but not all) of the solvent exposed surface of protein (blue sticks) Here they interact with water, helping solubilize the protein Relatively few polar residues are Hydrophobic core buried; if they are, they almost always Lysozyme hydrogen bond with the backbone (1HEW) The solvent environment drives protein structure Proteins can neither fold nor function in a vacuum Their behaviour cannot be understood independent of the medium they are dissolved in Solvent properties of water are dominated by its polar character, its strong tendency to form hydrogen bonds and by its relatively ordered structure Water makes strong interactions with non-polar residues on the protein’s surface Water forms an integral part of a protein’s structure Waters are bound all over the surface of a protein molecule Some water molecules are also found deeply buried in the structure These water molecules are essentially always found in the given position in the molecule Surface associated H2O They are an integral part of the Deeply buried H2O structure Structural waters in lysozyme. These water molecules are consistently found in these positions Protein interiors are tightly packed Slice through lysozyme Holes in proteins can only exist where atoms are excluded from certain regions of space Empty spaces in packed protein cores are therefore entropically destabilizing Trp28 Tight packing also maximizes enthalpic interactions The interiors of natural proteins typically pack as tightly as geometrically achievable The interior of a protein is a 3D jigsaw puzzle assembled from a set of rounded pieces Where substantive holes exist, they are generally filled with water, or some other small molecule Secondary structure side-chain geometry biases the way they pack Secondary structural elements have a well-defined geometry that positions the sidechains in a regular pattern For example, b-sheets produce ridges of side chains with well-defined spacing When one such regular element is packed against another regular element, there tend to be certain “logical” ways to pack the side chains together – e.g. ridge into groove These preferred packing arrangements bias protein structures towards orienting their secondary structural elements in certain preferred ways b-sheets surfaces have ridges of side chains In both parallel and anti-parallel b- sheets, side chains line up to form parallel ridges Between these ridges are shallow grooves The details of ridge/groove size, shape and properties will depend on which amino acids are present, and how they are oriented b-sheets will pack on one another at a slight angle to align these grooves Helices will tend to pack at an angle that aligns reside ridges to these grooves a-helix side chains form ridges and grooves that guide their packing Seen from the side, a-helices form a series of ridges with grooves between These grooves run at 25° and 45° from the vertical axis of the helix Helices generally pack so that the “ridge” of one helix inserts into the “groove” of another B & T Fig. 3.11 E.g. helices prefer to pack at set angles because of their side-chain ridges Packing is optimized by packing helices at either: a 50˚ angle (25˚ + 25˚) or a 20˚ angle (45˚ - 25˚) Most a-helices pack against one another at one of these two angles B&T Fig. 3.12 flip e.g. Globins Globins are a diverse group of a- 20 helical proteins The structure is built by wrapping eight a-helices around a heme co- factor 50 20 The helices in globins generally cross at ~20° or 50° angles where they pack against one another Individual side chains adjust their positions to maximize their fit The sequence is also evolved to optimize fitting between elements 1a6m “Typical” average sized, monomeric proteins Proteins show massive diversity in their organization Most proteins however share the overall organization of their Src kinase myoglobin structure with many other proteins, including ones very distantly related These protein folds acquire specific names that form part of the basic vocabulary of structural biology carbonic anhydrase aminopeptidase Topology diagrams a-helix 2 B&T 4.21 b-strand 1 A topology diagram is a flat representation of the organization of a protein’s secondary structure b-strands are shown as arrows; adjacent arrows are part of the same sheet Helices are generally shown as rectangular boxes Loops are shown as lines connecting secondary structure units Only the arrangement of elements are significant - not lengths Topology diagrams show a protein’s overall architecture in 2D Multi-spanning TM protein topology For transmembrane helical proteins, topology diagrams are used to indicate the location of extracellular and intracellular loops, location of modifications, etc. Topology is used in a slightly different sense as the membrane imposes an organization context not relevant for soluble proteins Loop localization is often deduced without a 3D structure The number of folds is finite Protein structures are classified as different folds (essentially distinct organizations of secondary structure elements) The vast majority of new structures can be classified into known folds – no new folds in a decade! 1375 folds are known from experimental structures Current estimates is that there are # of CATH folds in PDB (blue – new folds) no more than 1700 folds that occur in nature This is a small fraction of the folds that are theoretically possible Classifying and describing 3D structures Protein structures can show varying degrees of resemblance that can reflect shared ancestry, or simply “make intuitive sense” CATH is a classification scheme that organizes protein structures within the framework of a simple hierarchy of different levels of similarity (http://www.cathdb.info) CATH classification is done partially by computer programs, and partly by a hand CATH classification is based on domains with the protein – a single protein can have multiple domains which fall into very different groups High level groupings don’t reflect evolutionary relationships, but do give us an intuitive way to organize protein structural diversity CATH divides proteins into four major classes Class 1: Class 2: Mainly a-helical Mainly b-stranded Class 4: Few secondary Class 3: a + b structural elements Each class is subdivided roll super roll into distinct architectures The architectures convey the overall ab horseshoe 2-layer sandwich logic of how the protein is put together Architectures are united by shared similarity in the relative positioning of groups of different secondary structure elements 3-layer (bab) 3-layer (bba) The number and organization of sandwich sandwich connections between different secondary structural elements can vary considerably within an architecture 40 distinct architectures are recognized 3-layer (bab) sandwich 4-layer sandwich Architectures are classified into folds (topologies) glutaredoxin A fold is comprised of defined set of secondary structural elements, in a specific architecture, connected in a certain order For example, in a given topology, a given b- mannitol sheet will have b-strands that have a defined specific EII direction (parallel or antiparallel to their neighbours) and occur in the same order Some secondary structural elements can be added or removed while still maintaining the aminopeptidase same core fold Different folds are necessarily unrelated, proteins of the same fold are often related, though maybe very distantly Extracellular Four examples drawn from the 180 folds endonuclease classified as 3-layer (bab) sandwich domains Topologies are subdivided into homology groups formyltransferse Proteins within a homology group are demonstrably related Generally, proteins within a group will have considerable similarity in their function Sequences will be similar NBD of ferredoxin NDR Note that the are all parallel b-sheets with similar organization, though the exact number of strands and helices differs AF062-like Three examples of homology groups drawn from the ~200 Rossmann fold domains (3-layer (bab) sandwich) Helical bundles Helical bundles are built from several helices packed in approximately co-linear fashion Residues on the inside form a tightly packed hydrophobic core (green) that stabilizes the bundle Note all helices cross at ~ 20° A four-helix bundle makes for a stable, well-behaved small protein and is a common structure Larger helical bundles are also found B&T 3.6 Different topological variations Different patterns of connections between the helices lead to different folds In some cases, additional loops connect the ends A uniting feature is that all helices are roughly (anti) parallel Multi-spanning membrane proteins are built as helical bundles Multi-spanning TM proteins have 2 or more TM helices which are generally organized into bundles Having two separate bundles with a space in Bacteriorhodpsin – 7 TM between is a common architecture for transporters The two bundles can pivot around a ligand bound in the middle of the membrane, allowing alternate access from either side of the membrane e.g. Lac permease, with connected 6 helix Lac permease – 12 TM bundles b-barrels b-barrels are extended anti- parallel b-sheets that curve around in a circle The first strand hydrogen bonds with the last, making a barrel These commonly have 8 strands, but can have as few as five or over 20 strands Fig. 5.1 b-sandwiches b-sandwiches resemble b- barrels that have been flattened out They have two smaller b-sheets back-to-back rather than a single barrel The distinction is not always clear cut, as the H-bond connections between strands can be present, but incomplete b-barrels/sandwiches vary in topology The difference lies in the connections between strands You can distinguish by looking at the number of connections between non- Up & Down Jelly Roll Greek Key adjacent strands B&T Fig. 5.2, 5.18, 5.14, 5.15 b-barrels commonly form membrane proteins b-barrel proteins have no unpaired hydrogen bonding main chain groups This is because the edges of the b-sheet connect back onto each other They therefore can occur as membrane proteins – these are found as such in bacterial outer membranes and mitochondria These proteins have a simple up-down topology that simplifies membrane insertion Porin structure The body of the barrel is Extracellular made by long b-strands arranged in an anti- parallel fashion Long extracellular Always an even number loop (orange) of strands as the strands Ca2+ inserted into the are inserted into the barrel helps membrane pairwise define pore size, shape and No crossing of strands, properties always a simple up down topology C Connecting loops are mostly very N short and often b- N and C-termini always turns Periplasm periplasmic and adjacent OMPs have a hydrophobic band similar to that seen in helical membrane proteins extracellular membrane spanning region ~ 25 Å periplasm basic residue The region of the protein exposed to the acidic residue inside of the membrane is dominated by aromatic residue hydrophobic residues, with a band of hydrophobic residue aromatic residues at the solution/membrane hydrophilic residue interface main chain atoms b-barrel extremes 5-stranded barrel (common) Note LptE is an outer membrane protein that introduces LPS into the outer membrane LptE (4n4r) Its barrel wraps around a second protein, 26 strands LptD b-propellers b-propellers are built up from a simple up-down four stranded motif These “blades” are packed around forming a propeller The number of blades can vary from three to eight Note that the the last blade is characteristically completed with a strand from the very N-terminus The active site is at one end of the axle, and is formed primarily by long connecting loops B&T Figs. 5.6-5.9 a/b proteins a/b proteins are most often built up from copies of a b-a motif The b-strands typically form an extended b-sheet, with a- helices packing onto the sheet Different variants differ primarily in the b-sheet topology, which in turn places the a-helices on different faces of the sheet (right-handed rule) a/b proteins are the largest class with the greatest topological diversity The ab-barrel b-barrels are also built up from repeats of a b-a motif The b-strands form a parallel b-sheet that wraps around to form a barrel The a-helices pack on the outside of the barrel By far the most common variant is the 8-stranded barrel Packing in the ab-barrel core The b-strands in the barrel are angled so that alternate strands have residues pointing inwards at the same height The hydrophobic core in the center of the barrel is formed by two layers of four residues each contributed by alternate strands A third layer of hydrophilic residues at the top forms the base of the substrate binding B&T Fig 4.3 pocket Glycolate oxidase, 1GOX The active site of ab-barrels Most ab-barrel proteins are enzymes The active site is always located at the C-terminal end of the b-strands The active site is built up of the top-most layer of b-strand residues, plus the loops connecting to the a-helices B&T Fig 4.8 The Rossmann Fold The Rossmann fold is an example of an a/b/a sandwich It is built of repeated a/b motifs The Rossmann fold incorporates a characteristic switch point in the middle of the parallel b-sheet Note how before the switch point the helices are all on one face of the sheet, while after the switch-point they pack on the opposite face Topological switches In a/b proteins, there is often a point form which the b-strands break from a linear progression (e.g. 321⇣45) Helices on either side of this break sit on opposite faces of the sheet (due to the right- handed rule) This in turn creates a crevice at the C-terminal end of the strands The active site is often located B&T Fig 4.13 in this crevice Repeat proteins Pseudomonas synringiae ice nucleating protein Perhaps the simplest way to build long proteins (e.g. for use as scaffolds or spacers) is to repeat a simple structural motif that packs predictably on additional copies of the same motif Each repeat in these proteins have conserved sequence elements that play conserved structural roles These motifs can readily be extended by sequence duplication Helix-helix packing in coiled-coils a-helices average 3.6 residues per turn Packing two helices side by side results in a pattern of interactions that almost repeats every seven residues Wrapping two helices around one another with a slight left-handed twist produces a repeat of exactly seven (2 x 3.5) residues This allows (variants of) a seven residue repeat sequence to be used over and over to create long fibers with a 140 Å super-helical repeat B&T Fig. 3.1 & 3.2 Coiled-coils show a heptad-repeat organization Coiled coil domains show a characteristic heptad repeat Residues are labeled “a” - “g” In the adjacent helices, a interacts with a’ and d with d’ in the center of the complex “d” is small and hydrophobic, generally Leucine “Knobs into holes” packing Where four residues in three turns of one helix form a “hole” that a mid sized hydrophobic residue or “knob” can fit nicely into Note that this is a variant of ridges- into-grooves packing, but extended by the helices wrapping around one another Francis Crick predicted that coiled- coils would be a stable structural motif in 1953 (3 years before the first protein x-ray structure!) B&T Fig. 3.5 Coiled-coils = Leucine zippers 90° In coiled coils the “knobs” are most commonly leucine residues Hence the “leucine zipper” name given to short coiled coils used to dimerize some transcription factors 1DZM Electrostatic stabilization in coiled coils The hydrophobic core of coiled- coils is relatively small, so stabilization by electrostatic interactions is unusually important Residues in the “e” and “g” positions are often charged and complementary The pattern of charged residues can help specify who should coil with who (homo and hetero- dimers) Coiled coils play diverse roles Coiled-coils are useful as a relatively small motif can drive protein interactions E.g. Short coiled-coils are used to dimerize transcription factors Coiled coil variants with 3 or 4 helices also occur They can also be extended indefinitely to make very long, rigid spacers Coiled coils are very common in cytoskeletal and fibrous molecules Coiled coils as spacers Coiled coils can space domains from one another, or from the membrane E.g. ROCK kinase has a 100 nm long coiled coil separating its kinase domain from a membrane binding domain Rock can therefore only modify Actin that is positioned at a specific distance from the membrane Long coiled-coils (up to 3000 a.a.) are used to connect vesicles to transport proteins or to tether them to target membranes Helical repeat proteins + +…= Ankyrin (1bk5) A common way to build long extended structures is to have 2-3 helix motifs that pack lengthwise on one another These motifs generally each comprise ~30 - 50 residues, and have a core of conserved hydrophobic residues in strategic positions that stabilize the structure Common examples include HEAT, Armadillo and Ankyrin repeat proteins e.g. HEAT repeats HEAT repeats are built from a simple helix-turn-helix motif of about 33 residues These are characterized by a pattern of conserved hydrophobic residues that pack between the helices Successive helices pack at ~20 degrees to form an extended rod that tends to curve inward towards the “B” helix HEAT repeats - clathrin Clathrin triskellion HEAT repeats are used to build the long arms of clathrin triskellions (with internal 3-fold symmetry) These are assembled around vesicles using variations of icosahedral symmetry Larger assemblies can be assembled by varying the symmetry Ankyrin repeats in mechanosensory channels Mechanosensory channels are ion channels that respond to physical touch The membrane embedded portion is similar to voltage gated K+ channels The N-terminus has 29 ankyrin repeats These form extended lever arms that spiral round one another These long lever arms transmit force to the channel, triggering gating P Jin et al. Nature 1–5 (2017) doi:10.1038/nature22981 a/b horseshoe fold These proteins are characterized by a repeating ~30 residue motif with seven conserved leucine residues This fold resembles an extended ab barrel except that it is open One side of the b-sheet is buried under helices, the other is exposed to solvent Because helices are bulkier than sheets, these proteins curve strongly axis Protein metrics (1) 3.5 Å 1.5 Å.8 Å 3 Ca-Ca distances in trans conformation are ~3.8 Å In a fully extended state, protein stretch 3.5 Å per residue (as the chain zig-zags) E.g. a 100 a.a. flexible protein linker can stretch at most 350 Å (or 35 nm) An a-helix stretches around 1.5 Å per residue (more compact) Protein metrics (2) myoglobin FabZ gas vesicle A small protein complex (100 a.a.) forms a roughly spherical body about 3 nm in diameter, a largish one (1000 a.a.) about 7 nm For oligomers, this includes all subunits Eight (23) times the number of amino acids doubles the volume The largest structured cellular objects (e.g. large viruses, vaults, gas vesicles) can reach ~1000 nm (1 µm) in length Protein Human protein, log scale metrics (3) Translated, functional polypeptides encoded in the human genome range from 6 a.a. to 30,000 a.a. long The average size of a human protein is 375 a.a. Few proteins are larger than 1,000 a.a. Bacteria have even fewer large proteins, and their average protein is 100 a.a. smaller Small proteins (a.k.a. peptides) Peptides are simply small proteins – typically defined as being less than 50 amino acids Peptides as short as six amino acids have been shown to be translated and be functional Peptides are often cut from longer proteins (zymogens) by proteolysis (e.g. most human peptidic hormones) Short peptides can act as hormones, toxins, regulators or binders (to proteins or nucleic acids) They generally act by binding and modifying the properties of other molecules (or membranes) Insulin (1B17) They are too small to take on more complex roles e.g. enzymes Peptides can lack a substantial hydrophobic core, and may resort to unusual stabilization strategies Peptide example #1 – fish ice binding protein (IBP) Many non-polikothermic organisms that endure subzero ice binding face is hydrophobic temperatures produce ice binding proteins (IBPs) These bind the surface of ice crystals, and prevent them from growing and rupturing cells IBPs come in a wide variety of 3D structures Shown is the 36 a.a. Antarctic fish IBP which forms a single a-helix with lots of Ala (a good helix stabilizer!) The structure is stabilized by H-bonds and capping (N&C) Chemically modified N-terminal NH2 and C-terminal CO2- prevents destabilization of the helix dipole The ice binding face forms a pattern of methyl groups that match the crystal lattice of ice Peptide example #2 - conotoxin Conotoxin is a 25 amino acid peptide from Conus magus that blocks prey’s neuronal calcium channels No secondary structure No hydrophobic core Only 3 hydrogen bonds between main chain atoms The structure is held together by three disulfide bonds, and little else Small secreted proteins (including many hormones) may rely heavily on disulfide bonds for stability 1DW4 b-sandwich domain inserted Larger proteins are generally built into a/b barrel domain from multiple domains A domain is defined as a part of a polypeptide chain that forms an independent compact unit, and is generally ~50 – 300 a.a. Domains can be discontinuous, or inserted into the middle of other domains Domains are often able to fold independently if they are expressed separately Domains often mediate specific functions (oligomerization, DNA binding, catalysis etc). Some protein functions – e.g. signal transduction – evolve my mixing and Pyruvate kinase has matching domains four domains (2vgb) B&T 4.5 Large proteins present challenges Long polypeptide chains are problematic in that they take a long time to translate and mature (slow response to changing conditions) They also have a higher risk of including a translation error or folding failure (requiring the whole molecule to be remade) Some proteins are unavoidably large, as they need strongly attached domains that are flexibly linked – e.g. big signaling scaffolds, titin In general, large, complicated protein machines are preferentially built by having multiple smaller proteins associate into a complex oligomer

Use Quizgecko on...
Browser
Browser