Primary Structure of Proteins PDF
Document Details
Uploaded by SensationalUvite
Garrett & Grisham
Tags
Summary
This document provides an overview of the primary structure of proteins. It details concepts such as peptide bond formation, the structural characteristics of the polypeptide backbone, and various methods for determining protein sequences.
Full Transcript
Helices, which sometimes appear as decorative or utilitarian motifs in manmade structures, are a common structural theme in biological macromolecules— proteins, nucleic acids, and even polysaccharides. Garret & Grisham, 2024 I. PRIMARY STRU...
Helices, which sometimes appear as decorative or utilitarian motifs in manmade structures, are a common structural theme in biological macromolecules— proteins, nucleic acids, and even polysaccharides. Garret & Grisham, 2024 I. PRIMARY STRUCTURE OF PROTEINS A. Fundamental Structural Pattern proteins unbranched polymers of amino acids (AAs) linked head to tail, from carboxyl group to amino group, through the formation of covalent peptide bonds (amide linkage) Peptide formation is the creation of an amide bond between the carboxyl group of one AA and the amino group of another AA peptide bond formation results in the release of H2O the peptide “backbone” of a protein consists of the repeated sequence -N-Cα-Co N represents the amide nitrogen Cα is the α-carbon atom of an AA in the polymer chain Co is the carbonyl carbon of the AA the carbonyl oxygen and the amide hydrogen are trans to each other this conformation is favored energetically because it results in less steric hindrance between nonbonded atoms in neighboring AAs the polypeptide chain is inherently asymmetric since the α-carbon atom of the AA is a chiral center (except Gly) The Peptide Bond has Partial Double- Bond Character rotation may occur about any covalent bond in the polypeptide backbone because all 3 kinds of bonds (N-C , C-Co, Co-N peptide bond) are single bonds the Co and N atoms of the peptide grouping are both in planar sp2 hybridization the Co and O atoms are linked by a bond, leaving the N with a lone pair of e- in a 2p orbital another resonance form for the peptide bond: the Co and N atoms participate in a bond, leaving a lone e- pair on the oxygen prevents free rotation about the Co-N peptide bond because it becomes a double bond the real nature of the peptide bond: partial double-bond character peptide bond has a 40% double- bond character The Polypeptide Backbone is Relatively Polar peptide bond resonance causes the peptide backbone to be relatively polar the amide N is in a protonated or (+)ly charged form the carbonyl oxygen is a (-) ly charged atom in actuality, the hybrid state of the partially double-bonded peptide arrangement gives a net positive charge of 0.28 on the amide N and an equivalent net negative charge of 0.28 on the carbonyl O the presence of these partial charges means that the peptide bond has a permanent dipole the peptide backbone is relatively unreactive chemically, and protons are gained or lost by the peptide groups only at extreme pH conditions Peptides can be classified according to how many AAs they contain Peptide short polymers of AAs AA residue each AA unit in the chain dipeptides tripeptides tetrapeptides oligopeptides >12 1 polypeptide chain, the chains are separated and purified 2. Intrachain S-S cross-bridges between Cys residues in the polypeptide chain are cleaved (if disulfides are interchain linkages, step 2 precedes step 1) 3. N-terminal and C-terminal residues are identified 4. each polypeptide chain is cleaved into smaller fragments, and the AA composition and sequence of each fragment are determined 5. Step 4 is repeated, using a different cleavage procedure to generate a different and overlapping set of peptide fragments 6. the overall AA sequence of the protein is reconstructed from the sequences in overlapping fragments Step 1. Separation of Polypeptide Chains Heteromultimer protein dissociated into its component polypeptide chains separated from one another sequenced individually Dissociation into component polypeptide chains pH extremes 8 M urea 6 M guanidinium hydrochloride high salt concentrations disrupt polar interactions such as H bonds both within the protein molecule and between the protein and the aqueous solvent once dissociated, the individual polypeptides can be isolated from one another on the basis of differences in size and/or charge Step 2. Cleavage of Disulfide Bridges occasionally, heteromultimers are linked together by interchain S-S bridges these crosslinks must be cleaved before dissociation and isolation of the individual chains carry out these cleavages so that the original or even new S-S links do not form oxidation of a disulfide by performic acid → 2 equivalents of cysteic acid cysteic acid side chains are ionized SO3- groups, electrostatic repulsion prevents S-S recombination sulfhydryl compounds [2- mercaptoethanol or dithiothreitol (DTT)] readily reduce S-S bridges to regenerate 2 Cys-SH side chains these SH groups recombine to re- form either the original disulfide link or new disulfide links S-S reduction must be followed by treatment with alkylating agents (iodoacetate or 3-bromopropylamine) which modify the SH groups and block disulfide bridge formation Step 3. A. N-terminal Analysis AA residing at the N-terminal end of a protein Edman degradation sequential ID of a series of residues beginning at the N-terminus in weakly basic solutions, phenylisothiocyanate, or Edman reagent (phenyl-N=C=S), combines with the free amino terminus of a protein which can be excised from the end of the polypeptide chain and recovered as a PTH derivative chromatographic methods can identify this PTH derivative the rest of the polypeptide chain remains intact and can be subjected to further rounds of Edman degradation to identify successive AA residues in the chain the carboxyl terminus of the polypeptide under analysis is coupled to an insoluble matrix, allowing the polypeptide to be easily recovered by filtration or centrifugation following each round of Edman rxn Edman rxn thru successive cycles can reveal further info about sequence automated instruments (Edman sequenators) carry out repeated rounds of the Edman procedure 6 ng of polypeptide reveal yield 15 residues of sequence info Step 3. B. C-terminal Analysis enzymatic approach carboxypeptidases cleave AA residues from the C- termini of polypeptides in a successive fashion Carboxypeptidase A (bovine pancreas) C-terminal peptide bond of all residues except P, D, E, R, K Carboxypeptidase B (hog pancreas) when R or K are the C-terminal residues Carboxypeptidase Y (yeast) any C-terminal residue Steps 4 & 5. Fragmentation of the Polypeptide Chain produce fragments useful for sequence analysis enzymatic cleavage specific or nonspecific chemical means (partial acid hydrolysis) proteolytic enzymes offer an advantage: many hydrolyze only specific peptide bonds this specificity immediately gives info about the peptide products fragments produced upon cleavage should be small enough to yield their sequences through end-group analysis and Edman degradation not so small that an overabundance of products must be resolved before analysis A. Trypsin the most commonly used reagent for specific proteolysis cleaves on the C-side of R or K B. Chymotrypsin peptide bonds formed by the carboxyl groups of the aromatic AA (F Y, W) and to a lesser extent L C. Other Endopeptidases endopeptidases peptide bonds w/in the interior of a polypeptide chain clostripain R residues endopeptidase Lys-C K residues staphylococcal protease D and E residues D. Chemical Methods I. CNBr: M residues 1. nucleophilic attack of the Met S atom on the -CN carbon atom, w/ displacement of Br 2. Nucleophilic attack by the M carbonyl oxygen atom on the R group. The cyclic derivative is unstable in aqueous solution 3. Hydrolysis cleaves the M peptide bond. C-terminal homoserine residues occur where M residues once were CNBr acts upon Met residues the nucleophilic S atom of Met reacts with CNBr, yielding a sulfonium ion that undergoes a rapid intramolecular rearrangement to form a cyclic iminolactone water readily hydrolyzes this iminolactone, cleaving the polypeptide and generating peptide fragments with C- terminal homoserine lactone residues at the former Met positions II. N-G bonds: hydroxylamine (NH2OH) at pH 9 III. selective hydrolysis at D–P bonds under mildly acidic conditions cleavage products generated by these procedures must be isolated and individually sequenced to accumulate the information necessary to reconstruct the protein’s complete AA sequence Garret & Grisham, 2024 Step 6. Reconstruction of the Overall AA Sequence sequences obtained for the sets of fragments derived from 2 or more cleavage procedures are compared find overlaps that establish continuity of the overall AA sequence of the polypeptide chain peptides generated from specific fragmentation of the polypeptide can be aligned to reveal the overall AA sequence such comparisons are useful in eliminating errors and validating the accuracy of the sequences determined for the individual fragments Summary of the sequence analysis of catrocollastatin-C, a 23.6 kDa protein found in the venom of the western diamondback rattlesnake Crotalus atrox. the overall AA sequence (216 amino acid residues long) for catrocollastatin-C as deduced from the overlapping sequences of peptide fragments is shown on the lines headed CAT-C. the other lines report the various sequences used to obtain the overlaps (Shimokawa, K., et al., 1997. Sequence and biological activity of catrocollastatin-C: A disintegrin-like/ cysteine-rich two-domain protein from Crotalus atrox venom. Archives of Biochemistry and Biophysics 343:35–43.) N-term: Edman degradation of the intact protein in an automated Edman sequenator M: proteolytic fragments generated by CNBr cleavage, followed by Edman sequencing of the individual fragments (numbers denote fragments M1 through M5) K: proteolytic fragments from endopeptidase Lys-C cleavage, followed by Edman sequencing (only fragments K3 through K6 are shown) E: proteolytic fragments from Staphylococcus protease digestion of catrocollastatin sequenced in the Edman sequenator (only E13 through E15 are shown) The AA Sequence of a Protein can be determined by Mass Spectrometry Mass spectrometry technology has evolved rapidly over the past decades and now dominates sequence determination as well as the new field of global protein --- proteomics mass spectrometers exploit the difference in the mass-to- charge (m/z) ratio of ionized atoms or molecules to separate them from each other m/z ratio of a molecule can be used to acquire chemical and structural info molecules can be fragmented in distinctive ways in mass spectrometers the fragments that arise provide quite specific structural info about the molecule Basic operation of a mass spectrometer 1. evaporate and ionize molecules in a vacuum, creating gas-phase ions 2. separate the ions in space and/or time based on their m/z ratios 3. measure the amount of ions with specific m/z ratios proteins, NAs and carbohydrate decompose upon heating 2 most prominent MS modes for protein analysis Electrospray Ionization (ESI-MS) Matrix-Assisted Laser Desorption Ionization-Time of Flight (MALDI-TOF MS) Electrospray Ionization (ESI-MS) 3 principal steps in ESI-MS 1. small, highly charged droplets are formed by electrostatic dispersion of a protein solution through a glass capillary subjected to a high electric field Garret & Grisham, 2024 Electrospray Ionization (ESI-MS) 2. protein ions are desorbed from the droplets into the gas phase (assisted by evaporation of the droplets in a stream of hot N2 gas) 3. the protein ions are separated in a mass spectrometer and identified according to their m/z ratios Garret & Grisham, 2024 20-kD protein molecule will pick up 10 to 30 (+) charges MS spectrum of this protein reveals all of the differently charged species as a series of sharp peaks whose consecutive m/z values differ by the charge and mass of a single proton decreasing m/z values: increasing number of charges per molecule, z Electrospray ionization mass spectrum of the protein aerolysin K Garret & Grisham, 2024 Matrix-Assisted Laser Desorption Ionization-Time of Flight (MALDI-TOF MS) protein is mixed with a chemical matrix that includes a light-absorbing substance excitable by a laser a laser pulse excites the chemical matrix, creating a microplasma that transfers the energy to protein molecules in the sample, ionizing them and ejecting them into the gas phase among the products are protein molecules that have picked up a single proton these (+)ly charged species can be selected by the MS for mass analysis MALDI-TOF MS very sensitive and very accurate attomole (10-18 moles) quantities of a particular molecule can be detected at accuracies better than 0.001 AMU (0.001 daltons) Sequence Databases contain the AA Sequences of Millions of Different Proteins 1st protein sequence databases were compiled by protein chemists using chemical sequencing methods today, protein sequence info derived from translating the nucleotide sequences of genes into codons and, thus, AA sequences sequencing the order of nucleotides in cloned genes is a more rapid, efficient, and informative process than determining the AA sequences of proteins by chemical methods Electronic databases containing continuously updated sequence info SWISS-PROT protein sequence database on the ExPASy (Expert Protein Analysis System) Molecular Biology server and the PIR (Protein Identification Resource Protein Sequence Database) at http://pir.georgetown.edu Mass spectrometric proteomics data PRIDE (PRoteomics IDentification database) at http://www.ebi.ac.uk/pride/ PeptideAtlas at http://peptideatlas.org Global Proteome Machine at http:// gpmdb.thegpm.org Protein information from genomic sequences GenBank, accessible via the National Center for Biotechnology Information (NCBI) website located at http://www.ncbi.nlm.nih.gov Protein structural information Worldwide Protein Data Bank (wwPDB), an ensemble of repositories composed of the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB at http://www.rcsb.org/pdb) Protein Data Bank Europe (PDBe) Protein Data Bank Japan (PDBj) Protein structural information BioMagResBank (BMRB), which specifically houses NMR-derived structural information I. PRIMARY STRUCTURE OF PROTEINS D. The Nature of Amino Acid Sequences proteins have unique AA sequences uniqueness of sequence ultimately gives each protein its own particular personality the number of possible AA sequences in a protein is astronomically large the probability that 2 proteins will have similar AA sequences is negligible sequence similarities between proteins imply evolutionary relatedness AA composition: frequencies of the various AAs in proteins for all the proteins in the SWISS-PROT protein knowledge base. These data are derived from the AA composition of more than 600,000 different proteins. The range is from leucine at 9.66% to tryptophan at 1.09% of all residues Garret & Grisham, 2024 Homologous Proteins from Different Organisms Have Homologous AA Sequences homologous proteins share a significant degree of sequence similarity and structural resemblance perform the same function in different organisms e.g. oxygen transport protein Hb serves a similar role and has a similar structure in all vertebrates homologous proteins have polypeptide chains nearly identical in length the degree to which their sequences share identity provides a direct measure of the evolutionary relationship between the species from which they are derived Homologous proteins can be further subdivided 1. orthologous proteins proteins from different species that have homologous AA sequences (similar function) arose from a common ancestral gene during evolution 2. paralogous proteins proteins found within a single species that have homologous AA sequences arose through gene duplication e.g. - and -globin chains of Hb Computer Programs can Align Sequences and discover Homology between Proteins if 2 proteins share homology, it can be revealed through alignment of their sequences using powerful computer programs a given AA sequence is used to query the databases for proteins with similar sequences BLAST (Basic Local Alignment Search Tool) compares nucleotide or protein sequences to sequence databases commonly used program for rapid searching of sequence databases detects local as well as global alignments where sequences are in close agreement detects regions of similarity shared between otherwise unrelated proteins sequence similarities between proteins an important clue to the function of uncharacterized proteins useful in assigning related proteins to protein families calculates the statistical significance of matches detects local as well as global alignments where sequences are in close agreement One way to measure similarity is to use a matrix that assigns scores for all possible substitutions of one AA for another BLOSUM62 substitution matrix most often used with BLAST Blocks Substitution Matrix BLOSUM62 assigns a probability score for each position in an alignment based on the frequency with which that substitution occurs in the consensus sequences of related proteins scores each position on the basis of observed frequencies of different AA substitutions within blocks of local alignments in related proteins BLOSUM62 substitution matrix provides scores for all possible exchanges of one AA with another Henikoff, S., and Henikoff, J. G., 1992. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences, USA 89:10915–10919 the scores are derived using sequences sharing no more than 62% identity substitution scores range from 24 (lowest probability of substitution) to 11 (highest probability of substitution) To look up the value corresponding to the substitution of an Asn by a Trp, or vice versa, find the intersection of the “N” column with the “W” row the value -4 means that the substitution of N for W, or vice versa, is not very likely the substitution of V for I (score: 3), or vice versa, is very likely AAs whose side chains have unique qualities (such as C, H, P, or W) have high BLOSUM62 scores because replacing them with any other AA may change the protein significantly AAs that are similar (such as R and K; or D and E; or A, V, L, and I) have low scores, since one can replace the other with less likelihood of serious change to the protein structure Cytochrome c: an example of Orthology Cytochrome c e- transport protein found in the mitochondria of all eukaryotic organisms 28 positions in the polypeptide chain where the same AA residues are always found (invariant residues) these invariant residues serve roles crucial to the biological function of this protein substitutions of other AAs at these positions cannot be tolerated the number of AA differences between 2 cyt c sequences is proportional to the phylogenetic difference between the species from which they are derived cyt c in humans and in chimpanzees is identical human and sheep cyt c differ at 10 residues human cyt c sequence has 14 variant residues from a reptile sequence (rattlesnake) , 18 from a fish (carp), 29 from a mollusc (snail), 31 from an insect (moth), and more than 40 from yeast or higher plants (cauliflower) Phylogenetic tree for cyt c phylogenetic tree a diagram illustrating the evolutionary relationships among a group of organisms the tips of the branches are occupied by contemporary species whose sequences have been determined phylogenetic tree has been deduced by computer analysis of these sequences to find the minimum number of mutational changes connecting the branches the numbers along the branches give the AA changes between a species and a hypothetical progenitor extant species are located only at the tips of branches Creighton, T. E., 1983. Proteins: Structure and Molecular Properties. San Francisco: W. H. Freeman Phylogenetic tree Related Proteins Share a Common Evolutionary Origin proteins with related functions often show a high degree of sequence similarity suggest a common ancestry for these proteins Oxygen-Binding Heme Proteins Mb oxygen-binding heme protein of muscle a single polypeptide chain of 153 AA residues Hb oxygen transport protein of erythrocytes a tetramer composed of 2 -chains (141 residues each) and 2 -chains (146 residues each) myoglobin, -globin, and -globin globin paralogs share a strong degree of sequence and structural homology the AA sequences of α- and β-globin chains share 64 residues of their approximately 140 residues in common Mb and the α-globin chain have 38 AA sequence identities the ability to bind O2 via a heme prosthetic group is retained by all three of these polypeptides This evolutionary tree is inferred from the homology between the AA sequences of the α-globin, β- globin, and Mb chains duplication of an ancestral globin gene allowed the divergence of the Mb and ancestral Hb genes another gene duplication event subsequently gave rise to ancestral α and β forms gene duplication is an important evolutionary force in creating diversity Garret & Grisham, 2024 Different Proteins May Share a Common Ancestry hen egg white lysozyme and human milk -lactalbumin different biological activity identical at 48 positions although both act in reactions involving carbohydrates, their functions show little similarity 3 structures are strikingly similar lysozyme 129 residues hydrolyzes the polysaccharide wall of bacterial cell -lactalbumin 123 residues regulates lactose synthesis in the mammary gland A Mutant Protein is a Protein with a slightly different AA Sequence sequence variants can be found for a protein variants are a consequence of mutations in a gene that have arisen naturally within the population gene mutations lead to mutant forms of the protein in which the AA sequence is altered at one or more positions “neutral” mutant forms: functional properties of the protein are unaffected by the AA substitution nonfunctional (if loss of function is not lethal to the individual) a range of aberrations between these two extremes the severity of the effects on function depends on the nature of the AA substitution and its role in the protein a variety of effects on the Hb molecule are seen in these mutants Alterations in oxygen affinity heme affinity stability solubility subunit interactions between the - globin and -globin polypeptide chains some variants show no apparent changes HbS, sickle-cell hemoglobin, result in serious illness indicates that some AA changes are relatively unimportant, whereas others drastically alter one or more functions of a protein What is the Proteome and what does it tell us? the full genetic potential of a cell (what it is capable of doing) is contained within its genome but not all genes are expressed at any moment in time those genes that are being expressed are defined by the transcriptome what the cell is doing is not directly determined by the transcriptome a more accurate reflection of what a cell is doing at any moment in time is found in the proteome the proteome is much more complex than the genome there are only 20,000 or so protein- coding genes in the human genome estimates suggest that there are hundreds of thousands of different proteins, perhaps even a million or more this discrepancy exists because one gene may give rise to a large number of protein products through a range of processes post-translational modification alternative RNA splicing RNA editing The Proteome Is Large and Dynamic proteins are synthesized, processed, delivered to appropriate subcellular compartments, assembled into complexes, and degraded different cells have different sets of proteins the same cell has different proteins at different times the lifetimes of different proteins vary from minutes to years defining the proteome requires techniques that can unambiguously identify each protein and determine how much of it is present bacterial cells contain about 2 million protein molecules per 10-15 L ( the volume of an E. coli cell) cells from bacteria to yeast to mammals seem to be rather uniform in containing 2 to 4 million protein molecules per 10-15 L a liver cell is about 8000 times larger than an E. coli cell, so it would contain as many as 30 billion protein molecules these numbers reflect total number of proteins/cell in actuality, in human cells, the number of molecules of each protein can vary from as low as 10 or so copies per cell to as much as 100 billion thus, the concentrations of different proteins in a human cell can vary over ten orders of magnitude