3 Amino Acids, Peptides, and Proteins PDF
Document Details
Tags
Summary
This document details the structure, function, and properties of amino acids, peptides, and proteins. It explores the common structural features of the 20 amino acids and explains how they join to form peptides and proteins. The document also covers various aspects of protein structure and function in the context of cellular processes, including enzyme functions and their importance for biochemistry.
Full Transcript
CHAPTER 3 AMINO ACIDS, PEPTIDES, AND PROTEINS 3.1 Amino Acids 3.2 Peptides and Proteins 3.3 Working with Proteins 3.4 The Structure of Proteins: Primary Structure Proteins mediate virtually every process that takes place in a cell, exhibiting an almost endless diversity of functions. To e...
CHAPTER 3 AMINO ACIDS, PEPTIDES, AND PROTEINS 3.1 Amino Acids 3.2 Peptides and Proteins 3.3 Working with Proteins 3.4 The Structure of Proteins: Primary Structure Proteins mediate virtually every process that takes place in a cell, exhibiting an almost endless diversity of functions. To explore the molecular mechanism of a biological process, a biochemist almost inevitably studies one or more proteins. Proteins are the most abundant biological macromolecules, occurring in all cells and all parts of cells. Proteins also occur in great variety; thousands of different kinds may be found in a single cell. Proteins are the molecular instruments through which genetic information is expressed — the important final products of the information pathways discussed in Part III of this book. Cells produce proteins with strikingly different properties and activities by joining a common set of 20 amino acids in many 357 different combinations and sequences. From these building blocks, different organisms can make such widely diverse products as enzymes, hormones, antibodies, transporters, light- harvesting complexes in plants, the flagella of bacteria, muscle fibers, feathers, spider webs, rhinoceros horn, antibiotics, and myriad other substances that have distinct biological functions (Fig. 3-1). Among these protein products, the enzymes are the most varied and specialized. As the catalysts of almost all cellular reactions, enzymes are one of the keys to understanding the chemistry of life, and thus they provide a focal point for any course in biochemistry. FIGURE 3-1 Some functions of proteins. (a) The light produced by fireflies is the result of a reaction involving the protein luciferin and ATP, catalyzed by the enzyme luciferase (see Box 13-1). (b) Erythrocytes contain large amounts of the oxygen-transporting protein hemoglobin. (c) The protein keratin, formed by all vertebrates, is the chief structural component of hair, scales, horn, wool, nails, and feathers. The black rhinoceros is extinct in the wild because of the belief prevalent in some parts of the world that a powder derived from its horn has aphrodisiac properties. In reality, the chemical properties of powdered rhinoceros horn are no different from those of powdered bovine hooves or human fingernails. Protein structure and function are the topics of this and the next three chapters. Here, we begin with a description of the fundamental chemical properties of amino acids, peptides, and 358 proteins. We also consider how a biochemist works with proteins. The material is organized around four principles: In every living organism, proteins are constructed from a common set of 20 amino acids. Each amino acid has a side chain with distinctive chemical properties. Amino acids may be regarded as the alphabet in which the language of protein structure is written. In proteins, amino acids are joined in characteristic linear sequences through a common amide linkage, the peptide bond. The amino acid sequence of a protein constitutes its primary structure, a first level we will introduce within the broader complexities of protein structure. For study, individual proteins can be separated from the thousands of other proteins present in a cell, based on differences in their chemical and functional properties arising from their distinct amino acid sequences. As proteins are central to biochemistry, the purification of individual proteins for study is a quintessential biochemical endeavor. Shaped by evolution, amino acid sequences are a key resource for understanding the function of individual proteins and for tracing broader functional and evolutionary relationships. 359 3.1 Amino Acids Proteins are polymers of amino acids, with each amino acid residue joined to its neighbor by a specific type of covalent bond. (The term “residue” reflects the loss of the elements of water when one amino acid is joined to another.) Proteins can be broken down (hydrolyzed) to their constituent amino acids by a variety of methods, and the earliest studies of proteins naturally focused on the free amino acids derived from them. Twenty different amino acids are commonly found in proteins. The first to be discovered was asparagine, in 1806. The last of the 20 to be found, threonine, was not identified until 1938. All the amino acids have trivial or common names, in some cases derived from the source from which they were first isolated. Asparagine was first found in asparagus, and glutamate in wheat gluten; tyrosine was first isolated from cheese (its name is derived from the Greek tyros, “cheese”); glycine (Greek glykos, “sweet”) was so named because of its sweet taste. Learning the names, structures, and chemical properties of the 20 common amino acids found in proteins is one of the key memorization trials of every beginning biochemistry student. The necessity rapidly becomes apparent in succeeding chapters. It is impossible to discuss protein structure, protein function, ligand-binding sites, enzyme active sites, and most other biochemical topics without this foundation. The amino acids are part of the biochemistry vocabulary. Amino Acids Share Common Structural Features α-amino acids. They have a carboxyl group and an All 20 of the common amino acids are amino group bonded to the same carbon atom (the α carbon) (Fig. 3-2). They differ from each other in their side chains, or R groups, which vary in structure, size, and electric charge, and which influence the solubility of the amino acids in water. In addition to these 20 amino acids, there are many less common ones. Some are residues modified a er a protein has been synthesized, others are amino acids present in living organisms but not as constituents of proteins, and two are special cases found in just a few proteins. The common amino acids of proteins have been assigned three- letter abbreviations and one-letter symbols (see Table 3-1), which are used as shorthand to indicate the composition and sequence of amino acids polymerized in proteins. FIGURE 3-2 General structure of an amino acid. This structure is common 360 to all but one of the α-amino acids. (Proline, a cyclic amino acid, is the exception.) The R group, or side chain (purple), attached to the α carbon (gray) is different in each amino acid. KEY CONVENTION The three-letter code is easily understood, the abbreviations generally consisting of the first three letters of the amino acid name. The one-letter code was devised by Margaret Oakley Dayhoff, considered by many to be the founder of the field of bioinformatics. The one-letter code reflects an attempt to reduce the size of the data files (in an era of limited computer memory) used to describe amino acid sequences. It was designed to be easily memorized, and understanding its origin can help students do just that. For six amino acids (CHIMSV), the first letter of the amino acid name is unique and thus is used as the symbol. For five others (AGLPT), the first letter of the name is not unique but is assigned to the amino acid that is most common in proteins (for example, leucine is more common than lysine). For another four, the letter used is phonetically suggestive (RFYW: aRginine, Fenylalanine, tYrosine, tWiptophan). The rest were harder to assign. Four (DNEQ) were assigned letters found within or suggested by their names (asparDic, asparagiNe, glutamEke, Q- tamine). That le lysine. Only a few letters were le , and K was chosen because it was the closest to L. Margaret Oakley Dayhoff, 1925–1983 For all the common amino acids except glycine, the α carbon is bonded to four different groups: a carboxyl group, an amino group, an R group, and a hydrogen atom (Fig. 3-2; in glycine, the R group is another hydrogen atom). The α-carbon atom is thus a chiral center (p. 61). Because of the tetrahedral arrangement of the bonding orbitals around the α-carbon atom, the four different groups can occupy two unique spatial arrangements, and thus amino acids have two possible stereoisomers. Since they are nonsuperposable mirror images of each other (Fig. 3-3), the two forms represent a class of stereoisomers called enantiomers (see Fig. 1-21). All molecules with a 361 chiral center are also optically active — that is, they rotate the plane of plane-polarized light (see Box 1-2). FIGURE 3-3 Stereoisomerism in α-amino acids. (a) The two stereoisomers of alanine, - and -alanine, are nonsuperposable mirror images of each other (enantiomers). (b, c) Two different conventions for showing the configurations in space of stereoisomers. In perspective formulas (b), the solid wedge-shaped bonds project out of the plane of the paper, the dashed bonds behind it. In projection formulas (c), the horizontal bonds are assumed to project out of the plane of the paper, the vertical bonds behind. However, projection formulas are o en used casually and are not always intended to portray a specific stereochemical configuration. See Figure 3-4 for an explanation of the , -system for specifying absolute configuration. KEY CONVENTION Two conventions are used to identify the carbons in an amino acid — a practice that can be confusing. The additional carbons in an R group are commonly designated β, γ, δ, ε, and so 362 forth, proceeding out from the α carbon. For most other organic molecules, carbon atoms are simply numbered from one end, giving highest priority (C-1) to the carbon with the substituent containing the atom of highest atomic number. Within this latter convention, the carboxyl carbon of an amino acid would be C-1 and the α carbon would be C-2. In cases such as amino acids with heterocyclic R groups (e.g., histidine), where the Greek lettering system is ambiguous, the numbering system is used. For branched amino acid side chains, equivalent carbons are given numbers a er the Greek letters. Leucine thus has δ1 and δ2 carbons (see the structure in Fig. 3-5). Special nomenclature has been developed to specify the absolute configuration of the four substituents of asymmetric carbon atoms. The absolute configurations of simple sugars and amino acids are specified by the D , L system (Fig. 3-4), based on the absolute configuration of the three- carbon sugar glyceraldehyde, a convention proposed by Emil Fischer in 1891. (Fischer knew what groups surrounded the asymmetric carbon of glyceraldehyde but had to guess at their absolute configuration; he guessed right, as was later confirmed by x-ray diffraction analysis.) For all chiral compounds, stereoisomers having a configuration related to that of L -glyceraldehyde are designated L, and stereoisomers related to D -glyceraldehyde are designated D. The functional groups of L - alanine are matched with those of L -glyceraldehyde by aligning those that can be interconverted by simple, one-step chemical reactions. Thus the carboxyl group of L -alanine occupies the same position about the chiral carbon as does the aldehyde group of L -glyceraldehyde, because an aldehyde is readily converted to a carboxyl group via a one-step oxidation. Historically, the similar l and d designations were used for levorotatory (rotating plane-polarized light to the le ) and dextrorotatory (rotating light to the right). However, not all L -amino acids are levorotatory, and the convention shown in Figure 3-4 was needed to avoid potential ambiguities about absolute configuration. By Fischer’s convention, L and D refer only to the absolute configuration of the four substituents around the chiral carbon, not to optical properties of the molecule. 363 FIGURE 3-4 Steric relationship of the stereoisomers of alanine to the absolute configuration of - and -glyceraldehyde. In these perspective formulas, the carbons are lined up vertically, with the chiral atom in the center. The carbons in these molecules are numbered beginning with the terminal aldehyde or carboxyl carbon (red), 1 to 3 from top to bottom as shown. When α carbon. presented in this way, the R group of the amino acid (in this case the methyl group of alanine) is always below the - Amino acids are those with the α-amino group on the le , and -amino acids have the α-amino group on the right. Another system of specifying configuration around a chiral center is the RS system, which is used in the systematic nomenclature of organic chemistry and describes more precisely the configuration of molecules with more than one chiral center (p. 17). The Amino Acid Residues in Proteins Are Stereoisomers Nearly all biological compounds with a chiral center occur naturally in only one stereoisomeric form, either D or L. The amino acid residues in protein molecules are almost all L stereoisomers, with less than 1% being found in the D -configuration. The rare D -amino acid residues generally have a precise structural purpose, and they are introduced to a protein by enzyme-catalyzed reactions that occur a er the proteins are synthesized on a ribosome. It is remarkable that virtually all amino acid residues in proteins are L stereoisomers. When chiral compounds are formed by ordinary chemical reactions, the result is a racemic mixture of D and L isomers, which are difficult for a chemist to distinguish and separate. But to a living system, D and L isomers are as different as the right hand and the le. The formation of stable, repeating substructures in proteins (Chapter 4) requires that their constituent amino acids be of one 364 stereochemical series. Cells are able to specifically synthesize the L isomers of amino acids because the active sites of enzymes are asymmetric, causing the reactions they catalyze to be stereospecific. Amino Acids Can Be Classified by R Group Knowledge of the chemical properties of the common amino acids is central to an understanding of biochemistry. The topic can be simplified by grouping the amino acids into five main classes based on the properties of their R groups (Table 3-1), particularly their polarity, or tendency to interact with water at biological pH (near pH 7.0). The polarity of the R groups varies widely, from nonpolar and hydrophobic (water-insoluble) to highly polar and hydrophilic (water-soluble). A few amino acids — especially glycine, histidine, and cysteine — are somewhat difficult to characterize or do not fit perfectly in any one group. They are assigned to particular groupings based on considered judgments rather than absolutes. TABLE 3-1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins pKa values Amino acid Abbreviation/symbol Mr a pK1 (—COOH) pK2 (—NH ) + 3 pKR (R group) pI Hydro inde Nonpolar, aliphatic R groups Glycine Gly G 75 2.34 9.60 5.97 Alanine Ala A 89 2.34 9.69 6.01 Proline Pro P 115 1.99 10.96 6.48 Valine Val V 117 2.32 9.62 5.97 Leucine Leu L 131 2.36 9.60 5.98 Isoleucine Ile I 131 2.36 9.68 6.02 Methionine Met M 149 2.28 9.21 5.74 Aromatic R groups Phenylalanine Phe F 165 1.83 9.13 5.48 Tyrosine Tyr Y 181 2.20 9.11 10.07 5.66 Tryptophan Trp W 204 2.38 9.39 5.89 Polar, uncharged R groups Serine Ser S 105 2.21 9.15 5.68 Threonine Thr T 119 2.11 9.62 5.87 Cysteinee Cys C 121 1.96 10.28 8.18 5.07 Asparagine Asn N 132 2.02 8.80 5.41 365 Glutamine Gln Q 146 2.17 9.13 5.65 Positively charged R groups Lysine Lys K 146 2.18 8.95 10.53 9.74 Histidine His H 155 1.82 9.17 6.00 7.59 Arginine Arg R 174 2.17 9.04 12.48 10.76 Negatively charged R groups Aspartate Asp D 133 1.88 9.60 3.65 2.77 Glutamate Glu E 147 2.19 9.67 4.25 3.22 aM values reflect the structures as shown in Figure 3-5. The elements of water (M r r 18) are deleted when the amino acid is incorporated into a polypeptide. bA scale combining hydrophobicity and hydrophilicity of R groups. The values reflect the free energy (ΔG) of transfer of the amino acid side chain from a hydrophobic env favorable (ΔG < 0; negative value in the index) for charged or polar amino acid side chains, and it is unfavorable (ΔG > 0; positive value in the index) for amino acids wit chains. See Chapter 11. Source: Data from J. Kyte and R. F. Doolittle, J. Mol. Biol. 157:105, 1982. cThe first value in each row is the average occurrence in more than 1,150 proteins. Source: Data from R. F. Doolittle, in Prediction of Protein Structure and the Principles of Pr p. 599, Plenum Press, 1989. The second and third values are, respectively, from the complete proteomes of nine mesophilic bacterial species and seven thermophilic bacte commonly encountered temperatures, whereas thermophiles grow at elevated temperatures up to and beyond the boiling point of water. The decline in glutamine occurre tendency of this amino acid to deaminate at high temperatures. Source: Data from A. C. Singer and D. A. Hickey, Gene 317:39, 2003. dAs originally composed, the hydropathy index takes into account the frequency with which an amino acid residue appears on the surface of a protein. As proline o en ap lower score than its chain of methylene groups would suggest. eCysteine is generally classified as polar, despite having a positive hydropathy index. This reflects the ability of the sulfhydryl group to act as a weak acid and to form a wea nitrogen. The structures of the 20 common amino acids are shown in Figure 3-5, and some of their properties are listed in Table 3-1. Within each class there are gradations of polarity, size, and shape of the R groups. 366 FIGURE 3-5 The 20 common amino acids of proteins. The structural formulas show the state of ionization that would predominate at pH 7.0. The unshaded portions are those that are common to all the amino acids; the shaded portions are the R groups. Although the R group of histidine is shown uncharged, its pK (see Table 3-1) is such that a small but significant fraction of these groups are positively a charged at pH 7.0. The protonated form of histidine is shown above the graph in Figure 3-12b. Nonpolar, Aliphatic R Groups The R groups in this class of amino acids are nonpolar and hydrophobic. The side chains of alanine, valine, leucine, and isoleucine tend to cluster together within proteins, stabilizing protein structure through the hydrophobic effect. Glycine has the simplest structure. Although it is most easily grouped with the nonpolar amino acids, its very small side chain makes no real contribution to interactions driven by the hydrophobic effect. Methionine, one of the two sulfur-containing amino acids, has a slightly nonpolar thioether group in its side chain. Proline has an aliphatic side chain with a distinctive cyclic structure. The secondary amino (imino) group of proline residues is held in a rigid conformation that reduces the structural flexibility of polypeptide regions containing proline. Aromatic R Groups Phenylalanine, tyrosine, and tryptophan, with their aromatic side chains, are relatively nonpolar (hydrophobic). All can contribute to the hydrophobic effect. The hydroxyl group of tyrosine can form hydrogen bonds, and it is an important functional group in some enzymes. 367 Tyrosine and tryptophan are significantly more polar than phenylalanine because of the tyrosine hydroxyl group and the nitrogen of the tryptophan indole ring. Tryptophan and tyrosine, and to a much lesser extent phenylalanine, absorb ultraviolet light (Fig. 3- 6; see also Box 3-1). This accounts for the characteristic strong absorbance of light by most proteins at a wavelength of 280 nm, a property exploited by researchers in the characterization of proteins. FIGURE 3-6 Absorption of ultraviolet light by aromatic amino acids. Comparison of the light absorption spectra of the aromatic amino acids tryptophan, tyrosine, and phenylalanine at pH 6.0. The amino acids are present in equimolar amounts (10 −3 M) under identical conditions. The measured absorbance of tryptophan is more than four times that of tyrosine at a wavelength of 280 nm. Note that the maximum light absorption for both tryptophan and tyrosine occurs near 280 nm. Light absorption by phenylalanine generally contributes little to the spectroscopic properties of proteins. BOX 3-1 METHODS Absorption of Light by Molecules: The Lambert-Beer Law 368 A wide range of biomolecules absorb light at characteristic wavelengths, just as tryptophan absorbs light at 280 nm (see Fig. 3-6). Measurement of light absorption by a spectrophotometer is used to detect and identify molecules and to measure their concentration in solution. The fraction of the incident light absorbed by a solution at a given wavelength is related to the thickness of the absorbing layer (path length) and the concentration of the absorbing species (Fig. 1). These two relationships are combined into the Lambert-Beer law, I0 log = εcl I where I is the intensity of the incident light, I is the intensity of the transmitted light, the ratio I /I (the inverse of the ε is the molar extinction coefficient (in units of liters per mole-centimeter), c is 0 0 ratio in the equation) is the transmittance, the concentration of the absorbing species (in moles per liter), and l is the path length of the light-absorbing sample (in centimeters). The Lambert-Beer law assumes that the incident light is parallel and monochromatic (of a single wavelength) and that the solvent and solute molecules are randomly oriented. The expression log (I 0 /I ) is called the absorbance, designated A. FIGURE 1 The principal components of a spectrophotometer. A light source emits light along a broad spectrum, then the monochromator selects and transmits light of a particular wavelength. The monochromatic light passes through the sample in a cuvette of path length l. The absorbance of the sample, log (I 0 /I ), is proportional to the concentration of the absorbing species. The transmitted light is measured by a detector. It is important to note that each successive millimeter of path length of absorbing solution in a 1.0 cm cell absorbs not a constant amount but a constant fraction of the light that is incident upon it. However, with an absorbing layer of fixed path length, the absorbance, A, is directly proportional to the concentration of the absorbing solute. The molar extinction coefficient varies with the nature of the absorbing compound, the solvent, and the wavelength, and also with pH if the light-absorbing species is in equilibrium with an ionization state that has different absorbance properties. Polar, Uncharged R Groups The R groups of these amino acids are more soluble in water, or more hydrophilic, than those of the nonpolar amino acids because they contain functional groups that form hydrogen bonds with water. This class of amino acids includes serine, threonine, cysteine, asparagine, and glutamine. The polarity of serine and threonine is contributed by their hydroxyl groups, and the polarity of asparagine and glutamine is contributed by their amide groups. Cysteine is an outlier here because its polarity, contributed by its sulfhydryl group, is quite modest. Cysteine is a weak acid and can make weak hydrogen bonds with oxygen or nitrogen. 369 Asparagine and glutamine are the amides of two other amino acids also found in proteins — aspartate and glutamate, respectively — to which asparagine and glutamine are easily hydrolyzed by acid or base. Cysteine is readily oxidized to form a covalently linked dimeric amino acid called cystine, in which two cysteine molecules or residues are joined by a disulfide bond (Fig. 3-7). The disulfide-linked residues are strongly hydrophobic (nonpolar). Disulfide bonds play a special role in the structures of many proteins by forming covalent links between parts of a polypeptide molecule or between two different polypeptide chains. FIGURE 3-7 Reversible formation of a disulfide bond by the oxidation of two molecules of cysteine. Disulfide bonds between Cys residues stabilize the structures of many proteins. Positively Charged (Basic) R Groups The most hydrophilic R groups are those that are either positively charged or negatively charged. The amino acids in which the R groups have significant positive charge at pH 7.0 are lysine, which has a second primary amino group at the ε position on its aliphatic chain; arginine, which has a positively charged guanidinium group; and histidine, which has an aromatic imidazole group. As the only common amino acid having an ionizable side chain with pK near neutrality, histidine may be positively charged (protonated form) or uncharged a at pH 7.0. His residues facilitate many enzyme-catalyzed reactions by serving as proton donors/acceptors. Negatively Charged (Acidic) R Groups The two amino acids having R groups with a net negative charge at pH 7.0 are aspartate and glutamate, each of which has a second carboxyl group. Uncommon Amino Acids Also Have Important Functions In addition to the 20 common amino acids, proteins may contain residues created by modification of common residues already incorporated into a polypeptide — that is, through postsynthetic modification (Fig. 3-8a). Among these uncommon amino acids are 4-hydroxyproline, a derivative of proline found in the fibrous protein collagen, and γ-carboxyglutamate, found in the blood- 370 clotting protein prothrombin and in certain other proteins that bind Ca 2+ as part of their biological function. More complex is desmosine, a derivative of four Lys residues, which is found in the fibrous protein elastin. FIGURE 3-8 Uncommon amino acids. (a) Some uncommon amino acids found in proteins. Most are derived from common amino acids. (Note the use of either numbers or Greek letters in the names of these structures to identify the altered carbon atoms.) Extra functional groups added by modification reactions are shown in red. Desmosine is formed from four Lys residues (the carbon backbones are shaded in light red). Selenocysteine and pyrrolysine are exceptions: these amino acids are added during normal protein synthesis through a highly specialized expansion of the standard genetic code. Both are found in very small numbers of proteins. (b) Reversible amino acid modifications involved in regulation of protein activity. Phosphorylation is the most common type of regulatory modification. (c) Ornithine and citrulline, which are not found in proteins, are intermediates in the biosynthesis of arginine and in the urea cycle. Selenocysteine and pyrrolysine are special cases. These rare amino acid residues are not created through a postsynthetic modification. Instead, they are introduced during protein synthesis 371 through an unusual adaptation of the genetic code, which we describe in Chapter 27. Selenocysteine contains selenium rather than the sulfur of cysteine. Actually derived from serine, selenocysteine is a constituent of just a few known proteins. Pyrrolysine is found in a few proteins in several methanogenic (methane-producing) archaea and in one known bacterium; it plays a role in methane biosynthesis. Some amino acid residues in a protein may be modified transiently to alter the protein’s function. The addition of phosphoryl, methyl, acetyl, adenylyl, ADP-ribosyl, or other groups to particular amino acid residues can increase or decrease a protein’s activity (Fig. 3-8b). Phosphorylation is a particularly common regulatory modification. Covalent modification as a protein regulatory strategy is discussed in more detail in Chapter 6. Some 300 additional amino acids have been found in cells. They have a variety of functions, but not all are constituents of proteins. Ornithine and citrulline (Fig. 3-8c) deserve special note because they are key intermediates (metabolites) in the biosynthesis of arginine (Chapter 22) and in the urea cycle (Chapter 18). Amino Acids Can Act as Acids and Bases The amino and carboxyl groups of amino acids, along with the ionizable R groups of some amino acids, function as weak acids and bases. When an amino acid lacking an ionizable R group is dissolved in water at neutral pH, the α-amino and carboxyl groups create a dipolar ion, or zwitterion (German for “hybrid ion”), which can act as either an acid or a base (Fig. 3-9). Substances having this dual (acid-base) nature are amphoteric and are o en called ampholytes (from “amphoteric electrolytes”). A simple monoamino monocarboxylic α-amino acid, such as alanine, is a diprotic acid when fully protonated; it has two groups, the —COOH group and the —NH + 3 group, that can yield protons: 372 FIGURE 3-9 Nonionic and zwitterionic forms of amino acids. The nonionic form does not occur in significant amounts in aqueous solutions. The zwitterion predominates at neutral pH. A zwitterion can act as either an acid (proton donor) or a base (proton acceptor). Acid-base titration involves the gradual addition or removal of protons (Chapter 2). Figure 3-10 shows the titration curve of the diprotic form of glycine. The two ionizable groups of glycine, the carboxyl group and the amino group, are titrated with a strong base such as NaOH. The plot has two distinct stages, corresponding to deprotonation of two different groups on glycine. Each of the two stages resembles in shape the titration curve of a monoprotic acid, such as acetic acid (see Fig. 2- 16), and can be analyzed in the same way. At very low pH, the predominant ionic species of glycine is the fully protonated form, + H3 N−CH2 —COOH. 373 FIGURE 3-10 Titration of amino acids. The titration curve of 0.1 M glycine at 25 °C. The ionic species predominating at key points in the titration are shown above the graph. The shaded boxes, centered at about pK 1 = 2.3 and pK 2 = 9.60, indicate the regions of greatest buffering power. Note that 1 equivalent of OH − = 0.1 M NaOH added. The pI occurs at the arithmetic mean between the two pK values, and it corresponds to the inflection point in the titration. a In the first stage of the titration, the —COOH group of glycine (with its lower pK ) loses its proton. a At the midpoint of this stage, equimolar concentrations of the proton-donor ( + H3 N—CH2 —COOH) and the proton-acceptor ( + H3 N—CH2 —COO − ) species are present. As in the titration of any weak acid, a point of inflection is reached at this midpoint where the pH is equal to the pK of the a protonated group that is being titrated (see Fig. 2-17). For glycine, the pH at the midpoint is 2.34; thus its —COOH group has a pK (labeled pK in Fig. 3-10) of 2.34. (Recall from Chapter 2 that pH a 1 and pK are simply convenient notations for proton concentration and the equilibrium constant for a ionization, respectively. The pK is a measure of the tendency of a group to give up a proton, with a 374 that tendency decreasing 10-fold as the pK increases by one unit.) As the titration of glycine a proceeds, another point of inflection is reached at pH 5.97; at this point, removal of the first proton is essentially complete and removal of the second has just begun. At this pH, glycine is present largely as the dipolar ion (zwitterion) + H3 N—CH2 —COO −. We shall return to the significance of this inflection point in the titration curve (labeled pI in Fig. 3-10) shortly. The second stage of the titration corresponds to the removal of a proton from the —NH group of + 3 glycine. The pH at the midpoint of this stage is 9.60, equal to the pK (labeled pK in Fig. 3-10) for a 2 the —NH group. The titration is essentially complete at a pH of about 12, at which point the + 3 predominant form of glycine is H 2 N—CH2 —COO −. From the titration curve of glycine we can derive several important pieces of information. First, it gives a quantitative measure of the pK of each of the two ionizing groups: 2.34 for the —COOH a group and 9.60 for the —NH group. Note that the carboxyl group of glycine is over 100 times more + 3 acidic (more easily ionized) than the carboxyl group of acetic acid, which, as we saw in Chapter 2, has a pK of 4.76 — about average for a carboxyl group attached to an otherwise unsubstituted a aliphatic hydrocarbon. The perturbed pK of glycine is caused primarily by the nearby positively α-carbon atom, an electronegative group that tends to pull electrons a charged amino group on the toward it (a process called electron withdrawal), as described in Figure 3-11. The opposite charges on the resulting zwitterion are also somewhat stabilizing. Similarly, the pK of the amino group in a glycine is perturbed downward relative to the average pK of an amino group. This effect is due a largely to electron withdrawal by the electronegative oxygen atoms in the carboxyl groups, increasing the tendency of the amino group to give up a proton. Hence, the α-amino group has a pKa that is lower than that of an aliphatic amine such as methylamine (Fig. 3-11). In short, the pK a of any functional group is greatly affected by its chemical environment, a phenomenon sometimes exploited in the active sites of enzymes to promote exquisitely adapted reaction mechanisms that depend on the perturbed pK values of proton donor/acceptor groups of specific residues. a 375 FIGURE 3-11 Effect of the chemical environment on pK. The pK values for the ionizable groups in glycine are lower than those for a a simple, methyl-substituted amino and carboxyl groups. These downward perturbations of pK are due to intramolecular interactions. a Similar effects can be caused by chemical groups that happen to be positioned nearby — for example, in the active site of an enzyme. The second piece of information provided by the titration curve of glycine is that this amino acid has two regions of buffering power. One of these is the relatively flat portion of the curve, extending for approximately 1 pH unit on either side of the first pK of 2.34, indicating that glycine is a good a buffer near this pH. The other buffering zone is centered around pH 9.60. (Note that glycine is not a good buffer at the pH of intracellular fluid or blood, about 7.4.) Within the buffering ranges of glycine, the Henderson-Hasselbalch equation (p. 60) can be used to calculate the proportions of proton-donor and proton-acceptor species of glycine required to make a buffer at a given pH. A final important piece of information derived from the titration curve of an amino acid is the relationship between its net charge and the pH of the solution. At pH 5.97, the point of inflection between the two stages in its titration curve, glycine is present predominantly as its dipolar form, fully ionized but with no net electric charge (Fig. 3-10). The characteristic pH at which the net electric charge is zero is called the isoelectric point or isoelectric pH, designated pI. For glycine, which has no ionizable group in its side chain, the isoelectric point is simply the arithmetic mean of the two pK values: a 1 1 pI = (pK1 + pK2 ) = (2.34 + 9.60) = 5.97 2 2 As is evident in Figure 3-10, glycine has a net negative charge at any pH above its pI and thus will move toward the positive electrode (the anode) when placed in an electric field. At any pH below its pI, glycine has a net positive charge and will move toward the negative electrode (the cathode). The farther the pH of a glycine solution is from its isoelectric point, the greater the net electric charge of the population of glycine molecules. At pH 1.0, for example, glycine exists almost entirely as the form + H3 N—CH2 —COOH with a net positive charge of 1.0. At pH 2.34, where there is an equal mixture of + H3 N—CH2 —COOH and + H3 N—CH2 —COO − , the average or net positive charge is 0.5. The sign and the magnitude of the net charge of any amino acid at any pH can be predicted in the same way. Amino Acids Differ in Their Acid-Base Properties The shared properties of many amino acids permit some simplifying generalizations about their acid-base behaviors. First, all amino acids with a single α-amino group, a single α-carboxyl group, and an R group that does not ionize have titration curves resembling that of glycine (Fig. 3-10). These amino acids have very similar, although not identical, pK values: pK of the —COOH group a a in the range of 1.8 to 2.4, and pK of the —NH group in the range of 8.8 to 11.0 (Table 3-1). The a + 3 differences in these pK values reflect the chemical environments imposed by their R groups. a 376 Second, amino acids with an ionizable R group have more complex titration curves, with three stages corresponding to the three possible ionization steps; thus, they have three pK values. The a additional stage for the titration of the ionizable R group merges to some extent with that for the titration of the α-carboxyl group, the titration of the α-amino group, or both. The titration curves for two amino acids of this type, glutamate and histidine, are shown in Figure 3-12. The isoelectric points reflect the nature of the ionizing R groups that are present. For example, glutamate has a pI of 3.22, considerably lower than that of glycine. This is due to the presence of two carboxyl groups, which, at the average of their pK values (3.22), contribute a net charge of −1 that balances the +1 a contributed by the amino group. Similarly, the pI of histidine, with two groups that are positively charged when protonated, is 7.59 (the average of the pK values of the amino and imidazole a groups), much higher than that of glycine. FIGURE 3-12 Titration curves for (a) glutamate and (b) histidine. The pK of the R group is designated here as pK. In both cases, the a R presence of three ionizable groups renders the titration curve more complex. Note that for glutamate, the pI is approximately the arithmetic mean of the pK of the two groups that are negatively charged. There is a net charge of 0 (the pI) when these two groups α-amino group. a contribute a net charge of −1 (one protonated, the other not) to exactly balance the +1 charge of the protonated Similarly, the pI for histidine is approximately the arithmetic mean of the pK of the two groups that are positively charged when a protonated. Finally, in an aqueous environment, only histidine has an R group (pK a = 6.0) providing significant buffering power near the neutral pH usually found in the intracellular and extracellular fluids of most animals and bacteria (Table 3-1). SUMMARY 3.1 Amino Acids The 20 amino acids commonly found as residues in proteins contain an α-carboxyl group, an α- amino group, and a distinctive R group substituted on the α-carbon atom. The α-carbon atom of all amino acids except glycine is asymmetric, and thus amino acids can exist in at least two stereoisomeric forms. 377 Only the L stereoisomers of amino acids, with a configuration related to the absolute configuration of the reference molecule L -glyceraldehyde, are found in proteins. Amino acids can be classified into five types on the basis of the polarity and charge (at pH 7) of their R groups. Other, less common amino acids also occur, either as constituents of proteins (usually through modification of common amino acid residues a er protein synthesis) or as free metabolites. Amino acids vary in their acid-base properties and have characteristic titration curves. Monoamino monocarboxylic amino acids (with nonionizable R groups) are diprotic acids ( + H3 NCH(R)COOH) at low pH and exist in several different ionic forms as the pH is increased. Amino acids with ionizable R groups have additional ionic species, depending on the pH of the medium and the pK of the R group. a 378 3.2 Peptides and Proteins We now turn to polymers of amino acids, the peptides and proteins. Biologically occurring polypeptides range in size from small, consisting of two or three linked amino acid residues, to very large, consisting of thousands of residues. Peptides Are Chains of Amino Acids Two amino acid molecules can be covalently joined through a substituted amide linkage, termed a peptide bond, to yield a dipeptide. Such a linkage is formed by removal of the elements of water (dehydration) — a hydroxyl moiety from the α- carboxyl group of one amino acid and a hydrogen atom the α- amino group of another (Fig. 3-13). The joined amino acids are referred to as residues, the part le over a er the elements of water are removed. Peptide bond formation is an example of a condensation reaction, a common class of reactions in living cells. The reverse reaction, bond breakage involving water, is an example of hydrolytic cleavage or hydrolysis. Under standard biochemical conditions, the equilibrium for the reaction shown in Figure 3-13 favors the hydrolysis of the dipeptide into amino acids. To make the condensation reaction thermodynamically more favorable, the carboxyl group must be chemically modified or activated so that the hydroxyl group can be more readily eliminated. A chemical approach to this problem is outlined later in this chapter. The biological approach to peptide bond formation is a major topic of Chapter 27. 379 FIGURE 3-13 Formation of a peptide bond by condensation. The α-amino group of one amino acid (with the R group) acts as a nucleophile to displace the hydroxyl 2 group of another amino acid (with the R group), forming a peptide bond (shaded in 1 light red). Amino groups are good nucleophiles, but the hydroxyl group is a poor leaving group and is not readily displaced. At physiological pH, the reaction shown here does not occur to any appreciable extent. Three amino acids can be joined by two peptide bonds to form a tripeptide; similarly, four amino acids can be linked to form a tetrapeptide, five to form a pentapeptide, and so forth. When a few amino acids are joined in this fashion, the structure is called an oligopeptide. When many amino acids are joined, the product is called a polypeptide. Proteins may have thousands of amino acid residues. Although the terms “protein” and “polypeptide” are sometimes used interchangeably, molecules referred to as 380 polypeptides generally have molecular weights below 10,000, and those called proteins have higher molecular weights. Figure 3-14 shows the structure of a pentapeptide. In a peptide, the amino acid residue at the end with a free α-amino group is the amino-terminal (or N-terminal) residue; the residue at the other end, which has a free carboxyl group, is the carboxyl- terminal (C-terminal) residue. FIGURE 3-14 The pentapeptide serylglycyltyrosylalanylleucine, Ser–Gly– Tyr–Ala–Leu, or SGYAL. Peptides are named beginning with the amino- terminal residue, which by convention is placed at the le. The peptide bonds are shaded in light red; the R groups are in red. KEY CONVENTION When an amino acid sequence of a peptide, a polypeptide, or a protein is displayed, the amino-terminal end is placed on the le and the carboxyl-terminal end is placed on the right. The sequence is read from le to right, beginning with the amino- terminal end. 381 Although hydrolysis of a peptide bond is an exergonic reaction, it occurs only slowly because it has a high activation energy (p. 25). As a result, the peptide bonds in proteins are quite stable, with an average half-life (t 1/2 ) of about 7 years under most intracellular conditions. Peptides Can Be Distinguished by Their Ionization Behavior Peptides contain only one free α-amino group and one freeα-carboxyl group, at opposite ends of the chain (Fig. 3-15). These groups ionize as they do in free amino acids. The α-amino and α-carboxyl groups of all nonterminal amino acids are covalently joined in the peptide bonds. They can no longer ionize and thus do not contribute to the total acid-base behavior of peptides. Ionizable R groups in a peptide (Table 3-1) also contribute to the overall acid-base properties of the molecule (Fig. 3-15). 382 FIGURE 3-15 Alanylglutamylglycyllysine. This tetrapeptide has one free α-amino group, one free α-carboxyl group, and two ionizable R groups. The groups ionized at pH 7.0 are in red. Like free amino acids, peptides have characteristic titration curves and a characteristic isoelectric pH (pI) at which the net charge is zero and they do not move in an electric field. These properties are exploited in some of the techniques used to separate peptides and proteins, as we describe later in the 383 chapter. When an amino acid becomes a residue in a peptide, its chemical environment is altered, and the pK value for an a ionizable R group can change somewhat. The pK values for R a groups listed in Table 3-1 can be a useful guide to the pH range in which a given group will ionize, but they cannot be strictly applied when an amino acid becomes part of a peptide. Biologically Active Peptides and Polypeptides Occur in a Vast Range of Sizes and Compositions No generalizations can be made about the molecular weights of biologically active peptides and proteins in relation to their functions. Naturally occurring peptides range in length from two to many thousands of amino acid residues. Even the smallest peptides can have biologically important effects. Consider the commercially synthesized dipeptide L -aspartyl-L -phenylalanine methyl ester, the artificial sweetener better known as aspartame or NutraSweet. 384 Many small peptides exert their effects at very low concentrations. For example, a number of vertebrate hormones (Chapter 23) are small peptides. These include oxytocin (nine amino acid residues), which is secreted by the posterior pituitary gland and stimulates uterine contractions, and thyrotropin- releasing factor (three residues), which is formed in the hypothalamus and stimulates the release of another hormone, thyrotropin, from the anterior pituitary gland. Some extremely toxic mushroom poisons, such as amanitin, are also small peptides, as are many antibiotics. How long are the polypeptide chains in proteins? As Table 3-2 shows, lengths vary considerably. Human cytochrome c has 104 amino acid residues linked in a single chain; bovine chymotrypsinogen has 245 residues. At the extreme is titin, a constituent of vertebrate muscle, which has nearly 27,000 amino 385 acid residues and a molecular weight of about 3,000,000. The vast majority of naturally occurring proteins are much smaller than this, containing fewer than 2,000 amino acid residues. TABLE 3-2 Molecular Data on Some Proteins Protein Molecular Number of Number of weight residues polypeptide chains Cytochrome c (human) 12,400 104 1 Myoglobin (equine 16,700 153 1 heart) Chymotrypsin (bovine 25,200 241 3 pancreas) Hemoglobin (human) 64,500 574 4 Hexokinase (yeast) 107,900 972 2 RNA polymerase (E. 450,000 4,158 5 coli) Glutamine synthetase 619,000 5,628 12 (E. coli) Titin (human) 2,993,000 26,926 1 Some proteins consist of a single polypeptide chain, but others, called multisubunit proteins, have two or more polypeptides associated noncovalently (Table 3-2). The individual polypeptide chains in a multisubunit protein may be identical or different. If at least two are identical the protein is said to be oligomeric, and the identical units (consisting of one or more polypeptide chains) 386 are referred to as protomers. Hemoglobin, for example, has four polypeptide subunits: two identical α chains and two identical β chains, all four held together by noncovalent interactions. Each α subunit is paired in an identical way with a β subunit within the structure of this multisubunit protein, so that hemoglobin can be considered either a tetramer of four polypeptide subunits or a dimer of αβ protomers. A few proteins contain two or more polypeptide chains linked covalently. For example, the two polypeptide chains of insulin are linked by disulfide bonds. In such cases, the individual polypeptides are not considered subunits; instead they are commonly referred to simply as chains. The amino acid composition of proteins is also highly variable. The 20 common amino acids almost never occur in equal amounts in a protein. Some amino acids may occur only once or not at all in a given type of protein; others may occur in large numbers. Table 3-3 shows the amino acid composition of bovine cytochrome c and chymotrypsinogen, the inactive precursor of the digestive enzyme chymotrypsin. These two proteins, with very different functions, also differ significantly in the relative numbers of each kind of amino acid residue. TABLE 3-3 Amino Acid Composition of Two Proteins Bovine cytochrome c Bovine chymotrypsinogen Amino Number of Percentage Number of Percentage acid residues per of totala residues per of totala molecule molecule 387 Ala 6 6 22 9 Arg 2 2 4 1.6 Asn 5 5 14 5.7 Asp 3 3 9 3.7 Cys 2 2 10 4 Gln 3 3 10 4 Glu 9 9 5 2 Gly 14 13 23 9.4 His 3 3 2 0.8 Ile 6 6 10 4 Leu 6 6 19 7.8 Lys 18 17 14 5.7 Met 2 2 2 0.8 Phe 4 4 6 2.4 Pro 4 4 9 3.7 Ser 1 1 28 11.4 Thr 8 8 23 9.4 Trp 1 1 8 3.3 Tyr 4 4 4 1.6 Val 3 3 23 9.4 388 Total 104 102a 245 99.7a Note: In some common analyses, such as acid hydrolysis, Asp and Asn are not readily distinguished from each other and are together designated Asx (or B). Similarly, when Glu and Gln cannot be distinguished, they are together designated Glx (or Z). In addition, Trp is destroyed by acid hydrolysis. Additional procedures must be employed to obtain an accurate assessment of complete amino acid content. aPercentages do not total to 100%, due to rounding. We can estimate the number of amino acid residues in a simple protein containing no other chemical constituents by dividing its molecular weight by 110. Although the average molecular weight of the 20 common amino acids is about 138, the smaller amino acids predominate in most proteins. If we take into account the proportions in which the various amino acids occur in an average protein (Table 3-1; the averages are determined by surveying the amino acid compositions of more than 1,000 different proteins), the average molecular weight of protein amino acids is nearer to 128. Because a molecule of water (M r 18) is removed to create each peptide bond, the average molecular weight of an amino acid residue in a protein is about 128 − 18 = 110. Some Proteins Contain Chemical Groups Other Than Amino Acids Many proteins — for example, the enzymes ribonuclease A and chymotrypsin — contain only amino acid residues and no other chemical constituents. However, some proteins contain permanently associated chemical components in addition to amino acids; these are called conjugated proteins. The non– 389 amino acid part of a conjugated protein is usually called its prosthetic group. Conjugated proteins are classified on the basis of the chemical nature of their prosthetic groups (Table 3-4); for example, lipoproteins contain lipids, glycoproteins contain sugar groups, and metalloproteins contain a specific metal. Some proteins contain more than one prosthetic group. Usually the prosthetic group plays an important role in the protein’s biological function. TABLE 3-4 Conjugated Proteins Class Prosthetic group Example Lipoproteins Lipids β1 -Lipoprotein of blood (Fig. 17-2) Glycoproteins Carbohydrates Immunoglobulin G (Fig. 5-20) Phosphoproteins Phosphate groups Glycogen phosphorylase (Fig. 6-39) Hemoproteins Heme (iron porphyrin) Hemoglobin (Figs 5-8 to 5-11) Flavoproteins Flavin nucleotides Succinate dehydrogenase (Fig. 19-9) Metalloproteins Iron Ferritin (Box 16-1) Zinc Alcohol dehydrogenase (Fig. 14-12) Calcium Calmodulin (Fig. 12-17) Molybdenum Dinitrogenase (Fig. 22-3) Copper Complex IV (Fig. 19-12) SUMMARY 3.2 Peptides and Proteins 390 Amino acids can be joined covalently through peptide bonds to form peptides and proteins. Cells generally contain thousands of different proteins, each with a different biological activity. The ionization behavior of peptides reflects their ionizable side chains as well as the terminal α-amino and α-carboxyl groups. Proteins can be very long polypeptide chains of 100 to several thousand amino acid residues. However, some naturally occurring peptides have only a few amino acid residues. Some proteins are composed of several noncovalently associated polypeptide chains, called subunits. Simple proteins yield only amino acids on hydrolysis; conjugated proteins contain in addition some other component, such as a metal or organic prosthetic group. 391 3.3 Working with Proteins Biochemists’ understanding of protein structure and function has been derived from the study of many individual proteins. To study a protein in detail, the researcher must be able to separate it from other proteins in pure form and must have the techniques to determine its properties. The necessary methods come from protein chemistry, a discipline as old as biochemistry itself and one that retains a central position in biochemical research. Proteins Can Be Separated and Purified A pure preparation is usually essential before a protein’s properties and activities can be determined. Given that cells contain thousands of different kinds of proteins, how can one protein be purified? Methods for separating proteins take advantage of properties that vary from one protein to the next, including size, charge, and binding properties. The advent of genetic engineering approaches has provided new and simpler paths for protein purification. The latter methods, described in Chapter 9, o en artificially modify the protein being purified, adding a few or many amino acid residues to one or both ends. In many cases, the modifications alter protein function. Isolation of unaltered native proteins requires removal of the modification or a reliance on methods described here. 392 The source of a protein is generally tissue or microbial cells. The first step in any protein purification procedure is to break open these cells, releasing their proteins into a solution called a crude extract. If necessary, differential centrifugation can be used to prepare subcellular fractions or to isolate specific organelles (see Fig. 1-7). Once the extract or organelle preparation is ready, various methods are available for purifying one or more of the proteins it contains. Commonly, the extract is subjected to treatments that separate the proteins into different fractions based on a property such as size or charge; the process is referred to as fractionation. Early fractionation steps in a purification utilize differences in protein solubility, which is a complex function of pH, temperature, salt concentration, and other factors. The solubility of proteins is lowered in the presence of some salts, an effect called “salting out.” Ammonium sulfate ((NH 4 ) SO4 ) 2 is particularly effective for selectively precipitating some proteins while leaving others in solution. Low-speed centrifugation is then used to remove the precipitated proteins from those remaining in solution. A solution containing the protein of interest usually must be further altered before subsequent purification steps are possible. For example, dialysis is a procedure that separates proteins from small solutes by taking advantage of the proteins’ larger size. The partially purified extract is placed in a bag or tube made of a semipermeable membrane, which is suspended in a much larger 393 volume of buffered solution of appropriate ionic strength. The membrane allows the exchange of salt and buffer but not proteins. Thus dialysis retains large proteins within the membranous bag or tube while allowing the concentration of other solutes in the protein preparation to change until they come into equilibrium with the solution outside the membrane. Dialysis might be used, for example, to remove ammonium sulfate from the protein preparation. The most efficient methods for fractionating proteins make use of column chromatography, which takes advantage of differences in protein charge, size, binding affinity, and other properties (Fig. 3-16). A porous solid material with appropriate chemical properties (the stationary phase) is held in a column, and a buffered solution (the mobile phase) migrates through it. The protein, dissolved in the same buffered solution that was used to establish the mobile phase, is layered on the top of the column. The protein then percolates through the solid matrix as an ever- expanding band within the larger mobile phase. Individual proteins migrate faster or more slowly through the column, depending on their properties. 394 FIGURE 3-16 Column chromatography. The standard elements of a chromatographic column include a solid, porous material (matrix) supported inside a column, generally made of plastic or glass. A solution, the mobile phase, flows through the matrix, the stationary phase. The solution that passes out of the column at the bottom (the effluent) is constantly replaced by solution supplied from a reservoir at the top. The protein solution to be separated is layered on top of the column and allowed to percolate into the solid matrix. Additional solution is added on top. The protein solution forms a band within the mobile phase that is initially the depth of the protein solution applied to the column. As proteins 395 migrate through the column (shown here at five different times), they are retarded to different degrees by their different interactions with the matrix material. The overall protein band thus widens as it moves through the column. Individual types of proteins (such as A, B, and C, shown in blue, red, and green) gradually separate from each other, forming bands within the broader protein band. Separation improves (i.e., resolution increases) as the length of the column increases. However, each individual protein band also broadens with time due to diffusional spreading, a process that decreases resolution. In this example, protein A is well separated from B and C, but diffusional spreading prevents complete separation of B and C under these conditions. Protein C is being detected and its presence recorded as it is eluted from the column. Ion-exchange chromatography exploits differences in the sign and magnitude of the net electric charge of proteins at a given pH (Fig. 3-17a). The column matrix is a synthetic polymer (resin) containing bound charged groups; those with bound anionic groups are called cation exchangers, and those with bound cationic groups are called anion exchangers. The affinity of each protein for the charged groups on the column is affected by the pH (which determines the ionization state of the molecule) and the concentration of competing free salt ions in the surrounding solution. Separation can be optimized by gradually changing the pH and/or salt concentration of the mobile phase in order to create a pH or salt gradient. In cation-exchange chromatography, proteins with a net positive charge migrate through the matrix more slowly than those with a net negative charge, because the migration of the former is retarded more by interaction with the stationary phase. 396 FIGURE 3-17 Three chromatographic methods used in protein purification. (a) Ion- exchange chromatography exploits differences in the sign and magnitude of the net electric charges of proteins at a given pH. (b) Size-exclusion chromatography, also called gel filtration, separates proteins according to size. (c) Affinity chromatography separates 397 proteins by their binding specificities. Further details of these methods are given in the text. As the protein-containing solution exits a column, successive portions (fractions) of this effluent are collected in test tubes. Each fraction can be tested for the presence of the protein of interest as well as other properties, such as ionic strength or total protein concentration. All fractions positive for the protein of interest can be combined as the product of this chromatographic step of the protein purification. WORKED EXAMPLE 3-1 Ion Exchange of Peptides A biochemist wants to separate two peptides by ion-exchange chromatography. At the pH of the mobile phase to be used on the column, one peptide (A) has a pI of 5.1, due to the presence of more Glu and Asp residues than Arg, Lys, and His residues, and has a net negative charge at neutral pH. Peptide B has a pI of 7.8, reflecting a plurality of positively charged amino acid residues at neutral pH. At neutral pH, which peptide would elute first from a cation-exchange resin? Which would elute first from an anion- exchange resin? SOLUTION: A cation-exchange resin has negative charges and binds positively charged molecules, retarding their progress through the column. 398 Peptide B, with its higher pI and net positive charge, will interact more strongly than peptide A with the cation-exchange resin. Thus, peptide A will elute first. On the anion-exchange resin, peptide B will elute first. Peptide A, having a relatively low pI and a net negative charge, will be retarded by its interaction with the positively charged resin. Figure 3-17 shows two variations of column chromatography in addition to ion exchange. Size-exclusion chromatography, also called gel filtration (Fig. 3-17b), separates proteins according to size. In this method, large proteins emerge from the column sooner than small ones — a somewhat counterintuitive result. The solid phase consists of cross-linked polymer beads with engineered pores or cavities of a particular size. Large proteins cannot enter the cavities and so take a shorter (and more rapid) path through the column, around the beads. Small proteins enter the cavities and are slowed by their more labyrinthine path through the column. Size-exclusion chromatography can also be used to approximate the size of a protein being purified, using methods similar to those described in Figure 3-19. Affinity chromatography is based on binding affinity (Fig. 3-17c). The beads in the column have a covalently attached chemical group called a ligand — a group or molecule that binds to a macromolecule such as a protein. When a protein mixture is added to the column, any protein with affinity for this ligand binds to the beads, and its migration through the matrix is retarded. For example, if the biological function of a protein 399 involves binding to ATP, then attaching a molecule that resembles ATP to the beads in the column creates an affinity matrix that can help purify the protein. Proteins that do not bind to ATP flow more rapidly through the column. Bound proteins are then eluted by a solution containing either a high concentration of salt or a free ligand — in this case, ATP or an analog of ATP. Salt weakens the binding of the protein to the immobilized ligand, interfering with ionic interactions. Free ligand competes with the ligand attached to the beads, releasing the protein from the matrix; the protein product that elutes from the column is o en bound to the ligand used to elute it. Protein purification protocols o en use genetic engineering to fuse additional amino acids or peptides (tags) to the target protein. Affinity chromatography can be used to bind this tag, achieving a large increase in purity in a single step (see Fig. 9-11). In many cases, the tag can be subsequently removed, fully restoring the function of the native protein. Chromatographic methods are typically enhanced by the use of HPLC, or high-performance liquid chromatography. HPLC makes use of high-pressure pumps that speed the movement of the protein molecules down the column; it also uses higher- quality chromatographic materials that can withstand the crushing force of the pressurized flow. By reducing the transit time on the column, HPLC can limit diffusional spreading of protein bands and thus can greatly improve resolution. 400 Choosing the approach to purification of a protein that has not previously been isolated is guided both by established precedents and by common sense. In most cases, several different methods must be used sequentially to purify a protein completely, each separating proteins on the basis of different properties. The choice of methods is somewhat empirical, and many strategies may be tried before the most effective one is found. Researchers can o en minimize trial and error by basing the new procedure on purification techniques developed for similar proteins. Common sense dictates that inexpensive procedures such as salting out be used first, when the total volume and the number of contaminants are greatest. As each purification step is completed, the sample size generally becomes smaller (Table 3-5), making it feasible to use more sophisticated (and expensive) chromatographic procedures at later stages. A purification table documents the success of each step in a purification protocol. In the hypothetical purification shown in Table 3-5, the ratio of the final specific activity (15,000 units/mg) to the starting specific activity (10 units/mg) gives the purification factor (1,500). The percentage of the total activity at the last step (45,000 units) relative to the total activity in the starting material (100,000 units) gives the yield from the purification procedure (45%). TABLE 3-5 A Hypothetical Purification Table for an Enzyme Procedure or step Fraction Total Activity Specific volume protein (units) activity (mL) (mg) (units/mg) 1. Crude cellular 1,400 10,000 100,000 10 extract 401 2. Precipitation with 280 3,000 96,000 32 ammonium sulfate 3. Ion-exchange 90 400 80,000 200 chromatography 4. Size-exclusion 80 100 60,000 600 chromatography 5. Affinity 6 3 45,000 15,000 chromatography Note: All data represent the status of the sample a er the designated procedure has been carried out. “Activity” and “specific activity” are defined on page 90. Proteins Can Be Separated and Characterized by Electrophoresis Protein purification is usually complemented by electrophoresis, an analytical process that allows researchers to visualize and characterize proteins as they are purified. This method does not itself contribute to purification, as electrophoresis o en adversely affects the structure and thus the function of proteins. However, it allows a biochemist to rapidly estimate the number of different proteins in a mixture and the degree of purity of a particular protein preparation. Also, electrophoresis can be used to determine such crucial properties of a protein as its isoelectric point and approximate molecular weight. Electrophoresis of proteins is generally carried out in gels made up of the cross-linked polymer polyacrylamide (Fig. 3-18). The 402 polyacrylamide gel acts as a molecular sieve, slowing the migration of proteins approximately in proportion to their charge-to-mass ratio. Migration may also be affected by protein shape. In electrophoresis, the force moving the macromolecule is the electrical potential, E. The electrophoretic mobility, μ, of a molecule is the ratio of its velocity, V, to the electrical potential. Electrophoretic mobility is also equal to the net charge, Z, of the molecule divided by the frictional coefficient, f, which reflects in part a protein’s shape. Thus, V Z μ = = E f The migration of a protein in a gel during electrophoresis is therefore a function of its size and its shape. 403 FIGURE 3-18 Electrophoresis. (a) Different samples are loaded in wells or depressions at the top of the SDS polyacrylamide gel. The proteins move into the gel when an electric field is applied. The gel minimizes convection currents caused by small temperature gradients, as well as protein movements other than those induced by the electric field. (b) Proteins can be visualized a er electrophoresis by treating the gel with a stain such as Coomassie blue, which binds to the proteins but not to the gel itself. Each band on the gel represents a different protein (or protein subunit); smaller proteins move through the gel more rapidly than larger proteins and therefore are found nearer the bottom of the gel. This gel illustrates purification of the RecA protein of Escherichia coli. The gene for the RecA protein was cloned so that its expression (synthesis of the protein) could be controlled. The first lane shows a set of standard proteins (of known M ), serving as r molecular weight markers. The second and third lanes show, respectively, proteins from E. coli cells before and a er synthesis of RecA protein was induced. The fourth lane shows the proteins in a crude cellular extract. Subsequent lanes (le to right) show the proteins that are present a er successive purification steps. Although the protein looks pure in lane 6, two more steps are needed to remove minor contaminants not evident on the gel. The purified protein is a single polypeptide chain (M r ~ 38,000), as seen in the rightmost lane. The electrophoretic method commonly employed for estimation of purity and molecular weight makes use of the detergent sodium dodecyl sulfate (SDS) (“dodecyl” denoting a 12-carbon chain). A protein will bind about 1.4 times its weight of SDS, nearly one molecule of SDS for each amino acid residue. The sulfate moieties 404 of the bound SDS contribute a large net negative charge, rendering the intrinsic charge of the protein insignificant and conferring on each protein a similar charge-to-mass ratio. In addition, SDS binding partially unfolds proteins, such that most SDS-bound proteins assume a similar rodlike shape. Electrophoresis in the presence of SDS therefore separates proteins almost exclusively on the basis of mass (molecular weight), with smaller polypeptides migrating more rapidly. A er electrophoresis, the proteins are visualized by adding a dye such as Coomassie blue, which binds to proteins but not to the gel itself (Fig. 3-18b). Thus, a researcher can monitor the progress of a protein purification procedure as the number of protein bands visible on the gel decreases a er each new fractionation step. When compared with the positions to which proteins of known molecular weight migrate in the gel, the position of an unidentified protein can provide a good approximation of its molecular weight (Fig. 3-19). If the protein has two or more different subunits, generally the subunits are separated by the SDS treatment, and a separate band appears for each. 405 FIGURE 3-19 Estimating the molecular weight of a protein. The electrophoretic mobility of a protein on an SDS polyacrylamide gel is related to its molecular weight, M. (a) Standard proteins of known r molecular weight are subjected to electrophoresis (lane 1). These marker proteins can be used to estimate the molecular weight of an unknown protein (lane 2). (b) A plot of log Mr of the marker proteins versus relative migration during electrophoresis is linear, which allows the molecular weight of the unknown protein to be read from the graph. (In similar fashion, a set of standard proteins with reproducible retention times on a size-exclusion column can be used to create a standard curve of retention time versus log M. The retention time of an unknown substance on the r column can be compared with this standard curve to obtain an approximate M.) r Isoelectric focusing is a procedure used to determine the isoelectric point (pI) of a protein (Fig. 3-20). A pH gradient is established by allowing a mixture of low molecular weight organic acids and bases (ampholytes; p. 77) to distribute 406 themselves in an electric field generated across the gel. When a protein mixture is applied, each protein migrates until it reaches the pH that matches its pI. Proteins with different isoelectric points are thus distributed differently throughout the gel. FIGURE 3-20 Isoelectric focusing. This technique separates proteins according to their isoelectric points. A protein mixture is placed on a gel strip containing an immobilized pH gradient. With an applied electric field, proteins enter the gel and migrate until each reaches a pH that is equivalent to its pI. Remember that when pH = pI, the net charge of a protein is zero. Combining isoelectric focusing and SDS electrophoresis sequentially in a process called two-dimensional electrophoresis permits the resolution of complex mixtures of proteins (Fig. 3- 21). This is a more sensitive analytical method than either electrophoretic method alone. Two-dimensional electrophoresis 407 separates proteins of identical molecular weight that differ in pI, or proteins with similar pI values but different molecular weights. 408 409 FIGURE 3-21 Two-dimensional electrophoresis. Proteins are first separated by isoelectric focusing in a thin strip gel. The gel is then laid horizontally on a second, slab-shaped gel, and the proteins are separated by SDS polyacrylamide gel electrophoresis. Horizontal separation reflects differences in pI; vertical separation reflects differences in molecular weight. The original protein complement is thus spread in two dimensions. Thousands of cellular proteins can be resolved using this technique. Individual protein spots can be cut out of the gel and identified by mass spectrometry (see Figs 3-28 and 3-29). Unseparated Proteins Are Detected and Quantified Based on Their Functions To purify a protein, it is essential to have a way of detecting and quantifying that protein in the presence of many other proteins at each stage of the procedure. A common target of purification is one or another of the class of proteins called enzymes (Chapter 6). Each enzyme catalyzes a particular reaction that converts one biomolecule (the substrate) to another (the product). The amount of the protein in a given solution or tissue extract can be measured, or assayed, in terms of the catalytic effect the enzyme produces — that is, the increase in the rate at which its substrate is converted to reaction products when the enzyme is present. For this purpose the researcher must know (1) the overall equation of the reaction catalyzed, (2) an analytical procedure for determining the disappearance of the substrate or the appearance of a reaction product, (3) whether the enzyme 410 requires cofactors such as metal ions or coenzymes, (4) the dependence of the enzyme activity on substrate concentration, (5) the optimum pH, and (6) a temperature zone in which the enzyme is stable and has high activity. Enzymes are usually assayed at their optimum pH and at some convenient temperature within the range 25 to 38 ∘ C. Also, very high substrate concentrations are generally used so that the initial reaction rate, measured experimentally, is proportional to enzyme concentration (Chapter 6). By international agreement, 1.0 unit of enzyme activity for most enzymes is defined as the amount of enzyme causing transformation of 1.0 μmol of substrate to product per minute at 25 °C under optimal conditions of measurement (for many enzymes, this definition is inconvenient, and a unit may be defined differently). The term activity refers to the total units of enzyme in a solution. The specific activity is the number of enzyme units per milligram of total protein (Fig. 3-22). The specific activity is a measure of enzyme purity: it increases during purification of an enzyme and becomes maximal and constant when the enzyme is pure (Table 3-5). 411 FIGURE 3-22 Activity versus specific activity. The difference between these terms can be illustrated by considering two flasks containing marbles. The flasks contain the same number of red marbles, but different numbers of marbles of other colors. If the marbles represent proteins, both flasks contain the same activity of the protein, represented by the red marbles. The second flask, however, has the higher specific activity, because red marbles represent a higher fraction of the total. A er each purification step, the activity of the preparation (in units of enzyme activity) is assayed, the total amount of protein is determined independently, and the ratio of the two gives the specific activity. Activity and total protein generally decrease with each step. Activity decreases because there is always some loss due to inactivation or nonideal interactions with chromatographic materials or other molecules in the solution. Total protein decreases because the objective is to remove as much unwanted or nonspecific protein as possible. In a 412 successful step, the loss of nonspecific protein is much greater than the loss of activity; therefore, specific activity increases even as total activity falls. The data are assembled in a purification table similar to Table 3-5. A protein is generally considered pure when further purification steps fail to increase specific activity and when only a single protein species can be detected (for example, by electrophoresis in the presence of SDS). For proteins that are not enzymes, other quantification methods are required. Transport proteins can be assayed by their binding to the molecule they transport, and hormones and toxins by the biological effect they produce; for example, growth hormones will stimulate the growth of certain cultured cells. Some structural proteins represent such a large fraction of a tissue mass that they can be readily extracted and purified without a functional assay. The approaches are as varied as the proteins themselves. SUMMARY 3.3 Working with Proteins Proteins are separated and purified on the basis of differences in their properties. Proteins can be selectively precipitated by changes in pH or temperature, and particularly by the addition of certain salts. A wide range of chromatographic procedures makes use of differences in size, binding affinities, charge, and other properties. These include ion-exchange, size-exclusion, affinity, and high-performance liquid chromatography. 413 Electrophoresis separates proteins on the basis of mass or charge for analytical purposes. SDS gel electrophoresis and isoelectric focusing can be used separately or in combination for higher resolution. All purification procedures require a method for quantifying or assaying the protein of interest in the presence of other proteins. Purification can be monitored by assaying specific activity. 414 3.4 The Structure of Proteins: Primary Structure Purification of a protein is usually only a prelude to a detailed biochemical dissection of its structure and function. What is it that makes one protein an enzyme, another a hormone, another a structural protein, and still another an antibody? How do they differ chemically? The most obvious distinctions are structural, and to protein structure we now turn. We can describe the structure of large molecules such as proteins at several levels of complexity, arranged in a kind of conceptual hierarchy. Four levels of protein structure are commonly defined (Fig. 3-23). A description of all covalent bonds (mainly peptide bonds and disulfide bonds) linking amino acid residues in a polypeptide chain is its primary structure. The most important element of primary structure is the sequence of amino acid residues. Secondary structure refers to particularly stable arrangements of amino acid residues giving rise to recurring structural patterns. Tertiary structure describes all aspects of the three-dimensional folding of a polypeptide. When a protein has two or more polypeptide subunits, their arrangement in space is referred to as quaternary structure. Our exploration of proteins will eventually include complex protein machines consisting of dozens to thousands of subunits. Primary structure is the focus of the remainder of this chapter; we discuss the higher levels of structure in Chapter 4. 415 FIGURE 3-23 Levels of structure in proteins. The primary structure consists of a sequence of amino acids linked together by peptide bonds, and it includes any disulfide bonds. The resulting polypeptide can be arranged into units of secondary structure, such as an α helix. The helix is a part of the tertiary structure of the folded polypeptide, which is itself one of the subunits that make up the quaternary structure of the multisubunit protein, in this case hemoglobin. [Data from PDB ID 1HGA, R. Liddington et al., J. Mol. Biol. 228:551, 1992.] Primary structure now becomes our focus. We first consider empirical clues that amino acid sequence and protein function are closely linked, then describe how amino acid sequence is determined; finally, we outline the many uses to which this information can be put. The Function of a Protein Depends on Its Amino Acid Sequence The bacterium Escherichia coli produces more than 3,000 different proteins; a human has ~20,000 genes that may produce over a million different proteins (through genetic processes discussed in 416 Part III of this text). In both species, each type of protein has a unique amino acid sequence that confers a particular three- dimensional structure. This structure in turn confers a unique function. Amino acid sequences are important elements of the broader realm of biological information. They are a major functional expression of information stored in DNA in the form of genes. The sequences are not at all random. Each protein has a distinctive number and sequence of amino acid residues. As we shall see in Chapter 4, the primary structure of a protein determines how it folds up into its unique three-dimensional structure, and this in turn determines the function of the protein. Some simple observations illustrate the functional importance of primary structure, or the amino acid sequence of a protein. First, as we have already noted, proteins with different functions always have different amino acid sequences. Second, thousands of human genetic diseases have been traced to the production of proteins with less activity or altered activity. The alteration can range from a single change in the amino acid sequence (as in sickle cell disease, described in Chapter 5) to deletion of a larger portion of the polypeptide chain (as in most cases of Duchenne muscular dystrophy: a large deletion in the gene encoding the protein dystrophin leads to production of a shortened, inactive protein). Finally, on comparing functionally similar proteins from different species, we find that these proteins o en have similar 417 amino acid sequences. Thus, a close link between protein primary structure and function is evident. The amino acid sequence for a particular protein is not absolutely fixed, or invariant. Virtually all of the proteins in humans are polymorphic, having amino acid sequence variants in the human population. Many human proteins are polymorphic even within an individual, with amino acid variations occurring due to processes that will be described in Part III of this text. Some of these variations have little or no effect on the function of the protein; others may affect function dramatically. Furthermore, proteins that carry out a broadly similar function in distantly related species can differ greatly in overall size and amino acid sequence. Although the amino acid sequence in some regions of the primary structure might vary considerably without affecting biological function, most proteins contain crucial regions that are essential to their function and thus have sequences that are conserved. The fraction of the overall sequence that is critical varies from protein to protein, complicating the task of relating sequence to three-dimensional structure, and structure to function. Before we can consider this problem further, however, we must examine how sequence information is obtained. In 1953, Frederick Sanger worked out the sequence of amino acid residues in the polypeptide chains of the hormone insulin (Fig. 3- 24), surprising many researchers who had long thought that determining the amino acid sequence of a polypeptide would be a 418 hopelessly difficult task. The elucidation of DNA structure in that same year by Watson and Crick telegraphed a likely relationship between DNA and protein sequences. Barely a decade a er these discoveries, the genetic code relating the nucleotide sequence of DNA to the amino acid sequence of protein molecules was elucidated (Chapter 27). FIGURE 3-24 Amino acid sequence of bovine insulin. The two polypeptide chains are joined by disulfide cross-linkages (yellow). The A chain of insulin is identical in human, pig, dog, rabbit, and sperm whale insulins. The B chains of the cow, pig, dog, goat, and horse are identical. The amino acid sequences of proteins are now most o en derived indirectly from the DNA sequences in genome databases. However, an array of techniques derived from traditional methods of polypeptide sequencing made important contributions to the broader field of protein chemistry. The method used by Sanger to sequence insulin is based on the classical method for direct chemical sequencing of proteins from the amino terminus, the two-step Edman degradation developed by Pehr Edman. 419 Protein Structure Is Studied Using Methods That Exploit Protein Chemistry The sequence of a protein can be predicted from the sequence of the gene encoding it, which is usually available in genomic databases. Direct sequencing can also be provided by mass spectrometry. Many methods used in traditional protein sequencing protocols remain valuable for labeling proteins or breaking them into parts for functional and structural analysis. For example, the amino-terminal α-amino group of a protein can be labeled with 1-fluoro-2,4-dinitrobenzene (FDNB), dansyl chloride, or dabsyl chloride (Fig. 3-25). These reagents also label the ε-amino group of lysine residues. Disulfide bonds within a polypeptide or between polypeptide subunits can be broken irreversibly (Fig. 3-26). 420 FIGURE 3-25 Modification of the α-amino group at the amino terminus. The reaction is a nucleophilic displacement of the halide ion as shown for (a) FDNB and (b) dansyl chloride. The ε-amino group of lysine will also be labeled. Dansyl chloride and (c) dabsyl chloride, another labeling reagent, have useful absorbance and/or fluorescent properties at visible wavelengths. 421 FIGURE 3-26 Breaking disulfide bonds in proteins. Two common methods are illustrated. Oxidation of a cystine residue with performic acid produces two cysteic acid residues. Reduction by dithiothreitol (or β-mercaptoethanol) to form Cys residues must be followed by further modification of the reactive —SH groups to prevent re-formation of the disulfide bond. Carboxymethylation by iodoacetate serves this purpose. 422 Frederick Sanger, 1918–2013 Enzymes called proteases catalyze the hydrolytic cleavage of peptide bonds and provide the most common method to break a protein into parts. Some proteases cleave only the peptide bond adjacent to particular amino acid residues (Table 3-6) and thus fragment a polypeptide chain in a predictabl