Chapter 3: Proteins PDF
Document Details
Uploaded by WellBacklitSchrodinger
New York University Abu Dhabi
Tags
Summary
This chapter details the structure and function of proteins. It discusses the atomic structure of proteins, their function in cells, and the role of amino acid sequences. The chapter also examines the different types of amino acids and their properties.
Full Transcript
115 CHAPTER Proteins 3 When we look at a cell through a microscope or analyze its electrical or biochemi- cal activity, we are...
115 CHAPTER Proteins 3 When we look at a cell through a microscope or analyze its electrical or biochemi- cal activity, we are, in essence, observing proteins. Proteins constitute most of a IN THIS CHAPTER cell’s dry mass. They are not only the cell’s building blocks; they also execute the majority of the cell’s functions. Proteins that are enzymes provide the intricate The Atomic Structure of Proteins molecular surfaces inside a cell that catalyze its many chemical reactions. Pro- teins embedded in the plasma membrane form channels and pumps that control Protein Function the passage of small molecules into and out of the cell. Other proteins carry mes- sages from one cell to another or act as signal integrators that relay sets of signals inward from the plasma membrane to the cell nucleus. Yet others serve as tiny molecular machines with moving parts: kinesin, for example, propels organelles through the cytoplasm; topoisomerase can untangle knotted DNA molecules. Other specialized proteins act as antibodies, toxins, hormones, antifreeze mole- cules, elastic fibers, ropes, or sources of luminescence. Before we can hope to understand how genes work, how muscles contract, how nerves conduct elec- tricity, how embryos develop, or how our bodies function, we must attain a deep understanding of proteins. THE ATOMIC STRUCTURE OF PROTEINS From a chemical point of view, proteins are by far the most structurally complex and functionally sophisticated molecules known. This is perhaps not surpris- ing, once we realize that the structure and chemistry of each protein have been developed and fine-tuned over billions of years of evolutionary history. The theoretical calculations of population geneticists reveal that, over evo- lutionary time periods, a surprisingly small selective advantage is enough to cause a randomly altered protein sequence to spread through a population of organisms. Yet, even to experts, the remarkable versatility of proteins can seem truly amazing. In this section, we consider how the location of each amino acid in a protein’s long string of amino acids determines its three-dimensional shape. Later in the chapter, we use this understanding of protein structure at the atomic level to describe how the precise shape of each protein molecule determines its function in a cell. The Structure of a Protein Is Specified by Its Amino Acid Sequence There are 20 different types of amino acids in proteins that are encoded directly in an organism’s DNA, each with different chemical properties. Every protein mole- cule consists of a long unbranched chain of these amino acids, each linked to its neighbor through a covalent peptide bond (Figure 3–1A). Proteins are therefore also known as polypeptides. Each type of protein has a unique sequence of amino acids, and there are many thousands of different proteins in a cell. The repeating sequence of atoms along the core of the polypeptide chain is referred to as the polypeptide backbone. Attached to this repetitive back- bone are those portions of the amino acids that are not involved in making 116 Chapter 3: Proteins (A) (B) OH amino O O group carboxyl C group polypeptide backbone side chains + + CH2 CH2 H H O H H O – O – amino + carboxyl glycine alanine terminus H N C C N C C N C C N C C terminus (N-terminus) (C-terminus) H H H O H H O PEPTIDE BOND FORMATION WITH CH2 CH2 water REMOVAL OF WATER peptide peptide bond C H bonds CH HN C H3C CH3 HC N side chains + H+ Histidine Aspartic acid Leucine Tyrosine – (His) (Asp) (Leu) (Tyr) peptide bond in glycylalanine Figure 3–1 The components of a protein. (A) Formation of a peptide bond. This covalent bond forms when the carbon atom of the carboxyl group of one amino acid (such as glycine) shares electrons with the nitrogen atom from the amino group of a second amino acid (such as alanine). As indicated, a molecule of water is eliminated in this condensation reaction (see Figure 2–9). In this model, carbon atoms are black, nitrogen blue, oxygen red, and hydrogen white. (B) A two-dimensional representation of a short section of polypeptide backbone with its attached side chains. Each type of protein differs in its sequence and number of amino acids; it is the sequence of the chemically different side chains that makes each protein distinct. The two ends of a polypeptide chain are chemically different: the end carrying the free amino group (NH2, which takes up a proton at neutral pH to become NH3+) is the amino terminus, or N-terminus, and the end carrying the free MBoC7 e4.01.02/3.01 carboxyl group (COOH, which loses a proton at neutral pH to become COO–) is the carboxyl terminus, or C-terminus. Note that, for simplicity, in many figures in this textbook, NH2 and COOH are used to denote these termini, instead of their actual ionized forms. The amino acid sequence of a protein is always presented in the N-to-C direction, reading from left to right. a peptide bond; these are the 20 different amino acid side chains that give each amino acid its unique properties (Figure 3–1B). Some of these side chains are nonpolar and hydrophobic (“water-fearing”), others are negatively or positively charged, some can readily form covalent bonds, and so on. Panel 3–1 (pp. 118–119) shows their atomic structures, and Figure 3–2 lists their abbreviations. AMINO ACID SIDE CHAIN AMINO ACID SIDE CHAIN Aspartic acid Asp D acidic (negative charge) Alanine Ala A nonpolar Glutamic acid Glu E acidic (negative charge) Glycine Gly G nonpolar Arginine Arg R basic (positive charge) Valine Val V nonpolar Lysine Lys K basic (positive charge) Leucine Leu L nonpolar Histidine His H basic (positive charge) Isoleucine Ile I nonpolar Asparagine Asn N uncharged polar Proline Pro P nonpolar Glutamine Gln Q uncharged polar Phenylalanine Phe F nonpolar Serine Ser S uncharged polar Methionine Met M nonpolar Threonine Thr T uncharged polar Tryptophan Trp W nonpolar Tyrosine Tyr Y uncharged polar Cysteine Cys C nonpolar POLAR AMINO ACIDS NONPOLAR AMINO ACIDS Figure 3–2 The 20 amino acids commonly found in proteins. Each amino acid has a three-letter and a one-letter abbreviation. There are equal numbers of polar and nonpolar side chains; however, some side chains listed here as polar are large enough to have some nonpolar properties (for example, Thr, Tyr, Arg, Lys). For atomic structures, see Panel 3–1 (pp. 118–119). MBoC7 m3.02/3.02 THE ATOMIC STRUCTURE OF PROTEINS 117 (A) (B) +180 amino acid beta sheet alpha helix R2 (right-handed) O H left-handed H C Cα N H helix Cα N H C Cα psi 0 phi psi R1 H O R3 peptide bonds –180 –180 0 +180 phi Figure 3–3 Steric limitations on the bond angles in a polypeptide chain. (A) Each amino acid contributes three bonds (red) to the backbone of the chain. Because it has a partial double-bond character, the peptide bond is planar (gray shading) and does not permit free rotation. By contrast, rotation can occur about the Cα–C bond, whose angle of rotation is called psi (Ψ), and about the N–Cα bond, whose angle of rotation is called phi (ϕ). By convention, an R group is often used to denote an amino acid side chain (purple circles). (B) The conformation of the main-chain atoms in a protein is determined by one pair of ϕ and Ψ angles for each amino acid; because of steric restrictions, most of the possible pairs of ϕ and Ψ angles do not occur. In this so-called Ramachandran plot, each dot represents an observed pair of angles in a protein. The three differently shaded clusters of dots reflect three different secondary structures repeatedly found in proteins. Most prominent are the alpha helix and the beta sheet, as will be described in the text. (B, from J. Richardson, Adv. Prot. Chem. 34:174–175, 1981. With permission from Elsevier.) As discussed in Chapter 2, atoms behave almost as if they were hard spheres with a definite radius (their van der Waals radius). Other constraints limit the possible bond angles in a polypeptide chain, and MBoC7 m3.03/3.03 this—plus the requirement that no two atoms overlap—severely restricts the possible three-dimensional arrangements (or conformations) of proteins. As illustrated in Figure 3–3, these steric restrictions (which include a delocalization of electrons in the peptide bond that makes that linkage planar) confine the energy minima for the bond angles in polypeptides to a narrow range. But a long flexible chain such as a protein can still fold in an enormous number of different ways. The folding of a protein chain is determined by many different sets of weak noncovalent bonds that form between one part of the chain and another. These involve atoms in the polypeptide backbone, as well as atoms in the amino acid side chains. There are three types of these weak bonds: hydrogen bonds, elec- trostatic attractions, and van der Waals attractions, as explained in Chapter 2 (see p. 51). Individual noncovalent bonds are 30–300 times weaker than the typi- cal covalent bonds that create biological molecules. But many weak bonds acting in parallel can hold two regions of a polypeptide chain tightly together. It is the combined strength of large numbers of these noncovalent bonds that stabilizes each protein’s folded shape (Figure 3–4). A fourth weak force—a hydrophobic clustering force—also has a central role in determining the shape of a protein. As described in Chapter 2, hydrophobic molecules, including the nonpolar side chains of particular amino acids, tend to be forced together in an aqueous environment in order to minimize their disrup- tive effect on the hydrogen-bonded network of water molecules (see Panel 2–2, pp. 96–97). Therefore, an important factor governing the folding of any protein is the distribution of its polar and nonpolar amino acids. The nonpolar (hydropho- bic) side chains in a protein—belonging to such amino acids as phenylalanine, leucine, valine, and tryptophan—tend to cluster in the interior of the molecule (just as hydrophobic oil droplets coalesce in water to form one large droplet). This enables these side chains to avoid contact with the water that surrounds them inside a cell. In contrast, polar groups—such as those belonging to argi- nine, glutamine, and histidine—tend to arrange themselves near the outside of the molecule, where they can form hydrogen bonds with water and with other polar molecules (Figure 3–5). Any polar amino acids that are left buried within the protein are usually hydrogen-bonded to other polar amino acids or to the polypeptide backbone. 118 PANEL 3–1: The 20 Amino Acids Found in Proteins FAMILIES OF BASIC SIDE CHAINS AMINO ACIDS lysine arginine histidine The common amino acids (Lys, or K) (Arg, or R) (His, or H) are grouped according to H O H O H O whether their side chains are N C C N C C N C C acidic H CH2 H CH2 H CH2 basic uncharged polar CH2 CH2 C nonpolar HN CH CH2 This group is CH2 very basic HC NH+ These 20 amino acids CH2 because its NH are given both three-letter positive charge These nitrogens have a + and one-letter abbreviations. NH3 is stabilized by C relatively weak affinity for an resonance (see +H N NH2 H+ and are only partly positive 2 Thus: alanine = Ala = A Panel 2–1). at neutral pH. THE AMINO ACID The α-carbon atom is asymmetric, OPTICAL ISOMERS allowing for two mirror-image The general formula of an amino acid is (or stereo-) isomers, L and D. α-carbon atom H amino carboxyl H H group H2N C COOH group R NH3+ COO– COO– NH3+ side chain L Cα Cα D R is commonly one of 20 different side chains. At pH 7, both the amino and carboxyl groups are ionized. R R H + H3N C COO R Proteins contain exclusively L-amino acids. PEPTIDE BONDS In proteins, amino acids are joined together by an The four atoms involved in each peptide bond form a rigid amide linkage, called a peptide bond. planar unit (red box). There is no rotation around the C–N bond. H2O H R H O R H O H O H O N C C N C C N C C N C C H OH H OH H OH R H R H H SH peptide bond Proteins are long polymers amino terminus, or of amino acids linked by N-terminus H O CH2 H H peptide bonds, and they +H N C C N C C N C COO– 3 are always written with the N-terminus toward the left. CH2 H O CH carboxyl terminus, or Peptides are shorter, usually C-terminus C CH3 CH3 fewer than 50 amino acids long. The sequence of this tripeptide HN CH These two single bonds allow rapid rotation, so that is histidine–cysteine–valine. long chains of amino acids are very flexible. HC NH+ 119 ACIDIC SIDE CHAINS NONPOLAR SIDE CHAINS alanine valine aspartic acid glutamic acid (Ala, or A) (Val, or V) (Asp, or D) (Glu, or E) H O H O H O H O N C C N C C N C C N C C H CH3 H CH H CH2 H CH2 H3C CH3 C CH2 O O– leucine isoleucine C O O– (Leu, or L) (Ile, or I) H O H O N C C N C C H CH2 H CH CH H3C CH2 UNCHARGED POLAR SIDE CHAINS H3C CH3 CH3 proline phenylalanine asparagine glutamine (Pro, or P) (Phe, or F) (Asn, or N) (Gln, or Q) H O H O H O H O N C C N C C N C C N C C CH2 CH2 H CH2 H CH2 H CH2 (actually an CH2 CH2 imino acid) C O NH2 C O NH2 methionine tryptophan (Met, or M) (Trp, or W) H O H O N C C N C C Although the amide N is not charged at neutral pH, it is polar. H CH2 H CH2 CH2 S CH3 N serine threonine tyrosine H (Ser, or S) (Thr, or T) (Tyr, or Y) glycine cysteine H O H O H O (Gly, or G) (Cys, or C) N C C N C C N C C H O H O H CH2 H CH CH3 H CH2 N C C N C C OH OH H H H CH2 OH SH A disulfide bond (red) can form between two cysteine side The –OH group is polar. chains in proteins. CH2 S S CH2 120 Chapter 3: Proteins glutamic acid H O N C C electrostatic H attractions CH2 + R CH2 C C H hydrogen bond H O O H C N H H O C O N H + C H H CH2 C R N CH2 van der Waals attractions C CH2 R O CH2 H C O H CH3 CH3 C C C H C N CH3 CH3 valine O H H HN C CH3 lysine N C H C H C N C O H H O valine alanine Figure 3–4 Three types of noncovalent bonds help proteins fold. Although a single one of these bonds is quite weak, many of them often act together to create a strong bonding arrangement, as in the example shown. As in the previous figure, R is used as a general designation for an amino acid side chain. MBoC7 e4.04/3.04 unfolded polypeptide nonpolar polar polypeptide side chains side backbone chains polar side chain on the hydrophobic core region outside of the molecule contains nonpolar can form hydrogen side chains bonds to water folded conformation in aqueous environment Figure 3–5 How a protein folds into a compact conformation. The polar amino acid side chains tend to lie on the outside of the protein, where they can interact with water; the nonpolar amino acid side chains are buried on the inside forming a tightly packed hydrophobic core of atoms that are hidden from water. In this highly schematic drawing, the protein contains only 17 amino acids; actual proteins are generally much larger. MBoC7 m3.05/3.05 THE ATOMIC STRUCTURE OF PROTEINS 121 Proteins Fold into a Conformation of Lowest Energy As a result of all of these interactions, most proteins have a particular three- dimensional structure, which is determined by the order of the amino acids in a protein’s chain. The final folded structure, or conformation, of any polypep- tide chain is generally the one that minimizes its free energy. Biologists have studied protein folding in a test tube using highly purified proteins. Treatment with certain solvents, which disrupt the noncovalent interactions holding the folded chain together, unfolds, or denatures, a protein. This treatment converts the protein into a flexible polypeptide chain that has lost its natu- ral shape. When the denaturing solvent is removed, the protein often refolds spontaneously, or renatures, into its original conformation. This indicates that the amino acid sequence contains all of the information needed for specifying the three-dimensional shape of a protein, a critical point for understanding cell biology. Most proteins fold up into a single stable conformation. However, this confor- mation is very dynamic, experiencing constant fluctuations caused by thermal energy. In addition, a protein’s conformation can change when the protein interacts with other molecules in the cell. This change in shape is often crucial to the function of the protein, as we explain in detail later. Although a protein chain can fold into its correct conformation without out- side help, special proteins called molecular chaperones often assist in protein folding (see Chapter 6). Molecular chaperones bind to partly folded polypeptide chains and help them progress along the most energetically favorable folding pathway. In the crowded conditions of the cytoplasm, chaperones are required to prevent the temporarily exposed hydrophobic regions in newly synthesized protein chains from associating with each other to form protein aggregates. However, the final three-dimensional shape of the protein is still specified by its amino acid sequence: chaperones simply make reaching the folded state more reliable. The α Helix and the β Sheet Are Common Folding Motifs When we compare the three-dimensional structures of many different protein molecules, it becomes clear that, although the overall conformation of each protein is unique, two regular folding patterns are often found within them. Both patterns were discovered 70 years ago from studies of hair and silk. The first folding pattern to be described, called the ! helix, was found in the protein !-keratin, which forms the filaments in hair. Within a year of the dis- covery of the α helix, a second folded structure, called a " sheet, was found in the protein fibroin, the major constituent of silk. These two patterns are common because they result from hydrogen-bonding between the N}H and C O groups in the polypeptide backbone, without involving the side chains of the amino acids. Thus, although incompatible with some amino acid side chains, many different amino acid sequences can form them. In each case, the protein chain adopts a regular, repeating conformation. Figure 3–6 illustrates the detailed structures of these two important conformations, which in ribbon models of proteins are represented by a helical ribbon and by a set of aligned arrows, respectively. The cores of many proteins contain extensive regions of β sheet. As shown in Figure 3–7, these β sheets can form either from neighboring segments of the polypeptide backbone that run in the same orientation (parallel chains) or from a polypeptide backbone that folds back and forth upon itself, with each section of the chain running in the direction opposite to that of its immediate neigh- bors (antiparallel chains). Both types of β sheet produce a very rigid structure, held together by hydrogen bonds that connect the peptide bonds in neighboring chains (see Figure 3–6C). 122 Chapter 3: Proteins α helix β sheet peptide amino acid bond carbon R R side chain oxygen R nitrogen R R R R hydrogen oxygen R hydrogen R R R R bond hydrogen 0.54 nm R bond R R R R carbon R hydrogen carbon R R R R carbon (C) R nitrogen amino acid nitrogen side chain R (A) (B) (D) 0.7 nm Figure 3–6 The regular conformation of the polypeptide backbone in the ! helix and the " sheet. The α helix (alpha helix) is shown in (A) and (B). The N}H of every peptide bond is hydrogen-bonded to the C O of a neighboring peptide bond located four peptide bonds away in the same chain. Note that all of the N}H groups point up in this diagram and that all of the C O groups point down (toward the C-terminus); this gives a polarity to the helix, with the C-terminus having a partial negative and the N-terminus a partial positive charge (Movie 3.1). The β sheet (beta sheet) is shown in (C) and (D). In this example, adjacent peptide chains run in opposite (antiparallel) directions. Hydrogen-bonding between peptide bonds in different strands holds the individual polypeptide chains (strands) MBoC7 together in a β sheet, and the amino acid side chains in m3.07/3.06 each strand alternately project above and below the plane of the sheet. By convention, when arrows are used to represent a β sheet, the arrowheads point toward the C-terminus (Movie 3.2). (A) and (C) show all the atoms in the polypeptide backbone, but the amino acid side chains are truncated and denoted by R. (It has long been a convention to use R in this way.) In contrast, (B) and (D) show only the carbon and nitrogen backbone atoms. (A) An α helix is generated when a single polypeptide chain twists around on itself to form a rigid cylinder. A hydrogen bond forms between every fourth peptide bond, linking the C O of one peptide bond to the N}H of another (see Figure 3–6A). This gives rise to a regular helix with a complete turn every 3.6 amino acids. Regions of α helix are abundant in proteins located in cell membranes, such as transport proteins and receptors. As we discuss in Chapter 10, those portions of a transmembrane protein that cross the lipid bilayer usually cross as α helices (B) composed largely of amino acids with nonpolar side chains. The polypep- tide backbone, which is hydrophilic, is hydrogen-bonded to itself in the α helix and shielded from the hydrophobic lipid environment of the membrane by its protruding nonpolar side chains (see Figure 10–19). In other proteins, α helices can wrap around each other to form a particularly stable structure, known as a coiled-coil. This structure can form when the two (or in some cases, three or four) α helices have most of their nonpolar (hydrophobic) side chains on one side, so that they can twist around each other with these side chains facing inward (Figure 3–8). Long rodlike coiled-coils provide the structural Figure 3–7 Two types of " sheet structures. framework for many elongated proteins. Examples are α-keratin, which forms the (A) An antiparallel β sheet (see Figure 3–6C). intracellular fibers that reinforce the outer layer of the skin and its appendages, (B) A parallel β sheet. Both of these and the myosin molecules responsible for muscle contraction. structures are common in proteins. MBoC7 m3.08/3.07 THE ATOMIC STRUCTURE OF PROTEINS 123 g NH 2 c d g a H2N NH2 c d g a stripe of hydrophobic d “a” and “d” amino acids g a 11 nm d e g a d helices wrap around each other to minimize e exposure of hydrophobic amino acid side chains to aqueous environment COOH HOOC COOH 0.5 nm (A) (B) (C) Figure 3–8 A coiled-coil. (A) A single α helix, with successive amino acid side chains labeled in a sevenfold sequence, “abcdefg” (from top to bottom). Amino acids “a” and “d” in such a sequence lie close together on the cylinder surface, forming a “stripe” (green) that winds slowly around the α helix. Proteins that form coiled-coils typically have nonpolar amino acids at positions “a” and “d.” Consequently, as shown in (B), the two α helices can wrap around each other with the nonpolar side chains of one α helix interacting with the nonpolar side chains of the other. (C) The atomic structure of a coiled-coil determined by x-ray crystallography. The α-helical backbone is shown in red and the nonpolar side chains in green, while the more hydrophilic amino acid side chains, shown in gray, are left exposed to the aqueous environment (Movie 3.3). Coiled-coils can also form MBoC7 e4.16/3.08 from three α helices. (PDB code: 3NMD.) Four Levels of Organization Are Considered to Contribute to Protein Structure Scientists have found it useful to define four levels of organization that succes- sively generate the structure of a protein. The first level is the protein’s amino acid sequence, which is known as its primary structure; this sequence is unique for each protein, as determined by the gene that encodes that protein. At the next level, those stretches of the polypeptide chain that form α helices and β sheets constitute the protein’s secondary structure. The full three-dimensional organi- zation of a polypeptide chain—including its α helices, β sheets, and the many twists and turns that form between its N- and C-termini—is referred to as the protein’s tertiary structure. And finally, if a protein molecule is formed as a com- plex of more than one polypeptide chain, its complete conformation is designated as its quaternary structure. Because even a small protein molecule is built from thousands of atoms linked together by precisely oriented covalent and noncovalent bonds, biologists are aided in visualizing these extremely complicated structures by computer-based three-dimensional displays. The student resource site that accompanies this book contains computer-generated images of selected proteins, which can be displayed and rotated on the screen in a variety of formats (Movie 3.4). 124 Chapter 3: Proteins (A) (B) (C) (D) Figure 3–9 Four representations that are commonly used to describe the structure of a protein. Constructed from a string of 100 amino acids, the SH2 domain is part of many different proteins. Here, its structure is displayed as (A) a polypeptide backbone model, (B) a ribbon model, (C) a wire model that includes the amino acid side chains, and (D) a space-filling model (Movie 3.4). Each image is colored in a way that allows the polypeptide chain to be followed from its N-terminus (purple) to its C-terminus (red). (PDB code: 1SHA.) MBoC7 m3.06/3.09 Protein Domains Are the Modular Units from Which Larger Proteins Are Built Proteins come in a wide variety of shapes, and most are between 50 and 2000 amino acids long. Large proteins usually consist of a set of smaller protein domains that are joined together. A domain is a structural unit that folds more or less inde- pendently, being formed from perhaps 40 to 350 contiguous amino acids, and it is a modular unit from which larger proteins are constructed. To display a protein structure in three dimensions, several different represen- tations are conventionally used, each of which emphasizes distinct features. As an example, Figure 3–9 presents four representations of an important protein structure called the SH2 domain. The SH2 domain is present in many different proteins in eukaryotic cells, where it responds to cell signals to cause selected protein molecules to bind to each other, thereby altering cell behavior (see Chapter 15). Contributing to the tertiary structure of this domain are two α helices and a three-stranded, antiparallel β sheet, which are its critical secondary structure elements (see Figure 3–9B). Figure 3–10 presents ribbon models of three differently organized protein domains. As these examples illustrate, the central core of a domain can be con- structed from α helices, from β sheets, or from various combinations of these two fundamental folding elements. THE ATOMIC STRUCTURE OF PROTEINS 125 (A) (B) (C) Figure 3–10 Ribbon models of three different protein domains. (A) Cytochrome b562, a single-domain protein involved in electron transport in mitochondria. This protein is composed almost entirely of α helices. (B) The NAD-binding domain of the enzyme lactate dehydrogenase, which is composed of a mixture of α helices and parallel β sheets. (C) The variable domain of an immunoglobulin (antibody) light chain, composed of a sandwich of two antiparallel β sheets. In these examples, the α helices are shown in green, while strands organized as β sheets are denoted by red arrows. Note how the polypeptide chain generally traverses back and forth across the entire domain, making sharp turns (Movie 3.5) only at the protein surface. It is the protruding loop regions (yellow) that often form the binding sites for other molecules. MBoC7 m3.11/3.10 The different domains of a protein are often associated with different func- tions. Figure 3–11 shows an example—the Src protein kinase, which functions in signaling pathways inside vertebrate cells (Src is pronounced “sarc”). This protein is considered to have three domains: its SH2 and SH3 domains have regulatory roles—responding to signals that turn the kinase on and off—while its C-terminal domain is responsible for the kinase catalytic activity. Later in the chapter, we shall return to this protein to explain how proteins can form molecular switches that transmit information throughout cells. SH3 domain ATP N C SH2 domain (B) (A) Figure 3–11 A protein formed from multiple domains. In the Src protein shown, a C-terminal domain with two lobes (yellow and orange) forms the core protein kinase enzyme, while its SH2 and SH3 domains perform regulatory functions. Note that both the SH2 and SH3 domains derive their names from this protein, being abbreviations for “Src homology 2” and “Src homology 3,” respectively. (A) A ribbon model, with ATP substrate in red. (B) A space-filling model, with ATP substrate in red. Note that the site that binds ATP is positioned at the interface of the two lobes that form the kinase domain. The human genome encodes about 300 different SH3 domains and 120 SH2 domains. The structure of the SH2 domain was illustrated in Figure 3–9. (PDB code: 2SRC.) MBoC7 m3.10/3.11 126 Chapter 3: Proteins Figure 3–12 A folded protein molecule exists as an ensemble of closely related substructures, or conformers, N as displayed here for ubiquitin. (A) A ribbon model that displays the structure of ubiquitin. Ubiquitin is a small protein widely used in cells, often being covalently attached to larger proteins, as described in Chapters 6 and 15. (B) In this diagram, a set of backbone conformations determined for ubiquitin has been overlaid to reveal regions that rapidly transition between different substructures. Superimposed on these structures are the rates of motion of the protein’s atoms, as observed in NMR residual dipolar coupling experiments. A color code has been used to indicate the magnitude of these rates, which are largest (A) C (B) for red, with orange and yellow also being high. (A, PDB code 1UBI; B, from O.F. Lange et al., Science 320:1471–1475, 2008. With permission from AAAS.) Proteins Also Contain Unstructured Regions The smallest protein molecules contain only a single domain, whereas larger pro- MBoC7 n3.150/3.12 teins can contain several dozen domains, often connected to each other by short, relatively unstructured lengths of polypeptide chain that can act as flexible hinges between domains. The ubiquity of such intrinsically disordered sequences, which continually bend and flex due to thermal buffeting, became appreciated only after bioinformatics methods were developed that could recognize them from their amino acid sequences. Current estimates suggest that a third of all eukaryo- tic proteins also possess longer, intrinsically disordered regions (IDRs)—greater than 30 amino acids in length—in their polypeptide chains. These intrinsically disordered regions can be very long, and they have important functions in cells, as discussed later in this chapter. All Protein Structures Are Dynamic, Interconverting Rapidly Between an Ensemble of Closely Related Conformations Because of Thermal Energy Even though a protein has folded into a conformation of lowest free energy, this conformation is always being subjected to thermal bombardment from the Brownian motions of the many molecules that constantly collide with it. Thus the atoms in the protein are always moving, which causes neighboring regions of the protein to oscillate in concerted ways. These motions can now be precisely traced using special NMR techniques, as illustrated in Figure 3–12 for the small protein ubiquitin. From recent studies combining many types of analyses, we know that protein function exploits these rapid fluctuations—as when a loop on the surface of a protein flips out to expose a binding site for a second molecule. In fact, the function of a protein is generally dependent on that protein’s dynamic character, as we explain later when we discuss protein function in detail. Function Has Selected for a Tiny Fraction of the Many Possible Polypeptide Chains Because each of the 20 amino acids is chemically distinct and each can, in prin- ciple, occur at any position in a protein chain, there are 20 × 20 × 20 × 20 = 160,000 different possible polypeptide chains four amino acids long, or 20n differ- ent possible polypeptide chains n amino acids long. For a typical protein length of about 300 amino acids, a cell could theoretically make more than 10390 (20300) different polypeptide chains. This is such an enormous number that to produce THE ATOMIC STRUCTURE OF PROTEINS 127 just one molecule of each kind would require many more atoms than exist in the universe. Only a very small fraction of this vast set of conceivable polypeptide chains would adopt a stable three-dimensional conformation—by some estimates, less than one in a billion. And yet the majority of proteins present in cells do adopt unique and stable conformations. How is this possible? The answer lies in natu- ral selection. A protein with an unpredictably variable structure and biochemical activity is unlikely to help the survival of a cell that contains it. Such proteins would therefore have been eliminated by natural selection through the enormously long trial-and-error process that underlies biological evolution. Because evolution has selected for protein function in living organisms, present-day proteins have chemical properties that enable the protein to per- form a particular catalytic or structural function in the cell. Proteins are so precisely built that the change of even a few atoms in one amino acid can some- times disrupt the structure of the whole molecule so severely that all function is lost. And, as discussed later in this chapter, when certain rare protein mis- folding accidents occur, the results can be disastrous for the organisms that contain them. HOOC Proteins Can Be Classified into Many Families NH2 Once a protein had evolved that folded up into a stable conformation with use- ful properties, its structure was often modified during evolution to enable it to perform new functions. As we will discuss in Chapter 4, this process has been greatly accelerated by genetic mechanisms that duplicate genes accidentally, which allows gene copies to evolve independently to perform new functions. elastase Because this type of event occurred frequently in the past, present-day proteins can be grouped into protein families, each family member having an amino acid sequence and a three-dimensional conformation that resemble those of the other family members. Consider, for example, the serine proteases, a large family of protein-cleaving (proteolytic) enzymes that includes the digestive enzymes chymotrypsin, trypsin, and elastase, as well as several proteases involved in blood clotting. When the HOOC protease portions of any two of these enzymes are compared, parts of their amino acid sequences are found to match. The similarity of their three-dimensional conformations is even more striking: most of the detailed twists and turns in their polypeptide chains, which are several hundred amino acids long, are vir- NH2 tually identical (Figure 3–13). The many different serine proteases nevertheless have distinct enzymatic activities, each cleaving different proteins or the peptide bonds between different types of amino acids. Each therefore performs a distinct function in an organism. chymotrypsin The story we have told for the serine proteases could be repeated for hundreds of other protein families. In general, the structure of the different members of a protein family has been more highly conserved than has the amino acid sequence. Figure 3–13 A comparison of the In many cases, the amino acid sequences have diverged so far that we cannot be conformations of two serine proteases. certain of a family relationship between two proteins without determining their The backbone conformations of elastase and chymotrypsin. Although only those three-dimensional structures. The yeast α2 protein and the Drosophila engrailed MBoC7 amino acids in them3.12/3.13 polypeptide chain protein, for example, are both transcription regulatory proteins in the homeo- shaded in green are the same in the two domain family (discussed in Chapter 7). Because they are identical in only 17 proteins, the two conformations are very of the 60 amino acids of their homeodomain, their relationship became certain similar nearly everywhere. The active site of only by comparing their three-dimensional structures (Figure 3–14). Many simi- each enzyme is circled in red; this is where the peptide bonds of the proteins that lar examples show that two proteins with more than 25% identity in their amino serve as substrates are bound and cleaved acid sequences usually share the same overall structure. by hydrolysis. The serine proteases derive The various members of a large protein family often have distinct functions. their name from the amino acid serine, Mutation is a random process. Some of the amino acid changes that make family whose side chain is part of the active site members different were selected in the course of evolution because they resulted of each enzyme and directly participates in the cleavage reaction. The two dots on in useful changes in biological activity; these give the individual family members the right side of the chymotrypsin molecule the different functional properties they have today. Other amino acid changes mark the new ends created when this were effectively “neutral,” having neither a beneficial nor a damaging effect on enzyme cuts its own backbone. 128 Chapter 3: Proteins (A) (B) helix 2 helix 3 helix 1 NH2 COOH (C) yeast G H R F T K E N V R I L E S W F A K N I E N P Y L D T K G L E N L MK N T S L S R I Q I K NWV S N R R R K E K T I H2N COOH R T A F S S E O L A R L K R E F N E N - - - R Y L T E R R R QQ L S S E L G L N E AQ I K I WF QN K R A K I K K S Drosophila Figure 3–14 A comparison of a class of DNA-binding domains, called homeodomains, in a pair of proteins from two organisms separated by more than a billion years of evolution. (A) A ribbon model of the structure common to both proteins. (B) A trace of the α-carbon positions. The three-dimensional structures shown were determined by x-ray crystallography for the yeast α2 protein (green) and the Drosophila engrailed protein (red). (C) A comparison of amino acid sequences for the region of the proteins shown in A and B. Black dots mark sites with identical amino acids. Green shading has been used to mark the three α helices shown in A. Orange dots indicate the position of a three-amino-acid insert in the α2 protein. (Adapted from C. Wolberger et al., Cell 67:517–528, 1991.) the basic structure and function of the protein. In addition, because mutation is random, there must also have been many deleterious changes that altered the MBoC7 m3.13/3.14 three-dimensional structure of these proteins sufficiently to make them useless. Such faulty proteins would have been readily lost during evolution. Protein families are readily recognized when the genome of any organism is sequenced; for example, the determination of the DNA sequence for the entire human genome has revealed that we contain about 20,000 protein-coding genes. Through sequence comparisons, we can assign the products of more than half of our protein-coding genes to known protein structures belonging to more than 500 different protein families. Most of the proteins in each family have evolved to perform somewhat different functions, as for the enzymes elastase and chymo- trypsin illustrated previously in Figure 3–13. These family members are sometimes called paralogs to distinguish them from orthologs—those evolutionarily related proteins that have the same function in different organisms (such as the mouse elastase and human elastase enzymes). The current database of known protein sequences contains more than 100 million entries, and it is growing very rapidly as more and more genomes are sequenced—revealing huge numbers of new genes that encode proteins. The encoded polypeptides range widely in size, from 6 amino acids to a gigantic pro- tein of 34,000 amino acids (titin, a structural protein in muscle). As described in Chapters 8 and 9, because of the powerful techniques of x-ray crystallography, nuclear magnetic resonance (NMR), and cryo-electron micros- copy, we now know the three-dimensional shapes, or conformations, of more than 100,000 of these proteins. By carefully comparing the conformations of these proteins, structural biologists (that is, experts on the structure of biological molecules) have concluded that there are a limited number of ways in which protein domains usually fold up in nature—estimated to be about 2000, if we consider all organisms. For most of these so-called protein folds, representative structures have been determined. Protein comparisons are important because related structures often imply related functions. Many years of experimentation can be saved by discovering that a new protein has an amino acid sequence similarity with a protein of known THE ATOMIC STRUCTURE OF PROTEINS 129 function. Such sequence relationships, for example, first indicated that certain EGF genes that cause mammalian cells to become cancerous encode protein kinases H2N COOH (discussed in Chapter 20). CHYMOTRYPSIN H2N COOH Some Protein Domains Are Found in Many Different Proteins UROKINASE H2N COOH As previously stated, most proteins are composed of a series of protein domains FACTOR IX in which different regions of the polypeptide chain fold independently to form H2N COOH compact structures. Such multidomain proteins are believed to have originated from the accidental joining of the DNA sequences that encode each domain, cre- PLASMINOGEN H2N COOH ating a new gene. In an evolutionary process called domain shuffling, many large proteins have evolved through the joining of preexisting domains in new com- Figure 3–15 Domain shuffling. An binations (Figure 3–15). Novel binding surfaces have often been created at the extensive shuffling of blocks of protein juxtaposition of domains, and many of the functional sites where proteins bind to sequence (protein domains) has occurred small molecules are found to be located there. during protein evolution. Those portions of a protein denoted by the same shape A subset of protein domains has been especially mobile during evolution; and color in this diagram are evolutionarily these seem to have particularly versatile structures and are sometimes referred related. Serine proteases such as chymotrypsin to as protein modules. The structure of one such module, the SH2 domain, was are formed from two domains (brown). In featured in Figure 3–9. Three other abundant protein domains are illustrated in the three other proteases shown, which are highly regulated andMBoC7 morem3.14/3.15 specialized, Figure 3–16. these two protease domains are connected Each of these three domains has a stable core structure formed from strands to one or more domains that are similar to of β sheets, from which less-ordered loops of polypeptide chain protrude. The domains found in epidermal growth factor loops are ideally situated to form binding sites for other molecules, as most clearly (EGF; green), to a calcium-binding protein demonstrated for the immunoglobulin fold, which forms the basis for antibody (yellow), or to a kringle domain (blue). molecules. Such β sheet–based domains may have achieved their evolutionary Chymotrypsin is illustrated in Figure 3–13. success because they provide a convenient framework for the generation of new binding sites for ligands, requiring only small changes to their protruding loops (see Figure 3–40). A second feature of these protein domains that explains their utility is the ease with which they can be integrated into other proteins. Two of the three domains illustrated in Figure 3–16 have their N- and C-terminal ends at oppo- site poles of the domain. When the DNA encoding such a domain undergoes tandem duplication, which is not unusual in the evolution of genomes (discussed in Chapter 4), the duplicated domains with this in-line arrangement can be read- ily linked in series to form extended structures—either with themselves or with 1 nm immunoglobulin fibronectin kringle module type 3 module module Figure 3–16 The three-dimensional structures of three commonly used protein domains. In these ribbon diagrams, β-sheet strands are shown as arrows, and the N- and C-termini are indicated by red spheres. Many more such “protein modules” exist in nature. (Adapted from D.J. Leahy et al., Science 258:987–991, 1992. With permission from AAAS.) MBoC7 m3.15/3.16 130 Chapter 3: Proteins other in-line domains (Figure 3–17). Stiff extended structures composed of a series of domains are especially common in extracellular matrix molecules and in the extracellular portions of cell-surface receptor proteins. Other frequently used domains, including the SH2 domain and the kringle domain in Figure 3–16, are of a plug-in type, with their N- and C-termini close together. After genomic rearrangements, such domains are usually accommodated as an insertion into a loop region of a second protein. A comparison of the relative frequency of domain utilization in different eukaryotes reveals that for many common domains, such as protein kinases, this frequency is similar in organisms as diverse as yeast, plants, worms, flies, and humans. But there are some notable exceptions, such as the major histocom- patibility complex (MHC) antigen-recognition domain (see Figure 24–36) that is present in 57 copies in humans, but absent in the other four organisms just mentioned. Domains such as these have specialized functions that are not shared with the other eukaryotes; they are assumed to have been strongly selected for during recent evolution to produce the multiple copies observed. The Human Genome Encodes a Complex Set of Proteins, Revealing That Much Remains Unknown The result of sequencing the human genome has been surprising, because it reveals that our chromosomes contain only about 20,000 protein-coding genes. On the basis of this number alone, we would appear to be no more complex than the tiny mustard weed, Arabidopsis, and only about 1.3-fold more complex than a nematode worm. The genome sequences also reveal that vertebrates have inherited nearly all of their protein domains from invertebrates—with only 7% of (A) (B) identified human domains being vertebrate specific. Figure 3–17 An extended structure formed Each of our proteins is on average more complicated, however (Figure 3–18). from a series of protein domains. Four Domain shuffling during vertebrate evolution has given rise to many novel fibronectin type 3 domains (see Figure 3–16) combinations of protein domains, with the result that there are nearly twice from the extracellular matrix molecule as many combinations of domains found in human proteins as in a worm or fibronectin are illustrated in (A) ribbon and (B) space-filling models. (Adapted from a fly. This extra variety in our proteins greatly increases the range of protein– D.J. Leahy et al., Cell 84:155–164, 1996.) protein interactions possible, but how it contributes to making us human is MBoC7 m3.16/3.17 not known. The complexity of living organisms is staggering, and it is quite sobering to note that we currently lack even the tiniest hint of what the function might be for more than 10,000 of the proteins that have been identified through exam- ining the human genome. There are certainly enormous challenges ahead for the next generation of cell biologists, with no shortage of fascinating mysteries to solve. Protein Molecules Often Contain More Than One Polypeptide Chain The same weak noncovalent bonds that enable a protein chain to fold into a specific conformation also allow proteins to bind to each other to produce larger structures in the cell. Any region of a protein’s surface that can interact with another molecule through sets of noncovalent bonds is called a binding site. yeast A protein can contain binding sites for various large and small molecules. If a binding site recognizes the surface of a second protein, the tight binding of two Ep1 PHD PHD Ep2 folded polypeptide chains at this site creates a larger protein molecule with a precisely defined geometry. Each polypeptide chain in such a protein is called worm Ep1 PHD PHD Ep2 Br Figure 3–18 Domains in a group of evolutionarily related proteins that have a similar function. human In general, there is a tendency for the proteins in more complex organisms, such as humans, to contain additional domains compared to a less complex organism such as yeast—as is the case for Znf Ep1 PHD PHD Ep2 Br BMB the DNA-binding protein compared here. THE ATOMIC STRUCTURE OF PROTEINS 131 dimer of the CAP protein tetramer of neuraminidase protein dimer formed by tetramer formed by interaction between interactions between a single, identical two nonidentical binding binding site on each sites on each monomer (A) monomer (B) Figure 3–19 Many protein molecules contain multiple copies of the same protein subunit. (A) A symmetrical dimer. The CAP protein, a bacterial transcription regulatory protein, is a complex of two identical polypeptide chains. (B) A symmetrical homotetramer. The enzyme neuraminidase exists as a ring of four identical polypeptide chains. For both A and B, a small schematic below the structure emphasizes how the repeated use of the same binding interaction forms the structure. In A, the use of the same binding site on each monomer (represented by brown and green ovals) causes the formation of a symmetrical dimer. In B, a pair of nonidentical binding sites (represented by orange circles and blue squares) causes the formation of a symmetrical tetramer. MBoC7 e4.23/3.19 a protein subunit. And the precise way that these subunits are arranged creates the protein’s quaternary structure—as introduced previously. In the simplest case, two identical, folded polypeptide chains form a sym- metrical complex of two protein subunits (called a dimer) that is held together by interactions between two identical binding sites. (Figure 3–19A). Symmetrical protein complexes that are formed from more than two copies of the same poly- peptide chain are also commonly found in cells (Figure 3–19B). Many other proteins contain two or more types of polypeptide chains. Hemoglobin, the protein that carries oxygen in red blood cells, contains two identical α-globin subunits and two identical β-globin subunits, symmetrically arranged (Figure 3–20). Such multisubunit proteins can be very large (Movie 3.6). Some Globular Proteins Form Long Helical Filaments The proteins that we have discussed so far are globular proteins, in which the β polypeptide chain folds up into a compact shape like a ball with an irregular β surface. Some of these protein molecules can nevertheless assemble to form filaments that may span the entire length of a cell. Most simply, a long chain of identical protein molecules can be constructed if each molecule has a Figure 3–20 Hemoglobin is a protein formed as a symmetrical assembly using two each of two different subunits. This abundant, oxygen-carrying protein in red blood cells contains two copies of α-globin (green) and two copies of β-globin (blue). Each of these four polypeptide chains contains a heme molecule (red), which is the site that binds oxygen (O2). Thus, each molecule of α α hemoglobin carries four molecules of oxygen. (PDB code: 2DHB.) 132 Chapter 3: Proteins Figure 3–21 Protein assemblies. (A) A protein with just one binding site can form a dimer with (A) another identical protein. (B) Identical proteins with two different binding sites often form a long free assembled helical filament. (C) If the two binding sites are disposed appropriately in relation to each other, the subunits structures protein subunits may form a closed ring instead of a helix. (For an example of A, see Figure 3–19A; dimer for an example of B, see Figure 3–22; for an example of C, see Figure 14–32.) binding site (B) helix binding site complementary to another region of the surface of the same mol- ecule (Figure 3–21). An actin filament, for example, is a long helical structure produced from many molecules of the protein actin (Figure 3–22). Actin is a binding globular protein that is very abundant in eukaryotic cells, where it forms one of sites the major filament systems of the cytoskeleton (discussed in Chapter 16). (C) We will encounter many helical structures in this book. Why is a helix such a ring common structure in biology? As we have seen, biological structures are often formed by linking similar subunits into long, repetitive chains. If all the subunits binding are identical, the neighboring subunits in the chain can often fit together in only sites one way, adjusting their relative positions to minimize the free energy of the con- tact between them. As a result, each subunit is positioned in exactly the same way in relation to the next, so that subunit 3 fits onto subunit 2 in the same way that subunit 2 fits onto subunit 1, and so on. Because it is very rare for subunits to join up in a straight line, this arrangement generally results in a helix—a regular struc- ture that resembles a spiral staircase, as illustrated in Figure 3–23. Depending on the twist of the staircase, a helix is said to be either right-handed or left-handed (see Figure 3–23E). Handedness is not affected by turning the helix upside down, but it is reversed if the helix is reflected in the mirror. The observation that helices occur commonly in biological structures holds true whether the subunits are small molecules linked together by covalent bonds (for example, the amino acids in an α helix) or large protein molecules that are linked by noncovalent forces (for example, the actin molecules in actin filaments). This is not surprising. A helix is an unexceptional structure, and it is actin molecule generated simply by placing many similar subunits next to each other, each in the same strictly repeated relationship to the one before; that is, with a fixed minus end rotation followed by a fixed translation along the helix axis. MBoC7 m3.20/3.21 Protein Molecules Can Have Elongated, Fibrous Shapes Enzymes tend to be globular proteins: even though many are large and compli- cated, with multiple subunits, most have an overall rounded shape. In Figure 3–22, we saw that a globular protein can associate to form long filaments. But some functions require that an individual protein molecule span a large distance. These fibrous proteins generally have a relatively simple, elongated three-dimensional structure. One large family of intracellular fibrous proteins consists of α-keratin, intro- duced when we described the α helix. Keratin filaments are extremely stable and are the main component in long-lived structures such as hair, horn, and nails. An α-keratin molecule is a dimer of two identical subunits, with the long α helices of each subunit forming a coiled-coil (see Figure 3–8). The coiled-coil regions are 37 nm capped at each end by globular domains containing binding sites. This enables this type of protein to assemble into ropelike intermediate filaments—an import- ant component of the cytoskeleton that creates the cell’s internal structural framework (see Figure 16–62). plus end Fibrous proteins are especially abundant outside the cell, where they are a main (A) 50 nm (B) component of the gel-like extracellular matrix that helps to bind collections of cells together to form tissues. Cells secrete extracellular matrix proteins into their Figure 3–22 Globular actin monomers surroundings, where they often assemble into sheets or long fibrils. Collagen is the assemble to produce an actin filament. (A) Transmission electron micrographs of most abundant of these proteins in animal tissues. A collagen molecule consists of negatively stained actin filaments. (B) The three long polypeptide chains, each containing the nonpolar amino acid glycine helical arrangement of actin molecules in an at every third position. This regular structure allows the chains to wind around actin filament. (A, courtesy MBoC7 of Roger Craig.) m3.21/3.22 THE ATOMIC STRUCTURE OF PROTEINS 133 Figure 3–23 Some properties of a helix. (A–D) A helix forms when a series of subunits (here represented by rectangular bricks) bind to each other in a regular way. At the top, each of these helices is viewed from directly above the helix and seen to have two (A), three (B), and six (C and D) subunits per helical turn. Note that the helix in D has a wider path than that in C but the same number of subunits per turn. (E) As discussed in the text, a helix can be either right-handed or left-handed. As a reference, it is useful to remember that standard metal screws, which insert when turned clockwise, are right-handed. Note that a helix retains the same handedness when it is turned upside down. left- right- handed handed (A) (B) (C) (D) (E) one another to generate a long, regular triple helix (Figure 3–24). Many collagen molecules then bind to one another side-by-side and end-to-end to create long MBoC7 e4.14/3.23 overlapping arrays—thereby generating the extremely tough collagen fibrils that give connective tissues their tensile strength, as described in Chapter 19. Covalent Cross-Linkages Stabilize Extracellular Proteins Many protein molecules are either attached to the outside of a cell’s plasma mem- brane or secreted to form part of the extracellular matrix. All such proteins are directly exposed to extracellular conditions. To help maintain their structures, the polypeptide chains in such proteins are often stabilized by covalent cross-link- ages. These linkages can either tie together two amino acids in the same protein or join together many polypeptide chains in a large protein complex—as for the collagen fibrils just described. A variety of such cross-links exist, but the most common are covalent sulfur– sulfur bonds. These disulfide bonds (also called S–S bonds) form as cells prepare newly synthesized proteins for export. As described in Chapter 12, their forma- tion is catalyzed in the endoplasmic reticulum by an enzyme that links together short section of 50 nm collagen fibril collagen molecule (300 nm × 1.5 nm) collagen triple 1.5 nm helix (A) Figure 3–24 The fibrous protein collagen. The collagen molecule is a triple helix formed by three extended protein chains that wrap around one another (bottom). In the extracellular space, many rodlike collagen molecules become covalently linked together through their lysine side chains to form collagen fibrils (top) that have the tensile strength of steel. The striping on the collagen fibril is caused by the regular repeating arrangement of the collagen molecules within the fibril. 134 Chapter 3: Proteins cysteine Figure 3–25 Disulfide bonds. Covalent polypeptide 1 disulfide bonds form between adjacent C cysteine side chains. These cross- C CH2 linkages can join either two parts of the CH2 same polypeptide chain or two different SH polypeptide chains. Because the energy S SH required to break one covalent bond is S much larger than the energy required to CH2 interchain break even a whole set of noncovalent C CH2 disulfide bonds (see Table 2–1, p. 51), a disulfide C OXIDATION C bond CH2 C bond can have a major stabilizing effect on a protein (Movie 3.7). CH2 SH REDUCTION S intrachain SH disulfide S bond CH2 CH2 C C polypeptide 2 the –SH groups of two cysteine side chains that are adjacent in the folded pro- MBoC7 e4.30/3.25 tein (Figure 3–25). Disulfide bonds do not change the conformation of a protein but instead act as atomic staples to reinforce its most favored conformation. For example, lysozyme—an enzyme in tears that dissolves bacterial cell walls— retains its antibacterial activity for a long time because it is stabilized by such cross-linkages. Disulfide bonds generally fail to form in the cytosol, where a high concen- tration of reducing agents converts S–S bonds back to cysteine –SH groups. Apparently, proteins do not require this type of reinforcement in the relatively mild environment inside the cell. Protein Molecules Often Serve as Subunits for the Assembly of Large Structures The same principles that enable a protein molecule to associate with itself to form rings or a long filament also operate to generate structures that are formed from a set of different macromolecules, such as enzyme complexes, ribosomes, viruses, and membranes. These much larger objects are not made as single, giant, cova- lently linked molecules. Instead they are formed by the noncovalent assembly of many separately manufactured molecules, which serve as the subunits of the final structure. The use of smaller subunits to build larger structures has several advantages: 1. A large structure built from one or a few repeating smaller subunits requires only a small amount of genetic information. 2. Both assembly and disassembly can be readily controlled reversible pro- cesses, because the subunits associate through multiple bonds of relatively low energy. hexagonally packed sheet Figure 3–26 Single protein subunits form protein assemblies that feature subunit multiple protein–protein contacts. Hexagonally packed globular protein subunits are shown here forming either flat sheets or tubes. Such large structures are tube not considered to be single “molecules.” Instead, like the actin filament described previously, they are viewed as assemblies formed of many different molecules. THE ATOMIC STRUCTURE OF PROTEINS 135 3. Errors in the synthesis of the structure can be more easily avoided, because correction mechanisms can operate during the course of assembly to exclude malformed subunits. To focus on a well-studied example, we can consider how a virus forms from a mixture of proteins and nucleic acids. Some protein subunits are found to assem- ble into flat sheets in which the subunits are arranged in hexagonal patterns, but with a slight change in the geometry of the individual subunits, a hexagonal sheet can be converted into a tube (Figure 3–26) or, with more changes, into a hollow sphere. Protein tubes and spheres that bind specific RNA and DNA molecules in their interior form the coats of viruses. The formation of closed structures, such as rings, tubes, or spheres, pro- vides additional stability because it increases the number of noncovalent bonds betwe