Control of Gene Expression PDF
Document Details
Uploaded by Deleted User
Kyle Andrei Dematera
Tags
Summary
This document provides an overview of gene expression, highlighting the role of RNA and proteins, and illustrating examples of cell differentiation using a side-by-side comparison of neuron and liver cells and experiments on frog cells. It focuses on the synthesis and accumulation of different RNA and protein molecules, and the dynamic nature of gene expression, emphasizing variations in the expression of proteins and mRNAs among diverse cell types.
Full Transcript
The Different Cell Types of a Multicellular Organism Contain the Same DNA The different cell types in a multicellular organism differ dramatically in structure and function. For example, if we compare a mammalian neuron with a liver cell, the differences are so extreme that it is difficult to imag...
The Different Cell Types of a Multicellular Organism Contain the Same DNA The different cell types in a multicellular organism differ dramatically in structure and function. For example, if we compare a mammalian neuron with a liver cell, the differences are so extreme that it is difficult to imagine that the two cells contain the same genome. Figure 1. Side-by-side comparison between a neuron and liver cell. For this reason, and because cell differentiation often seemed irreversible, biologists originally suspected that genes might be selectively lost when a cell differentiates. Cell differentiation occurs when unspecialized cells become specialized to carry out distinct functions. Figure 2. Stem cell differentiation. We now know, however, that cell differentiation generally occurs without changes in the nucleotide sequence of a cell’s genome. The cell types in a multicellular organism become different from one another because they synthesize and accumulate different sets of RNA and protein molecules. The initial evidence that they do this without altering the sequence of their DNA came from a classic set of experiments in frogs. Figure 3. Transplanted nuclei from a frog (from the paper by J. B. Gurdon). Figure 3. Transplanted nuclei from a frog (from the paper by J. B. Gurdon). When the nucleus of a fully differentiated frog cell is injected into a frog egg whose nucleus has been removed, the injected donor nucleus is capable of directing the recipient egg to produce a normal tadpole. Because this tadpole contains a full range of differentiated cells, each of which derived their DNA sequences from the nucleus of the original donor skin cell, that differentiated cell cannot have lost any important DNA sequences. Different Cell Types Synthesize Different Sets of RNAs and Proteins As a first step in understanding cell differentiation, we would like to know how many differences there are between any one cell type and another. Although we still do not have an exact answer for each cell type, we can make several general statements. 1. Many processes are common to all cells, and any two cells in a single organism therefore have many gene products in common. These include the structural proteins of chromosomes, RNA and DNA polymerases, DNA repair enzymes, ribosomal proteins and RNAs, the enzymes that catalyze the central reactions of metabolism, and many of the proteins that form the cytoskeleton such as actin. 2. Some RNAs and proteins are abundant in the specialized cells in which they function and cannot be detected elsewhere, even by sensitive tests. Hemoglobin, for example, is expressed specifically in red blood cells, where it carries oxygen, whereas the enzyme tyrosine aminotransferase (which breaks down tyrosine in food) is expressed in the liver but not in most other tissues. 3. Analyses of RNAs reveal that, at any one time, a typical human cell expresses 30–60% of its approximately 25,000 genes at some meaningful level. There are about 20,000 protein- coding genes and an estimated 5000 noncoding RNA genes in humans. When the patterns of RNA expression in different human cell lines are compared, the level of expression of almost every gene is found to vary from one cell type to another. A few of these differences are striking, but most are much more subtle. 4. Although there are striking differences in the protein-coding RNAs (mRNAs) in specialized cell types, they underestimate the full range of differences in the final pattern of protein production. there are many steps after RNA production at which gene expression can be regulated. The differences in gene expression between cell types are therefore most fully revealed through methods that directly display the levels of proteins, along with their post- translational modifications The Spectrum of mRNAs Present in a Cell Can Be Used to Accurately Identify the Cell Type We have seen that each cell type produces a characteristic set of mRNAs. Therefore, if all the mRNAs present in a cell are known, the cell type can be unambiguously identified, using prior knowledge from cell lines or analyses of tissues. This approach is made possible by the ability to determine the nucleotide sequence of all the mRNAs produced by a single cell. Thus, for example, because human cells have approximately 20,000 mRNA- producing genes, this strategy provides very fine resolution of the differences among our different individual cells. In general, the mRNA approach agrees well with the traditional categorization of cell types that is based on staining and microscopy, but the mRNA strategy has also revealed that many cells that “look” the same can differ significantly in their mRNA content and therefore in their function. The ability to determine the mRNA content of individual cells also provides a new appreciation for how cells present in a tissue (liver, for example) differ according to their positions in the tissue. External Signals Can Cause a Cell to Change the Expression of Its Genes Although the specialized cells in a multicellular organism have characteristic patterns of gene expression, each cell is capable of altering its pattern of gene expression in response to extracellular cues. Suppose a liver cell is exposed to a glucocorticoid hormone, for example. In that case, the production of a set of proteins is dramatically increased. Once the hormone is no longer present, the production of these proteins drops back to its normal, unstimulated level. Other cell types respond to glucocorticoids differently. Fat cells, for example, reduce the production of tyrosine aminotransferase, while some other cell types do not respond to glucocorticoids at all. These examples illustrate a general feature of cell specialization: different cell types can respond very differently to the same extracellular signal. Other features of the gene expression pattern do not change and give each cell type its permanently distinctive character. Gene Expression Can Be Regulated at Many of the Steps in the Pathway from DNA to RNA to Protein There are many steps in the pathway leading from DNA to protein, and all of them can in principle be regulated. A cell can control the proteins it makes by: 1. Controlling when and how often a given gene is transcribed (transcriptional control) 2. Controlling the splicing and processing of RNA transcripts (RNA-processing control) 3. Selecting which completed mRNAs are exported from the nucleus to the cytosol and determining where in the cytosol they are localized (RNA transport and localization control) 4. Selecting which mRNAs in the cytoplasm are translated by ribosomes (translational control) 5. Selectively destabilizing certain mRNA molecules in the cytoplasm (mRNA degradation control) 6. Selectively degrading specific protein molecules (protein degradation control) 7. Activating, inactivating, or localizing specific protein molecules (protein activity control) Figure 4. Steps involved in the control of gene expression. For a cell to determine which of its thousands of genes to transcribe, it must rely on a group of proteins, one that applies to all species on Earth, called transcription regulators. Transcription regulators proteins that recognize specific sequences of DNA (typically 5-12 nucleotide pairs in length) that are often called cis-regulatory sequences, because they must be on the chromosome (that is, in cis) to the genes they control. bind to cis-regulatory sequences which are dispersed throughout genomes, and this binding puts into motion a series of reactions that ultimately specify which genes are to be transcribed and at what rate. Approximately 10% of the protein-coding genes of most organisms are devoted to transcription regulators, making them one of the largest classes of proteins in the cell. A given transcription regulator typically recognizes a specific cis-regulatory sequence that is different from those recognized by the other transcription regulators in the cell. Transcription regulators The transcription of each gene is, in turn, controlled by its unique collection of cis- regulatory DNA sequences, which thus constitute a crucial part of the information coded in genomes. These sequences typically lie near the gene, often in the intergenic region directly upstream from the transcription start point of the gene. Although a few genes are controlled by a single cis-regulatory sequence that is recognized by a single transcription regulator, the majority have complex arrangements of cis-regulatory sequences, each of which is recognized by a different transcription regulator. It is therefore the positions, identity, and arrangement of cis-regulatory sequences that ultimately determine the time and place that each gene is transcribed. The Sequence of Nucleotides in the DNA Double Helix Can Be Read by Proteins The DNA in a chromosome consists of a very long double helix that has both a major and a minor groove Transcription regulators must recognize short, specific cis- regulatory sequences within this structure. When first discovered in the 1960s, it was thought that these proteins might require direct access to the interior of the double helix to distinguish between one DNA sequence and another, analogous to complementary base-pairing. The outside edges of each base pair display distinctive patterns of hydrogen-bond donors, hydrogen-bond acceptors, and hydrophobic patches in both the major and minor grooves, allowing each base to be distinguished from the other three. Figure 5. A space-filling model of the double-helical structure of DNA. Figure 5. Recognition of the base pairs from their edges. The four possible configurations of base pairs are shown, with potential hydrogen-bond donors indicated in blue, potential hydrogen-bond acceptors in red, and hydrogen bonds of the base pairs themselves as a series of short, parallel red lines. Methyl groups, which form hydrophobic protuberances, are shown in yellow, and hydrogen atoms that are attached to carbons, and are therefore unavailable for hydrogen-bonding, are white. Because the major groove is wider and displays more molecular features than does the minor groove, nearly all transcription regulators make the majority of their contacts with the major groove Transcription Regulators Contain Structural Motifs That Can Read DNA Sequences Molecular recognition in biology generally relies on an exact fit between the surfaces of two molecules, and the study of transcription regulators provides some of the clearest examples of this principle. Thus, a transcription regulator recognizes its specific cis-regulatory sequence because the protein’s surface is complementary to the surface features of the double helix that displays that sequence. Each transcription regulator makes a series of contacts with the DNA, involving hydrogen bonds, ionic bonds, and hydrophobic interactions. Although each individual contact is weak, the 20 or so contacts that are typically formed at the protein–DNA interface add together to ensure that the interaction is both highly specific and very strong. In fact, DNA–protein interactions include some of the tightest and most specific molecular interactions known in biology. Although each example of protein–DNA recognition is unique in detail, x-ray crystallographic and nuclear magnetic resonance (NMR) spectroscopic studies of hundreds of transcription regulators reveal that many contain one or another of a small set of DNA-binding structural motifs use either α helices or β sheets to bind to the major groove of DNA, with amino acid side chains that extend from these motifs making their specific DNA contacts. Thus, a given structural motif can be used to recognize many different cis-regulatory sequences depending on the specific side chains that extend from it. Helix-turn-Helix proteins Figure 6. Helix-turn-Helix proteins. Originally identified in bacterial transcription regulators, this motif has since been found in many hundreds of DNA-binding proteins from eukaryotes, bacteria, and archaea. It is constructed from two α helices (blue and red) connected by a short, extended chain of amino acids, which constitutes the “turn”. Helix-turn-Helix proteins Figure 6. Helix-turn-Helix proteins. The two helices are held at a fixed angle, primarily through interactions between the two helices. The more C-terminal helix (in red) is called the recognition helix because it fits into the major groove of DNA; its amino acid side chains, which differ from protein to protein, play an important part in recognizing the specific DNA sequence to which the protein binds. Helix-turn-Helix proteins Figure 6. Helix-turn-Helix proteins. All of the proteins shown here bind DNA as dimers in which the two copies of the recognition helix (in red) are separated by exactly one turn of the DNA helix (3.4 nm); thus, both recognition helices of the dimer can fit into the major groove of DNA. Leucine zipper protein The leucine zipper motif is named because of the way the two α helices, one from each monomer, are joined together to form a short coiled-coil. These proteins bind DNA as dimers where the two long α helices are held together by interactions between hydrophobic amino acid side chains (often on leucines) that extend from one side of each helix. Just beyond the dimerization interface, the two α helices separate from each other to form a Y-shaped structure, which allows their side chains to contact the major groove of DNA. The dimer thus grips the double helix like a clothespin on a clothesline. Figure 7. Leucine zipper protein. Homeodomain proteins Figure 8. Homeodomain proteins. Not long after the first transcription regulators were discovered in bacteria, genetic analyses of the fruit fly Drosophila led to the characterization of an important class of genes, the homeotic selector genes, that play a critical part in orchestrating fly development. Homeodomain proteins Figure 8. Homeodomain proteins. It was later shown that these genes coded for transcription regulators that bound DNA through a structural motif named the homeodomain. Homeodomain proteins Figure 8. Homeodomain proteins. (A) The homeodomain is folded into three α helices, which are packed tightly together by hydrophobic interactions. The part containing helices 2 and 3 closely resembles the bacterial helix–turn–helix motif. Homeodomain proteins Figure 8. Homeodomain proteins. (B) The recognition helix (helix 3, red) forms important contacts with the major groove of DNA. The asparagine (Asn) of helix 3, for example, contacts an adenine. A flexible arm attached to helix 1 forms contacts with nucleotide pairs in the minor groove Homeodomain proteins On the left, a single contact is shown between a transcription regulator and DNA; such contacts allow the protein to “read” the DNA sequence from the outside of the DNA double helix. Figure 9. Transcription regulator binding to a specific DNA sequence. On the right, the complete set of contacts between a transcription regulator (a member of the homeodomain family—see Panel 7–1) and its cis-regulatory sequence is shown. Homeodomain proteins The DNA-binding portion of the protein is 60 amino acids long, and the amino acids that directly contact DNA are numbered beginning with the amino terminus. Figure 9. Transcription regulator binding to a specific DNA sequence. Although the interactions in the major groove are the most important, the protein also contacts both the minor groove and phosphates in the sugar-phosphate DNA backbone. β-sheet DNA recognition proteins Figure 10. β-sheet DNA recognition proteins. In other DNA-binding motifs, α helices are the primary mechanisms for DNA sequence recognition. In another group, a two-stranded β sheet with amino acid side chains extending from the sheet toward the DNA reads the information on the surface of the major groove. β-sheet DNA recognition proteins Figure 10. β-sheet DNA recognition proteins. The β-sheet motif can be used to recognize many different DNA sequences; the exact DNA sequence recognized depends on the sequence of amino acids that make up the β sheet. β-sheet DNA recognition proteins Figure 10. β-sheet DNA recognition proteins. Shown is a transcription regulator that binds two molecules of S-adenosyl methionine (red). On the left is a dimer of the protein; on the right is a simplified diagram showing just the two-stranded β sheet bound to the major groove of DNA. S-adenosyl methionine is needed for this protein to bind DNA. Thus, the small molecule regulates the activity of the DNA- binding protein. Zinc Finger proteins Figure 11. Zinc Finger proteins. This group of DNA-binding motifs includes one or more zinc atoms as structural components. All such zinc-coordinated DNA-binding motifs are called zinc fingers, referring to their appearance in early schematic drawings (left). Zinc Finger proteins Figure 11. Zinc Finger proteins. It has a simple structure, in which the zinc atom holds an α helix and a β sheet together (middle). This type of zinc finger is often found in clusters with the α helix of each finger contacting the major groove of the DNA, forming a nearly continuous stretch of α helices along that groove. Zinc Finger proteins Figure 11. Zinc Finger proteins. In this way, a strong and specific DNA–protein interaction is built up through a repeating basic structural unit. Three such fingers are shown on the right. Helix-Loop-Helix proteins Figure 12. Helix-Loop-Helix proteins. The helix–loop–helix motif consists of a short α helix connected by a loop (red) to a second, longer α helix. The flexibility of the loop allows one helix to fold back and park against the other thereby forming the dimerization surface. Helix-Loop-Helix proteins Figure 12. Helix-Loop-Helix proteins. As shown, this two-helix structure binds both to DNA and to the two-helix structure of a second protein to create either a homodimer or a heterodimer. Two α helices that extend from the dimerization interface make specific contacts with the major groove of DNA. Dimerization of Transcription Regulators Increases Their Affinity and Specificity for DNA Figure 13. Transcription regulator. A monomer of a typical transcription regulator recognizes about 4–8 nucleotide pairs of DNA. These proteins do not bind tightly to a single DNA sequence and reject all others; rather, each regulator recognizes a range of closely related sequences, with the affinity of the protein for the DNA varying according to how closely the DNA matches its optimal sequence. Figure 13. Transcription regulator. For this reason, the cis-regulatory sequence for a regulator is often depicted by a “logo” that displays the range of sequences recognized by that transcription regulator. The DNA sequence recognized by a monomer does not usually contain sufficient information to be picked out from the background of such sequences that would occur at random across the genome. For example, an exact six-nucleotide DNA sequence would be expected to occur by chance approximately once every 4096 nucleotides (46). For a bacterial genome of 4.6 × 106 nucleotide pairs, not to mention a mammalian genome of 3 × 109 nucleotide pairs, this is insufficient information to accurately control the transcription of individual genes. Additional contributions to DNA-binding specificity must therefore be present. Many transcription regulators form dimers, with both monomers making nearly identical contacts with DNA. Many transcription regulators form dimers, with both monomers making nearly identical contacts with DNA Figure 13. Transcription regulator. This arrangement doubles the length of the cis-regulatory sequence recognized and greatly increases both the affinity and the specificity of transcription regulator binding. Because the DNA sequence recognized by the protein has increased from approximately 6 nucleotide pairs to 12 nucleotide pairs, there are many fewer random occurrences of matching sequences. In many cases, heterodimers can form between two different transcription regulators, and this configuration also increases both affinity and specificity by expanding the DNA sequence recognized. Some transcription regulators can form heterodimers with more than one partner protein; in this way, the same transcription regulator can be “reused” to create several distinct DNA-binding specificities Many Transcription Regulators Bind Cooperatively to DNA In the simplest case, the collection of noncovalent bonds that holds dimers or heterodimers together is so extensive that these structures form obligatorily and virtually never fall apart. In this case, the unit of binding is the dimer or heterodimer, and the binding curve for the transcription regulator (the fraction of DNA bound as a function of protein concentration) has a standard exponential shape. Figure 14. Occupancy of a cis-regulatory sequence by a transcription regulator. In many cases, however, the dimers and heterodimers are held together very weakly; they exist predominantly as monomers in solution, and yet dimers are observed on the appropriate DNA sequence. In this case, the proteins are said to bind to DNA cooperatively, and the curve describing their binding is S-shaped Cooperative binding means that, over a range of concentrations of the transcription regulator, binding is more of an all-or-none phenomenon than for noncooperative binding; that is, at most protein concentrations, the cis-regulatory sequence is either nearly empty or nearly fully occupied and is rarely somewhere in between. Figure 14. Occupancy of a cis-regulatory sequence by a transcription regulator. Nucleosome Structure Promotes Cooperative Binding of Transcription Regulators As we have just seen, cooperative binding of transcription regulators to DNA often occurs because the proteins involved have only a weak affinity for each other. However, there is a second, indirect mechanism for cooperative binding in eukaryotes, one that arises from the nucleosome structure of their chromosomes. In general, transcription regulators bind to DNA in nucleosomes with lower affinity than they do to naked DNA. There are two reasons for this difference. Figure 15. How nucleosomes affect the binding of transcription regulators. First, the surface of the cis-regulatory sequence recognized by the transcription regulator may be facing inward on the nucleosome, toward the histone core, and therefore not be readily available to the regulatory protein. Second, even if the face of the cis-regulatory sequence is exposed on the outside of the nucleosome, many transcription regulators subtly alter the conformation of the DNA when they bind, and these changes are generally opposed by the tight wrapping of the DNA around the histone core. For example, many transcription regulators induce a bend or kink in the DNA when they bind. Figure 15. How nucleosomes affect the binding of transcription regulators.. We saw in Chapter 4 that nucleosome remodeling can alter the structure of the nucleosome, allowing transcription regulators access to the DNA. Even without remodeling, however, transcription regulators can still gain limited access to DNA in a nucleosome. The DNA at the end of a nucleosome “breathes,” transiently exposing the DNA and allowing regulators to bind. This breathing occurs at a much lower rate in the middle of the nucleosome; therefore, the positions where the DNA exits the nucleosome are much easier to occupy than those in the middle of the nucleosome Figure 15. How nucleosomes affect the binding of transcription regulators. These properties of the nucleosome promote cooperative DNA binding by transcription regulators. If a transcription regulator seizes a “window of opportunity” provided by nucleosome breathing, it can enter the nucleosome by binding to the exposed DNA and prevent the DNA from tightly rewrapping around the nucleosome core. When this happens, the affinity of a second transcription regulator for a nearby cis-regulatory sequence can be increased simply by this loosening of the DNA from the histone core. If the two transcription regulators also interact with each other (as described earlier), the cooperative effect can be even greater. In some cases, the combined action of the regulatory proteins can eventually displace the histone core of the nucleosome altogether. Many transcription regulators, when their affinities for DNA and their concentrations are sufficiently high, can take advantage of nucleosome breathing and thereby “invade” nucleosomes. If the two transcription regulators also interact with each other (as described earlier), the cooperative effect can be even greater. In some cases, the combined action of the regulatory proteins can eventually displace the histone core of the nucleosome altogether. Many transcription regulators, when their affinities for DNA and their concentrations are sufficiently high, can take advantage of nucleosome breathing and thereby “invade” nucleosomes. Although nucleosomes generally inhibit the DNA binding of transcription regulators, some regulators—if their cis-regulatory sequences are exposed on the nucleosome surface—can bind with nearly the same affinity as they do on naked DNA, occupying their binding sites while the DNA is still tightly wrapped around the histone core. Transcription regulators with this property are sometimes called pioneer factors, because they are often the first proteins to bind DNA when a previously silent gene becomes transcriptionally active. Although their binding typically destabilizes the nucleosome, pioneer factors probably exert their major effects by attracting Figure 16. Two cooperating transcription additional proteins that alter chromatin structure, such as regulators, Oct4 (green) and Sox2 (blue), nucleosome remodeling complexes. bound to a nucleosome. If one transcription regulator binds its cis-regulatory sequence on a nucleosome and attracts a chromatin remodeling complex, the localized action of the remodeling complex can allow a second transcription regulator to efficiently bind nearby. Figure 16. Two cooperating transcription regulators, Oct4 (green) and Sox2 (blue), bound to a nucleosome. DNA-Binding by Transcription Regulators Is Dynamic Thus far, we have treated transcription regulators as static—we have considered them as either bound to DNA or free in solution. But in reality, the situation is highly dynamic, with transcription regulator molecules in constant motion, rapidly binding and dissociating from DNA. In most cases a given transcription regulator molecule stays on its cis-regulatory sequence for only a short time, but it is rapidly replaced by other molecules of the same regulator. Thus, when we consider a cis-regulatory sequence being fully bound by its matching transcription regulator, this state is an average, over time, of many individual association and dissociation events. In most cases, a given transcription regulator molecule stays on its cis-regulatory sequence for only a short time, but it is rapidly replaced by other molecules of the same regulator. Thus, when we consider a cis-regulatory sequence being fully bound by its matching transcription regulator, this state is an average, over time, of many individual association and dissociation events. DNA-Binding by Transcription Regulators Is Dynamic By attaching a transcription regulator to a bright fluorescent tag, it is possible to follow single regulator molecules in live cells, as they diffuse randomly within the nucleus, bind to their cis- regulatory sequences, and then dissociate from them. In these single-molecule tracking experiments, different states for the regulator can be distinguished on the basis of the tagged protein’s mobility over short time periods. A high-mobility regulator state is observed for the free protein diffusing in the nucleoplasm. At the other extreme, a very low-mobility state is attributed to the regulator bound to DNA, in as much as its restrained motions are similar to that of a histone molecule that has been labeled in the same way. Whereas a histone remains stably bound in a nucleosome, transcription regulators remain in a low-mobility, DNA-bound state only transiently. Individual regulator molecules are observed to leave their DNA-bound state at a wide variety of rates—some molecules persist for only a fraction of a second, while others remain for minutes. Any protein, such as a transcription regulator, that binds tightly to a specific set of DNA sequences will also bind, albeit much more weakly, to any DNA sequence. This weak binding is useful because it allows a regulator to search for its target by “scanning” the DNA in the vicinity of the initial chromosomal site that it binds. Most such regulators will fail to find a matching cis-regulatory DNA sequence, and it is these that are thought to dissociate within seconds. The minority that persist for minutes are likely to have engaged with a matching cis- regulatory sequence. But because even these Figure 17. Tracking single molecules of a transcription regulator in regulators do not remain on DNA for long the nucleus of a living cell. periods, they need to be constantly replaced by another such molecule. The Tryptophan Repressor Switches Genes Off The genome of the bacterium Escherichia coli consists of a single, circular DNA molecule of about 4.6 × 106 nucleotide pairs that encodes approximately 4300 proteins. Only a fraction of these proteins are made at any one time. When tryptophan concentrations are low, the operon is transcribed; the resulting mRNA is translated to produce a full set of biosynthetic enzymes, which work in tandem to synthesize tryptophan from much simpler molecules. When tryptophan is abundant, however—for example, when the bacterium is in the gut of a mammal that has just eaten a protein-rich meal—the amino acid is imported into the cell and shuts down production of the enzymes, which are no longer needed. Within the operon’s promoter is a cis-regulatory sequence that is recognized by a transcription regulator. When this regulator binds to this sequence, it blocks access of RNA polymerase to the promoter, thereby preventing transcription of the operon (and thus production of the tryptophan-producing enzymes). Figure 18. Genes can be switched off by repressor proteins. The transcription regulator is known as the tryptophan repressor, and its cis-regulatory sequence is called the tryptophan operator. These components are controlled in a simple way: the repressor can bind to DNA only if it has also bound several molecules of tryptophan The tryptophan repressor is an allosteric protein, and the binding of tryptophan causes a subtle change in its three-dimensional structure so that the protein can bind tightly to the operator sequence. Figure 18. Genes can be switched off by repressor proteins. Whenever the concentration of free tryptophan in the bacterium drops, tryptophan dissociates from the repressor, the repressor no longer binds to DNA, and the tryptophan operon is transcribed. The repressor is thus a simple device that switches production of a set of biosynthetic enzymes on and off according to the availability of the end product of the pathway that the enzymes catalyze. Figure 18. Genes can be switched off by repressor proteins. The tryptophan repressor protein itself is always present in the cell. The gene that encodes it is continually transcribed at a low level, so that a small amount of the repressor protein is always being made. Thus the bacterium can respond very rapidly to a rise or fall in tryptophan concentration. Repressors Turn Genes Off and Activators Turn Them On The tryptophan repressor, as its name suggests, is a transcription repressor protein: in its active form, it switches genes off, or represses them. Some bacterial transcription regulators do the opposite: they switch genes on, or activate them. These transcription activator proteins work on promoters that—in contrast to the promoter for the tryptophan operon—are only marginally able to bind and position RNA polymerase on their own. However, these poorly functioning promoters can be made fully functional by activator proteins that bind to nearby cis- regulatory sequences and contact the RNA polymerase to help it initiate transcription. Figure 19. Genes can be switched on by activator proteins. DNA-bound activator proteins can increase the rate of transcription initiation as much as 1000- fold, a value consistent with a relatively weak and nonspecific interaction between the transcription regulator and RNA polymerase. For example, a 1000-fold change in the affinity of RNA polymerase for its promoter corresponds to a change in ∆G of ∼18 kJ/mole, which could be accounted for by just a few weak, noncovalent bonds. Thus, many activator proteins work simply by providing a few favorable interactions that help to attract RNA polymerase to the promoter. To provide this assistance, however, the activator protein must be bound to its cis-regulatory sequence, and this sequence must be positioned precisely so that these favorable interactions can occur with an RNA polymerase molecule at its promoter. Like the tryptophan repressor, activator proteins often have to interact with a second molecule to be able to bind DNA. For example, the bacterial activator protein CAP has to bind cyclic AMP (cAMP) before it can bind to DNA. Genes activated by CAP are switched on in response to an increase in intracellular cAMP concentration, which rises when glucose, the bacterium’s preferred carbon source, is no longer available. CAP then drives the production of enzymes that allow the bacterium to digest other sugars. Both an Activator and a Repressor Control the Lac Operon The activity of a single bacterial promoter is often controlled by several different transcription regulators. The Lac operon in E. coli, for example, is controlled by both the Lac repressor and the CAP activator just discussed. The Lac operon encodes proteins required to import and digest the disaccharide lactose, a key nutrient in milk. In the absence of glucose (the cell’s favorite energy source), the bacterium makes cAMP, which activates CAP to switch on genes that allow the cell to utilize alternative sources of carbon—including lactose. Figure 20. How the Lac operon is controlled by two transcription regulators, causing it to be expressed only when needed. It would be wasteful, however, for CAP to induce expression of the Lac operon if lactose itself were not present. Thus the Lac repressor shuts off the operon in the absence of lactose. This arrangement enables the control region of the Lac operon to integrate two different signals so that the operon is highly expressed only when two conditions are met: glucose must be absent and lactose must be present. This genetic circuit thus behaves much like a switch that carries out a logic operation in a computer. When lactose is present AND glucose is absent, the cell executes the appropriate program—in this case, transcription of the genes that permit the uptake and utilization of lactose. Figure 20. How the Lac operon is controlled by two transcription regulators, causing it to be expressed only when needed. All transcription regulators, whether they are repressors or activators, must be bound to DNA to exert their effects. In this way, each regulatory protein acts selectively, controlling only those genes that bear a cis-regulatory sequence recognized by it. The logic of the Lac operon first attracted the attention of biologists more than 60 years ago. The way it works was uncovered by a combination of genetics and biochemistry, providing some of the first insights into how transcription is controlled in any organism. DNA Looping Can Occur During Bacterial Gene Regulation Some proteins (for example, the CAP protein) can act either as a repressor or an activator, depending on the exact placement of a binding site relative to the promoter: if this site overlaps the promoter, CAP binding can prevent the assembly of RNA polymerase at the promoter, thus serving as a repressor. Most bacteria have small, compact genomes, and the cis-regulatory sequences that control the transcription of a gene are typically located very near to the start point of transcription. But there are some exceptions to this generalization—cis-regulatory sequences can be located hundreds and even thousands of nucleotide pairs from the bacterial genes they control. In these cases, the intervening DNA loops out, allowing a transcription regulator bound at a distant site along the DNA to contact RNA polymerase. Figure 20. Transcriptional activation by DNA looping in bacteria. Figure 20. Transcriptional activation by DNA looping in bacteria. Here, the DNA is serving as a tether, enormously increasing the probability that the regulator will collide with a promoter-bound polymerase, compared with the situation where the regulator is free in solution. Figure 20. Transcriptional activation by DNA looping in bacteria. Although the exception in bacteria, DNA looping is thought to occur in the regulation of nearly every eukaryotic gene. It has been proposed that the compact, simple genetic switches found in bacteria evolved in response to a severe competition for growth that put strong selective pressure on bacteria to maintain small genome sizes. In contrast, there appears to have been little selective pressure to “streamline” the genomes of multicellular organisms. Complex Switches Control Gene Transcription in Eukaryotes When compared to the situation in bacteria, transcription regulation in eukaryotes involves many more proteins and much longer stretches of DNA—and it often seems bewilderingly complex. Yet many of the same principles apply. As in bacteria, the time and place that each gene is to be transcribed are specified by its cis- regulatory sequences, which are “read” by the transcription regulators that bind to them. Once bound to DNA, positive transcription regulators (activators) help RNA polymerase to begin transcribing genes, and negative regulators (repressors) block this from happening. But in bacteria, most of the interactions between DNA-bound transcription regulators and RNA polymerases (whether they activate or repress transcription) are direct; that is, they contact each other. In contrast, these interactions are almost always indirect in eukaryotes: many intermediate proteins, including the histones and a large protein complex known as Mediator, act between DNA-bound transcription regulators and RNA polymerase. When compared to the situation in bacteria, transcription regulation in eukaryotes involves many more proteins and much longer stretches of DNA—and it often seems bewilderingly complex. Yet many of the same principles apply. Moreover, in multicellular organisms, it is common for dozens of transcription regulators to control a single gene and for cis-regulatory sequences to be spread over tens of thousands of nucleotide pairs. DNA looping allows the DNA-bound regulatory proteins to interact with each other and ultimately to control RNA polymerase at the promoter. Finally, because nearly all of the DNA in eukaryotic organisms is organized in nucleosomes and higher-order chromatin structures, transcription initiation in eukaryotes must overcome this inherent block. A Eukaryotic Gene Control Region Includes Many cis-Regulatory Sequences In eukaryotes, RNA polymerase II transcribes all the protein-coding genes and many noncoding RNA genes. This polymerase requires five general transcription factors in contrast to bacterial RNA polymerase, which needs only a single general transcription factor (the σ subunit). In eukaryotes, RNA polymerase II transcribes all the protein-coding genes and many noncoding RNA genes. This polymerase requires five general transcription factors in contrast to bacterial RNA polymerase, which needs only a single general transcription factor (the σ subunit). Because the many cis-regulatory sequences that control the expression of a typical gene are often spread over long stretches of DNA, we use the term gene control region to describe the whole expanse of DNA involved in regulating and initiating transcription of a eukaryotic gene. This includes the promoter, where the general transcription factors and the polymerase assemble, plus all of the cis-regulatory sequences to which transcription regulators bind to control the rate of the gene activation processes at the promoter. Figure 21. Transcription is controlled by gene control regions. This includes the promoter, where the general transcription factors and the polymerase assemble, plus all of the cis-regulatory sequences to which transcription regulators bind to control the rate of the gene activation processes at the promoter. Figure 21. Transcription is controlled by gene control regions. In animals and plants, it is not unusual to find the regulatory sequences of a gene dotted over stretches of DNA as large as 100,000 nucleotide pairs. For now, we can regard much of this DNA as “spacer” sequences that transcription regulators do not directly recognize. In contrast to the small number of general transcription factors, which are abundant proteins that assemble on the promoters of all genes transcribed by RNA polymerase II, there are thousands of different transcription regulators devoted to turning individual genes on and off. As we have seen, each eukaryotic gene is usually transcribed individually. Not surprisingly, the regulation of each eukaryotic gene is different in detail from that of every other gene, and it is difficult to formulate simple rules for gene regulation that apply in every case. Eukaryotic Transcription Regulators Work in Groups In bacteria, we saw that proteins such as the tryptophan repressor, the Lac repressor, and the CAP protein bind to DNA on their own and directly affect RNA polymerase at the promoter. Eukaryotic transcription regulators, in contrast, usually assemble together in groups at their cis- regulatory sequences. In some especially complex gene control regions, tens and even hundreds of such proteins may coassemble on DNA. In addition, a broad class of multisubunit proteins termed coactivators and co-repressors join with them. Typically, these coactivators and co-repressors do not recognize specific DNA sequences themselves; they are brought to those sequences by specific interactions with the DNA-bound transcription regulators. As their names imply, coactivators are typically involved in activating transcription and co-repressors in repressing it. Figure 22. Transcription is controlled by gene control regions. As shown, an individual transcription regulator can often participate in more than one type of regulatory complex. A protein might function, for example, in one case as part of a complex that activates transcription and in another case as part of a complex that represses transcription. Thus, individual eukaryotic transcription regulators function as regulatory parts that are used to build complexes whose function depends on the final assembly of all of the individual components. Figure 22. Transcription is controlled by gene control regions. Each eukaryotic gene is therefore regulated by a “committee” of proteins, all of which must be present to express the gene at its proper level. Often the protein-protein interactions between transcription regulators and between regulators and coactivators are too weak for them to assemble in solution; however, the appropriate combination of cis-regulatory sequences can “crystallize” the assembly of these complexes on DNA. Figure 22. Transcription is controlled by gene control regions. In very large and complex gene control regions, this assembly may be accompanied by a phase transition to form a biomolecular condensate, whereby all the components are held together even more efficiently by keeping them in rough proximity even when individual proteins disassociate from DNA. Activator Proteins Promote the Assembly of RNA Polymerase at the Start Point of Transcription The cis-regulatory sequences to which eukaryotic transcription activator proteins bind were originally called enhancers because their presence “enhanced” the rate of transcription initiation. Once bound to DNA, how do assemblies of activator proteins increase the rate of transcription initiation? At most genes, several mechanisms work in concert. Their ultimate function is to attract and position RNA polymerase II at the promoter and to release it so that transcription can begin. Some activator proteins bind directly to one or more of the general transcription factors, accelerating their assembly on a promoter that has been brought in proximity—through DNA looping—to that activator. Most transcription activators, however, attract coactivators that then perform the biochemical tasks needed to initiate transcription. Figure 21. Transcription is controlled by gene control regions. One of the most prevalent coactivators is the large Mediator protein complex, composed of more than 30 subunits. About the same size as RNA polymerase itself, Mediator serves as a bridge between DNA-bound transcription activators, RNA polymerase, and the general transcription factors, facilitating their assembly at the promoter. Eukaryotic Transcription Activators Direct the Modification of Local Chromatin Structure The eukaryotic general transcription factors and RNA polymerase are unable, on their own, to assemble on a promoter that is packaged in nucleosomes. Thus, in addition to directing the assembly of the transcription machinery at the promoter, eukaryotic transcription activators— once bound to their cis-regulatory sequences—promote transcription by triggering changes to the chromatin structure of the promoters, rendering the underlying DNA more accessible. The enzymes that alter chromatin structure are usually carried as subunits of coactivators, which are typically multiprotein complexes, with different subunits carrying out different functions. For example, such a coactivator might carry one subunit that associates with specific DNA- bound transcription regulators, another that associates with one of the general transcription factors, and several more that alter chromatin structure in different ways. The most important ways of locally altering chromatin are through covalent histone modifications, nucleosome remodeling, nucleosome removal, and histone replacement. Figure 23. Transcription is controlled by gene control regions. Eukaryotic transcription activators use all four of these mechanisms: thus they attract coactivators that include histone modification enzymes, ATP-dependent chromatin remodeling complexes, and histone chaperones. These proteins often act cooperatively to alter the chromatin structure of promoters, providing greater access to the DNA. Figure 23. Transcription is controlled by gene control regions. Often a series of individual events, ultimately directed by transcription regulators, must occur before RNA polymerase can be assembled onto a promoter, with details that depend on the gene being regulated. As illustrated, a series of specific histone tail modifications is triggered by a transcription activator; these modifications then attract additional proteins to the promoter, including both a chromatin remodeling complex and a general transcription factor. Those proteins can in turn recruit additional proteins to the promoter, while also destabilizing adjacent nucleosomes. Because the local chromatin changes directed by one transcription regulator often allow the binding of additional proteins—both directly and indirectly as just described—a cascade of events typically takes place on the control regions of eukaryotic genes to regulate their transcription. As RNA polymerase II transcribes through a gene a different type of chromatin modification occurs. The histones just ahead of the polymerase are acetylated by enzymes carried by the polymerase, removed by histone chaperones, and deposited behind the moving polymerase. Figure 24. Successive histone modifications during transcription initiation. These histones are then rapidly deacetylated and methylated, also by complexes that are carried by the polymerase, leaving behind nucleosomes that are especially resistant to transcription. This remarkable process seems to prevent spurious transcription reinitiation behind a moving polymerase, which, in essence, must clear a path through chromatin as it transcribes. Figure 24. Successive histone modifications during transcription initiation. Some Transcription Activators Work by Releasing Paused RNA Polymerase Figure 25. Different transcription regulators can act at different steps. Thus far, we have emphasized how transcription regulators—once bound to DNA—can assemble multiple components and stimulate transcription initiation. But for some genes, a key regulatory step occurs after this point. Figure 25. Different transcription regulators can act at different steps. In the most common of these cases, the RNA polymerase halts after transcribing about 50 nucleotides of RNA, and further elongation requires a new transcription activator to bind to the gene’s control region. Figure 25. Different transcription regulators can act at different steps. The release of a paused RNA polymerase can occur in several ways. In some cases, the new activator brings in a chromatin remodeling complex that removes a nucleosome block to the elongating RNA polymerase. Figure 25. Different transcription regulators can act at different steps. In other cases, the activator communicates with RNA polymerase (typically through a coactivator), signaling it to forge ahead. Figure 25. Different transcription regulators can act at different steps. In some cases, the key step in gene activation is the delayed loading of these factors onto RNA polymerase, directed by DNA-bound transcription activators. Once loaded, these factors allow the polymerase to move through blocks imposed by chromatin structure to begin transcribing the gene effectively. Figure 25. Different transcription regulators can act at different steps. Having RNA polymerase already poised on a promoter in the beginning stages of transcription bypasses the step of assembling many components at the promoter, which is often slow. This mechanism is therefore thought to allow cells to begin transcribing a gene in rapid response to an extracellular signal. Transcription Activators Work Synergistically In general, where several factors work together to enhance a reaction rate, the joint effect is not merely the sum of the enhancements that each factor alone contributes, but the product of them. If, for example, factor A lowers the free-energy barrier for a reaction by a certain amount and thereby speeds up the reaction 100-fold, and factor B, by acting on that reaction, does likewise, then A and B acting in parallel can lower the energy barrier by a double amount and speed up the reaction 10,000-fold. Even if A and B work simply by attracting the same protein, the affinity of that protein for the reaction site increases multiplicatively. Figure 26. Transcriptional synergy. Figure 26. Transcriptional synergy. Thus, transcription activators often exhibit transcriptional synergy, where several DNA-bound activator proteins working together produce a transcription rate that is much higher than the sum of their transcription rates working alone. As a result, the rate of transcription of a gene ultimately depends on the spectrum of regulatory proteins that are bound upstream and downstream of its transcription start site, along with the coactivator proteins they bring to the DNA. Condensate Formation Likely Increases the Efficiency of Transcription Initiation We have discussed in broad, conceptual terms the many different types of proteins that must assemble for transcription of a typical gene to begin. For especially complex gene control regions, such as those of key human genes that orchestrate development, several hundred individual subunits are involved and, as they begin to assemble on DNA, they become involved in networks that create phase transitions, forming small biomolecular condensates. As described in Chapter 3, such condensates hold their proteins in loose proximity, such that, when one disassociates from the assembly, it can be retained nearby by a network of fluctuating weak interactions Consistent with this idea, many transcription regulators, coactivators, and co-repressors contain the type of low-complexity, unstructured regions that help to drive condensate formation. How might this aid transcription? At least some of these transcription condensates contain additional copies of key proteins, including the Mediator complex. Figure 27. Condensate formation at the transcription control region of the Nanog gene in a mouse embryonic stem cell. The presence of these extra copies in the same condensate is proposed to make transcription initiation an efficient but highly dynamic process, with proteins within the condensate rapidly exchanging on and off DNA. Eukaryotic Transcription Repressors Can Inhibit Transcription in Several Ways Although the “default” state of eukaryotic DNA packaged into nucleosomes is resistant to transcription, eukaryotes nonetheless use transcription regulators to repress the transcription of individual genes. These transcription repressors can rapidly turn off a gene that is being actively transcribed, and they can depress the rate of transcription even below that of the very low default value. Like the transcription activators discussed earlier, transcription repressors often work on a gene-by-gene basis. But unlike the bacterial repressors discussed earlier in this chapter, eukaryotic repressors do not directly compete with the RNA polymerase for access to the DNA. Figure 28. Six of the ways in which eukaryotic repressor proteins can operate. The different mechanisms of repression have different consequences for the ease with which a repressed gene can be reactivated. For most of the strategies, the repressed state is relatively easy to rapidly reverse, for example, by simply inactivating the repressor. But, the last mechanism—a directed methylation of specific histone amino acids that creates an unusually highly condensed form of chromatin, known as heterochromatin—is self-reinforcing and can propagate even when the initiating signal is no longer present. As discussed in Chapter 4, chromatin that is marked by H3K9me3 (trimethylation of the lysine at position 9 of histone H3) appears to be the most difficult to transcribe. Typically located around centromeres and repeated DNA sequences such as inactive transposons, this type of heterochromatin strongly suppresses both genetic recombination and transcription. A different histone H3 modification (H3K27me3) is associated with a second form of heterochromatin that is also resistant to transcription. Although apparently easier to activate than the H3K9me3 form, this form of chromatin is also self-propagating and can persist across cell divisions, after the initiating signal has disappeared. These two types of heterochromatin are used to tightly repress genes active in early development, presumably to make sure that these genes are not expressed in the mature organism. Tight, heritable gene repression is especially important to animals and plants whose growth depends on elaborate and complex developmental programs. Misexpression of a single gene at a critical time can have disastrous consequences for the individual. For this reason, many of the genes encoding the most important developmental regulatory proteins are kept tightly repressed, often by multiple mechanisms. Insulator DNA Sequences Prevent Eukaryotic Transcription Regulators from Influencing Distant Genes We have seen that all genes have control regions, which dictate at which times, under what conditions, and in what tissues the gene will be expressed. We have also seen that eukaryotic transcription regulators can act across very long stretches of DNA, with the intervening DNA looped out. How, then, are control regions of different genes kept from interfering with one another? For example, what keeps a transcription regulator bound on the control region of one gene from looping in the wrong direction and inappropriately influencing the transcription of an adjacent gene? And, if complex regulatory regions form biomolecular condensates, what keeps all of the control regions from forming a giant condensate where the regulatory information would become scrambled? To avoid such cross-talk between control regions, several types of DNA elements compartmentalize the genome into discrete regulatory domains. In Chapter 4, we discussed barrier sequences that prevent the spread of heterochromatin into genes that need to be expressed. A second type of DNA element, called an insulator, prevents cis-regulatory sequences from running amok and activating inappropriate genes. Figure 29. Schematic diagram summarizing the properties of insulators and barrier sequences. As we saw in Chapter 4, insulator sequences function by forming loops of chromatin, an effect mediated by specialized proteins that recognize them. The loops are thought to keep a gene and its control region in rough proximity and help to prevent the control region from “spilling over” to adjacent genes. More generally, the distribution of insulators and barrier sequences in a genome helps to divide it into independent domains of gene regulation and chromatin structure. The distribution of the more than 10,000 loops on the collection of mammalian chromosomes can change as cells differentiate or as they respond to changes in their environment. In addition, these loops formed by insulators are not static; rather, they undergo a continual process of loop extrusion and release that is driven by cohesion protein rings. It has been proposed that the extrusion process itself helps to juxtapose enhancers with their matching promoters by sliding them past one another, while helping to break up inappropriate enhancer–promoter connections by physically separating them. Although chromosomes are dynamically organized into domains that discourage control regions from acting indiscriminately, there are special circumstances where a control region located on one chromosome has been found to deliberately activate a gene located on a different chromosome. Although there is much we do not understand about this mechanism, it reflects the extreme versatility of transcription regulation strategies. Although all cells must be able to switch genes on and off in response to changes in their environments, the cells of multicellular organisms have evolved this capacity to an extreme degree. In particular, once a cell in a multicellular organism becomes committed to differentiate into a specific cell type, the cell maintains this choice through many subsequent cell generations, which means that it remembers the changes in gene expression involved in the choice. In particular, once a cell in a multicellular organism becomes committed to differentiate into a specific cell type, the cell maintains this choice through many subsequent cell generations, which means that it remembers the changes in gene expression involved in the choice. This phenomenon of cell memory is a prerequisite for the creation of organized tissues and for the maintenance of stably differentiated cell types. In contrast, other changes in gene expression in eukaryotes, as well as most such changes in bacteria, are only transient. The tryptophan repressor, for example, switches off the tryptophan genes in bacteria only in the presence of tryptophan; as soon as tryptophan is removed from the medium, the genes are switched back on, and the descendants of the cell will have no memory that their ancestors had been exposed to tryptophan. Complex Genetic Switches That Regulate Drosophila Development Are Built Up from Smaller Modules The expression of the Drosophila Even-skipped (Eve) gene plays an important part in the development of the Drosophila embryo. If this gene is inactivated by mutation, many parts of the embryo fail to form, and the embryo dies early in development. At the stage of development when Eve begins to be expressed, the embryo is a single giant cell containing multiple nuclei in a common cytoplasm. This cytoplasm contains a mixture of transcription regulators that are distributed unevenly along the length of the embryo, thus providing positional information that distinguishes one part of the embryo from another Figure 30. The nonuniform distribution of transcription regulators in an early Drosophila embryo. Figure 30. The nonuniform distribution of transcription regulators in an early Drosophila embryo. Although the nuclei are initially identical, they rapidly begin to express different genes because they are exposed to different transcription regulators: the nuclei near the anterior end of the developing embryo are exposed to a set of transcription regulators that is different from the set present at the middle and that present at the posterior end of the embryo. Figure 30. The nonuniform distribution of transcription regulators in an early Drosophila embryo. The regulatory DNA sequences that control the Eve gene have evolved to “read” the concentrations of transcription regulators at each position along the length of the embryo, so as to cause the Eve gene to be expressed in seven precisely positioned stripes, each initially five to six nuclei wide. Figure 31. Experiment demonstrating the modular construction of the Eve gene regulatory region. The control region of the Eve gene is very large (approximately 20,000 nucleotide pairs). It is formed from a series of relatively simple regulatory modules, each of which contains multiple cis-regulatory sequences and is responsible for specifying a particular stripe of Eve expression along the embryo. This modular organization of the Eve gene control region was revealed by experiments in which a particular regulatory module (say, that specifying stripe 2) is removed from its normal setting upstream of the Eve gene, placed in front of a reporter gene, and reintroduced into the Drosophila genome. Figure 31. Experiment demonstrating the modular construction of the Eve gene regulatory region. When developing embryos derived from flies carrying this genetic construct are examined, the reporter gene is found to be expressed in precisely the position of stripe 2 but not in the other normal stripe positions. Similar experiments reveal the existence of other regulatory modules, which specify other stripes. The Drosophila Eve Gene Is Regulated by Combinatorial Controls Figure 31. The Eve stripe 2 unit. A detailed study of the stripe 2 regulatory module has provided insights into how it reads and interprets positional information. The module contains recognition sequences for two transcription regulators that activate Eve transcription (Bicoid and Hunchback) and for two that repress it (Krüppel and Giant). The relative concentrations of these four proteins determine whether the protein complexes that form at the stripe 2 module activate transcription of the Eve gene. Figure 32 shows the distributions of the four transcription regulators across the region of a Drosophila embryo where stripe 2 forms. It is thought that either of the two repressor proteins, when bound to the DNA, will turn off the stripe 2 module, whereas both Bicoid and Hunchback must bind for this module’s maximal activation. Figure 32. Distribution of the transcription regulators This simple regulatory scheme suffices to turn on responsible for ensuring that Eve is expressed in stripe 2. the stripe 2 module (and therefore the expression of the Eve gene) only in those nuclei located where the levels of both Bicoid and Hunchback are high and both Krüppel and Giant are absent—a combination that occurs in only one region of the early embryo. The stripe 2 element is autonomous, inasmuch as it specifies stripe 2 when isolated from its normal context. The other stripe regulatory modules are thought to be constructed similarly, reading positional information provided by other combinations of transcription regulators. The entire Eve gene control region binds more than 20 different transcription regulators. Seven combinations of regulators—one combination for each stripe—specify Eve expression, while many other combinations (all those found in the interstripe regions of the embryo) keep all the stripe elements silent. A large and complex control region is thereby built from a series of smaller modules, each of which consists of a unique arrangement of short cis-regulatory sequences recognized by specific transcription regulators. The Eve gene itself encodes a transcription regulator, which, after its pattern of expression is set up in seven stripes, controls the expression of other Drosophila genes. As development proceeds, the embryo is thus subdivided into finer and finer regions that eventually give rise to the different body parts of the adult fly Eve exemplifies the complexity of transcription control regions in plants and animals. As this example shows, control regions can respond to many different inputs, integrate this information, and produce a complex spatial and temporal output as development proceeds. However, exactly how all these mechanisms work together to produce the final output is understood only in broad outline. Figure 33. The integration of multiple inputs at a promoter. Transcription Regulators Are Brought into Play by Extracellular Signals In embryos of most other organisms and in all adults, individual nuclei are in separate cells, and extracellular information (including positional cues) must be passed across the plasma membrane so as to generate signals in the cytosol that cause different transcription regulators to become active in different cell types. Figure 34. Some ways in which the activity of transcription regulators is controlled inside eukaryotic cells. Like the fly example discussed earlier, mammalian enhancers are also modular. An example is the control region responsible for regulating the α-globin gene, which codes for one of the subunits of hemoglobin. Here, five different modules are spread out over about 25,000 nucleotide pairs. Each of the five modules, when experimentally separated from the other four, can act as an independent enhancer to specify production of α- globin; but they do so only in erythroid cells, the precursors to red blood cells, because only erythroid cells express the appropriate transcription regulators. Figure 35. Modular structure of the control region for the mouse a-globin gene. Red blood cells, which contain high concentrations of hemoglobin, are unusual in that they lack DNA and rely on their precursor cells to synthesize this protein. Combinatorial Gene Control Creates Many Different Cell Types We have seen that transcription regulators usually act in combination to control the expression of an individual gene. It is also generally true that each transcription regulator in an organism contributes to the control of many genes. This illustration shows how combinatorial gene control makes it possible to generate a great deal of biological complexity even with relatively few transcription regulators. The importance of a combination of transcription regulators for the specification of cell types is most easily demonstrated by their ability—when expressed artificially in a specific combination—to convert one type of cell to another. Figure 36. The importance of combinatorial gene control for development. For example, the artificial expression of three neuron-specific transcription regulators in liver cells can convert the liver cells into functional nerve cells. In some cases, expression of even a single transcription regulator is sufficient to convert one cell type to another: when the gene encoding the transcription regulator MyoD is artificially introduced into fibroblasts cultured from skin connective tis sue, the fibroblasts form muscle-like cells. Figure 37. A small set of transcription regulators can convert one differentiated cell type into another. As discussed in Chapter 22, fibroblasts, which are derived from the same broad class of embryonic cells as muscle cells, have already accumulated many of the other necessary transcription regulators required for the combinatorial control of the muscle-specific genes, and the addition of MyoD completes the unique combination required to direct the cells to become muscle. Figure 38. Expression of the Drosophila Eyeless gene in precursor cells of the fly leg triggers the development of an eye on the leg. An even more striking example is seen by artificially expressing, early in development, a single Drosophila transcription regulator (Eyeless) in groups of cells that would normally go on to form leg parts. Here, this abnormal gene expression change causes eye-like structures to develop in the legs. Specialized Cell Types Can Be Experimentally Reprogrammed to Become Pluripotent Stem Cells Artificial manipulation of transcription regulators can also coax various differentiated cells to de-differentiate into pluripotent stem cells that are capable of giving rise to the different cell types in the body. Thus, when three specific transcription regulators are artificially expressed in cultured mouse fibroblasts, a number of cells become induced pluripotent stem cells (iPS cells)—cells that look and behave like the pluripotent embryonic stem (ES) cells that are derived from embryos. This approach has been adapted to produce iPS cells from a variety of specialized cell types, including cells taken from humans. Such human iPS cells can then be directed to generate a population of differentiated cells for use in the study or treatment of disease. Figure 39. A combination of transcription regulators can induce a differentiated cell to de-differentiate into a pluripotent cell. Combinations of Master Transcription Regulators Specify Cell Types by Controlling the Expression of Many Genes As we saw in the introduction to this chapter, different cell types of multicellular organisms differ enormously in the proteins and RNAs they express. For example, only muscle cells express special types of actin and myosin that form the contractile apparatus, while nerve cells must make and assemble all the proteins needed to form dendrites and synapses. We have seen that these patterns of cell-type-specific expression are orchestrated by a combination of so-called master transcription regulators. In many cases, these proteins bind directly to cis-regulatory sequences of the genes particular to that cell type. The specification of a particular cell type typically involves changes in the expression of several thousand genes. Genes whose protein products are required in the cell type are expressed at high levels, while those not needed are typically down-regulated. The specification of a particular cell type typically involves changes in the expression of several thousand genes. Genes whose protein products are required in the cell type are expressed at high levels, while those not needed are typically down-regulated. Figure 40. A portion of the transcription network specifying embryonic stem cells. Specialized Cells Must Rapidly Turn Some Genes On and Off Although they generally maintain their identities, specialized cells must constantly respond to changes in their environment. Among the most important changes are signals from other cells that coordinate the behavior of the whole organism. Here, we consider how specialized cell types rapidly and decisively switch groups of genes on and off in response to their environment. Even though control of gene expression is combinatorial, the effect of a single transcription regulator can still be decisive in switching any particular gene on or off, simply by completing the combination needed to maximally activate or repress that gene. An example is the rapid control of gene expression by the human glucocorticoid receptor protein. To bind to its cis-regulatory sequences in the genome, this transcription regulator must first form a complex with a molecule of a glucocorticoid steroid hormone. The body releases this hormone during times of starvation and intense physical activity, and among its other activities, it stimulates liver cells to increase the production of glucose from amino acids and other small molecules. Although these genes all have different and complex control regions, their maximal expression depends on the binding of the hormone–glucocorticoid receptor complex to its cis-regulatory sequence, which is present in the control region of each gene. When the body has recovered and the hormone is no longer present, the expression of each of these genes drops to its normal level in the liver. In this way, a single transcription regulator can rapidly control the expression of many different genes The effects of the glucocorticoid receptor are not confined to cells of the liver. In other cell types, activation of this transcription regulator by hormone also causes changes in the expression levels of many genes; the genes affected, however, are usually different from those affected in liver cells. Figure 41. A single transcription regulator can coordinate the expression of many different genes. Differentiated Cells Maintain Their Identity Once a cell has become differentiated into a particular cell type, it will generally remain differentiated, and all its progeny cells will remain that same cell type. Some highly specialized cells, including skeletal muscle cells and neurons, never divide again once they have differentiated; that is, they are terminally differentiated. But many other differentiated cells—such as fibroblasts, smooth muscle cells, and liver cells— will divide many times in the life of an individual. When they do, these specialized cell types give rise only to cells like themselves: smooth muscle cells do not give rise to liver cells, nor liver cells to fibroblasts. Cells have several ways of ensuring that their daughters “remember” what kind of cells they are. One of the simplest and most important is through a positive feedback loop, where a master cell-type transcription regulator activates transcription of its own gene, in addition to that of the other cell-type-specific genes needed to maintain the cell type. Each time a cell divides, the regulator is distributed to both daughter cells, where it continues to stimulate the positive feedback loop, making more of itself and the cell-type proteins it controls each division. Positive feedback is crucial for establishing “self-sustaining” circuits of gene expression that allow a cell to commit to a particular fate—and then to transmit that information to its progeny. Positive feedback loops formed by transcription regulators are probably the most prevalent way of ensuring that daughter cells remember what kind of cells they are meant to be, and they are found in all species on Earth. Figure 42. A positive feedback loop can create cell memory. Transcription Circuits Allow the Cell to Carry Out Logic Operations An analysis of gene regulatory circuits reveals that certain simple types of arrangements (called network motifs) are found over and over again in cells from widely different species. Figure 43. Common types of network motifs in transcription circuits. The different types of behavior produced by a feedback loop will depend on the details of the system; for example, how tightly the transcription regulator binds to its cis-regulatory sequence, its rate of synthesis, and its rate of decay. With two or more transcription regulators, the possible range of circuit behaviors becomes more complex. Another common circuit arrangement is called a feed-forward loop; such a loop can serve as a filter, responding to input signals that are prolonged but disregarding those that are brief. The simple types of devices just illustrated are often found joined together, creating exceedingly complex circuits. Each cell in a developing multicellular organism is equipped with similarly complex control machinery, and it must, in effect, use its intricate system of interlocking transcription switches to “compute” how it should behave at each time point in response to the many different past and present inputs received. Figure 44. How a feed-forward loop can measure the duration of a signal. Patterns of DNA Methylation Can Be Inherited When Vertebrate Cells Divide In vertebrate cells, the methylation of cytosine provides one mechanism through which gene expression patterns can be passed on to progeny cells. The methylated form of cytosine, 5- methylcytosine (5-methyl C), has the same relation to cytosine that thymine has to uracil, and the modification likewise has no effect on base-pairing DNA methylation in vertebrate DNA occurs on cytosine (C) nucleotides largely in the sequence CG, which is base-paired to exactly the same sequence (in opposite orientation) on the other strand of the DNA helix. Figure 45. Formation of 5-methylcytosine occurs by methylation of a cytosine base in the DNA double helix. DNA methylation in vertebrate DNA occurs on cytosine (C) nucleotides largely in the sequence CG, which is base-paired to exactly the same sequence (in opposite orientation) on the other strand of the DNA helix. As a result, the pattern of DNA methylation on the parent DNA strand serves as a template for the methylation of the daughter DNA strand, causing this pattern to be inherited directly after DNA replication. Methylation patterns are dynamic during mammalian development. Shortly after fertilization, there is a genome- wide wave of demethylation, when the vast majority of methyl groups are lost from the DNA. Figure 46. How DNA methylation patterns are faithfully inherited. DNA methylation in vertebrate DNA occurs on cytosine (C) nucleotides largely in the sequence CG, which is base-paired to exactly the same sequence (in opposite orientation) on the other strand of the DNA helix.