Chapter 18 Regulation of Gene Expression PDF

Bacterial cells that can conserve resources and energy have a selective advantage over cells that are unable to do so. Thus, natural selection has favored bacteria that express only the genes whose products are needed by the cell. Consider, for instance, an individual Escherichia coli cell living in a human colon, dependent for its nutrients on the whimsical eating habits of its host. If the environment is lacking in the amino acid tryptophan, which the bacterium needs to survive, the cell responds by activating a metabolic pathway that makes tryptophan from another compound. If the human host later eats a tryptophan-rich meal, the bacterial cell stops producing tryptophan, thus avoiding wasting resources to produce a substance that is readily available from the surrounding solution. First, cells can adjust the activity of enzymes already present. This is a fairly rapid physiological response, which relies on the sensitivity of many enzymes to chemical cues that increase or decrease their catalytic activity. The activity of the first enzyme in the pathway is inhibited by the pathway’s end product—tryptophan. Thus, if tryptophan accumulates in a cell, it shuts down the synthesis of more tryptophan by inhibiting enzyme activity. Such feedback inhibition, typical of anabolic pathways, allows a cell to adapt to short-term fluctuations in the supply of a substance it needs. Second, cells can adjust the production level of certain enzymes via a genetic mechanism; that is, they can regulate the expression of the genes encoding the enzymes. If, in our example, the environment provides all the tryptophan the cell needs, the cell stops making the enzymes that catalyze the synthesis of tryptophan. In this case, the control of enzyme production occurs at the level of transcription, the synthesis of messenger RNA from the genes that code for these enzymes. Regulation of the tryptophan synthesis pathway is just one example of how bacteria tune their metabolism to changing environments. Many genes of the bacterial genome are switched on or off by changes in the metabolic status of the cell; some genes are regulated singly and others as groups of related genes. One basic mechanism for this type of regulation of groups of genes in bacteria, described as the operon model, was discovered in 1961 by François Jacob and Jacques Monod at the Pasteur Institute in Paris. Let’s see what an operon is and how it works. Each reaction in the pathway is catalyzed by a specific enzyme, and the five genes that code for the subunits of these enzymes are clustered together on the bacterial chromosome. A single promoter serves all five genes, which together constitute a transcription unit. Thus, transcription gives rise to one long mRNA molecule that codes for the five polypeptides making up the enzymes in the tryptophan pathway. The cell can translate this one mRNA into five separate polypeptides because the mRNA is punctuated with start and stop codons that signal where the coding sequence for each polypeptide begins and ends. A key advantage of grouping genes of related function into one transcription unit is that a single “on-off switch” can control the whole cluster of functionally related genes; in other words, these genes are coordinately controlled. When an E. coli cell must make tryptophan for itself because its surrounding environment lacks this amino acid, all the enzymes for the metabolic pathway are synthesized at the same time. The on-off switch is a segment of DNA called an operator. Both its location and name suit its function: Positioned within the promoter or, in some cases, between the promoter and the enzyme-coding genes, the operator controls the access of RNA polymerase to the genes. Together, the operator, the promoter, and the genes they control—the entire stretch of DNA required for enzyme production for the tryptophan pathway—constitute an operon. The trp operon is one of many operons in the E. coli genome. If the operator is the operon’s switch for controlling transcription, how does this switch work? By itself, the trp operon is turned on; that is, RNA polymerase can bind to the promoter and transcribe the genes of the operon. The trp operon can be switched off by a protein that is called the trp repressor. A repressor binds to the operator, preventing RNA polymerase from transcribing the genes, often by preventing RNA polymerase from binding. A repressor protein is specific for the operator of a particular operon. For example, the trp repressor, which switches off the trp operon by binding to the trp operator, has no effect on other operons in the E. coli genome. A repressor protein is encoded by a regulatory gene—in this case, a gene called trpR; trpR is located some distance from the trp operon and has its own promoter. Regulatory genes are among the bacterial genes that are expressed continuously, although at a low rate, and a few trp repressor molecules are always present in E. coli cells. Why, then, is the trp operon not switched off permanently? First, the binding of repressors to operators is reversible. An operator alternates between two states: one with the repressor bound and one without the repressor bound. The relative duration of the repressor-bound state is higher when more active repressor molecules are present. Second, the trp repressor, like most regulatory proteins, is an allosteric protein, with two alternative shapes: active and inactive. The trp repressor is synthesized in the inactive form, which has little affinity for the trp operator. Only when a tryptophan molecule binds to the trp repressor at an allosteric site does the repressor protein change to the active form that can attach to the operator, turning the operon off. Tryptophan functions in this system as a corepressor, a small molecule that cooperates with a repressor protein to switch an operon off. As tryptophan accumulates, more tryptophan molecules associate with trp repressor molecules, which can then bind to the trp operator and shut down production of the tryptophan pathway enzymes. If the cell’s tryptophan level drops, many fewer trp repressor proteins would have tryptophan bound, rendering them inactive; they would dissociate from the operator, allowing transcription of the operon’s genes to resume. The trp operon is one example of how gene expression can respond to changes in the cell’s internal and external environment. The trp operon is said to be a repressible operon because its transcription is usually on but can be inhibited when a specific small molecule binds allosterically to a regulatory protein. In contrast, an inducible operon is usually off but can be stimulated to be on when a specific small molecule interacts with a different regulatory protein. The classic example of an inducible operon is the lac operon. The disaccharide lactose is available to E. coli when the bacterium is in contact with any dairy product. Lactose metabolism by E. coli begins with hydrolysis of the disaccharide into its component monosaccharides, a reaction catalyzed by the enzyme ßß-galactosidase. Only a few molecules of this enzyme are present in an E. coli cell growing in the absence of lactose. If lactose is added to the bacterium’s environment, however, the number of ßß-galactosidase molecules in the cell increases 1,000-fold within about 15 minutes. How can a cell ramp up enzyme production this quickly?. The gene for ßß-galactosidase is part of the lac operon, which includes two other genes coding for enzymes that function in the use of lactose. The entire transcription unit is under the command of one main operator and promoter. The regulatory gene, lacI, located outside the lac operon, codes for an allosteric repressor protein that can switch off the lac operon by binding to the lac operator. So far, this sounds just like regulation of the trp operon, but there is one important difference. Recall that the trp repressor protein is inactive by itself and requires tryptophan as a corepressor in order to bind to the operator. The lac repressor, in contrast, is active by itself, binding to the operator and switching the lac operon off. In this case, a specific small molecule, called an inducer, inactivates the repressor. For the lac operon, the inducer is allolactose, an isomer of lactose formed in small amounts from lactose that enters the cell. In the absence of lactose, the lac repressor is in its active shape and binds to the operator; thus, the genes of the lac operon are silenced. If lactose is added to the cell’s surroundings, allolactose binds to the lac repressor and alters its shape so the repressor can no longer bind to the operator. Without the lac repressor bound, the lac operon is transcribed into mRNA, and the enzymes for using lactose are made. In the context of gene regulation, the enzymes of the lactose pathway are referred to as inducible enzymes because their synthesis is induced by a chemical signal. Analogously, the enzymes for tryptophan synthesis are said to be repressible. Repressible enzymes generally function in anabolic pathways, which synthesize essential end products from raw materials. By suspending production of an end product when it is already present in sufficient quantity, the cell can allocate its organic precursors and energy for other uses. In contrast, inducible enzymes usually function in catabolic pathways, which break down a nutrient to simpler molecules. By producing the appropriate enzymes only when the nutrient is available, the cell avoids wasting energy and precursors making proteins that are not needed. Regulation of both the trp and lac operons involves the negative control of genes because the operons are switched off by the active form of their respective repressor protein. It may be easier to see this for the trp operon, but it is also true for the lac operon. In the case of the lac operon, allolactose induces enzyme synthesis not by directly activating the lac operon, but by freeing it from the negative effect of the repressor. Gene regulation is said to be positive only when a regulatory protein interacts directly with the genome to increase transcription. When glucose and lactose are both present in its environment, E. coli preferentially uses glucose. The enzymes for glucose breakdown in glycolysis are continually present. Only when lactose is present and glucose is in short supply does E. coli use lactose as an energy source, and only then does it synthesize appreciable quantities of the enzymes for lactose breakdown. How does the E. coli cell sense the glucose concentration and relay this information to the lac operon? Again, the mechanism depends on the interaction of an allosteric regulatory protein with a small organic molecule, cyclic AMP in this case, which accumulates when glucose is scarce. The regulatory protein, called cAMP receptor protein, is an activator, a protein that binds to DNA and stimulates transcription of a gene. When cAMP binds to this regulatory protein, CRP assumes its active shape and can attach to a specific site at the upstream end of the lac promoter. This attachment increases the affinity of RNA polymerase for the lac promoter, which is actually rather low even when no lac repressor is bound to the operator. By facilitating the binding of RNA polymerase to the promoter and thereby increasing the rate of transcription of the lac operon, the attachment of CRP to the promoter directly stimulates gene expression. Therefore, this mechanism qualifies as positive regulation. If the amount of glucose in the cell increases, the cAMP concentration falls, and without cAMP, CRP detaches from the lac operon. Because CRP is inactive, RNA polymerase binds less efficiently to the promoter, and transcription of the lac operon proceeds only at a low level, even when lactose is present. Thus, the lac operon is under dual control: negative control by the lac repressor and positive control by CRP. Whether or not transcription occurs is controlled by allolactose: Without allolactose, the lac repressor is active and the operon is off with allolactose, the lac repressor is inactive and the operon is on. The rate of transcription is controlled by whether CRP has cAMP bound to it: With bound cAMP, the rate is high; without it, the rate is low. It is as though the operon has both an on-off switch and a volume control. In addition to regulating the lac operon, CRP helps regulate other operons that encode enzymes used in catabolic pathways. All told, it may affect the expression of more than 100 genes in E. coli. When glucose is plentiful and CRP is inactive, the synthesis of enzymes that catabolize compounds other than glucose generally slows down. The ability to catabolize other compounds, such as lactose, enables a cell deprived of glucose to survive. The compounds present in any given cell at a certain moment determine which operons are switched on—the result of simple interactions of activator and repressor proteins with the promoters of the genes in question. All organisms, whether prokaryotes or eukaryotes, must regulate which genes are expressed at any given time. Both unicellular organisms and the cells of multicellular organisms continually turn genes on and off in response to signals from their external and internal environments. Regulation of gene expression is also essential for cell specialization in multicellular organisms, which are made up of different types of cells. To perform its own distinct role, each cell type must maintain a specific program of gene expression in which certain genes are expressed and others are not. A typical human cell might express about a third to a half of its protein-coding genes at any given time. Highly differentiated cells, such as muscle or nerve cells, express a smaller fraction of their genes. Almost all the cells in a multicellular organism contain an identical genome. A subset of genes is expressed in each cell type; some of these—about 35%—are “housekeeping” genes, expressed by many cell types, while others are unique to that cell type. The uniquely expressed genes allow these cells to carry out their specific function. The differences between cell types, therefore, are due not to different genes being present, but to differential gene expression, the expression of different genes by cells with the same genome. The function of any cell depends on its expressing the appropriate set of genes. The transcription factors of a cell must locate the right genes at the right time, a task like finding a needle in a haystack. Abnormal gene expression can cause serious imbalances and diseases, including cancer. When the structure of DNA was determined in 1953, an understanding of the mechanisms that control gene expression in eukaryotes seemed almost hopelessly out of reach. Since then, advances in DNA technology have enabled molecular biologists to uncover many details of eukaryotic gene regulation. In all organisms, gene expression is commonly controlled at transcription; regulation at this stage often occurs in response to signals coming from outside the cell, such as hormones or other signaling molecules. For this reason, the term gene expression is often equated with transcription for both bacteria and eukaryotes. While this may be the case for bacteria, the greater complexity of eukaryotic cell structure and function provides opportunities for regulating gene expression at many stages besides transcription. Recall that the DNA of eukaryotic cells is packaged with proteins in an elaborate complex known as chromatin, the basic unit of which is the nucleosome. The structural organization of chromatin not only packs a cell’s DNA into a compact form that fits inside the nucleus, but also helps regulate gene expression in several ways. Genes within heterochromatin, which is more densely arranged than euchromatin, are usually not expressed. In euchromatin, whether or not a gene is transcribed is affected by the location of nucleosomes along a gene’s promoter and also the sites where the DNA attaches to the protein scaffolding of the chromosome. Chromatin structure and gene expression can be influenced by chemical modifications of both the histone proteins of the nucleosomes around which DNA is wrapped and the nucleotides that make up that DNA. Here we examine the effects of these modifications, which are catalyzed by specific enzymes. Chemical modifications to histones, found in all eukaryotic organisms, play a direct role in the regulation of gene transcription. The N-terminus of each histone protein in a nucleosome protrudes outward from the nucleosome. These so-called histone tails are accessible to various modifying enzymes that catalyze the addition or removal of specific chemical groups, such as acetyl, methyl, and phosphate groups. Generally, histone acetylation—the addition of an acetyl group to an amino acid in a histone tail—appears to promote transcription by opening up chromatin structure while the addition of methyl groups to histones can lead to the condensation of chromatin and reduced transcription. Often, the addition of a particular chemical group may create a new binding site for enzymes that further modify chromatin structure. Rather than modifying histone proteins, a different set of enzymes can methylate the DNA itself on certain bases, usually cytosine. Such DNA methylation occurs in most plants, animals, and fungi. Long stretches of inactive DNA, such as that of inactivated mammalian X chromosomes are generally more methylated than regions of actively transcribed DNA. On a smaller scale, the DNA of individual genes is usually more heavily methylated in cells in which those genes are not expressed. Removal of the extra methyl groups can turn on some of these genes. Once methylated, genes usually stay that way through successive cell divisions in a given individual. At DNA sites where one strand is already methylated, enzymes methylate the correct daughter strand after each round of DNA replication. Methylation patterns are thus passed on to daughter cells, and cells forming specialized tissues keep a chemical record of what occurred during embryonic development. A methylation pattern maintained in this way also accounts for genomic imprinting in mammals, where methylation permanently regulates expression of either the maternal or paternal allele of particular genes at the start of development. DNA methylation and histone modification are believed to be coordinated in their regulation. The chromatin modifications that we just discussed do not change the DNA sequence, yet they still may be passed along to future generations of cells. Inheritance of traits transmitted by mechanisms not involving the nucleotide sequence itself is called epigenetic inheritance, the study of which is called epigenetics. Whereas mutations in the DNA are permanent changes, modifications to the chromatin can be reversed. For example, DNA methylation patterns are largely erased during gamete formation and reestablished during embryonic development. Furthermore, they are changeable, thus responding more rapidly to environmental conditions. Research on epigenetics has skyrocketed over the past 20 years. The importance of epigenetic information in regulating gene expression is now widely accepted. One key study by Robert Waterland and Randy Jirtle at Duke University, published in 2003, used a mouse mutant whose genome had been altered so that a gene called agouti that determines coat color, normally expressed only briefly during fur formation, was instead expressed throughout development. This overexpression resulted in yellow mice rather than the usual brownish color. Previous work had shown that merely supplementing the diet of pregnant mothers with methyl group–containing compounds could shift the range of coat colors of the offspring back to normal. Waterland and Jirtle reproduced this result, analyzing the state of methylation of the DNA. They showed that the extent of the color shift correlated with the level of DNA methylation. In other words, feeding methyl groups to the mothers at a key time during gestation led to a change in gene expression in the offspring’s phenotype. Further studies showed that the effects were even observed in the next generation—the “grandpups” of the original female mouse. A similar epigenetic effect due to changes in methylation occurs in humans as well. Near the end of World War II, during the winter of 1944–45, Dutch railway workers went on strike to try to prevent the Nazis from bringing in more troops. In retaliation, the Nazis blocked all deliveries of food. Over 20,000 Dutch died in the “Dutch Hunger Winter”. Over time, doctors found that the offspring of women in early pregnancy at that time experienced adverse health effects as adults: higher rates of obesity, high triglyceride and cholesterol levels, type 2 diabetes, and schizophrenia. Furthermore, the affected individuals had a 10% higher mortality rate after the age of 68 than their siblings who were in utero when food was readily available. A collaboration between labs in the Netherlands and the United States compared those adults with their siblings and published their findings in 2018. Statistical analysis enabled the researchers to conclude that differences between the siblings in DNA methylation of certain genes caused these long-term adverse medical conditions—examples of epigenetic inheritance. Epigenetic variations might help explain cases where one identical twin acquires a genetically based disease, such as schizophrenia, but the other does not, despite their identical genomes. Alterations in normal patterns of DNA methylation are seen in some cancers, where they are associated with inappropriate gene expression. Evidently, enzymes that modify chromatin structure are integral parts of the eukaryotic cell’s machinery for regulating transcription. Chromatin-modifying enzymes provide initial control of gene expression by making a region of DNA either more or less able to bind the transcription machinery. Once the chromatin of a gene is optimally modified for expression, the initiation of transcription is the next major step at which gene expression is regulated. As in bacteria, the regulation of transcription initiation in eukaryotes involves proteins that bind to DNA and either facilitate or inhibit binding of RNA polymerase. The process is more complicated in eukaryotes, however. Before looking at how eukaryotic cells control their transcription, let’s review the structure of a eukaryotic gene. Recall that a cluster of proteins called a transcription initiation complex assembles on the promoter sequence at the “upstream” end of the gene. One of these proteins, RNA polymerase II, then proceeds to transcribe the gene, synthesizing a primary RNA transcript (pre-mRNA). RNA processing includes enzymatic addition of a 5′ cap and a poly-A tail, as well as splicing out of introns, to yield a mature mRNA. Associated with most eukaryotic genes are multiple control elements, segments of noncoding DNA that serve as binding sites for the proteins called transcription factors, which bind to the control elements and regulate transcription. Control elements on the DNA and the transcription factors that bind to them are critical to the precise regulation of gene expression seen in different cell types. There are two types of transcription factors: General transcription factors act at the promoter of all genes, while some genes require specific transcription factors that bind to control elements that may be close to or farther away from the promoter. To initiate transcription, eukaryotic RNA polymerase II requires the assistance of transcription factors. Some transcription factors are essential for the transcription of all protein-coding genes; therefore, they are often called general transcription factors. A few general transcription factors bind to a DNA sequence, such as the TATA box in most promoters, but many bind to proteins, including other transcription factors as well as RNA polymerase II. Protein-protein interactions are crucial to the initiation of eukaryotic transcription. Only when the complete initiation complex has assembled can the polymerase begin to move along the DNA template strand and transcribe.Some genes are expressed all the time, but others are not; instead, they are regulated. For these genes, the interaction of general transcription factors and RNA polymerase II with a promoter usually leads to a low rate of initiation and production of few RNA transcripts from genes that are not expressed at significant levels all the time or in all cells. In eukaryotes, high levels of transcription of these particular genes at the appropriate time and place depend on the interaction of control elements with another set of proteins, which can be thought of as specific transcription factors. The more distant distal control elements, groupings of which are called enhancers, may be thousands of nucleotides upstream or downstream of a gene or even within an intron. A given gene may have multiple enhancers, each active at a different time, cell type, or location in the organism. Each enhancer, however, is generally associated with only that gene and no other. In eukaryotes, the rate of gene expression can be strongly increased or decreased by the binding of specific transcription factors, either activators or repressors, to the control elements of enhancers. Hundreds of transcription activators have been discovered in eukaryotes. Researchers have identified two types of structural domains that are commonly found in a large number of transcription activators: a DNA-binding domain—a part of the protein’s three-dimensional structure that binds to DNA—and one or more activation domains. Activation domains bind other regulatory proteins or components of the transcription machinery, facilitating a series of protein-protein interactions that result in enhanced transcription of a given gene. Protein-mediated bending of the DNA brings the bound activators into contact with a group of mediator proteins, which in turn interact with general transcription factors at the promoter. These protein-protein interactions help assemble and position the initiation complex on the promoter. One of the studies supporting this model shows that proteins regulating one of the mouse globin genes contact both the gene’s promoter and an enhancer located about 50,000 nucleotides upstream. Protein interactions allow these two DNA regions to come together in a very specific fashion, in spite of the many nucleotide pairs between them. Specific transcription factors that function as repressors can inhibit gene expression in several different ways. Some repressors bind directly to control element DNA, blocking activator binding. Other repressors interfere with the activator itself so it can’t bind the DNA. In addition to influencing transcription directly, some activators and repressors indirectly affect chromatin structure. Studies using yeast and mammalian cells show that some activators recruit proteins that acetylate histones near the promoters of specific genes, thus promoting transcription. Similarly, some repressors recruit proteins that remove acetyl groups from histones, leading to reduced transcription, a phenomenon referred to as silencing. Indeed, recruitment of chromatin-modifying proteins seems to be the most common mechanism of repression in eukaryotic cells. In eukaryotes, the precise control of transcription depends largely on the binding of activators to DNA control elements. Considering that many genes must be regulated in a typical animal or plant cell, the number of different nucleotide sequences in control elements is surprisingly small. A dozen or so short nucleotide sequences appear again and again in the control elements for different genes. On average, each enhancer is composed of about ten control elements, each binding only one or two specific transcription factors. It is the particular combination of control elements in an enhancer associated with a gene, rather than a single unique control element, that is important in regulating transcription of the gene. Even with only a dozen control element sequences available, many combinations are possible. Each combination of control elements can activate transcription only when the appropriate transcription activators are present, which may occur at a precise time during development or in a particular cell type. This can occur because each cell type contains a different group of transcription activators. How does the eukaryotic cell deal with a group of genes of related function that need to be turned on or off at the same time? Earlier in this chapter, you learned that in bacteria, such coordinately controlled genes are often clustered into an operon, which is regulated by a single promoter and transcribed into a single mRNA molecule. Thus, the genes are expressed together, and the encoded proteins are produced at the same time. With a few exceptions, operons that work in this way have not been found in eukaryotic cells. Eukaryotic genes that are co-expressed, such as genes coding for the enzymes of a metabolic pathway, are typically scattered over different chromosomes. Here, coordinate gene expression depends on every gene of a dispersed group having a specific combination of control elements. Transcription activators in the nucleus that recognize the control elements bind to them, promoting simultaneous transcription of the genes, no matter where they are in the genome. Coordinate control of dispersed genes in a eukaryotic cell often occurs in response to chemical signals from outside the cell. A steroid hormone, for example, enters a cell and binds to a specific intracellular receptor protein, forming a hormone-receptor complex that serves as a transcription activator. Every gene whose transcription is stimulated by a given steroid hormone, regardless of its chromosomal location, has a control element recognized by that hormone-receptor complex. This is how estrogen activates a group of genes that stimulate cell division in uterine cells, preparing the uterus for pregnancy. Many signaling molecules, such as nonsteroid hormones and growth factors, bind to receptors on a cell’s surface and never actually enter the cell. Such molecules can control gene expression indirectly by triggering signal transduction pathways that activate particular transcription factors. Coordinate regulation in such pathways is the same as for steroid hormones: Genes with the same sets of control elements are activated by the same chemical signals. Because this system for coordinating gene regulation is so widespread, biologists think that it probably arose early in evolutionary history. Each chromosome in the interphase nucleus of animal cells occupies a distinct territory. Recently, chromosome conformation capture techniques have been developed that allow researchers to cross-link and identify regions of chromosomes associating with each other during interphase. These studies reveal two organizational details: First, the territory of each chromosome is divided into regions of chromatin loops, within which chromatin sites associate mainly with each other. Second, loops of chromatin, each likely a TAD, extend from individual chromosomal territories into specific sites in the nucleus. Different loops from the same chromosome and loops from other chromosomes may congregate in such sites, some of which are rich in RNA polymerases and other transcription-associated proteins. Like a recreation center that draws members from many different neighborhoods, these so-called transcription factories are thought to be areas specialized for a common function. The old view that the nuclear contents are like a bowl of amorphous chromosomal spaghetti has given way to a new model of a nucleus with a defined architecture and regulated movements of chromatin. Several lines of evidence suggest that unexpressed genes are located in the outer edges of the nucleus, while those that are being expressed are found in its interior region. Relocation of particular genes from their chromosomal territories to transcription factories in the interior may be part of the process of readying genes for transcription. In 2017, a consortium of researchers funded by the National Institutes of Health began investigating genome organization over time and its relationship to genome function. Transcription alone does not constitute gene expression. The expression of a protein-coding gene is ultimately measured by the amount of functional protein a cell makes, and much happens between the synthesis of the RNA transcript and the activity of the protein in the cell. Many regulatory mechanisms operate at the various stages after transcription. These mechanisms allow a cell to rapidly fine-tune gene expression in response to environmental changes without altering its transcription patterns. Here we explore how cells regulate gene expression after transcription. RNA processing in the nucleus and the export of mature RNA to the cytoplasm provide opportunities for regulating gene expression not available in prokaryotes. One example of regulation at the RNA-processing level is alternative RNA splicing, in which different mRNA molecules are produced from the same primary transcript, depending on which RNA segments are treated as exons and which as introns. Regulatory proteins specific to a cell type control intron/exon choices by binding to regulatory sequences within the primary transcript. Other genes code for many more products. For instance, researchers have found a Drosophila gene with enough alternatively spliced exons to generate about 19,000 membrane proteins that have different extracellular domains. At least 17,500 of the alternative mRNAs are actually synthesized. Each developing nerve cell in the fly appears to synthesize a different form of the protein, which acts as a unique identifier on the cell surface and helps prevent excessive overlap of nerve cells during development of the nervous system. Alternative RNA splicing can significantly expand the repertoire of a eukaryotic genome. In fact, alternative splicing was proposed as one explanation for the surprisingly low number of human genes counted when the human genome was sequenced. The number of human genes was found to be similar to that of a soil worm, a mustard plant, or a sea anemone. This discovery prompted questions about what, if not the number of genes, accounts for the more complex form and structure of humans. More than 90% of human protein-coding genes likely undergo alternative splicing. Thus, the extent of alternative splicing greatly multiplies the number of possible human proteins, which may be better correlated with complexity of form than the number of genes. Translation is another stage where gene expression is regulated, most commonly at the initiation stage. For some mRNAs, the initiation of translation can be blocked by regulatory proteins that bind to specific sequences or structures within the untranslated region at the 5′ or 3′ end, preventing the attachment of ribosomes. Alternatively, translation of all the mRNAs in a cell may be regulated simultaneously. In a eukaryotic cell, such “global” control usually involves the activation or inactivation of one or more protein factors required to initiate translation. This mechanism plays a role in starting translation of mRNAs that are stored in eggs. Just after fertilization, translation is triggered by the sudden activation of translation initiation factors. The response is a burst of synthesis of the proteins encoded by the stored mRNAs. Some plants and algae store mRNAs during periods of darkness; light then triggers the reactivation of the translational apparatus. The life span of mRNA molecules in the cytoplasm is important in determining the pattern of protein synthesis in a cell. Bacterial mRNA molecules typically are degraded by enzymes within a few minutes. This short life span of mRNAs is one reason bacteria can change their patterns of protein synthesis so quickly in response to environmental changes. In contrast, some mRNAs in multicellular eukaryotes typically survive for hours, days, or even weeks. For instance, the mRNAs for the hemoglobin polypeptides in developing red blood cells are unusually stable, and these long-lived mRNAs are translated repeatedly in red blood cells. Nucleotide sequences that affect how long an mRNA remains intact are often found in the untranslated region at the 3′ end of the molecule. In one experiment, researchers transferred such a sequence from the short-lived mRNA for a growth factor to the 3′ end of a normally stable globin mRNA. The globin mRNA was quickly degraded. Other mechanisms that degrade or block expression of mRNA molecules have come to light. They involve a group of recently discovered RNA molecules that regulate gene expression at several levels, as you’ll see shortly. The final opportunities for controlling gene expression occur after translation. Often, eukaryotic polypeptides must be processed to yield functional protein molecules. For instance, cleavage of the initial insulin polypeptide forms the active hormone. In addition, many proteins undergo chemical modifications that make them functional. Regulatory proteins are commonly activated or inactivated by the reversible addition of phosphate groups and proteins destined for the surface of animal cells acquire sugars. Cell-surface proteins and many others must also be transported to target destinations in the cell in order to function. Regulation might occur at any of the steps involved in modifying or transporting a protein. Finally, the length of time each protein functions in the cell is strictly regulated by selective degradation. Many proteins, such as the cyclins involved in regulating the cell cycle, must be relatively short-lived if the cell is to function appropriately. To mark a protein for destruction, the cell commonly attaches molecules of a small protein called ubiquitin to the protein. Giant protein complexes called proteasomes then recognize the ubiquitin tagged proteins and degrade them. Genome sequencing has revealed that protein-coding DNA accounts for only 1.5% of the human genome and a similarly small percentage of the genomes of many other multicellular eukaryotes. A very small fraction of the non-protein-coding DNA consists of genes for RNAs such as ribosomal RNA and transfer RNA. Until recently, scientists assumed that most of the remaining DNA was not transcribed, thinking that since it didn’t specify proteins or the few known types of RNA, such DNA didn’t contain meaningful genetic information—in fact, it was called “junk DNA.” However, some genomic studies have cast doubt on this description. For example, a massive study showed that roughly 75% of the human genome is transcribed at some point in any given cell. Introns account for only a fraction of this transcribed RNA, most of which is untranslated. The verdict is still out on how much of the transcribed RNA is functional, but at least some of the genome is transcribed into non-protein-coding RNAs, including a variety of small RNAs. Researchers are uncovering more evidence of the biological roles of these ncRNAs every day. These discoveries have revealed a large and diverse population of RNA molecules in the cell that play crucial roles in regulating gene expression and have gone largely unnoticed until recently. The longstanding view that mRNAs are the most important RNAs because they code for proteins needs revision. This represents a major shift in thinking by biologists, one that you are witnessing as students entering this field of study. Regulation by both small and large ncRNAs occurs at several points in the pathway of gene expression, including mRNA translation and chromatin modification. We’ll examine two types of small ncRNAs, the importance of which was acknowledged when their discovery was the focus of the 2006 Nobel Prize in Physiology or Medicine, which was awarded for work completed only eight years earlier. Since 1993, a number of research studies have uncovered microRNAs (miRNAs)—small, single-stranded RNA molecules capable of binding to complementary sequences in mRNA molecules. A longer RNA precursor is processed by cellular enzymes into an miRNA, a single-stranded RNA of about 22 nucleotides that forms a complex with one or more proteins. The miRNA allows the complex to bind to any mRNA molecule with at least seven or eight nucleotides of complementary sequence. The miRNA-protein complex then degrades the target mRNA or, less often, simply blocks its translation. There are approximately 1,500 genes for miRNAs in the human genome, and biologists estimate that expression of at least one-half of all human genes may be regulated by miRNAs, a remarkable figure given that the existence of miRNAs was unknown until the early 1990s. Another class of small noncoding RNAs, similar in size and function to miRNAs, is called small interfering RNAs. Both miRNAs and siRNAs can associate with the same proteins, producing similar results. In fact, if siRNA precursor RNA molecules are injected into a cell, the cell’s machinery can process them into siRNAs that turn off expression of genes with related sequences, similarly to how miRNAs function. The distinction between miRNAs and siRNAs is based on subtle differences in the structure of their precursors, which in both cases are RNA molecules that are mostly double-stranded. The blocking of gene expression by siRNAs, referred to as RNA interference, is used in the laboratory as a means of disabling specific genes to investigate their function. Given that the cellular RNAi pathway can process double-stranded RNAs into homing devices that lead to destruction of related RNAs, some scientists think that this pathway may have evolved as a natural defense against infection by such viruses. However, the fact that RNAi can also affect the expression of nonviral cellular genes may reflect a different evolutionary origin for the RNAi pathway. While this section has focused on ncRNAs in eukaryotes, small ncRNAs are also used by bacteria as a defense system, called the CRISPR-Cas9 system, against viruses that infect them. The use of ncRNAs thus evolved long ago, but we don’t yet know how bacterial ncRNAs are related to those of eukaryotes. In addition to regulating mRNAs, small noncoding RNAs can cause remodeling of chromatin structure. In the S phase of the cell cycle, for example, the centromeric regions of DNA must be loosened for chromosomal replication and then re-condensed into heterochromatin in preparation for mitosis. In some yeasts, siRNAs produced by the yeast cells from the centromeric DNA are required to re-form the heterochromatin at the centromeres. Exactly how the process starts is still debated, but biologists agree on the general idea: The siRNA system in yeast interacts with other, larger noncoding RNAs and with chromatin-modifying enzymes to condense the centromere chromatin into heterochromatin. In most mammalian cells, siRNAs have not been found, and the mechanism for centromere DNA condensation is not yet understood. However, it may also turn out to involve small noncoding RNAs. A recently discovered class of small ncRNAs called piwi-interacting RNAs, or piRNAs, also induces formation of heterochromatin, blocking expression of some parasitic DNA elements in the genome known as transposons. Usually 24–31 nucleotides in length, piRNAs are processed from a longer, single-stranded RNA precursor. They play an indispensable role in the germ cells of many animal species, where they appear to help reestablish appropriate methylation patterns in the genome during gamete formation. Researchers have also found a relatively large number of long noncoding RNAs, ranging from 200 to hundreds of thousands of nucleotides in length, that are expressed at significant levels in specific cell types at particular times. The functional significance of these lncRNAs has been debated, but in 2017 a large international research consortium published an atlas of almost 28,000 such RNAs; their analysis supported the idea that almost 20,000 were functional and that some were associated with specific diseases. One lncRNA, long known to be functional, is responsible for X chromosome inactivation, which prevents expression of genes located on one of the X chromosomes in most female mammals. In this case, lncRNAs—transcripts of the XIST gene located on the chromosome to be inactivated—bind back to and coat that chromosome. This binding leads to condensation of the entire chromosome into heterochromatin. The examples just described involve chromatin remodeling in large regions of the chromosome. Because chromatin structure affects transcription and thus gene expression, RNA-based regulation of chromatin structure is sure to play an important role in gene regulation. Additionally, some experimental evidence supports the idea of an alternate role for lncRNAs in which they can act as a scaffold, bringing DNA, proteins, and other RNAs together into complexes. These associations may act either to condense chromatin or, in some cases, to help bring the enhancer of a gene together with mediator proteins and the gene’s promoter, activating gene expression in a more direct fashion. Given the extensive functions of noncoding RNAs, it is not surprising that many of the ncRNAs characterized thus far play important roles in embryonic development—the topic we turn to in the next section. Embryonic development is perhaps the ultimate example of precisely regulated gene expression. In the embryonic development of multicellular organisms, a fertilized egg gives rise to cells of many different types, each with a different structure and corresponding function. Typically, cells are organized into tissues, tissues into organs, organs into organ systems, and organ systems into the whole organism. Thus, any developmental program must produce cells of different types that form higher-level structures arranged in a particular way in three dimensions. This remarkable transformation results from three interrelated processes: cell division, cell differentiation, and morphogenesis. Through a succession of mitotic cell divisions, the zygote gives rise to a large number of cells. Cell division alone, however, would merely produce a great ball of identical cells, nothing like a tadpole. During embryonic development, cells not only increase in number, but also undergo cell differentiation, the process by which cells become specialized in structure and function. Moreover, the different kinds of cells are not randomly distributed but are organized into tissues and organs in a particular three-dimensional arrangement. The physical processes that give an organism its shape constitute morphogenesis, the development of the form of an organism and its structures. All three processes are rooted in cellular behavior. Even morphogenesis, the shaping of the organism, can be traced back to changes in the shape, motility, and other characteristics of the cells that make up various regions of the embryo. As you have seen, the activities of a cell depend on the genes it expresses and the proteins it produces. Almost all cells in an organism have the same genome; therefore, differential gene expression results from the genes being regulated differently in each cell type. Each of these fully differentiated cells has a particular mix of specific transcription factor activators that turn on the collection of genes whose products are required in the cell. The fact that both cells arose through a series of mitoses from a common fertilized egg inevitably leads to a question: How do different sets of activators come to be present in the two cells?. It turns out that materials placed into the egg by maternal cells set up a sequential program of gene regulation that is carried out as embryonic cells divide, and this program coordinates cell differentiation during embryonic development. To understand how this works, we will consider two basic developmental processes. First, we’ll explore how cells that arise from early embryonic mitoses develop the differences that start each cell along its own differentiation pathway. Second, we’ll see how cellular differentiation leads to one particular cell type, using muscle development as an example. What generates the first differences among cells in an early embryo? And what controls the differentiation of all the various cell types as development proceeds? You can probably deduce the answer: The specific genes expressed in any particular cell of a developing organism determine its path. Two sources of information, used to varying extents in different species, “tell” a cell which genes to express at any given time during embryonic development. One important source of information early in development is the egg’s cytoplasm, which contains both RNA and proteins encoded by the mother’s DNA. The cytoplasm of an unfertilized egg is not homogeneous. Messenger RNAs, proteins, other substances, and organelles are distributed unevenly in the unfertilized egg, and this unevenness has a profound impact on the development of the future embryo in many species. Maternal substances in the egg that influence the course of early development are called cytoplasmic determinants. After fertilization, early mitotic divisions distribute the zygote’s cytoplasm into separate cells. The nuclei of these cells may thus be exposed to different cytoplasmic determinants, depending on which portions of the zygotic cytoplasm a cell received. The combination of cytoplasmic determinants in a cell helps determine its developmental fate by regulating expression of the cell’s genes during the course of cell differentiation. The other major source of developmental information, which becomes increasingly important as the number of embryonic cells increases, is the environment around a particular cell. Most influential are the signals conveyed to an embryonic cell from other embryonic cells in the vicinity, including contact with cell-surface molecules on neighboring cells and the binding of growth factors secreted by neighboring cells. Such signals cause changes in the target cells, a process called induction. The molecules that transmit these signals within the target cell are cell-surface receptors and other signaling pathway proteins. In general, the signal sends a cell down a specific developmental path by causing changes in its gene expression that lead to observable cellular changes. Thus, interactions between embryonic cells help induce differentiation into the many specialized cell types making up a new organism. The earliest changes that set a cell on its path to specialization are subtle ones, showing up only at the molecular level. Before biologists knew much about the molecular changes occurring in embryos, they coined the term determination to refer to the point at which an embryonic cell is irreversibly committed to becoming a particular cell type. Once it has undergone determination, an embryonic cell can be experimentally placed in another location in the embryo and it will still differentiate into the cell type that is its normal fate. Differentiation, then, is the process by which a cell attains its determined fate. As the tissues and organs of an embryo develop and their cells differentiate, the cells become more noticeably different in structure and function. Today we understand determination in terms of molecular changes that result in observable cell differentiation, marked by the expression of genes for tissue-specific proteins. Such proteins are found only in a specific cell type and give the cell its characteristic structure and function. The first sign of differentiation is the appearance of mRNAs for tissue-specific proteins. Later, differentiation is observable with a microscope as changes in cellular structure. On the molecular level, different sets of genes are sequentially expressed in a regulated manner as new cells arise when their precursors divide. Multiple steps in gene expression may be regulated during differentiation, transcription being the most common. In the fully differentiated cell, transcription remains the principal regulatory point for maintaining appropriate gene expression. Differentiated cells are specialists at making tissue-specific proteins. For example, as a result of transcriptional regulation, liver cells specialize in making albumin, and lens cells specialize in making crystallin. Skeletal muscle cells in vertebrates are another instructive example. Each of these cells is a long fiber containing many nuclei within a single plasma membrane. Skeletal muscle cells have high concentrations of muscle-specific versions of the contractile proteins myosin and actin, as well as membrane receptor proteins that detect signals from nerve cells. Muscle cells develop from embryonic precursor cells that have the potential to develop into a number of cell types, including cartilage cells and fat cells, but particular conditions commit them to becoming muscle cells. Although the committed cells appear unchanged under the microscope, determination has occurred, and they are now a cell type called myoblasts. Eventually, myoblasts start to churn out large amounts of muscle-specific proteins and fuse to form mature, elongated, multinucleate skeletal muscle cells. Researchers have worked out what happens at the molecular level during muscle cell determination. In a series of experiments, they isolated different genes, caused each to be expressed in a separate embryonic precursor cell, and then looked for differentiation into myoblasts and muscle cells. In this way, they identified several so-called “master regulatory genes” whose protein products commit the cells to becoming skeletal muscle cells. Thus, in the case of muscle cells, the molecular basis of determination is the expression of one or more of these master regulatory genes. To understand more about how determination occurs in muscle cell differentiation, let’s focus on the master regulatory gene called myoD. The myoD gene deserves its designation as a master regulatory gene. Researchers have shown that the MyoD protein it encodes is capable of changing some kinds of fully differentiated nonmuscle cells, such as fat cells and liver cells, into muscle cells. Why doesn’t MyoD work on all kinds of cells? One likely explanation is that activation of muscle-specific genes is not solely dependent on MyoD but requires a particular combination of regulatory proteins, some of which are lacking in cells that do not respond to MyoD. What is the molecular basis for muscle cell differentiation? The MyoD protein is a transcription factor that binds to specific control elements in the enhancers of various target genes and stimulates their expression. Some target genes for MyoD encode still other muscle-specific transcription factors. MyoD also stimulates expression of the myoD gene itself, an example of positive feedback that perpetuates MyoD’s effect in maintaining the cell’s differentiated state. Presumably, all the genes activated by MyoD have enhancer control elements recognized by MyoD and are thus coordinately controlled. Finally, the secondary transcription factors activate the genes for proteins such as myosin and actin that confer the unique properties of skeletal muscle cells. The determination and differentiation of other kinds of tissues may play out in a similar fashion. Experimental results support the idea that master regulatory proteins like MyoD might function by opening the chromatin in certain regions. This allows access to transcription machinery for activation of the next set of cell-type-specific genes. We have now seen how different programs of gene expression that are activated in the fertilized egg can result in differentiated cells and tissues. But for the tissues to function effectively in the organism as a whole, the organism’s body plan—its overall three-dimensional arrangement—must be established and superimposed on the differentiation process. Let’s look at the molecular basis for the establishment of the body plan, using the well-studied fruit fly Drosophila melanogaster as an example. Cytoplasmic determinants and inductive signals both contribute to spatially organizing the tissues and organs of an organism in their characteristic places. This developmental process is referred to as pattern formation. Just as the locations of the front, back, and sides of a new building are determined before construction begins, pattern formation in animals begins in the early embryo, when the major axes of an animal are established. In a bilaterally symmetrical animal, the relative positions of head and tail, right and left sides, and back and front—the three major body axes—are set up before the organs appear. The molecular cues that control pattern formation, collectively called positional information, are provided by cytoplasmic determinants and inductive signals. These cues tell a cell its location relative to the body axes and to neighboring cells, and determine how the cell and its descendants will respond to future molecular signals. During the early 20th century, classical embryologists made detailed anatomical observations of embryonic development in a number of species and performed experiments in which they manipulated embryonic tissues. Although this research laid the groundwork for understanding the mechanisms of development, it did not reveal the specific molecules that guide development or determine how patterns are established. In the 1940s, scientists began using the genetic approach—the study of mutants—to investigate Drosophila development. That approach has had spectacular success. These studies have established that genes control development and have led to an understanding of the key roles that specific molecules play in defining position and directing differentiation. By combining anatomical, genetic, and biochemical approaches to the study of Drosophila development, researchers have discovered developmental principles common to many other species, including humans. Fruit flies and other arthropods have a modular construction, an ordered series of segments. These segments make up the body’s three major parts: the head, the thorax and the abdomen. Like other bilaterally symmetrical animals, Drosophila has an anterior-posterior axis, a dorsal-ventral axis, and a right-left axis. In Drosophila, cytoplasmic determinants that are localized in the unfertilized egg provide positional information for the placement of anterior-posterior and dorsal-ventral axes even before fertilization. We’ll focus here on the molecules involved in establishing the anterior-posterior axis as a case in point. The Drosophila egg develops in one of the female’s ovaries, next to the nurse cells, which supply the egg with nutrients, mRNAs, and other substances needed for development. The egg and nurse cells are surrounded by follicle cells, which make the eggshell. After fertilization and laying of the egg, embryonic development results in the formation of a segmented larva, which goes through three larval stages. Edward B. Lewis was a visionary American biologist who, in the 1940s, first showed the value of the genetic approach to studying embryonic development in Drosophila. Lewis studied bizarre mutant flies with developmental defects that led to extra wings or legs in the wrong place. He located the mutations on the fly’s genetic map, thus connecting the developmental abnormalities to specific genes. This research supplied the first concrete evidence that genes somehow direct the developmental processes studied by embryologists. The genes Lewis discovered, called homeotic genes, are regulatory genes that control pattern formation in the fly. Further insight into pattern formation during early embryonic development did not come for another 30 years, when two researchers in Germany, Christiane Nüsslein-Volhard and Eric Wieschaus, set out to identify all the genes that affect segment formation in Drosophila. The project was daunting for three reasons. The first was the sheer number of Drosophila genes, now known to total about 14,000. The genes affecting segmentation might be just a few needles in a haystack or might be so numerous and varied that the scientists would be unable to make sense of them. Second, mutations affecting a process as fundamental as segmentation would surely be embryonic lethals, mutations with phenotypes causing death at the embryonic or larval stage. Because organisms with embryonic lethal mutations never reproduce, they cannot be bred for study. The researchers dealt with this problem by looking for recessive mutations, which can be propagated in heterozygous flies that act as genetic carriers. Third, cytoplasmic determinants in the egg were known to play a role in axis formation, so the researchers knew they would have to study the mother’s genes as well as those of the embryo. It is the mother’s genes that we will discuss further as we focus on how the anterior-posterior body axis is set up in the developing egg. Nüsslein-Volhard and Wieschaus began their search for segmentation genes by exposing flies to mutagenic agents and scanning their descendants for dead embryos or larvae with abnormal segmentation or other defects. For example, to find genes that might set up the anterior-posterior axis, they looked for embryos or larvae with abnormal ends, such as two heads or two tails, predicting that such abnormalities would arise from mutations in maternal genes required for correctly setting up the offspring’s head or tail end. Using this approach, Nüsslein-Volhard and Wieschaus eventually identified about 1,200 genes essential for pattern formation during embryonic development. Of these, about 120 were essential for normal segmentation patterns. Next, the researchers were able to group these segmentation genes by general function and to isolate many of them for further study. The result was a detailed molecular understanding of the early steps in pattern formation in Drosophila. When the results of Nüsslein-Volhard and Wieschaus were combined with Lewis’s earlier work, a coherent picture of Drosophila development emerged. In recognition of their discoveries, the three researchers were awarded a Nobel Prize in 1995. Let’s consider a specific example of the genes that Nüsslein-Volhard, Wieschaus, and co-workers found. As we mentioned earlier, cytoplasmic determinants in the egg are the substances that initially establish the axes of the Drosophila body. These substances are encoded by genes of the mother, fittingly called maternal effect genes. A gene classified as a maternal effect gene is one that, when mutant in the mother, results in a mutant phenotype in the offspring, regardless of the offspring’s own genotype. In fruit fly development, the mRNA or protein products of maternal effect genes are placed in the egg while it is still in the mother’s ovary. When the mother has a mutation in such a gene, she makes a defective gene product, and her eggs are abnormal; when these eggs are fertilized, they fail to develop properly. Because maternal effect genes control the orientation of the egg and consequently that of the fly, they are also called egg-polarity genes. Two groups of these genes set up the anterior-posterior and dorsal-ventral axes of the embryo. Like mutations in segmentation genes, mutations in maternal effect genes are generally embryonic lethals. To see how maternal effect genes determine the body axes of the offspring, we will focus on one such gene, called bicoid, a term meaning “two-tailed.” An embryo or larva whose mother has two mutant bicoid alleles lacks the front half of its body and has posterior structures at both ends. This phenotype suggested to Nüsslein-Volhard and her colleagues that the product of the mother’s bicoid gene is essential for setting up the anterior end of the fly and might be concentrated at the future anterior end of the embryo. This hypothesis is an example of the morphogen gradient hypothesis first proposed by embryologists a century ago, in which gradients of substances called morphogens establish an embryo’s axes and other features of its form. DNA technology and other modern biochemical methods enabled the researchers to test whether the bicoid product, a protein called Bicoid, is in fact a morphogen that determines the anterior end of the fly. First, they asked whether the location of the mRNA and protein products of this gene in the egg was consistent with the hypothesis. They found that bicoid mRNA is highly concentrated at the extreme anterior end of the mature egg. After the egg is fertilized, the mRNA is translated into protein. The Bicoid protein then diffuses from the anterior end toward the posterior, resulting in a gradient of protein within the early embryo, most highly concentrated at the anterior end. These results are consistent with the hypothesis that Bicoid protein specifies the fly’s anterior end. To test this more specifically, scientists injected pure bicoid mRNA into various regions of early embryos. The protein that resulted from its translation caused anterior structures to form at the injection sites. The bicoid research was groundbreaking for several reasons. First, it led to the identification of a specific protein required for some of the earliest steps in pattern formation. It thus helped us understand how different regions of the egg can give rise to cells that go down different developmental pathways. Second, it increased our understanding of the mother’s critical role in the initial phases of embryonic development. Third, the principle that a gradient of morphogens can determine polarity and position has proved to be a key developmental concept for a number of species, just as early embryologists had hypothesized. Maternal mRNAs are crucial during development of many species. In Drosophila, gradients of specific proteins encoded by maternal mRNAs not only determine the posterior and anterior ends but also establish the dorsal-ventral axis. As the fly embryo grows, it reaches a point when the embryonic program of gene expression takes over, and the maternal mRNAs must be destroyed. Later, positional information encoded by the embryo’s genes, operating on an ever finer scale, establishes a specific number of correctly oriented segments and triggers the formation of each segment’s characteristic structures. The gene does not encode any antenna protein, however. Instead, it encodes a transcription factor that regulates other genes, and its malfunction leads to misplaced structures, such as legs instead of antennae. The observation that a change in gene regulation during development could lead to such a fantastic change in body form prompted some scientists to consider whether these types of mutations could contribute to evolution by generating novel body shapes. In this section, we have seen how a carefully orchestrated program of sequential gene regulation controls the transformation of a fertilized egg into a multicellular organism. The program is carefully balanced between turning on the genes for differentiation in the right place and turning off other genes. Even when an organism is fully developed, gene expression is regulated in a similarly fine-tuned manner. In the final section of the chapter, we’ll consider how fine this tuning is by looking at how specific changes in expression of just a few genes can lead to the development of cancer. Now that we have discussed the molecular basis of gene expression and its regulation, we can look at cancer more closely. The gene regulation systems that go wrong during cancer turn out to be the very same systems that play important roles in embryonic development, the immune response, and many other biological processes. Thus, research into the molecular basis of cancer has both benefited from and informed many other fields of biology. The genes that normally regulate cell growth and division during the cell cycle include genes for growth factors, their receptors, and the intracellular molecules of signaling pathways. Mutations that alter any of these genes in somatic cells can lead to cancer. The agent of such change can be random spontaneous mutation. Many cancer-causing mutations likely also result from environmental influences: chemical carcinogens like tobacco, X-rays and other high-energy radiation, and some viruses. Cancer research uncovered cancer-causing genes called oncogenes in certain types of viruses. Later, related versions of viral oncogenes were found in the genomes of humans and other animals. The normal versions of the cellular genes, called proto-oncogenes, code for proteins that stimulate normal cell growth and division. How might a proto-oncogene—a gene that has an essential function in normal cells—become an oncogene, a cancer- causing gene? In general, an oncogene arises from a genetic change that leads to an increase either in the amount of the proto-oncogene’s protein product or in the intrinsic activity of each protein molecule. The genetic changes that convert proto-oncogenes to oncogenes fall into four main categories: epigenetic changes, translocations, gene amplification, and point mutations. First, alterations in epigenetic modifications that can lead to abnormal chromatin condensation in a cell are often found in tumor cells. If a mutation in a gene for a chromatin-modifying enzyme leads to loosening of chromatin in a region that is normally not being expressed, a proto-oncogene in that region could be expressed at abnormally high levels. For example, a gene for one such enzyme has been shown to be mutated in 20% of tumor cells analyzed. Second, cancer cells are frequently found to contain chromosomes that have broken and rejoined incorrectly, translocating fragments from one chromosome to another. If a translocated proto-oncogene ends up near an especially active promoter, its transcription may increase, making it an oncogene. The third main type of genetic change, amplification, increases the number of copies of the proto-oncogene in the cell through repeated gene duplication. Fourth, a point mutation either in the promoter or an enhancer that controls a proto-oncogene, could cause an increase in its expression. A point mutation in the coding sequence of the proto-oncogene could change the gene’s product to a protein that is more active or more resistant to degradation than the normal protein. Any of these four mechanisms can lead to abnormal stimulation of the cell cycle and put the cell on the path to becoming a cancer cell. In addition to genes whose products normally promote cell division, cells contain genes whose normal products inhibit cell division. Such genes are called tumor-suppressor genes since the proteins they encode help prevent uncontrolled cell growth. Any mutation that decreases the normal activity of a tumor-suppressor protein may contribute to the onset of cancer, in effect stimulating growth through the absence of suppression. The protein products of tumor-suppressor genes have various functions. Some repair damaged DNA, which prevents the cell from accumulating cancer-causing mutations. Other tumor-suppressor proteins control adhesion of cells to each other or to the extracellular matrix; proper cell anchorage is crucial in normal tissues and is often absent in cancers. Still other tumor-suppressor proteins are components of cell-signaling pathways that inhibit the cell cycle. Let’s consider how protein components of cell-signaling pathways function in normal cells and what goes wrong with their function in cancer cells. We will focus on the products of two key genes, the ras proto-oncogene and the p53 tumor-suppressor gene. Mutations in ras occur in about 30% of human cancers and mutations in p53 in more than 50%. The Ras protein, encoded by the ras gene, is a G protein that relays a signal from a growth factor receptor on the plasma membrane to a cascade of protein kinases. The cellular response at the end of the pathway is the synthesis of a protein that stimulates the cell cycle. Normally, such a pathway will not operate unless triggered by the appropriate growth factor. But certain mutations in the ras gene can lead to production of a hyperactive Ras protein that triggers the kinase cascade even in the absence of growth factor, resulting in increased cell division. In fact, hyperactive versions or excess amounts of any of the pathway’s components can have the same outcome: excessive cell division. In this case, the signal is damage to the cell’s DNA, perhaps as the result of exposure to ultraviolet light. Operation of this signaling pathway blocks the cell cycle until the damage has been repaired. Otherwise, the damage might contribute to tumor formation by causing mutations or chromosomal abnormalities. Thus, the genes for the components of the pathway act as tumor-suppressor genes. The p53 gene, named for the apparent molecular weight of its protein product, is a tumor-suppressor gene. The protein it encodes is a specific transcription factor that promotes the synthesis of cell cycle–inhibiting proteins. That is why a mutation that knocks out the p53 gene or in a gene required to activate the p53 protein can lead to excessive cell growth and cancer. The p53 gene has been called the “guardian angel of the genome.” Once the p53 protein is activated—say, by the ATM protein, a protein kinase, after DNA damage—p53 activates several other genes, such as p21. The p21 protein halts the cell cycle by binding to cyclin-dependent kinases, allowing time for the cell to repair the DNA. Researchers recently showed that p53 also activates expression of a group of miRNAs that inhibit the cell cycle. The p53 protein can also turn on genes directly involved in DNA repair. If DNA damage is irreparable, p53 activates “suicide” genes, whose protein products bring about programmed cell death. Thus, p53 acts in several ways to prevent a cell from passing on mutations due to DNA damage. If mutations do accumulate and the cell survives through many divisions—as is more likely if the p53 tumor-suppressor gene is defective or missing—cancer may ensue. The many functions of p53 suggest a complex picture of regulation in normal cells, one that we do not yet fully understand. A recent study of elephants may underscore the protective role of p53. The incidence of cancer among elephants in zoo-based studies has been estimated at about 3%, compared with closer to 30% for humans. Genome sequencing revealed that elephants have 20 copies of the p53 gene, compared to one copy in humans, other mammals, and even manatees, elephants’ closest living relatives. There are undoubtedly other underlying reasons, but the correlation between low cancer rate and extra copies of the p53 gene bears further investigation. Recent studies have shown, for instance, that DNA methylation and histone modification patterns in normal cells differ from those in cancer cells and that miRNAs probably participate in cancer development. There is still a lot to learn, and you and your classmates may be the ones to make important discoveries about cancer biology. More than one somatic mutation or epigenetic change is generally needed to produce all the changes characteristic of a full-fledged cancer cell. This may help explain why the incidence of cancer increases greatly with age. If cancer results from an accumulation of mutations that occur throughout life, then the longer we live, the more likely we are to develop cancer. The model of a multistep path to cancer is well supported by studies of one of the best-understood types of human cancer: colorectal cancer, which affects the colon and/or rectum. About 140,000 new cases of colorectal cancer are diagnosed each year in the United States, and the disease causes 50,000 deaths per year. Like most cancers, colorectal cancer develops gradually. The first sign is often a polyp, a small, benign growth in the colon lining. The cells of the polyp look normal, although they divide unusually frequently. The tumor grows and may eventually become malignant, invading other tissues. The development of a malignant tumor is paralleled by a gradual accumulation of mutations that convert proto-oncogenes to oncogenes and knock out tumor-suppressor genes. A ras oncogene and a mutated p53 tumor-suppressor gene are often involved. About half a dozen changes must occur at the DNA level for a cell to become fully cancerous. These changes usually include the appearance of at least one active oncogene and the mutation or loss of several tumor-suppressor genes. Furthermore, since mutant tumor-suppressor alleles are usually recessive, in most cases mutations must knock out both alleles in a cell’s genome to block tumor suppression. Since we understand the progression of this type of cancer, routine screenings are recommended to identify and remove any suspicious polyps. The colorectal cancer mortality rate has been declining for the past 20 years due to increased screening and improved treatments. Treatments for other cancers have improved as well. Advances in the sequencing of DNA and mRNA allow medical researchers to compare the genes expressed by different types of tumors and by the same type in different people. These comparisons have led to personalized treatments based on the molecular characteristics of a person’s tumor. When researchers compared gene expression in normal breast cells and cells from breast cancers, they found that the genes showing the most significant differences in expression encoded signal receptors, as shown here. Breast cancer is the second most common form of cancer in the United States, and the first among women. Each year, this cancer strikes over 230,000 women (and some men) in the United States and kills 40,000 (450,000 worldwide). A major problem with understanding breast cancer is its heterogeneity: Tumors differ in significant ways. Identifying differences between types of breast cancer is expected to improve treatment and decrease the mortality rate. In 2012, the Cancer Genome Atlas Network, sponsored by the National Institutes of Health, published the results of a multi-team effort that used a genomics approach to profile subtypes of breast cancer based on their molecular signatures. Four major types of breast cancer were identified. It is now routine to screen for the presence of particular signaling receptors in any breast cancer tumors, and individuals with breast cancer, along with their physicians, can now make more informed decisions about their treatments. The fact that multiple genetic changes are required to produce a cancer cell helps explain the observation that cancers can run in families. An individual inheriting an oncogene or a mutant allele of a tumor-suppressor gene is one step closer to accumulating the necessary mutations for cancer to develop than is an individual without any such mutations. Geneticists are working to identify inherited cancer alleles so that predisposition to certain cancers can be detected early in life. About 15% of colorectal cancers, for example, involve inherited mutations. One syndrome, called hereditary nonpolyposis colon cancer, increases an individual’s lifetime risk of colon cancer to 50–70%. HNPCC, also known as Lynch syndrome, is caused by an autosomal dominant allele of any one of a group of DNA repair genes, underscoring the importance of DNA repair systems. This syndrome is responsible for 2–5% of colon cancers. Other inherited mutations that cause colon cancer affect the tumor-suppressor gene called adenomatous polyposis coli, or APC. This gene has multiple functions in the cell, including regulation of cell migration and adhesion. Even in patients with no family history of the disease, the APC gene is mutated in 60% of colorectal cancers. In these individuals, new mutations must have occurred in both APC alleles before the gene’s function is lost. Currently, only 15% of colorectal cancers are associated with known inherited mutations, so researchers continue to try to identify “markers” that could predict the risk of developing this type of cancer. Given the prevalence and significance of breast cancer, it is not surprising that it was one of the first cancers for which the role of inheritance was investigated. It turns out that for 5–10% of patients with breast cancer, there is evidence of a strong inherited predisposition. Geneticist Mary-Claire King began working on this problem in the mid-1970s. After 16 years of research, she convincingly demonstrated that mutations in one gene—BRCA1—were associated with increased susceptibility to breast cancer, a finding that flew in the face of medical opinion at the time. Mutations in that gene or a gene called BRCA2 are found in at least half of inherited breast cancers, and tests using DNA sequencing can detect these mutations. A woman who inherits one mutant BRCA1 allele has a 60% probability of developing breast cancer before the age of 50, compared with only a 2% probability for an individual homozygous for the normal allele. BRCA1 and BRCA2 are considered tumor-suppressor genes because their wild-type alleles protect against breast cancer and their mutant alleles are recessive. The BRCA1 and BRCA2 proteins both appear to function in the cell’s DNA damage repair pathway. More is known about BRCA2: Along with another protein, it helps repair breaks that occur in both strands of DNA, a function crucial for maintaining undamaged DNA. Because DNA breakage can contribute to cancer, it makes sense that the risk of cancer can be lowered by minimizing exposure to DNA-damaging agents, such as the ultraviolet radiation in sunlight and chemicals found in cigarette smoke. Ultimately, such approaches are expected to lower the death rate from cancer. The study of genes associated with cancer, inherited or not, increases our basic understanding of how disruption of normal gene regulation can result in this disease. In addition to the mutations and other genetic alterations described in this section, a number of tumor viruses can cause cancer in various animals, including humans. In fact, one of the earliest breakthroughs in understanding cancer came in 1911, when Peyton Rous, an American pathologist, discovered a virus that causes cancer in chickens. Also, the Epstein-Barr virus, which causes infectious mononucleosis, has been linked to several types of cancer in humans, notably Burkitt’s lymphoma. Papillomaviruses cause cancer of the cervix, and a virus called HTLV-1 causes a type of adult leukemia. Viruses play a role in about 15% of the cases of human cancer. Viruses may at first seem very different from mutations as a cause of cancer. However, we now know that viruses can interfere with gene regulation in several ways if they integrate their genetic material into the DNA of a cell. Viral integration may donate an oncogene to the cell, disrupt a tumor-suppressor gene, or convert a proto-oncogene to an oncogene. Some viruses produce proteins that inactivate p53 and other tumor-suppressor proteins, making the cell more prone to becoming cancerous.

Chapter 18 Regulation of Gene Expression PDF

Document Details

Tags

Related

Summary

Full Transcript