Harper's Biochemistry Chapter 35 PDF - DNA Organization, Replication, & Repair

C H A P T E R DNA Organization, Replication, & Repair P. Anthony Weil, PhD 35 OBJ EC T IVES Appreciate that the roughly 3 × 109 base pairs of DN...

C H A P T E R DNA Organization, Replication, & Repair P. Anthony Weil, PhD 35 OBJ EC T IVES Appreciate that the roughly 3 × 109 base pairs of DNA that compose the haploid genome of humans are divided uniquely between 23 linear DNA After studying this chapter, units, the chromosomes. Humans, being diploid, have 23 pairs of these linear you should be able to: chromosomes: 22 autosomes and two sex chromosomes. Understand that human genomic DNA, if extended end-to-end, would be meters in length, yet still fits within the nucleus of the cell, an organelle that is only microns (μ; 10−6 meters) in diameter. Such condensation in DNA length, in part, is induced following its association with the highly positively charged histone proteins resulting in the formation of a unique DNA-histone complex termed the nucleosome. Nucleosomes have DNA wrapped around the surface of an octamer of histones. Explain that strings of nucleosomes form along the linear sequence of genomic DNA to form chromatin, which itself can be more tightly packaged and condensed, this ultimately leads to the formation of the chromosomes. Appreciate that while the chromosomes are the macroscopic functional units for DNA transcription, replication, recombination, gene assortment, and cellular division, it is DNA function at the level of the individual nucleotides that composes regulatory sequences linked to specific genes that are essential for life. Describe the steps, phase of the cell cycle, and the molecules responsible for the replication, repair, and recombination of DNA, and understand the negative effects that errors in any of these processes can have on cellular, and thus, organismal integrity and health. BIOMEDICAL IMPORTANCE* and gene conversion. These processes provide a means of ensuring adaptability and diversity for the organism, but The genetic information in the DNA of a chromosome can be when they go awry, can also result in disease. A number of transmitted by exact replication or it can be exchanged by a enzyme systems are involved in DNA replication, alteration, number of processes, including recombination, transposition, and repair. Mutations are due to a change in the base sequence of DNA and may result from the faulty replication, transposi- * tion, or repair of DNA and occur with a frequency of about So far as possible, the discussion in this chapter and in Chapters 36, one in every 106 cell divisions. Abnormalities in gene products 37, and 38 will focus on mammalian organisms, which are, of course, among the higher eukaryotes. At times, it will be necessary to refer (either in RNA, protein function, or amount) can be the result to observations in prokaryotic organisms such as the bacterium of mutations that occur in transcribed protein coding, and Escherichia coli and its viruses, or less complex eukaryotic model nonprotein coding DNA, or nontranscribed regulatory-region systems such as the fruit fly Drosophila, the nematode roundworm DNA. A mutation in a germ cell is transmitted to offspring Caenorhabditis elegans, or the Brewer’s yeast Saccharomyces cerevisae. (so-called vertical transmission of hereditary disease). A num- However, in such cases the information will be of a kind that can be ber of factors, including viruses, chemicals, ultraviolet light, readily extrapolated to mammalian organisms. and ionizing radiation can all increase the rate of mutation. 360 CHAPTER 35 DNA Organization, Replication, & Repair 361 Mutations often affect somatic cells and so are passed on to successive generations of cells, but only within an organism (ie, horizontally). It is becoming apparent that a number of diseases—and perhaps most cancers—are due to the com- bined effects of vertical transmission of mutations as well as horizontal transmission of induced mutations and the impact thereof on cellular function. CHROMATIN IS THE CHROMOSOMAL MATERIAL IN THE NUCLEI OF CELLS OF EUKARYOTIC ORGANISMS Chromatin consists of very long double-stranded DNA (dsDNA) molecules and a nearly equal mass of small basic proteins termed histones as well as a smaller amount of non- FIGURE 35–1 Electron micrograph of chromatin showing individual nucleosomes (white, ball-shaped) attached to strands histone proteins (most of which are acidic and larger than of DNA (thin, gray line); see also Figure 35–2. (Reproduced with histones) and a small quantity of RNA. The nonhistone pro- permission from Shao Z. Probing Nanometer Structures with Atomic teins include enzymes involved in DNA replication and repair, Force Microscopy. News Physiol Sci. 1999;14:142-149.) and the proteins involved in RNA synthesis, processing, and transport to the cytoplasm as well as an array of proteins that regulate these various processes. The dsDNA helix in each is involved quite specifically in carrying out these functions. chromosome has a length that is thousands of times the diam- The carboxyl terminal two-thirds of the histone molecules are eter of the cell nucleus. One purpose of the molecules that hydrophobic, while their amino terminal thirds are particu- comprise chromatin, particularly the histones, is to condense larly rich in basic amino acids. These four core histones are the DNA; however, it is important to note that the histones subject to at least six major, or frequent, types of covalent also integrally participate in gene regulation (see Chapters 36, modification or posttranslational modifications (PTMs): 38, and 42); indeed, histones contribute importantly to all acetylation, methylation, phosphorylation, ADP-ribosylation, DNA-directed molecular transactions. Electron microscopic monoubiquitylation, and sumoylation. These histone modifi- studies of chromatin have demonstrated dense spherical par- cations play important roles in chromatin structure and func- ticles called nucleosomes, which are approximately 10 nm in tion, as illustrated in Table 35–1. Chapters 38 and 42 provide diameter and connected by DNA filaments (Figure 35–1). a more detailed discussion of the role of histone PTMs in cel- Nucleosomes are composed of DNA wound around an octa- lular biology. meric complex of histone molecules. Biochemical, biophysi- cal, and X-ray crystallography data all corroborate this model of nucleosome structure. Histones Are the Most Abundant Histone octamer Chromatin Proteins H2A H2B Histones are a small family of closely related basic proteins. H3 H4 H1 histones are the ones least tightly bound to chromatin (Figures 35–1, 35–2, and 35–3) and are, therefore, easily removed with a salt solution, after which chromatin becomes more soluble. The organizational unit of this soluble chro- Histone DNA matin is the nucleosome. Nucleosomes contain four major H1 types of histones: H2A, H2B, H3, and H4. The sequence and structures of all four histones, H2A, H2B, H3, and H4, FIGURE 35–2 Model for the structure of the nucleosome. the so-called core histones that form the nucleosome, have DNA is wrapped around the surface of a protein cylinder consisting been highly conserved between species, although variants of of two each of histones H2A, H2B, H3, and H4 that form the histone octamer. The ~145 bp of DNA, consisting of 1.75 superhelical turns, these four histones exist in various organisms, and are used for are in contact with the histone octamer. The position of histone H1, specialized purposes. The extreme conservation of core his- when it is present, is indicated by the dashed outline at the bottom tone sequences implies that the overall functions of histones of the figure. Note that histone H1 interacts with DNA as it enters and are identical in all eukaryotes, and that the entire molecule exits the nucleosome. 362 SECTION VII Structure, Function, & Replication of Informational Macromolecules Metaphase chromosome 1400 nm Condensed loops 700 nm Nuclear-scaffold associated form Chromosome scaffold Non-condensed loops 300 nm Topologically Associated Domains (TAD) 30-nm chromatin fibril 30 nm composed of nucleosomes H1 H1 Oct “Beads- on-a-string” Oct 10 nm Oct 10-nm H1 chromatin fibril Naked double-helical 2 nm DNA FIGURE 35–3 Extent of DNA packaging in metaphase chromosomes (top) to noted duplex DNA (bottom). Chromosomal DNA is packaged and organized at several levels as shown (see Table 35–2). Each phase of condensation or compaction and organization (bottom to top) decreases overall DNA accessibility to an extent that the DNA sequences in metaphase chromosomes are likely almost totally transcription- ally inert. In toto, these five levels of DNA compaction result in nearly a 104-fold linear decrease in end-to-end DNA length. Complete condensa- tion and decondensation of the linear DNA in chromosomes occur in the space of just a few hours during the normal replicative cell cycle (see Figure 35–20). CHAPTER 35 DNA Organization, Replication, & Repair 363 TABLE 35–1 Possible Roles of Posttranslationally stabilizes the primary particle and firmly binds two addi- Modified Histones tional half-turns of DNA previously bound only loosely to the 1. Acetylation of histones H3 and H4 is associated with the activation (H3–H4)2. Thus, 1.75 superhelical turns of DNA are wrapped or inactivation of gene transcription. around the surface of the histone octamer, protecting 145 to 2. Acetylation of core histones is associated with chromosomal assembly 150 bp of DNA and forming the nucleosome core particle during DNA replication. (see Figure 35–2). In chromatin, core particles are separated by a roughly 30-bp region of DNA termed “linker.” Most 3. Phosphorylation of histone H1 is associated with the condensation of chromosomes during the replication cycle. of the DNA is in a repeating series of these structures, giving chromatin a repeating “beads-on-a-string” appearance when 4. ADP-ribosylation of histones is associated with DNA repair. examined by electron microscopy (see Figure 35–1). 5. Methylation of histones is correlated with activation and repression In vivo the assembly of nucleosomes is mediated by one of of gene transcription. several nuclear chromatin assembly factors whose functions 6. Monoubiquitylation is associated with gene activation, repression, are facilitated by histone chaperones, a group of proteins that and heterochromatic gene silencing. exhibit high affinity for binding histones. As the nucleosome is 7. Sumoylation of histones (SUMO; small ubiquitin-related modifier) assembled, histones are released from the histone chaperones. is associated with transcription repression. Nucleosomes appear to exhibit preference for certain regions 8. Replacement of H2A with alternative H2AZ within nucleosomes is on specific DNA molecules, but the basis for this nonrandom associated with transcriptional activation. distribution, termed phasing, is not yet completely understood. 9. Alternative acylations of histones (propionylation, Phasing is likely related both to the relative physical flexibility of butyrylation, crotonylation, succinylation, malonylation and particular nucleotide sequences to accommodate the regions 2-hydroxyisobutyrylation) that likely link histone modifications of kinking within the nucleosomal supercoil, as well as the with intracellular metabolism. These newly discovered modifications of histones correlate with gene activity. presence of other DNA-bound factors that limit the sites of nucleosome deposition. The histones interact with each other in very specific ways. H3 and H4 form a tetramer containing two molecules of each HIGHER-ORDER STRUCTURES (H3–H4)2, whileH2A and H2B form dimers (H2A–H2B). Under PROVIDE FOR THE COMPACTION physiologic conditions, these histone oligomers associate to form OF CHROMATIN the histone octamer of the composition (H3–H4)2–(H2A–H2B)2. Electron microscopy of chromatin reveals two higher orders of structure—the 10-nm fibril and the 30-nm chromatin fiber— The Nucleosome Contains beyond that of the nucleosome itself. The disk-like nucleo- Histone & DNA some structure has a 10-nm diameter and a height of 5 nm. When the histone octamer is mixed with purified dsDNA The 10-nm fibril consists of nucleosomes arranged with their under appropriate ionic conditions, the same x-ray diffraction edges separated by a small distance (30 bp of linker DNA) with pattern is formed as that observed in freshly isolated chroma- their flat faces parallel to the fibril axis (see Figure 35–3). The tin. Biochemical and electron microscopic studies confirm 10-nm fibril is probably further supercoiled with six or seven the existence of reconstituted nucleosomes in such in vitro- nucleosomes per turn to form the 30-nm chromatin fiber generated preparations. Furthermore, the reconstitution of (see Figure 35–3). Each turn of the supercoil is relatively flat, nucleosomes from DNA and histones H2A, H2B, H3, and H4 and the faces of the nucleosomes of successive turns would be is independent of the organismal or cellular origin of the vari- nearly parallel to each other. H1 histones appear to stabilize the ous components. A result argues that nucleosome formation is 30-nm fiber, but their position and that of the variable length an ancient and evolutionarily conserved, fundamental cellular linker DNA are not clear. It is probable that nucleosomes can process. Moreover, neither the histone H1, nor the nonhistone form a variety of packed structures. In order to form a mitotic proteins are necessary for the reconstitution of the nucleo- chromosome, the 30-nm fiber must be compacted in length some core. another 100-fold (see following discussion). In the nucleosome, the DNA is supercoiled in a left-handed In interphase chromosomes, chromatin fibers appear helix over the surface of the disk-shaped histone octamer (see to be organized into 30,000 to 100,000 bp loops or domains Figure 35–2). The majority of core histone proteins interact anchored in a scaffolding, or supporting matrix structure with the DNA on the inside of the supercoil without protrud- within the nucleus, the so-called nuclear matrix. Within these ing, although the amino terminal tails of all the histones are domains, referred to as Topologically Associated Domains, thought to extend outside of this structure and are available or TADs; see Figure 35–3, some DNA sequences are located for regulatory PTMs (see Table 35–1). nonrandomly. It has been suggested that at least some of the The (H3–H4)2 tetramer itself can confer nucleosome-like looped domains of chromatin correspond to one or more properties on DNA and thus has a central role in the forma- separate genetic functions, containing both coding and non- tion of the nucleosome. The addition of two H2A–H2B dimers coding regions of the cognate gene or genes. This nuclear 364 SECTION VII Structure, Function, & Replication of Informational Macromolecules architecture is dynamic, having important regulatory effects euchromatin. Generally, euchromatin is replicated earlier on gene regulation. Indeed, recent data suggest that certain than heterochromatin in the mammalian cell cycle (see fol- genes or gene regions are mobile within the nucleus, moving lowing discussion). The chromatin in these regions of inactiv- obligatorily to discrete loci within the nucleus on activation. ity is often high in meC content, and histones therein contain Future work in the area of the 3-D organization of nuclear relatively lower levels of certain “activating” covalent modifi- chromatin, and further characterization of regulatory TADs, cations and higher levels of “repressing” histone PTMs (see will determine the exact biological functions and molecular Table 35–1). mechanisms responsible (see Chapter 38). There are two types of heterochromatin: constitutive and facultative. Constitutive heterochromatin is always relatively highly condensed (ie, heterochromatic), and thus essentially SOME REGIONS OF CHROMATIN always inactive. Such constitutive heterochromatin is found in ARE “ACTIVE” & OTHERS ARE the regions near the chromosomal centromere and at chro- mosomal ends (telomeres). Facultative heterochromatin “INACTIVE” is at times condensed, but at other times it is actively tran- Generally, every cell of an individual metazoan organism scribed and, thus, uncondensed and appears as euchromatin. contains the same genetic information. Thus, the differences Of the two members of the X-chromosome pair in mamma- between cell types within an organism must be explained by lian females, one X chromosome is almost completely inac- differential expression of the common genetic information. tive transcriptionally and is heterochromatic. However, the Chromatin containing active genes (ie, transcriptionally or heterochromatic X chromosome decondenses during game- potentially transcriptionally active chromatin) has been shown togenesis and becomes transcriptionally active during early to differ in several important ways from that of inactive regions. embryogenesis—thus, it is facultative heterochromatin. The nucleosome structure of active chromatin appears to be Certain cells of insects, for example, Chironomus and altered, sometimes quite extensively, in highly active regions. Drosophila, contain giant chromosomes that have been rep- DNA in active chromatin contains large regions (about 100,000 licated for multiple cycles without separation of daughter bases long) that are relatively more sensitive to digestion by a chromatids. These copies of DNA line up side by side in pre- nuclease such as DNase I. DNase I makes single-strand cuts in cise register and produce a banded chromosome containing nearly any segment of DNA due to its low-sequence specificity. regions of condensed chromatin and lighter bands of more However, DNase I will only avidly digest DNA that is not pro- extended chromatin. Transcriptionally active regions of these tected, or bound by protein. The sensitivity to DNase I of active polytene chromosomes are especially decondensed into chromatin regions reflects only a potential for transcription rather “puffs” that can be shown to contain the enzymes respon- than transcription itself, and in several different cellular systems sible for transcription and to be the sites of RNA synthesis can be correlated with a relative lack of 5-methyldeoxycytidine (Figure 35–4). Using highly sensitive fluorescently labeled (meC; see Figure 32–7) in the DNA, and particular histone hybridization probes, specific gene sequences can be mapped, variants and/or histone PTMs (phosphorylation, acetylation, or “painted,” within the nuclei of human cells, even without etc.; see Table 35–1). polytene chromosome formation, using fluorescence in situ Within the large regions of active chromatin there exist hybridization, or FISH techniques (see Chapter 39). shorter stretches of 100 to 300 nucleotides that exhibit an even greater (another 10-fold) sensitivity to DNase I. These hyper- DNA IS ORGANIZED INTO sensitive sites probably result from a structural conformation that favors access of the nuclease to the DNA. These regions are CHROMOSOMES often located immediately upstream from the active gene and In preparation for cell duplication via the cyclical process termed are the location of interrupted nucleosomal structure caused mitosis, cellular DNA content is doubled (see Figure 35–20). by the binding of nonhistone regulatory transcription factor During one phase of the mitotic cycle termed metaphase, the proteins (enhancer-binding transcriptional activator proteins; duplicated chromosomes condense and can readily be visualized. see Chapters 36 and 38). In many cases, it seems that if a gene Condensed chromosomes possess a twofold symmetry, with the is capable of being transcribed, it very often has a DNase- identical duplicated sister chromatids connected at a chromo- hypersensitive site(s) in the chromatin immediately upstream. somal structure termed the centromere, the relative position of As noted earlier, nonhistone regulatory proteins involved in which is characteristic for a given chromosome (Figure 35–5). transcription control and those involved in maintaining access The centromere is an adenine–thymine (A–T)-rich region con- to the template strand lead to the formation of hypersensitive taining repeated DNA sequences that range in size from 102 sites. Such sites often provide the first clue about the presence (brewers’ yeast) to 106 (mammals) base pairs (bp). Metazoan and location of a transcription control element. centromeres are bound by nucleosomes containing the special By contrast, transcriptionally inactive chromatin is densely histone H3 variant protein CENP-A and other specific centro- packed during interphase as observed by electron microscopic mere-binding proteins. This complex, called the kinetochore, studies and is referred to as heterochromatin; transcription- provides the anchor for the mitotic spindle, on which chromo- ally active chromatin stains less densely and is referred to as somal segregation occurs during mitosis. CHAPTER 35 DNA Organization, Replication, & Repair 365 Telomeres (TTAGG)n Sister Sister Chromatid #1 Chromatid #2 Centromere Centromere 5C 5C BR3 BR3 BR R3 BR3 A B Telomeres FIGURE 35–4 Illustration of the tight correlation between (TTAGG)n the presence of RNA polymerase II (Table 36–2) and messenger RNA synthesis. A number of genes, labeled A, B (top), and 5C, but not genes at locus (band) BR3 (5C, BR3, bottom) are activated when FIGURE 35–5 The two sister chromatids of mitotic human midge fly Chironomus tentans larvae are subjected to heat shock chromosome 12. The dashed line demarcates the sister chromatids. (39°C for 30 minutes). (A) Distribution of RNA polymerase II in isolated The location of the A+T-rich centromeric region connecting sister chromosome IV from the salivary gland (groups of light spots at arrows). chromatids is indicated, as are two of the four telomeres residing at The enzyme was detected by immunofluorescence using a fluores- the very ends of the chromatids that are attached one to the other cently labeled antibody directed against the polymerase. The 5C and at the centromere. (Reproduced with permission from Biophoto BR3 are specific bands of chromosome IV, and the arrows indicate Associates/Photo Researchers, Inc.) puffs (ie, A, B, 5C). (B) Autoradiogram of a chromosome IV that was incubated in 3H-uridine to label the RNA. Note the correspondence of the immunofluorescence and presence of the radioactive RNA 1.3 × 108 nucleotides in one dsDNA molecule. Consequently, (black dots) (ie, A, B, 5C). Bar = 7 μm. (Reproduced with permission the length of each DNA molecule must be compressed about from Sass H. RNA polymerase B in polytene chromosomes: immu- 8000-fold to generate the structure of a condensed metaphase nofluorescent and autoradiographic analysis during stimulated and chromosome. In metaphase chromosomes, the 30-nm chro- repressed RNA synthesis. Cell. 1982;28(2):269-278.) matin fibers are also folded into a series of looped domains, the proximal portions of which are anchored to the nuclear The ends of each chromosome contain structures called matrix, likely through interactions with proteins termed lam- telomeres. Telomeres consist of short TG-rich repeats. Human ins that constitute integral components of the inner nuclear telomeres have a variable number of repeats of the sequence membrane within the nucleus (see Figures 35–3 and 49–4). 5′-TTAGGG-3′, which can extend for several kilobases. Telom- The packing ratios of each of the orders of DNA structure are erase, a multisubunit RNA template-containing complex summarized in Table 35–2. Though chromosomes are highly related to viral RNA-dependent DNA polymerases (reverse compacted, certain transcription proteins have been shown to transcriptases), is the enzyme responsible for telomere synthe- still be able to access their target DNA sequences. The pack- sis, and thus for maintaining the length of the telomere. Since aging of nucleoproteins within chromatids is not random, as telomere shortening has been associated with both malignant transformation (see Chapter 56) and aging (see Chapter 58), this TABLE 35–2 The Packing or Compaction Ratios of Each enzyme has become an attractive target for cancer chemo- of the Orders of DNA Structure therapy and drug development (see Chapter 56). Each sister chromatid contains one dsDNA molecule. As schematized in Chromatin Form Packing Ratio Figure 35–3, during interphase, the packing of the DNA mol- Naked double-helical DNA ~1.0 ecule is less dense than it is in the condensed chromosome 10-nm fibril of nucleosomes 7-10 during metaphase. Metaphase chromosomes are nearly com- pletely transcriptionally inactive. 30-nm chromatin fiber of superhelical 40-60 nucleosomes The human haploid genome consists of about 3 × 109 bp and about 1.7 × 107 nucleosomes. Thus, each of the 23 chromatids Condensed metaphase chromosome loops 8000 in the human haploid genome would contain on the average 366 SECTION VII Structure, Function, & Replication of Informational Macromolecules 1 2 3 4 5 6 7 8 9 10 11 12 18 13 14 15 16 17 19 20 21 22 XY FIGURE 35–6 A human karyotype (of a man with a normal 46,XY constitution), in which the metaphase chromosomes have been stained by the Giemsa method and aligned according to the Paris Convention. (Reproduced with permission from H Lawce and F Conte.) evidenced by the characteristic patterns observed when chro- (and thus in the primary transcript) by at least one—and in mosomes are stained with specific dyes such as quinacrine or some cases as many as 50—noncoding intervening sequences Giemsa stain (Figure 35–6). termed introns. In most cases, the introns are much longer than From individual to individual within a single species, the coding regions termed exons. The processing of the primary the pattern of staining (banding) of the entire chromosome transcript, which involves precise removal of introns and splic- complement is highly reproducible; nonetheless, it differs sig- ing of adjacent exons, is described in Chapter 36. nificantly between species, even those closely related. Thus, The function of the intervening sequences, or introns, is the packaging of the nucleoproteins in chromosomes of higher not totally clear. However, mRNA precursor molecules can eukaryotes must in some way be dependent on species-specific be differentially spliced thereby increasing the number of dis- characteristics of the DNA molecules. tinct (yet related) proteins produced by a single gene and its A combination of specialized staining techniques and corresponding primary mRNA gene transcript. Introns may high-resolution microscopy has allowed cytogeneticists to also serve to separate functional domains (exons) of coding quite precisely map many genes to specific regions of mouse information in a form that permits genetic rearrangement and human chromosomes. With the recent elucidation of the by recombination to occur more rapidly than if all coding human and mouse genome sequences (among others), it has regions for a given genetic function were contiguous. Such an become clear that many of these visual mapping methods were enhanced rate of genetic rearrangement of functional domains remarkably accurate. might allow more rapid evolution of biologic function. In some instances, other protein-coding or noncoding RNAs are local- Coding Regions Are Often Interrupted ized within the intronic DNA of certain genes (see Chapter 34). The relationships among chromosomal DNA, gene clusters on by Intervening Sequences the chromosome, the exon–intron structure of genes, and the The protein coding regions of DNA, the transcripts of which final mRNA product are illustrated in Figure 35–7. ultimately appear in the cytoplasm as single mRNA molecules, are usually interrupted in the eukaryotic genome by large inter- vening sequences of nonprotein-coding DNA. Accordingly, the THE EXACT FUNCTION OF MUCH primary transcripts or mRNA precursors (originally termed OF THE MAMMALIAN GENOME IS hnRNA because this species of RNA was quite heterogeneous in size [length] and mostly restricted to the nucleus), contain NOT WELL UNDERSTOOD nonprotein coding intervening sequences of RNA that must be The haploid genome of each human cell consists of 3.3 × 109 removed in a process that also joins together the appropriate bp of DNA subdivided into 23 chromosomes. The entire protein-coding segments to form the mature mRNA. Most cod- haploid genome contains sufficient DNA to code for nearly ing sequences for a single mRNA are interrupted in the genome 1.5 million average-sized protein coding genes (ie, ~2200 bp CHAPTER 35 DNA Organization, Replication, & Repair 367 Chromosome (1–2 × 103 genes) 1.5 × 108 bp Gene cluster ( 20 genes) 1.5 × 106 bp Gene 2 × 104 bp mRNA Primary transcript 8 × 103 nt mRNA 2 × 103 nt FIGURE 35–7 The relationship between chromosomal DNA and mRNA. The human haploid DNA complement of 3 × 109 bp is unequally distributed between 23 chromosomes (see Figure 35–6). Genes are often clustered on these chromosomes. An average gene is 2 × 104 bp in length, including the regulatory region (red-hatched area), which is often located at the 5’ end of the gene. The regulatory region is shown here as being adjacent to the transcription initiation site (bent arrow). Most eukaryotic genes have alternating exons and introns. In this example, there are nine exons (blue-colored areas) and eight introns (green-colored areas). The introns are removed from the primary transcript by processing reac- tions, and the exons are ligated together in sequence to form the mature mRNA through a process termed RNA splicing. (nt, nucleotides.) of protein-coding DNA). However, early studies of mutation single copy genes that code for proteins. The repetitive DNA rates and of the complexities of the genomes of higher organ- in the haploid genome includes sequences that vary in copy isms suggested that humans have significantly fewer than number from 2 to as many as 107 copies per cell. 100,000 proteins encoded by the ~1% of the human genome that is composed of exonic DNA. Indeed, current estimates More Than Half the DNA in based on sequencing of the human genome and the collection of mRNA species produced therefrom suggest there are about Eukaryotic Organisms Is in Unique or 25,000 protein-coding genes in humans. This implies that most Nonrepetitive Sequences genomic DNA is nonprotein coding—that is, its information This estimation and genome-wide organization of repetitive is never translated into an amino acid sequence of a protein sequence DNA was based on a variety of techniques, and most molecule. Certainly, some of the excess DNA sequences serve recently on direct genomic DNA sequencing. Similar techniques to regulate the expression of genes during development, differ- were used to determine the number of protein-encoding genes. entiation, and adaptation to the environment, either by serving In brewers’ yeast (Saccharomyces cerevisiae, a lower eukaryote), as binding sites for regulatory proteins or by encoding regula- about two-thirds of its 6200 genes are expressed, but only tory ncRNAs. Some excess clearly makes up the intervening approximately one-fifth are required for viability under labo- sequences or introns that split the coding regions of genes, and ratory growth conditions. In typical tissues in a higher eukaryote another portion of the excess appears to be composed of many (eg, mammalian liver and kidney), between 10,000 and families of repeated sequences for which clear functions have 15,000 genes are actively expressed. Different combinations yet to be defined, though some small RNAs transcribed from of genes are expressed in each tissue of course, and how this these repeats can modulate transcription, either directly by is accomplished is one of the major unanswered questions in interacting with the transcription machinery or indirectly by biology. affecting the activity of the chromatin template. Interestingly, the ENCODE Project Consortium (see Chapters 10 and 39) In Human DNA, at Least 30% of has shown that most of the genomic sequence is indeed tran- scribed in at least some human cell types, albeit at a low level. the Genome Consists of Repetitive A large fraction of such transcription appears to generate the Sequences lncRNAs (see Chapter 34). Further research will elucidate the Repetitive-sequence DNA can be broadly classified as mod- role(s) played by such transcripts. erately repetitive or as highly repetitive. The highly repeti- The DNA in an eukaryotic genome can be divided into two tive sequences consist of 5 to 500 base pair lengths repeated broad “sequence classes.” These are unique-sequence DNA, or many times in tandem. These sequences are often clustered in nonrepetitive DNA and repetitive-sequence DNA. In the hap- centromeres and telomeres of the chromosome and some are loid genome, unique-sequence DNA generally includes the present in about 1 to 10 million copies per haploid genome. 368 SECTION VII Structure, Function, & Replication of Informational Macromolecules The majority of these sequences are transcriptionally inactive of a gene with a disease. Using PCR, a large number of family and some of these sequences play a structural role in the chro- members can be rapidly screened for a certain microsatellite mosome (see Figure 35–5 and Chapter 39). polymorphism. The association of a specific polymorphism The moderately repetitive sequences, which are defined as with a gene in affected family members—and the lack of this being present in numbers of less than 106 copies per haploid association in unaffected members—may be the first clue genome, are not clustered but are interspersed with unique about the genetic basis of a disease. sequences. In many cases, these long interspersed repeats are Trinucleotide sequences that increase in number (micro- transcribed by RNA polymerase II and the produced RNAs satellite instability) can cause disease. The unstable (CGG)n contain 5′-Cap structures (see Figure 34–10) indistinguish- repeat sequence (n = the number of repeats; in this case CGG) able from those on mRNA. Depending on their length, mod- is associated with the fragile X syndrome. Other trinucleotide erately repetitive sequences are classified as long interspersed repeats that undergo dynamic mutation (usually an increase nuclear elements (LINEs) or short interspersed nuclear in repeat numbers) are associated with Huntington chorea elements (SINEs). Both types appear to be retroposons; that (CAG), myotonic dystrophy (CTG), spinobulbar muscular is, they arose from movement from one location to another atrophy (CAG), and Kennedy disease (CAG). The advent of (transposition) through an RNA intermediate by the action next-generation, high-throughput DNA sequencing tech- of reverse transcriptase that transcribes an RNA template into nologies (see Chapter 39) has dramatically impacted both the DNA. Mammalian genomes contain 20,000 to 50,000 copies of speed, accuracy, and precision with which scientists and clini- the 6 to 7 kbp LINEs. These represent species-specific families cians can analyze human genome structure. Some newly insti- of repeat elements. SINEs are shorter (70-300 bp), and there tuted clinical tests involve targeted genomic DNA sequencing may be more than 100,000 copies per genome. Of the SINEs prepared either from tissues or serum samples. in the human genome, one family, the Alu family, is present in about 500,000 copies per haploid genome and accounts for ~10% of the human genome. Members of the human Alu fam- ONE PERCENT OF CELLULAR DNA ily and their closely related analogs in other animals can be transcribed as integral components of mRNA precursors or as IS IN MITOCHONDRIA discrete RNA molecules, including the well-studied 4.5S RNA The majority of the polypeptides in mitochondria (about 54 out and 7S RNA. These particular family members are highly con- of 67) are encoded by nuclear genes, while the rest are coded by served within a species as well as between mammalian species. genes found in mitochondrial (mt) DNA. Human mitochondria Components of the short-interspersed repeats, including the contain 2 to 10 copies of a small circular ~16 kbp dsDNA mol- members of the Alu family, may be mobile elements, capable ecule that makes up approximately 1% of total cellular DNA. of jumping into and out of various sites within the genome This mtDNA codes for mt-specific ribosomal and transfer (see following discussion). These transposition events can RNAs and for 13 proteins that play key roles in the respira- have disastrous results, as exemplified by the insertion of tory chain (see Chapter 13). The linearized structural map of Alu sequences into a gene, which, when so mutated, causes the human mitochondrial genes is shown in Figure 35–8. Some neurofibromatosis. Additionally, Alu B1 and B2 SINE RNAs of the features of mtDNA are shown in Table 35–3. have been shown to regulate mRNA production at the levels of An important feature of human mitochondrial mtDNA transcription and mRNA splicing. is that—because all mitochondria are contributed by the ovum during zygote formation—it is transmitted by maternal nonmendelian inheritance. Thus, in diseases resulting from Microsatellite Repeat Sequences mutations of mtDNA, an affected mother would in theory One category of repeat sequences exists as both dispersed and pass the disease to all of her children but only her daughters grouped tandem arrays. The sequences consist of 2 to 6 bp would transmit the trait. However, in some cases, deletions in repeated up to 50 times. These microsatellite sequences most mtDNA occur during oogenesis and thus are not inherited commonly are found as dinucleotide repeats of AC on one from the mother. A number of diseases have now been shown strand and TG on the opposite strand, but several other forms to be due to mutations of mtDNA. These include a variety of occur, including CG, AT, and CA. The AC repeat sequences myopathies, neurologic disorders, and some forms of diabetes occur at 50,000 to 100,000 locations in the genome. At any mellitus. locus, the number of these repeats may vary on the two chro- mosomes, thus providing heterozygosity of the number of copies of a particular microsatellite number in an individual. GENETIC MATERIAL CAN BE This is a heritable trait, and because of their number and the ease of detecting them using the polymerase chain reaction ALTERED & REARRANGED (PCR) (see Chapter 39), such repeats are useful in construct- An alteration in the sequence of purine and pyrimidine bases ing genetic linkage maps. Most genes are associated with one in a gene due to a change—a removal or an insertion—of one or more microsatellite markers, so the relative position of or more bases may result in an altered gene product or altera- genes on chromosomes can be assessed, as can the association tion of gene expression if nonprotein coding DNA is involved. CHAPTER 35 DNA Organization, Replication, & Repair 369 Cys Asn Pro Glu Ser OL Tyr Ala Gln PL Light (L) ND6 Strand 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.014.0 15.0 16.569 kb Ser Thr Thr His Arg Gly y Lys Asp Trp f-Met Ile Leu L Val Phe Heavy (H) OH cyt b ND5 ND4/ND4L ND3 COX3 ATPase 6/8 COX2 COX1 ND2 ND1 16S rRNA 12S rRNA Strand PH2 PH1 FIGURE 35–8 Map of human mitochondrial genes. The maps represent the so-called light (L; upper) and heavy (H; lower) strands of the 16,569 base pair linearized mitochondrial (mt) DNA The maps show the mt genes encoding subunits of NADH-coenzyme Q oxidoreductase (ND1 through ND6), cytochrome c oxidase (COX1 through COX3), cytochrome b (cyt b), ATP synthase (ATPase 6 and 8) and the 12S and 16S mt ribosomal rRNAs. Mt transfer RNA (tRNA) encoding genes are denoted by small yellow boxes and the three-letter code indicating the cognate amino acids which they specify during mt translation. The origin of heavy-strand (OH), and light-strand (OL) DNA replication, as well as the promoters for the initiation of heavy-strand (PH1 and PH2), and light-strand (PL) gene transcription are indicated by arrows and letters (see also Table 57–3). Figure generated using Homo sapiens mitochondrion, complete genome; Sequence: NCBI Reference NC_012920.1 and annotations therein. Such insertions or deletions are termed indels. Indels often chromosomes. If the homologous chromosomes possess dif- result in a mutation whose consequences are discussed in ferent alleles (ie, gene/DNA sequence variants) of the same detail in Chapter 37. genes, the crossover may produce noticeable and heritable genetic linkage differences. In the rare case where the align- ment of homologous chromosomes is not exact, the cross- Chromosomal Recombination Is One ing over, or recombination event, may result in an unequal Way of Rearranging Genetic Material exchange of information. One chromosome may receive less Genetic information can be exchanged between similar or genetic material and thus a deletion, while the other partner homologous chromosomes. The exchange, or recombination event, occurs primarily during meiosis in mammalian cells and requires alignment of homologous metaphase chromosomes, an alignment that almost always occurs with great exactness. A process of chromosome (chromatid) crossing over occurs as shown in Figure 35–9. This usually results in an equal and recip- rocal exchange of genetic information between homologous TABLE 35–3 Major Features of Human Mitochondrial DNA Is circular, double-stranded, and composed of heavy (H) and light (L) chains or strands Contains 16,569 bp Encodes 13 protein subunits of the respiratory chain (of a total of about 67) Seven subunits of NADH dehydrogenase (complex I) Cytochrome b of complex III Three subunits of cytochrome oxidase (complex IV) Two subunits of ATP synthase Encodes large (16S) and small (12S) mt ribosomal RNAs Encodes 22 mt tRNA molecules Genetic code differs slightly from the standard code UGA (standard stop codon) is read as Trp AGA and AGG (standard codons for Arg) are read as stop codons Contains very few untranslated sequences High mutation rate (5-10 times that of nuclear DNA) Comparisons of mtDNA sequences provide evidence about evolutionary origins of primates and other species FIGURE 35–9 The process of crossing over between Data from Harding AE. Neurological disease and mitochondrial genes. Trends Neurosci. homologous metaphase chromosomes to generate recombinant 1991;14(4):132-138. chromosomes. See also Figure 35–12. 370 SECTION VII Structure, Function, & Replication of Informational Macromolecules B A C 1 2 B A C 1 B 2 FIGURE 35–10 The process of unequal crossover in the region of the mammalian genome that harbors the structural C A genes encoding hemoglobins and the generation of the unequal recombinant products hemoglobin delta-beta Lepore and beta- 1 2 delta anti-Lepore. The examples given show the locations of the crossover regions within amino acid coding regions of the indicated C B A genes (ie, β and δ globin genes). (Modified with permission from Clegg JB, Weatherall DJ: β0 Thalassemia: time for a reappraisal? 1 2 Lancet 1974;304(7873):133-135.) FIGURE 35–11 The integration of a circular genome from a virus (with genes A, B, and C) into the DNA molecule of a host of the chromosome pair receives more genetic material and (with genes 1 and 2) and the consequent ordering of the genes. thus an insertion or duplication. One well-studied example of unequal crossing that occurs in humans involves the genes a recombination event analogous to that occurring between encoding hemoglobins. Unequal crossing over results in a homologous chromosomes can occur. However, some bacte- human hemoglobinopathy designated Lepore and anti-Lepore riophages synthesize proteins that bind specific sites on bac- (Figure 35–10). terial chromosomes to a nonhomologous site characteristic The farther apart any two genes are on an individual chro- of the bacteriophage DNA molecule. Integration occurs at the mosome, the greater the likelihood of a crossover recombi- site and is said to be “site specific.” nation event. This is the basis for genetic mapping methods. Many animal viruses, particularly the oncogenic viruses— Unequal crossover affects tandem arrays of repeated DNAs either directly or, in the case of RNA viruses such as HIV that whether they are related globin genes, as in Figure 35–10, or causes AIDS, double-stranded DNA copies generated by the more abundant repetitive DNA. Unequal crossover through action of the viral RNA-dependent DNA polymerase, or slippage in the base pairing can result in expansion or con- reverse transcriptase—can be integrated into chromosomes traction in the copy number of the repeat family and may of the mammalian cell. Integration of the viral DNA into the contribute to the expansion and fixation of variant members genome of the infected cells generally is not “site specific” but throughout the repeat array. does display site preferences. Not surprisingly a subset of such integration events is mutagenic. Some Viruses Chromosomally Integrate Their Genomes in Infected Cells Transposition Can Produce Some bacterial viruses (bacteriophages) are capable of recom- Processed Genes bining with the DNA of a bacterial host in such a way that In eukaryotic cells, small DNA elements that clearly are not the genetic information of the bacteriophage is incorporated viruses are capable of transposing themselves in and out of in a linear fashion into the genetic information of the host. the host genome in ways that affect the function of neigh- This integration, which is a form of recombination, occurs by boring DNA sequences. These mobile elements, sometimes the mechanism illustrated in Figure 35–11. The backbone of called “jumping DNA,” or jumping genes, can carry flanking the circularized bacteriophage genome is broken, as is that regions of DNA and, therefore, profoundly affect evolution. of the DNA molecule of the host; the appropriate ends are As mentioned earlier, the Alu family of moderately repeated resealed with the proper polarity. The bacteriophage DNA is DNA sequences has structural characteristics similar to the figuratively straightened out (“linearized”) as it is integrated termini of retroviruses, which would account for the ability into the bacterial DNA molecule—frequently a closed circle as of the latter to move into and out of the mammalian genome. well. The site at which the bacteriophage genome integrates or Direct evidence for the transposition of other small DNA recombines with the bacterial genome is chosen by one of two elements into the human genome has been provided by the mechanisms. If the bacteriophage contains a DNA sequence discovery of “processed genes” for immunoglobulin mol- homologous to a sequence in the host DNA molecule, then ecules, α-globin molecules, and many others. These processed CHAPTER 35 DNA Organization, Replication, & Repair 371 genes consist of DNA sequences identical or nearly identical to those of the messenger RNA for the appropriate gene product. That is, the 5′-nontranslated region, the coding region with- out intron representation, and the 3′ poly(A) tail are all pres- ent contiguously. This particular DNA sequence arrangement must have resulted from the reverse transcription of an appro- priately processed messenger RNA molecule from which the intron regions had been removed and the poly(A) tail added. The only recognized mechanism that this reverse transcript could have used to integrate into the genome would have been a transposition event. In fact, these “processed genes” have short terminal repeats at each end, as do known transposed sequences in other organisms. In the absence of their tran- scription and thus genetic selection for function, many of the processed genes have been randomly altered through evolu- tion so that they now contain nonsense codons that preclude their ability to encode a functional, intact protein even if they could be transcribed (see Chapter 37). Thus, such transposed sequences are referred to as “pseudogenes.” Gene Conversion Produces Rearrangements Besides unequal crossover and transposition, a third mecha- FIGURE 35–12 Sister chromatid exchanges between human nism can effect rapid changes in the genetic material. Similar chromosomes. The exchanges are detectable by Giemsa staining sequences on homologous or nonhomologous chromosomes of the chromosomes of cells replicated for two cycles in the pres- may occasionally pair up and eliminate any mismatched ence of bromodeoxyuridine (BrdU; 5-Bromo-2′-deoxyuridine; see Figure 32–13 with a depiction of 5-Iodo-2′deoxyuridine). The arrows sequences between them. This may lead to the accidental fixa- indicate some regions of exchange. DNA synthesized with the thymine tion of one variant or another throughout a family of repeated analog BrdU appears black in this image. (Reproduced with permis- sequences and thereby homogenize the sequences of the sion from S Wolff and J Bodycote.) members of repetitive DNA families. This process is referred to as gene conversion. the same VL and CL genes have been moved physically closer, and linked together in the genome within a single transcrip- tion unit. However, even then, this rearrangement of DNA Sister Chromatids Exchange during differentiation does not bring the VL and CL genes In diploid eukaryotic organisms such as humans, after cells into contiguity in the DNA. Instead, the DNA contains an progress through the DNA synthetic, or S phase of the mitotic intron of about 1200 bp at or near the junction of the V and C cell cycle (see Figure 35–20), they contain a tetraploid content regions. This intron sequence is transcribed into RNA along of DNA. This is in the form of sister chromatids of chromo- with the VL and CL exons, and the interspersed, intronic non- some pairs (see Figure 35–6). Each of these sister chromatids IgG sequence information is removed from the RNA during contains identical genetic information since each is a prod- its nuclear processing via mRNA splicing (see Chapters 36 uct of the semiconservative replication of the original par- and 38). ent DNA molecule of that chromosome. Crossing over can occur between these genetically identical sister chromatids. Of course, these sister chromatid exchanges (Figure 35–12) DNA SYNTHESIS & REPLICATION have no genetic consequence as long as the exchange is the ARE RIGIDLY CONTROLLED result of an equal crossover. The primary function of DNA replication is the provision of progeny with the genetic information possessed by the parent. Immunoglobulin Genes Rearrange Thus, the replication of DNA must be complete and carried In mammalian cells, some interesting gene rearrangements out in such a way as to maintain genetic stability within the occur normally during development and differentiation. For organism and the species. The process of DNA replication is example, the VL and CL genes, which encode for the immuno- complex and involves many cellular functions and several globulin G (IgG) light-chain variable (VL) and constant (CL) verification procedures to ensure fidelity in replication. About portions of the IgG light chain in a single IgG molecule (see 30 proteins are involved in the replication of the Escherichia coli Chapters 38, 52), are widely separated in the germ line DNA. chromosome; this process is significantly more complex in In the DNA of a differentiated IgG-producing (plasma) cell, eukaryotic organisms. 372 SECTION VII Structure, Function, & Replication of Informational Macromolecules In all cells, replication can occur only from a single-stranded TABLE 35–4 Steps Involved in DNA Replication in DNA (ssDNA) template. Therefore, mechanisms must exist Eukaryotes to target the site of initiation of replication and to unwind the 1. Identification of the origins of replication dsDNA in that region. The replication complex must then form. 2. ATP hydrolysis-driven removal of nucleosomes and unwinding of After replication is complete in an area, the parent and daughter dsDNA to provide an ssDNA template strands must reform dsDNA. In eukaryotic cells, an additional step must occur. The dsDNA must reform the chromatin struc- 3. Formation of the replication fork; synthesis of RNA primer ture, including nucleosomes that existed prior to the onset of 4. Initiation of DNA synthesis and elongation replication. Although this entire process is not completely 5. Formation of replication bubbles with ligation of the newly understood in eukaryotic cells, replication has been quite pre- synthesized DNA segments cisely described in prokaryotic cells, and the general principles 6. Reconstitution of chromatin structure are the same in both. The major steps are listed in Table 35–4, illustrated in Figure 35–13, and discussed, in sequence, in fol- lowing discussion. A number of proteins, most with specific synthesis termed oriC is bound by the protein dnaA, which enzymatic action, are involved in this process (Table 35–5). forms a complex consisting of 150 to 250 bp of DNA and multimers of this single-strand DNA-binding protein. This The Origin of Replication binding event leads to the local denaturation and unwinding At the origin of replication (ori), there is an association of of an adjacent A+T-rich region of DNA. Functionally similar sequence-specific dsDNA-binding proteins with a series of autonomously replicating sequences (ARS) or replicators direct repeat DNA sequences. In E. coli, the origin of DNA have been identified in yeast cells. The ARS contains a somewhat FIGURE 35–13 Steps involved in DNA replication. This figure describes DNA replication in an E. coli cell, but the general steps are similar in eukaryotes. A specific interaction of a protein (the dnaA protein) to the origin of replication (oriC) results in local unwinding of DNA at an adja- cent A+T-rich region. The DNA in this area is maintained in the single-strand conformation (ssDNA) by single-strand-binding proteins (SSBs). This allows a variety of proteins, including helicase, primase, and DNA polymerase, to bind and to initiate DNA synthesis. The replication fork proceeds as DNA synthesis occurs continuously (long red arrow) on the leading strand and discontinuously (short black arrows) on the lagging strand. The nascent DNA is always synthesized in the 5′ to 3′ direction, as DNA polymerases can add a nucleotide only to the 3′ end of a DNA strand. CHAPTER 35 DNA Organization, Replication, & Repair 373 TABLE 35–5 Classes of Proteins Involved in Replication complex that consists of several polymerase accessory factors (β′, γ, δ, δ′, and τ). DNA polymerases only synthesize DNA Protein Function in the 5′ to 3′ direction, and only one of the several differ- DNA polymerases Deoxynucleotide polymerization ent types of polymerases is involved at the replication fork. Helicases ATP-driven processive unwinding of DNA Because the DNA strands are antiparallel (see Chapter 34), the polymerase functions asymmetrically. On the leading Topoisomerases Relieve torsional strain that results from helicase-induced unwinding (forward) strand, the DNA is synthesized continuously. On the lagging (retrograde) strand, the DNA is synthesized DNA primase Initiates synthesis of RNA primers in short (1-5 kb; see Figure 35–16) fragments, the so-called Single-strand binding Prevent premature reannealing ssDNA Okazaki fragments, so named after the scientist who discov- proteins (SSBs) strands to form dsDNA ered them. Several Okazaki fragments (up to 1000) must be DNA ligase Seals the single-strand nick between the sequentially synthesized for each replication fork. To ensure that nascent chain and Okazaki fragments on this happens, the helicase acts on the lagging strand to unwind lagging strand dsDNA in a 5′ to 3′ direction. The helicase associates with the primase to afford the latter proper access to the template. This allows the RNA primer to be made and, in turn, the polymerase degenerate 11-bp sequence called the origin replication element to begin replicating the DNA. This is an important reaction (ORE). The ORE binds a set of proteins, analogous to the dnaA sequence since DNA polymerases cannot initiate DNA synthesis protein of E. coli, the group of proteins is collectively called de novo. The mobile complex between helicase and primase the origin recognition complex (ORC). ORC homologs have has been called a primosome. As the synthesis of an Okazaki been found in all eukaryotes examined. The ORE is located fragment is completed and the polymerase is released, a new adjacent to an approximately 80-bp A+T-rich sequence that is primer has been synthesized. The same polymerase molecule easy to unwind. This is called the DNA unwinding element remains associated with the replication fork and proceeds to (DUE). The DUE is the origin of replication in yeast and is synthesize the next Okazaki fragment. bound by the MCM protein complex. Consensus sequences similar in sequence to oriC or ARS in structure have not been precisely defined in mammalian The DNA Polymerase Complex cells. However, several of the proteins that participate in ori A number of different DNA polymerase molecules engage recognition and function have been identified in human cells in DNA replication. These share three important properties: and appear quite similar to their yeast counterparts in both (1) chain elongation, (2) processivity, and (3) proofread- amino acid sequence and function. This fact argues that at the ing. Chain elongation accounts for the rate (in nucleotides functional level ORE-like elements exist in humans. per second; nt/s) at which polymerization (ie, phosphodiester bond formation) occurs. Processivity is an expression of the Unwinding of DNA number of nucleotides added to the nascent chain before the The interaction of proteins with ori defines the start site of polymerase disengages from the template. The proofreading replication and provides a short region of ssDNA essential for function identifies copying errors and corrects them. In E. coli, initiation of synthesis of the nascent DNA strand. This pro- DNA polymerase III (pol III) functions at the replication fork. cess requires the formation of a number of protein–protein Of all polymerases, it catalyzes the highest rate of chain elon- and protein-DNA interactions. A critical step is provided by gation and is the most processive. It is capable of polymerizing a DNA helicase that allows for processive unwinding of DNA. 0.5 Mb of DNA during one cycle on the leading strand. Pol III This function is provided by a complex of dnaB helicase and is a large (>1 MDa), multisubunit protein complex in E. coli. the dnaC protein. Single-stranded DNA-binding proteins DNA pol III associates with the two identical β subunits of the (SSBs) stabilize this complex once formed. DNA sliding “clamp”; this association dramatically increases pol III-DNA complex stability, processivity (100 to >50,000 nucleotides) and rate of chain elongation (20-50 nt/s) generat- Formation of the Replication Fork ing the high degree of processivity the enzyme exhibits. A replication fork consists of four components that form in Polymerase I (pol I) and II (pol II) are mostly involved in the following order: (1) the DNA helicase unwinds a short seg- proofreading and DNA repair. Eukaryotic cells have counter- ment of the parental duplex DNA; (2) SSBs bind to ssDNA and parts for each of these enzymes plus a large number of addi- prevent premature reannealing of ssDNA back into dsDNA; tional DNA polymerases primarily involved in DNA repair. A (3) a primase initiates synthesis of an RNA molecule that is comparison is shown in Table 35–6. essential for priming DNA synthesis; and (4) the DNA poly- In mammalian cells, the polymerase is capable of polym- merase initiates nascent, daughter-strand synthesis on the free erizing at a rate that is somewhat slower than the rate of 3′-OH of the primer. polymerization of deoxynucleotides by the bacterial DNA The DNA polymerase III enzyme (the dnaE gene prod- polymerase complex. This reduced rate may result from inter- uct in E. coli) binds to template DNA as part of a multiprotein ference by nucleosomes. 374 SECTION VII Structure, Function, & Replication of Informational Macromolecules TABLE 35–6 A Comparison of Prokaryotic & Eukaryotic Replication Exhibits Polarity DNA Polymerases As has already been noted, DNA molecules are double E. coli Eukaryotic Function stranded and the two strands are antiparallel. The replication I Gap filling following DNA replication, of DNA in prokaryotes and eukaryotes occurs on both strands repair, and recombination simultaneously. However, an enzyme capable of polymerizing DNA in the 3′ to 5′ direction does not exist in any organism, II DNA proofreading and repair so that both of the newly replicated DNA strands cannot grow β DNA repair in the same direction simultaneously. Nevertheless, in bacteria γ Mitochondrial DNA synthesis the same enzyme does replicate both strands at the same time (in eukaryotes pol ε and pol δ catalyze leading and lagging III ε Processive, leading strand synthesis strand synthesis; see Table 35–6). The single enzyme replicates DnaG α Primase one strand (“leading strand”) in a continuous manner in the δ Processive, lagging strand synthesis 5′ to 3′ direction, with the same overall forward direction. It replicates the other strand (“lagging strand”) discontinuously while polymerizing the nucleotides in short spurts of 150 to Initiation & Elongation of DNA 250 nucleotides, again in the 5′ to 3′ direction, but at the same Synthesis time it faces toward the back end of the preceding RNA primer rather than toward the unreplicated portion. This process of The initiation of DNA synthesis (Figure 35–14) requires semidiscontinuous DNA synthesis is shown diagrammati- priming by a short length of RNA, about 10 to 200 nucleo- cally in Figures 35–13 and 35–16. tides long. In E. coli primer formation is catalyzed by dnaG (primase), in eukaryotes DNA Pol α synthesizes these RNA primers. Once complete DNA synthesis commences on this Formation of Replication Bubbles short RNA primer. The priming process involves nucleophilic Replication of the circular bacterial chromosome, composed attack by the 3′-hydroxyl group of the RNA primer on the of roughly 5 × 106 bp of DNA proceeds from a single ori. This phosphate of the first entering deoxynucleoside triphosphate process is completed in about 30 minutes, a replication rate of (N in Figure 35–14) with the splitting off of pyrophosphate; 3 × 105 bp/min. The entire mammalian genome replicates in this transition to DNA synthesis is catalyzed by the appro- approximately 9 hours, the average period required for forma- priate DNA polymerases (DNA pol III in E. coli; DNA pol δ tion of a tetraploid genome from a diploid genome in a rep- and ε in eukaryotes). The 3′-hydroxyl group of the recently licating cell. If a mammalian genome (3 × 109 bp) replicated attached deoxyribonucleoside monophosphate is then free to at the same rate as bacteria (ie, 3 × 105 bp/min) from but a carry out a nucleophilic attack on the next entering deoxy- single ori, replication would take over 150 hours! Metazoan ribonucleoside triphosphate (N + 1 in Figure 35–14), again organisms get around this problem using two strategies. First, at its α phosphate moiety, with the splitting off of pyrophos- replication is bidirectional. Second, replication proceeds from phate. Of course, selection of the proper deoxyribonucleo- multiple origins in each chromosome (a total of as many as tide whose terminal 3′-hydroxyl group is to be attacked is 100 in humans). Thus, replication occurs in both directions dependent on proper base pairing with the other strand of the along all of the chromosomes, and both strands are replicated DNA molecule according to Watson and Crick base pairing simultaneously. This replication process generates “replication rules (Figure 35–15). When an adenine deoxyribonucleoside bubbles” (Figure 35–17). monophosphoryl moiety is in the template position, a thymi- The multiple ori sites that serve as origins for DNA repli- dine triphosphate will interact with the dXTP binding site of cation in eukaryotes are poorly defined except in a few animal the DNA polymerase and its α phosphate will be attacked by viruses and in yeast. However, it is clear that initiation is regu- the 3′-hydroxyl group of the deoxyribonucleoside monophos- lated both spatially and temporally, since clusters of adjacent phoryl most recently added to the polymer. By this stepwise sites initiate replication synchronously. Replication firing, or process, the template dictates which deoxyribonucleoside DNA replication initiation at a replicator/ori, is influenced by triphosphate is complementary, and by hydrogen bonding, a number of distinct properties of chromatin structure that holds it in place while the 3′-hydroxyl group of the growing are just beginning to be understood. It is clear, however, that strand attacks and incorporates the new nucleotide into the there are more replicators and excess ORC than needed to polymer. These segments of DNA attached to an RNA primer replicate the mammalian genome within the time of a typi- component are the Okazaki fragments (Figure 35–16). In cal S phase. Therefore, mechanisms for controlling the excess both bacteria and mammals, after many Okazaki fragments ORC-bound replicators must exist. Understanding the control are generated, the replication complex begins to remove the of the formation and firing of replication complexes is one of RNA primers, to fill in the gaps left by their removal with the the major challenges in this field. proper base-paired deoxynucleotide, and then to seal the frag- During the replication of DNA, there must be a separa- ments of newly synthesized DNA by enzymes referred to as tion of the two strands to allow each to serve as a template DNA ligases. by hydrogen bonding its nucleotide bases to the incoming CHAPTER 35 DNA Organization, Replication, & Repair 375 X1 C O H H H H X2 OH C O O RNA primer P O H H H H X3 OH C O O P O H H H H X4 OH C O O P O H H H H OH OH N C O O O P First entering dNTP O O O– P H H H H O– O O– P OH H O O– X4 C O O P O H H H H N OH C O O P O H H H H H OH N+1 C O O O P Second entering dNTP O O O – P H H – H H O O O– P OH H O O– FIGURE 35–14 The initiation of DNA synthesis upon a primer of RNA and the subsequent attachment of the second deoxyribo- nucleoside triphosphate. Note the blue highlighting of the 2’-H moiety of deoxyribose and the yellow highlighting of the 2′-OH moiety within the RNA primer. 376 SECTION VII Structure, Function, & Replication of Informational Macromolecules FIGURE 35–15 The RNA-primed synthesis of DNA demonstrating the template function of the complementary strand of parental DNA. FIGURE 35–16 The discontinuous polymerization of deoxyribonucleotides on the lagging strand; formation of Okazaki fragments during lagging strand DNA synthesis is illustrated. Okazaki fragments are 100 to 250 nucleotides long in eukaryotes, 1000 to 2000 nucleo- tides in prokaryotes. FIGURE 35–17 The generation of “replication bubbles” during the process of DNA synthesis. The bidirectional replication and the proposed positions of unwinding proteins at the replication forks are depicted. CHAPTER 35 DNA Organization, Replication, & Repair 377 deoxynucleoside triphosphate. The separation of the DNA interspersed in the DNA molecules of all organisms. The swivel strands is promoted by SSBs in E. coli, and a protein termed function is provided by specific enzymes that introduce “nicks” replication protein A (RPA) in eukaryotes. These molecules in one strand of the unwinding double helix, thereby allow- stabilize the single-stranded structure as the replication fork ing the unwinding process to proceed. The nicks are quickly prog

Harper's Biochemistry Chapter 35 PDF - DNA Organization, Replication, & Repair

Document Details

Tags

Related

Summary

Full Transcript