Topic 14 Nucleic Acid Structure Lecture Notes 2024 PDF
Document Details
Uploaded by ResplendentHydrogen
Western University
2024
Tags
Summary
These are lecture notes covering nucleic acid structure, focusing on the chemical structure of nucleic acid polymers and the differences between DNA and RNA. The notes also discuss the structure of the double helix and the factors that influence the melting temperature (Tm).
Full Transcript
Topic 14 Nucleic Acid Structure Readings: Sections 3.1, 3.2, 20.5 (Eukaryotic DNA is packaged in nucleosomes) By the end of this topic, you should be able to: Explain the chemical structure of nucleic acid polymers, without memorizing the s...
Topic 14 Nucleic Acid Structure Readings: Sections 3.1, 3.2, 20.5 (Eukaryotic DNA is packaged in nucleosomes) By the end of this topic, you should be able to: Explain the chemical structure of nucleic acid polymers, without memorizing the structures of the nitrogenous bases Describe B-DNA and the higher order structures formed by RNA, and the important forces that stabilize these structures Predict the impact of changes in sequence or conditions on the melting temperature of a DNA double helix Identify chemical and structural similarities and differences between DNA and RNA Explain the different levels of structural organization displayed by DNA in eukaryotic cells Describe the structure of the nucleosome The two major types of nucleic acid in living systems are ribonucleic acid (RNA) and deoxyribonucleic acid (DNA). DNA is a crucial molecule, because it holds the genetic material, the information passed from parent(s) to offspring that is required to produce an organism. RNA has several different functions, including acting as the template from which proteins are produced (messenger RNA or mRNA), forming the ribosome (along with proteins) and catalyzing protein synthesis (ribosomal RNA or rRNA), and carrying amino acids to the growing peptide chain during protein synthesis (transfer RNA or tRNA). RNA is also involved in gene regulation and processes like mRNA splicing and telomere maintenance. RNA and DNA are linear polymers composed of ribonucleotide and deoxyribonucleotide monomers, respectively. Each monomer is composed of three parts: a monosaccharide, a nitrogenous base, and a phosphate (see Guided Tour, Nucleic Acid Structure). Monosaccharides Ribonucleotides contain the pentose ribose. Deoxyribonucleotides contain deoxyribose, which is like ribose, except that C2' bears two hydrogen atoms, instead of one hydrogen and one hydroxyl. Nitrogenous bases There are two types of nitrogenous base: purines and pyrimidines. All purines and pyrimidines are planar and relatively hydrophobic. The two purines found in DNA and RNA are adenine (A) and guanine (G). The three pyrimidines are cytosine (C), which is found in both DNA and RNA, thymine (T), which is found in DNA but not RNA, and uracil (U), which is found in RNA but not DNA. Uracil is identical to thymine except that uracil lacks the methyl (-CH 3 ) group. Purines and pyrimidines are joined to the anomeric (1') carbon of ribose or deoxyribose via an Nglycosidic bond. The anomeric carbon is ordinarily in the β configuration when joined to a nitrogenous base. A sugar joined to a nitrogenous base is called a nucleoside. The five common nucleosides are adenosine, guanosine, cytidine, thymidine, and uridine (Table 3.1). Phosphates Phosphates are attached to the 5' position of ribose or deoxyribose by a phosphoester linkage. After a phosphate is attached to this position of a nucleoside, a nucleotide is formed (i.e., a nucleotide has all three groups: a sugar, a base, and one or more phosphates). Phosphate groups are negatively charged at cellular pH. The names of nucleotides are usually abbreviated using the letter for the nitrogenous base followed by an indication of the number of phosphates present (e.g., ADP for adenosine diphosphate). Sometimes (but not always) the abbreviation for a deoxyribonucleotide will begin with a small d (e.g., dTTP). Polynucleotide Structure Mononucleotides are linked together to form an unbranched polynucleotide chain through 3',5' phosphodiester bonds. That is, the phosphate group attached to the 5' carbon of one pentose forms a phosphoester bond with the hydroxyl group at the 3' position of another pentose (see Guided Tour, Nucleic Acid Structure, and Chapter 3.2). The end of a polynucleotide at which the 5' carbon is “free”, or not attached to another monosaccharide, is called the 5' end; likewise, at the 3' end the 3' carbon is free. The sequence of the polynucleotide is represented by giving the letter for each nucleotide base in order, e.g., CAAGTG. By convention polynucleotides are written in the 5' to 3' direction from left to right, unless otherwise indicated. Usually, the context makes it clear whether the polynucleotide is DNA or RNA. All organisms have natural processes to synthesize DNA and RNA with desired sequences (see Topics 15, 18, and 19). Scientists have developed accurate, automated chemical methods to synthesize single-stranded oligonucleotides (or “oligos” for short) of up to about 200 base pairs. Longer sequences can be obtained by joining strands together. Single-stranded oligos have many uses in molecular biology (e.g., as primers for DNA sequencing or PCR), as outlined in Topic 22. The double helix A single strand of DNA is normally not found in the cell. Almost always, two polynucleotide DNA strands associate with each other non-covalently. The strands are antiparallel; that is, they have opposite orientation or chemical "polarity". One strand runs in a 5' to 3' direction, and the other runs 3' to 5': 5' 3' 3' 5' The two polynucleotides are wrapped around each other to form a double helix (Fig 3.3). The double helix is normally right-handed, meaning that if you look at the helix along its axis, and move your gaze along one strand from a point close to you to a point farther away, the curve is in a clockwise direction (Can you relate this to the structure of your right hand?) The base pairs lie on the inside and the phosphates are on the outside. This arrangement minimizes charge repulsion between the phosphates and allows for salt stabilization of the phosphates’ negative charges. Two types of interactions between the bases stabilize the double helix: 1. Base stacking The bases are planar aromatic rings that are nearly perpendicular to the helical axis. Their flat surfaces allow them to stack on top of each other, stabilized by interactions between transient induced dipoles in the rings (known as dispersion forces or London forces). The atoms in each base pair lie at their optimal van der Waals radius from the adjacent base pair. Base stacking is not sequence-specific, in that any base can stack on top of any other base. Base stacking interactions are the major contributor to the stability of the double helix. 2. Base pairing Bases also interact through hydrogen bonds between chemical groups on the edges of their ring structures. These interactions are sequence-specific. In normal double-stranded DNA, A and T base pair with each other and no other base; likewise, G and C base pair exclusively with each other. When the bases are positioned appropriately, the hydrogen bond donors and acceptors of these pairs are perfectly complementary to each other. The AT pair forms two hydrogen bonds, and the GC pair forms three. These base pairing arrangements are called Watson-Crick base pairing, after James Watson and Francis Crick, the scientists who first proposed the correct structure of DNA (though credit should also have been given to Rosalind Franklin). Because A pairs only with T, and G pairs only with C (and vice versa), the sequence on one strand dictates the sequence on the other (i.e., if you know the sequence of one strand, you know the sequence of them both; such strands are said to be complementary strands). Note that each base pair consists of one purine and one pyrimidine base, giving a uniform base pair size, so that the two sugar-phosphate backbones stay the same distance apart. Also, because water is excluded from the interior of the double-helix, it cannot compete with the bases for hydrogen bonding positions. DNA can form different types of double helices, with different geometries. In cells, DNA usually adopts a structure called the “B” form. Features of the B-DNA double helix (Fig 3.3) are: it is narrow, about 2 nm (20 Å) wide it can be very long; human chromosome 1 consists of 245 million base pairs the distance between consecutive bases is about 0.34 nm (so chromosome 1 would be about 8.3 cm long if it were stretched out in a straight double helix) Because of the positions at which the bases are attached to the sugar-phosphate backbone and the arrangement of hydrogen bonding donors and acceptors, the two grooves between the sugarphosphate backbones are not equal in size. The wider groove is called the major groove, and the narrower one is the minor groove. Proteins use the chemical groups that project from the bases into the grooves to recognize the sequence of double-stranded DNA. RNA Differences between DNA and RNA chemical structure: 1. The sugar in RNA is ribose instead of deoxyribose. The presence of a hydroxyl group at the 2' position of ribose makes the phosphodiester bond of RNA sensitive to base (OH–). Unlike DNA, RNA is degraded into mononucleotides in basic solution. 2. RNA contains no thymine but has uracil instead. These bases are very similar, differing only by a methyl group. Thus, uracil can form a Watson-Crick base pair with adenine, just as thymine can. 3. RNA is much shorter than DNA (usually no longer than a few thousand nucleotides, though some RNAs have been found that are over 50 kilobases in length). 4. RNA is synthesized as a single-stranded molecule. When not base-paired with another molecule, the backbone of an RNA strand folds into configurations that allow the strand to base pair with itself (Fig 22.2). Often different sections of one RNA strand interact with each other to form a right-handed, antiparallel double helix, but other structures can also arise, and base-pairing frequently deviates from the Watson-Crick scheme (Fig 21.30). The three-dimensional structures of RNA molecules can be quite complex. Like protein structure, RNA structure is sequence-dependent; this is less true for DNA, in which any sequence forms the same essential structure. 5. The cell is much more likely to chemically modify nitrogenous bases in RNA than in DNA (Fig 21.29). Over 100 different types of modified RNA bases have been identified, making up 1-2% of bases in tRNA and about 0.5% of bases in rRNA. These modified bases can influence the secondary and tertiary structure of RNA. Hybridization of nucleic acids Although we often think of double-stranded DNA in the context of an organism’s genome, any two nucleic acid strands (DNA or RNA, though for simplicity I’ll discuss only DNA in this section) will associate to form a helical structure (or hybridize) in solution if they are complementary enough. This process is reversible, such that DNA strands can be separated (denatured or melted) and brought back together (renatured) by changing the conditions. Denaturation of DNA is often brought about by heating the sample. The melting temperature (Tm) is the temperature at which 50% of a specific double-stranded DNA has become single-stranded. What determines the Tm? Intrinsic factors 1. A:T / G:C ratio. The more GC content, the higher the Tm. GC base pairs are more stable mainly because of increased base-stacking interactions, but also because they have three hydrogen bonds (versus two in AT base pairs) 2. Length. Shorter double helices melt at lower temperatures. 3. Degree of complementarity. Imperfect matches have lower Tm than perfect matches. Extrinsic factors 1. Salt concentration. Because counter-ions neutralize the negative charge of the phosphate backbone and reduce repulsion between strands, increasing the salt concentration raises the Tm. 2. Organic solvent concentration. Compounds such as dimethylformamide or isopropanol increase the hydrophobicity of the solvent and reduce base stacking interactions, lowering the Tm. 3. Hydrogen-bonding compounds. Formamide and urea decrease Tm by weakening hydrogen bonding between the bases. 4. pH. Extremes of pH change the protonation states of the bases, reducing the Tm. pH changes within the range of 5 to 9 do not alter Tm. For fragments of up to about 13 base pairs, the approximate melting temperature is the sum of 2°C for each A:T plus 4°C for each G:C base pair. More complicated equations exist to estimate the Tm of longer sequences, considering both sequence and environmental conditions. Genome packaging (Chapter 20.5 – Eukaryotic DNA is packaged in nucleosomes) The DNA of one human cell would be very long if the double helices were extended in a straight line. The cell must arrange the genome in a compact form so that it will fit within the nucleus. At the same time, the DNA must be available for transcription into RNA. This implies a high degree of organization in DNA packing. DNA is organized with the help of proteins; the complex of DNA and its organizing proteins is called chromatin. In eukaryotes, DNA is associated with small proteins called histones, each of which has fewer than 200 amino acids. Because histones have many lysine and arginine residues, they are positively charged at physiological pH and ideally suited to interact with the negatively charged phosphate groups of DNA. Histones are highly conserved among eukaryotes, reflecting their important function. Bacteria organize their genomes differently than eukaryotes, and we will not discuss bacterial genome packaging in this course. The DNA double-helix helix wraps around a cluster of eight histone proteins: two copies each of histone H2A, histone H2B, histone H3 and histone H4 (Figure 20.31). The DNA makes 1.7 circuits around the histone core, a distance of 147 base pairs. The wrapped DNA and histone core proteins are collectively called a nucleosome core particle. Regions of histones near the N-terminal ends are accessible for post-translational modification while in the nucleosome core particle; the importance of these modifications will be discussed in Topic 19. A linker region, ranging in length from a few bases to about 80 bases, joins the core particles. The nucleosome core particle plus the linker is properly called the nucleosome, although the word nucleosome is often used to refer to the nucleosome core particle only. Nucleosomes compact the DNA strand by a factor of about three. The nucleosomes can pack into a coil called a 30-nm chromatin fibre (Figure 20.33). Exactly how nucleosomes are arranged in this fibre is still controversial. The 30-nm fibre is stabilized by histone H1. In a 30-nm chromatin fibre, the length of the DNA has been compacted by a factor of about 100. It is thought that the 30-nm chromatin fibre forms large loops of 30 000 to 200 000 base pairs (30-200 kbp) that are anchored to the chromosomal scaffold, a group of chromatin proteins that do not belong to the histone family. The loops are packed together to form the chromosome, which is about 1400 nm thick in its fully condensed form, just before cell division (Fig 3.1). Details are still being discovered about how the cell is able to access individual sections of DNA when needed, despite all this packing. Controlling the density and positioning of nucleosome core particles is crucial for proper regulation of transcription, as will be discussed in Topic 19. Topic 14 Review Questions WileyPLUS questions for Chapter 3 – 5, 7, 23, 25, 29, 33, 37 WileyPLUS question for Chapter 20 – 77 14-1. Which statement is false? a) Nucleosomes are present in eukaryotic chromosomes, but not in bacterial chromosomes. b) Each nucleosome contains two molecules each of histone H2A, H2B, H3, and H4. c) A nucleosome core particle contains a core of histone with DNA wrapped around it approximately 1.7 times. d) Nucleosomes are aided in their formation by the high proportion of acidic amino acids in histone proteins. e) Nucleosome formation compacts the DNA into approximately one-third of its original length. Topic 14 Review Question Answers Note: Answers to textbook review questions are in your textbook 14-1. D Topic 15 Replication of DNA Readings: Sections 20.1, 20.2 By the end of this topic, you should be able to: Explain the process of replication initiation, including the protein and DNA elements involved Explain the process of DNA priming, synthesis, and proofreading, including the protein and nucleic acid elements involved Describe how common inhibitors of viral DNA replication function Explain the process of DNA ligation Draw a replication fork and a replication bubble, and describe the processes occurring at each Describe the arrangement of DNA at telomeres, and explain the roles of these structures Starting from a single cell, humans must generate many billions of copies of their DNA throughout their lifetimes. DNA must be copied, or replicated, with great accuracy. During replication of DNA, each strand of a double helix serves as a template for synthesis of a new DNA strand (see the textbook’s Guided Tour of DNA replication). The enzyme DNA polymerase (Fig 20.7 and 20.9) makes the new strand complementary to the template, according to the Watson-Crick base pairing rules. For example, when A is on the template strand, T is added to the strand being synthesized. In eukaryotes, DNA polymerases are complex multi-subunit enzymes. Replication of DNA is said to be semi-conservative, meaning that each of the two double helices resulting from replication contains one strand from the original double helix (Fig 20.1). This was demonstrated in an elegant experiment by Meselson and Stahl, who labeled replicating bacterial cells with 15N and used centrifugation to “weigh” the bacteria’s double-stranded and single-stranded DNA after various generations had passed. Initiation Replication starts at specialized DNA sequences called replication origins. The bacterium E. coli, which has a relatively small amount of DNA on a single circular chromosome, has a single origin. Eukaryotes have more DNA, usually on multiple chromosomes, and thus have multiple origins. Replication origins are rich in AT base pairs, which are generally easier to pull apart than GC base pairs. In both E. coli and eukaryotes, initiator proteins bind to the sequences of replication origins. In E. coli, the initiator proteins pull apart the two strands of DNA at the origin, after which helicase (Fig 20.3) binds. Helicase consumes ATP to unwind DNA during the rest of the replication process. In eukaryotes, inactive helicase is part of the pre-initiation complex that binds to each replication origin; upon activation, helicase separates the strands at each origin. The region what which the DNA has become single-stranded is called a replication bubble. Single-strand binding protein (Fig 20.4) associates with unwound regions and prevents them from re-forming base pairs. By the end of the initiation process, all proteins required for DNA replication have bound to the two points in the replication bubble where double-stranded DNA becomes single-stranded. These places are called replication forks. Priming DNA polymerase cannot start synthesizing a DNA strand using only the template strand; it needs something to attach nucleotides to. The enzyme primase gives DNA polymerase something to work with by synthesizing short (~10 nucleotides long) RNA primers in the 5' to 3' direction on the separated DNA strands (Figures in section “DNA polymerase faces two problems, and guided tour, DNA replication). DNA Synthesis After primer synthesis, a sliding clamp protein is loaded onto the primer-template complex, forming a ring-shaped structure (Fig 20.8) that surrounds the nucleic acid strands. The sliding clamp will not spontaneously dissociate from double-stranded DNA, but (as its name suggests) it can easily slide along its length. The DNA polymerase that will synthesize most of the new DNA (also called the “replicative” DNA polymerase) then binds to the sliding clamp protein. In E. coli, the replicative DNA polymerase is called DNA polymerase III. This enzyme, like all DNA polymerases, synthesizes DNA in the 5' to 3' direction, extending the primer from its 3' end. At each position on the single-stranded template DNA, it adds the complementary base from deoxyribonucleoside triphosphates (dNTPs) in the surrounding solution. Mononucleotides are incorporated into the new DNA strand, and pyrophosphate is the by-product (Fig 20.5). As for hydrolysis of ATP to AMP and PP i , this process is very energetically favourable. DNA polymerase is said to be a “processive” enzyme, because it undergoes several rounds of catalysis before dissociating from the template. Even so, in the absence of the sliding clamp, DNA polymerase would frequently dissociate from the template strand, making replication less efficient. So the sliding clamp is not only used to recruit DNA polymerase to the template, but it also keeps DNA polymerase bound to the template, greatly improving its processivity. Synthesis occurs in both directions from the replication origin along the two templates, at replication forks. The complex of proteins that carries out replication at a replication fork is called a replisome. Within each replisome, helicase unwinds the DNA, exposing single-stranded regions that can be used as templates for DNA synthesis. Replication on one of the template strands proceeds very naturally: DNA polymerase III travels with the replication fork toward the 5' end of the template, synthesizing complementary DNA in the 5' to 3' direction. This template strand is called the leading strand, and synthesis on this strand is said to be continuous. DNA synthesis on the leading strand can continue for long distances, usually about 5 x 105 nucleotides, using only one primer. Because double-stranded DNA is anti-parallel, on the complementary template strand (called the lagging strand) DNA synthesis must occur in the direction opposite to that in which the replication fork is moving (Fig 20.6). On the lagging strand, synthesis occurs discontinuously to form Okazaki fragments, which are connected later by DNA ligase (see below). In E. coli, Okazaki fragments are 500 to 2000 nucleotides long, but in humans they are only about one-tenth this length. Because synthesis must be restarted repeatedly, DNA synthesis on the lagging strand needs more primers than synthesis on the leading strand. Replication on the leading and lagging strands occurs concurrently, but the exact events that occur in the replisome is still under investigation. The traditional “trombone” model is shown in Fig 20.6. In this model, two DNA polymerase complexes associate with the helicase throughout replication, each one synthesizing DNA using a different template strand. During synthesis on the lagging strand, a loop composed of single-stranded and double-stranded DNA extends from the replisome, growing larger as the Okazaki fragment is lengthened. The increase in size of this loop resembles the lengthening of the tube of a slide trombone. When one Okazaki fragment is finished, the lagging strand polymerase (still associated with helicase) lets go of the template strand and then reassociates with it at the point where the primer for the next Okazaki fragment has been made. Recent studies have cast some doubt on different aspects of the trombone model. On the lagging strand, DNA polymerase may frequently dissociate from the helicase during synthesis, eliminating the “trombone loop”. It appears that three DNA polymerase complexes can associate with the helicase simultaneously. Having a third DNA polymerase bound to the replisome could make initiation of Okazaki fragment synthesis more efficient; if the lagging strand polymerase dissociates from the helicase, another polymerase would be present to immediately start synthesis of the next Okazaki fragment. Proofreading Although human DNA polymerases are very accurate, every now and then they make mistakes. Errors can take the form of adding an incorrect base (a substitution), adding one or more extra bases (an insertion) or not adding enough bases (a deletion). In addition to its polymerization activity, DNA polymerase has 3' to 5' exonuclease activity (Fig 20.10) in a distinct active site, which can remove mononucleotides from the 3' end of a DNA strand. This exonuclease activity is used to proofread the newly synthesized DNA strand and remove any erroneously incorporated nucleotides. After any incorrect nucleotide(s) have been removed, synthesis resumes from the point of error. Proofreading increases the accuracy of DNA replication to about 1 error every 106 to 107 bases in humans. DNA Ligation When DNA polymerase III completes an Okazaki fragment, it encounters the RNA primer from the previously synthesized fragment (Fig 20.11 and 20.12). When this happens in bacteria, DNA polymerase III dissociates from the template, and DNA polymerase I, which is also used in repair processes, binds. Besides synthesizing DNA, the polymerase I has 5' to 3' exonuclease activity, which is used to degrade the RNA primer. At the same time as it degrades the primer, the DNA polymerase I continues to synthesize DNA, resulting in “nick translation”. When all the RNA has been replaced, the DNA polymerase I dissociates from the double helix, and the nick in the newly synthesized DNA is sealed by DNA ligase. Bacterial DNA ligase requires NAD+ during this process, but the DNA ligase of many other organisms (including humans) hydrolyzes ATP to AMP and PP i. Ligase requires the substrate DNA strands to have a 3' hydroxyl and a 5' phosphate. Okazaki fragment maturation in some other organisms (such as humans) is done a bit differently than is described above. The replicating DNA polymerase does not dissociate from the template, and the RNA primer is removed by another enzyme (e.g., RNase H, as in Fig 20.11, or a flap endonuclease that removes single-stranded pieces of RNA that have been pushed aside by DNA polymerase). Telomeres Telomeres are protein-DNA structures found at the ends of chromosomes. In humans, the DNA consists of a 2000 to 10 000 bp double-stranded region containing many repeats of a six-nucleotide sequence (TTAGGG), followed by a 100- to 300-nucleotide 3' overhang that loops back and displaces an earlier section of the same strand (Fig 20.15). Many specific proteins are associated with this “Tloop” DNA structure. Telomeres enable the cell to distinguish between the end of the chromosome and a double-strand break in the middle of the chromosome. Telomeres also help with another problem, that of replicating the lagging strand at the ends of chromosomes. The cell is unable to synthesize DNA complementary to the very end of the lagging strand, because an RNA primer is needed. Therefore, chromosomes get shorter each time the DNA is replicated. Telomeres provide “extra” DNA at the ends of chromosomes, so that coding DNA is not lost as the ends shorten. Chromosome shortening is believed to be part of a biological clock that determines the number of generations at which cells stop dividing, and may contribute to the aging process. Telomeres are synthesized and lengthened by a special DNA polymerase called telomerase (Fig 20.14 and 20.15) that uses an associated RNA strand as a template. In humans, this enzyme is expressed during embryo development and in germ-line cells, but expression stops in most somatic cells shortly after birth. Abnormal expression of telomerase contributes to making cancer cells immortal. Antiviral Nucleoside Analogues (Box 20.A) Azido thymidine (AZT), also called zidovudine or Retrovir, is an anti- viral agent used to combat human immunodeficiency virus (HIV). HIV is 2 O HOCH Thymine a retrovirus, a small single-stranded RNA virus. Once in the cell, the virus’s RNA is transcribed to DNA in a process called reverse H H H H transcription (the enzyme that does this is called reverse transcriptase). AZT is taken orally and converted to the triphosphate N H form by the patient’s cells. It can be used as a substrate in place of 3 dTTP during DNA synthesis, becoming incorporated into the newly AZT synthesized DNA strand. Because it has azide (N 3 ) instead of a hydroxyl group at its 3' carbon, the next nucleotide cannot be added to the strand and DNA synthesis stops, preventing formation of new virus particles. AZT triphosphate inhibits reverse transcriptase 100 times more effectively than it inhibits human DNA polymerase, because the viral enzyme, but not the human enzyme, has a higher affinity for AZT than for dTTP. Acyclovir is a selective inhibitor of the herpes virus DNA polymerase. It also causes chain termination when incorporated into a growing DNA strand. Guanine HOCH 2 O H H H H acyclovir Topic 15 Review Questions WileyPLUS questions for Chapter 20 – 1, 3, 5, 7, 19, 29, 33 15-1. In a DNA strand that is being synthesized, which end is growing? a) the 3' end b) the 5' end c) both ends 15-2. On the figure of a replication bubble (above): 1. Indicate where the origin of replication was located (use O). 2. Label the leading-strand template and the lagging-strand template of the right-hand fork [R] as X and Y, respectively. 3. Indicate by arrows the direction in which the newly made DNA strands (indicated by dark lines) were synthesized. 4. Number the Okazaki fragments on each strand 1, 2, and 3 in the order in which they were synthesized. 5. Indicate where the most recent DNA synthesis has occurred (use S). 6. Indicate the direction of movement of the replication forks with arrows. Topic 15 Review Question Answers 15-1. A 15-2. Topic 16 DNA Repair Readings: Section 20.3 By the end of this topic, you should be able to: Describe and differentiate between the different kinds of commonly observed DNA damage Explain the fundamental mechanisms of the major DNA repair systems Predict which repair mechanism will be used to repair a particular instance of DNA damage Predict the consequences of DNA damage that is not repaired before the next round of replication Describe the role of translesion DNA polymerases as they relate to DNA damage To preserve genetic information, the cell must guard against unintended changes in the sequence or structure of DNA, which I will refer to as “DNA damage”. The main types of DNA damage are mentioned below. 1. Copying mistakes. Despite the proofreading activity of DNA polymerase (see Topic 15), mistakes occur during DNA replication, at a rate of about 1 per 107 nucleotides in humans. The result is insertion or deletion of bases, or mismatched base pairs, which would lead to a change in the DNA sequence, or mutation, in one of the daughter double helices after DNA replication. 2. Depurination. The N-glycosidic bond can spontaneously hydrolyze, resulting in the loss of an entire adenine or guanine base. At the resulting abasic site, the sugar-phosphate backbone remains intact. Abasic sites block replication by the normal replicative DNA polymerase. When this DNA polymerase stalls, one of several translesion DNA polymerases is recruited to the site. Translesion DNA polymerases are able to synthesize DNA past the site of damage, but because the template has no base, they are likely to either skip that position, or introduce a mutation in the newly synthesized strand. After a translesion polymerase has synthesized DNA past the site of damage, the normal replicative DNA polymerase resumes DNA synthesis. Note that translesion DNA polymerases do not repair damage. They merely allow replication to continue past sites of damage on the template strand. 3. Deamination. The amine group of a nitrogenous base, most commonly cytosine, is changed to a carbonyl. This occurs spontaneously and may lead to mutation. Upon deamination, cytosine is converted to uracil. 4. Pyrimidine dimers. The double bonds in adjacent pyrimidines, most commonly two thymines, react to form a four-membered ring structure. This reaction is usually caused by ultraviolet radiation. Translesion DNA polymerases are required to replicate DNA past pyrimidine dimers, but are more error-prone than normal replicative DNA polymerases, increasing the risk of mutation. 5. Other base modifications. Ionizing radiation, such as X-rays or gamma-rays, causes a variety of modifications to all four bases. Some chemicals called mutagens react with DNA bases, leading to changes in base-pairing properties or stalled replication. An example of a mutagen is the superoxide radical, O 2 -·, the first intermediate formed during reduction of oxygen to water by Complex IV (Topic 11). Another group of mutagens is the polycyclic aromatic hydrocarbons. 6. Strand breaks. Ionizing radiation, reactive oxygen species, or mechanical stress can break the sugar-phosphate backbone, either of one strand (a single-strand break) or of both strands (a doublestrand break). Neither the normal replicative DNA polymerase nor translesion polymerases can synthesize DNA past a break in the template. Double strand breaks can cause chromosomal abnormalities or result in cell death. DNA Repair Systems A single alteration in a DNA molecule, if left unrepaired, could interfere with the replication process and may cause the death of the cell. DNA damage can also cause mutations, which can alter the sequence of expressed proteins and impair their function. For these reasons, cells have developed several DNA repair systems. It is not as important to repair damage to RNA molecules, since RNA exists transiently and is continually being turned over. 1. Proofreading during DNA replication. See Topic 15. 2. Mismatch repair. Mismatch repair enzymes fix mistakes made by DNA polymerase that escaped correction by proofreading, with a success rate of 99%. After the protein MutS (Fig 20.16) has detected a site of mismatch, the cell must determine which strand is the newly synthesized (and hence erroneous) strand. In most bacteria, the DNA is scanned in both directions until a methylated base is found (most bacteria add methyl groups to bases within specific sequences some time after replication). Unmethylated strands have not had time to become methylated yet, i.e., the unmethylated strand is the newly synthesized strand. An endonuclease nicks the backbone of the newly synthesized strand, and then an exonuclease removes the bases from the nick to the mismatch site (this can be hundreds of nucleotides away). A replicative DNA polymerase then fills in the gap created by the exonuclease. In eukaryotes, the method of determining the newly synthesized strand is still under investigation. The new strand could be identified by intentional nicks (single-strand breaks) in the new strand, or by interactions of the mismatch repair proteins with the replicative sliding clamp protein. However the newly synthesized strand is recognized, it is removed by an exonuclease and resynthesized, as in bacteria. 3. Direct repair. A few common types of damaged bases can be repaired by enzymes specific to those bases. For example, an enzyme exists to remove the methyl group from O6-methylguanine; this enzyme will not repair any other type of DNA damage. 4. Base excision repair. Abasic sites, single-strand breaks, and modified bases (including deaminated bases) that are not fixed by direct repair are repaired by base excision repair (BER, Fig 20.17). There are two types of BER called short-patch and long-patch repair. In both types, an endonuclease cuts the backbone at the site of damage, if necessary. Short-patch BER involves synthesis of only one new nucleotide. Modifications that the short-patch BER proteins can’t fix are handled by long-patch BER, in which 2-10 new nucleotides are synthesized. The undamaged strand serves as the template for new DNA synthesis in each type of BER. Single-stranded DNA on the damaged strand that has been displaced by DNA synthesis is cleaved by an endonuclease, and the resulting nick is sealed by DNA ligase. 5. Nucleotide excision repair. Pyrimidine dimers and base modifications that distort the helical structure of DNA are repaired by nucleotide excision repair (NER, Fig 20.20). In this pathway, an endonuclease cuts the backbone of the damaged strand on either side of the site of damage. The distance between the nicks varies from organism to organism (in humans, it is 29 bases). The damaged section is removed, and new DNA is synthesized to replace it, using the undamaged strand as a template. People with a defect in NER suffer from xeroderma pigmentosum. In this condition, exposure to UV light causes pyrimidine dimers to accumulate, greatly increasing the risk of skin lesions and skin cancer. 6. Non-homologous end-joining. This is the more common of two methods used to repair a doublestrand break. First, the ends of the DNA are trimmed with a nuclease, and may be extended with a polymerase (Fig 20.22). Then, the ends are ligated together to form intact double-stranded DNA. Non-homologous end-joining results in changes to the DNA sequence relative to the sequence that existed before damage, but accomplishes the repair without the need for other DNA. 7. Homologous recombination. The second method used to repair a double-strand break. Proteins bind to the site of damage and recruit DNA (from the other copy of the chromosome) that is complementary to the DNA surrounding the break (Fig 20.23). The complementary DNA is used as a template to synthesize new DNA to restore the strand to its original condition. Homologous recombination is most convenient shortly after replication, when another copy of the chromosome is nearby. Topic 16 Review Questions WileyPLUS question from Chapter 3 – 55 WileyPLUS question from Chapter 20 – 53 16-1 Mismatch repair of DNA: a) is carried out solely by the replication DNA polymerase b) requires an undamaged template strand c) preferentially repairs the leading strand to match the lagging strand d) makes replication 100 000 times more accurate e) is defective in people with the condition xeroderma pigmentosum Topic 16 Review Question Answers 16-1 B Topic 17 revised 2024 Molecular Basis of Cancer Readings: Chapter 20.4 By the end of this topic you should be able to: Define the properties of cancer cells. Indicate how we know cancer is a genetic disease. Outline the properties and roles of cancer-causing genes. Describe ways in which biochemistry and molecular biology are leading to new cures for cancer. Molecular Basis of Cancer The study of the causes of cancer has been the focal point of a tremendous amount of research. These efforts have uncovered many key aspects of cell biology. In addition, the complexity of the disease has demanded that biologists delve into more basic studies of cellular processes such as transcription, replication, DNA repair and cell signaling. In this regard basic research has played a principal role in uncovering the molecular basis of cancer. Cancer is a disease characterized by the rapid proliferation of an abnormal cell derived from one of the organisms’ own cells. Cancers can be: benign, they do not spread, or malignant, the cells have the ability to invade surrounding tissues (Fig. 20.25). Depending on the cell origins, cancers are quite different diseases, yet they all originate from the alteration of a cell’s ability to properly control its growth and differentiation. In fact, cancer can often be traced to a single cell that has undergone an inheritable change that causes it to lose growth control. In this regard cancer is a genetic disease, a point highlighted by the fact that many cancer-causing agents damage DNA. These cancer-causing agents include: radiation and chemical mutagens. Since cancer is a genetic disease, susceptibility to certain forms can be inherited. These include some forms of skin cancer, colon cancer, breast cancer and others. Tumor Progression In many cases, pinpointing the origins and pathways resulting in disease is not simple. This is because a single mutation is not sufficient to cause disease. This need to accumulate multiple mutations is one of the reasons why cancer is more prevalent in older individuals. A slow accumulation of specific mutations can eventually result in disease. It is for the same reason that individuals with deficiencies in DNA repair processes are more susceptible to cancer. It has been estimated that 10 or more mutations may be required for disease to occur. Tumor progression involves successive rounds of mutation and selection for specific properties that will enhance or otherwise alter cell growth. At each stage a progenitor cell acquires an additional mutation that gives it the ability to divide at a greater rate or grow in places that it otherwise would not. In many cases one of the acquired mutations is in a gene allowing DNA repair thus further accelerating the process. Properties of Cancerous Cells Cancer cells do not respond to signals that normally control cell division. Cancer cells are not sensitive to normal pathways of cellular differentiation or programmed cell death (apoptosis). Cancer cells are genetically unstable. Malignant cancer cells can escape their normal environment and proliferate at foreign sites (metastasize). Cancer Prevention Certain environmental agents are known to cause cancer. For example, UV radiation is the principal cause of skin cancer. Combined with its direct link to lung and heart disease, it is very clear that the elimination of smoking would be the single most important step in reducing cancer and improving the general health of our society. It is estimated that 30% of all cancers are due to smoking. Diet seems to also play a key role in the risk factor. Some common food compounds contain cancer causing agents. Aflatoxin, a mold toxin, is a potent carcinogen and can be found in contaminated peanuts. Other foods (generally fruits and vegetables) seem to act as cancer inhibitors perhaps by acting as free radical scavengers and thus reducing DNA damage. Viral infections can have direct links to cancer. The human papilloma virus was shown to cause cervical cancer, and human papilloma virus vaccines prevent infection and thus prevent cervical cancer. In another example, individuals having a hepatitis-B infection are more prone to liver cancer. Other agents can promote cancer without directly giving rise to DNA damage. These tumor promoting agents often act by stimulating cell proliferation. Enhanced cell numbers can increase the probability that a mutant cell will appear and evolve into a cancerous cell. Cancer Causing Genes There are two types of cancer-causing genes. Oncogenes – genes whose presence in an aberrant form causes cancer. They can be aberrant in the localization of the protein product, the activity of the product or by having too much of the product expressed. Tumor Suppressors – genes whose absence causes cancer. These genes are often normally involved in DNA repair or the control of cell growth/differentiation or cell death (e.g. p53 Figure 20.26 and 20.27). Loss of function (activity) or lack of expression can give rise to cancer. Functions of the Cancer-Causing Genes As alluded to above most of the cancer coding genes code for components of pathways that regulate how cells respond to signals for cell division and/or differentiation, are involved in DNA repair or are involved in programmed cell death. Some of these genes are common for multiple types of cancer as they are involved in control pathways common to many cell types; others because of a more specific function are found primarily in one type of cancer. A full discussion of the roles of these factors would require details of cellular biochemistry that are beyond the scope of this course; however, the cancer- causing genes can be roughly divided into the following classes: 1. Growth factor receptors that are insensitive to normal signals and/or are constitutively active. 2. Enzymes in cell signaling cascades that are inappropriately active. 3. Molecules that directly control cell division and serve as normal “check point controls”. 4. Molecules that are normally involved in programmed cell death or in determining cellular longevity. 5. Molecules that are involved in repairing DNA damage or have a role in reducing DNA damage in response to normal cellular oxidative stress. 6. Factors that are involved in expression of other genes. Since cancer can be caused simply by the inappropriate expression of certain genes, many cancer-causing genes are transcription factors. Treatment of Cancer Cancer cells are very similar to ‘normal’ cells in a human patient, which makes treating the disease challenging. Cancer treatments must have a way of identifying and targeting only mutated cells when they are administered. Based on the above sections you should be able to think of some key differences between ‘normal’ cells and cancer cells: Cancer cells are rapidly proliferating when compared to most ‘normal’ cells. This cancer cell phenotype has an underlying molecular basis, that is, the cells are expressing proteins in an atypical way: either at an unusual time or in unusual amounts. This is the main difference between ‘normal’ cells and cancer cells that must be targeted for treatment. Early types of cancer treatments that were developed tended to target the rapid growth phenotype of cancer cells. Newer cancer treatments are often designed to target the specific molecular differences that are found in cancer cells but not ‘normal cells’. Cancer is also difficult to treat because each instance of cancer arises from a different combination of mutations. This means that even though multiple patients have cancer in the same tissue (eg breast cancer) the same treatment targeting a specific molecular cause may not work effectively for all those cancers. Topic 17 Review Questions Textbook Chapter 20: Q 61, 63, 65, 67, 69 Topic 18 revised 2024 Bacterial Transcription (TRANSCRIPTION and GENE REGULATION) Readings: Chapter 3.3 and 21 By the end of Topic 18 - Bacterial Transcription, you should be able to: Describe the components needed for transcription in bacterial cells. Describe mechanisms for transcriptional activation and repression in bacteria. Identify the structural characteristics of some bacterial DNA-binding proteins. Using examples, describe how genes are regulated in bacteria STEPS IN CONTROLLING GENE EXPRESSION The Central Dogma in its simplest form is (See chapter 3.3): transcription translation DNA RNA Protein The Central Dogma (Chapter 3.3) defines transcription (copying of DNA into RNA) and translation (process of decoding an RNA to synthesize protein) as two key processes in gene expression. The RNA intermediary allows a step of amplification, since in contrast to DNA where unit copies exist, the RNAs for different messages can be produced at dramatically different levels. In addition, RNA, unlike DNA, is unstable in the cellular environment. It can be rapidly degraded providing a mechanism to turn genes off. As we will see later, RNA also provides additional opportunities for regulation. TRANSCRIPTION Key Definitions 1. The DNA sequence required for transcriptional initiation of a gene is called the promoter. The promoter includes the sequences that 1) recognize RNA polymerase and 2) recognize any gene specific regulatory factors. 2. The DNA sequence required for transcriptional termination is called the terminator. 3. Transcription is catalyzed by the enzyme RNA polymerase. 4. Gene specific regulatory proteins (or transcription factors) are the key molecules in the differential transcription of genes. Bacterial transcription is very similar to eukaryotic transcription in the basic process and the structure/function of the molecules involved. Because it is somewhat simpler, we will initially focus on bacterial transcription. RNA Polymerase (Chapter 21.2 and Figures 21.13, 21.14) The core bacterial RNA polymerase is a large enzyme which contains 5 subunits, 2 alpha subunits and single beta, beta’ and omega subunits. The core enzyme will synthesize RNA from ends or nicks in DNA templates but it cannot recognize promoter sequences. The RNA polymerase holoenzyme will transcribe RNA specifically from promoters. In addition to the above subunits, the holoenzyme contains a sigma subunit. Sigma enhances recognition of promoter structures and decreases binding of RNAP to non-promoter DNA. Steps in the initiation of transcription (Chapter 21.1) 1. RNA polymerase recognizes and binds to the promoter DNA. This is called the closed complex. Promoter recognition is via the sigma factor. Sigma makes base specific contacts with the promoter sequence. 2. Polymerase unwinds the DNA strands at the transcriptional start site. This complex of polymerase and unwound promoter DNA is called the open complex. 3. The first NTP is brought to the template. No primer is required and nucleotide base pair rules apply. 4. Using NTPs (A, G, C,U) as substrates, chain elongation begins and proceeds in a 5'-3' direction. Phosphodiester bonds are formed and pyrophosphate is released. 5. After the incorporation of 5-10 nucleotides, sigma falls off. 6. The transcription bubble moves downstream with the template DNA reannealing behind. 7. Chain elongation continues until a terminator is reached and the polymerase falls off. Structure of Bacterial Promoters (See Figure 21.5) There are recognition sites on bacterial DNA that signal the recruitment of RNA polymerase holoenzyme. Relative to the start site of transcription that occurs at +1, these are found centered at approximately -35 and -10. The consensus sequences for the -10 and - 35 sequences are: -10 consensus: TATATT; -35 consensus: TTGACA (Note: When written like this, by convention it is the coding strand in the 5' to 3' direction.) The -10 and -35 sequences when properly spaced are sufficient to recruit the holoenzyme to the promoter. As outlined below other transcription factors play critical roles in regulation. Note that the - 10 and -35 sequences as written above are consensus sequences. Not all promoters have exactly the same sequences. How is Transcription Differentially Regulated? There are two key reasons why some bacterial promoters are transcribed more than others. 1. Strength of the basic promoter elements. Not all -10 and -35 sequences are equally active; they bind RNA polymerase holoenzyme with different affinities. 2. Gene specific regulatory proteins bind specific DNA sequences that are found in one or more promoters and serve to activate or repress transcription. Examples: Lac repressor as the name suggests is an example of a repressor. The Catabolite Activator Protein (CAP) regulates transcription of the Lac operon and other genes involved in carbon metabolism. CAP is an example of an activator protein. Positive Control and the Lac Operon (Figure 21.11) Some gene specific regulatory proteins can enhance transcription. Increased transcription is generally accomplished by increasing the rate of recruitment of RNA polymerase or the activity of RNA polymerase. Both enhanced recruitment and/or enhanced activity of polymerase are usually achieved by protein-protein interactions between the activator protein and the enzyme. The Lac operon contains the genes required for the metabolism of lactose in E.coli. E. coli’s expression of Lac is controlled by two signals. The first is negative regulator. In the absence of lactose the Lac repressor binds the promoter, inhibiting the action of RNA polymerase. The Lac repressor binds lactose when lactose is present in the growth media. Lactose binding to the Lac repressor, results in a conformational change in the protein so that it no longer binds the operator sequence in the promoter allowing transcription. The second mechanism for regulating the Lac operon involves control by glucose. All cells preferentially use glucose as their carbon source. In the presence of glucose, the genes required for the metabolism of other carbon sources are shut off. This is known as catabolite repression. In E.coli, catabolite repression occurs through the induction of genes in the absence of glucose. When the concentration of glucose in the media is low, the intracellular concentration of cAMP rises. At high concentrations of cellular cAMP the E. coli Catabolite Activator Protein (CAP) binds cAMP. A conformational change occurs in CAP upon binding cAMP that allows the protein dimer to bind DNA in a site-specific fashion. At the Lac operon, the CAP binding site is upstream (5') of the -35 sequence. Through direct protein-protein interactions with polymerase, CAP stimulates the recruitment of RNA polymerase to the promoter. The Trp Operon The genes required for tryptophan biosynthesis in E. coli are transcribed as a single unit from a common promoter. From the single mRNA transcript 5 proteins are translated. This type of gene structure is called an operon and is common in bacteria but not found in eukaryotes. RNA polymerase is recruited to the Trp promoter by the –10 and –35 sequences allowing transcription when tryptophan concentrations are low in the media. When tryptophan concentration is high, the Trp repressor binds tryptophan. A conformational change occurs in the repressor's structure such that it can bind the operator site within the promoter. Binding of the repressor sterically inhibits access of RNA polymerase to the promoter. The Trp Repressor Trp repressor contains 107 amino acid residues. It has a helix turn helix motif that is required for DNA binding. Amino acid side chains on helix 5 make base specific contacts with the major groove of its operator sequence. Trp repressor binds DNA as a dimer. Topic 18 Review Questions Textbook Chapter 21: Q 17, 21, 25, 27, 29, 47, 49, 51 Topic 19 revised 2024 Eukaryotic Transcription (TRANSCRIPTION and GENE REGULATION) Readings: Chapter 3.3 and 21 By the end of Topic 19 - Eukaryotic Transcription, you should be able to: Compare and contrast bacterial and eukaryotic transcription. Outline why transcription differs in bacteria and eukaryotes. Describe the fundamental features of chromatin and how it regulates transcription. Outline mechanisms that a eukaryotic cell could use to regulate gene expression. Provide examples of where protein-protein interactions regulate transcription. Overview: Humans have approximately 21,000 protein encoding genes. Of these ~5,000-10,000 are expressed in a tissue at any one time. The ability to induce the proper genes at the proper times in the proper amount is critical in the growth and development of all organisms and for their ability to respond to their environment. The importance of gene expression becomes evident when we realize defects in gene regulation often result in disease. Some of the most notable examples come from cancers where many tumor suppressors (proteins required for the prevention of cancer) and oncogenes (a gene that makes a cell cancerous) are transcription factors. The importance of gene regulation also lends itself to potential avenues for cures. For example, inhibiting the gene expression of a pathogen (e.g. viruses) provides a clear and specific pathway for a cure. Points of Gene Regulation in a EUKARYOTIC Cell In the pathway from gene to protein there are many potential points of regulation. Several of these are common between bacteria and eukaryotes; others are specific for eukaryotes. (* steps are NOT found for bacteria). These include: 1. Rate of transcription. Transcription is divided into stages: initiation, elongation, and termination. Since initiation is the first step in the transcription process it is the principal site of regulation. 2. Rate of RNA processing - the steps for converting a newly transcribed RNA into a molecule that can be translated. For eukaryotes these RNA processing steps include: 5' 7-methylguanosine capping*, 3' polyadenylation(*), RNA splicing of introns*. (see next topic – RNA). 3. Rate of transport of mRNA out of the nucleus*. 4. Rate of mRNA degradation. 5. Rate of translational initiation, elongation and termination 6. Protein processing (sometimes). Examples include: proteolytic cleavage to activate a protein (e.g. insulin) phosphorylation (addition of a covalent phosphate group from ATP), glycosylation (addition of one or more sugar moieties), acetylation, (addition of an acetyl group). 7. Protein degradation (loss of the protein or turnover). The amount of protein present is determined by the balance of its rate of synthesis and its rate of loss. Summary points for Regulated Transcription Regulation of transcription often occurs through the action of one or more site-specific DNA binding proteins. The mechanisms by which they regulate transcription are similar. 1. The DNA binding proteins recognize internal or environmental signals, similar to allolactose for the lacI repressor in prokaryotes. 2. The regulatory proteins stimulate or repress the promoter binding or activity of RNA polymerase. 3. Activation is through protein-protein interactions. Similar to CAP interacting with RNA polymerase in prokaryotes. Eukaryotes have many different gene-specific transcription factors. Differences in Transcriptional Regulation between Bacteria and Eukaryotes Although generally very similar, there are differences in the process of transcription in bacteria and eukaryotes. 1. Eukaryotes have three different RNA polymerase (bacteria have one). Eukaryotes do not use Rho for termination. Pol I for ribosomal RNA rRNA Pol II for messenger RNA mRNA Pol III for tRNA and snRNA 2. Eukaryotes do not have operons. Genes are transcribed as single units - monocistronic. 3. Promoter Recognition is through a distinct set of proteins. The role of these transcription factors is similar to bacterial Sigma factor but more complex. One of the factors, the TATA-binding protein (Figure 21.7), is a component of the transcription factor TFIID (Figure 21.8). It binds a basal promoter element, the TATA-box, found ~20 base pairs upstream of the transcriptional start site +1. Refer to Figure 21.6. The increased complexity of eukaryotic initiation greatly increases the range of possible levels of transcription. 4. Regulatory elements are often located many thousands of base pairs distant from +1. They may be brought into proximity of the promoter by DNA looping. (Figure Box 21.A) 5. Combinational Control: groups of proteins work together to determine the expression of a single gene. 6. Nucleosomes and higher order chromatin structure (Figure 21.4) have a profound effect (both positively and negatively) on determining the access of transcription factors to DNA. Much of eukaryotic transcriptional control involves the regulation of chromatin structure, which occurs in part through the post-translational modification of histones. Figure 21.9 summarizes many of these points: TBP and promoter recognition, DNA looping, co- ordinated action of multiple factors at eukaryotic promoters Transcription elongation and termination In bacteria during transcription RNA polymerases undergo a conformational change and the Sigma factor dissociates from the protein complex. In eukaryotes, promoter clearance requires the phosphorylation of one of the polymerase subunits to dissociate from the mediator complex (Figures 21.16, 21.17). The RNA polymerase then processively synthesizes the new RNA strand at a rate of 500-5000 nucleotides per minute. In eukaryotes, the exact mechanism of transcription termination is unclear. It is thought that after the polyadenylation signal is encountered by RNAP that the polymerase slows down and eventually dissociates from the DNA strand. In bacteria, two different pathways for transcription termination have been described: Rho-dependent and Rho-independent termination (Figure 21.18). Rho is a protein that binds to the growing RNA strand and helps “dislodge” the RNA strand and RNA polymerase from the DNA strand. Rho independent termination is based on an RNA hairpin formation, that destabilizes the transcription bubble. Topic 19 Review Questions Textbook Chapter 21: Q 3, 19, 53 Topic 20 revised 2024 RNA PROCESSING (IN EUKARYOTES) Readings: Chapter 21.3 By the end of this topic, you should be able to: Describe the steps that occur in the maturation of an RNA to form an mRNA and allow it to be translated. Outline why these steps occur. Describe alternative splicing and outline models for how it might occur. Describe how problems in splicing can result in disease. The primary RNA transcript in eukaryotes must be processed to become a translatable mRNA. In bacteria, RNA does not need to be processed for translation. 1. Capping of the 5' end of the message with 7-methylguanosine (Figure 21.19). Capping is required for RNA export from the nucleus, aids in stabilizing the mRNA from being degraded and acts as a translational signal. 2. Polyadenylation: the addition of a long A-tract to the 3' end of the RNA (Figure 21.20). The RNA is first cleaved ~ 30 bases following an AAUAAA sequence which is 3’ to the coding region. A string of A residues is then added. 5'_________________________AAUAAA (30 bases)_____________3' Cutting ^ cut 5'________________________________________AAAAAAAA(300) 3' PolyA addition Polyadenylation provides additional stability to the mRNA by reducing effects of 3' exonucleases, has a role in the nuclear export of the mRNA and in its translation. 3. Splicing: In eukaryotes protein coding sequences of genes can be interrupted by one or more noncoding sequences called introns (Figure 21.23). The coding sequences are called exons (expressed). The origin of introns is unclear. It is not known if bacteria have lost their introns or if eukaryotes gained introns. Their utility, though, is clear. Introns provide the opportunity for differential splicing. That is, the mRNA can be put together in different ways generating functionally related but distinct gene products. Differential splicing is seen particularly in cases where tissue specific forms of a protein are created (Figure 21.25). How are introns removed from the primary RNA? Specific sequences within and flanking the intron target its removal. (See Figure 21.2) These are the 5' junction, branch point and 3' junction. Loss of these sites results in defects in splicing. In fact, some of the thalassemia’s (genetic disorders resulting in anemia) are the result of the loss (or gain) of splice junctions in the globin genes. The Spliceosome -- snRNPs (U1, 2, 4, 5, 6) Introns are removed as the result of the catalytic activity found in small nuclear ribonucleoprotein particles (snRNPs=snurps). snRNPs are complexes of RNA (small nuclear RNA) and protein (Fig. 21.22). The RNA component has a recognition function, acting through base pairing with sequences on the precursor mRNA. snRNPs are also critical in arranging the ends into position. The assembly of snRNPs that catalyze splicing is called a spliceosome. Splicing occurs by 2 transesterification reactions. (Fig. 21.24) Cleavage at the exon 1-intron boundary results from the attack of the 5' splice junction by the 2’-OH of the A nucleotide in the branch point. This generates a lariat structure as the result of the unique 2'-phosphodiester bond. In the second reaction the 3'-OH of exon 1 reacts with the 3' splice junction cutting out the intron and joining the exons. The lariat is displaced and soon degraded. mRNA turnover and RNA interference The concentration of RNA in the cell depends on both RNA synthesis and RNA degradation rates. RNA degradation can start from either the 5’ end with decapping or the 3’ end with deadenylation of the polyA tail (Figure 21.26). These are both sequence independent processes. Sequence specific RNA degradation on the other hand is termed RNA interference. RNA interference requires the binding of a second complementary RNA strand (termed small interfering RNA or microRNA) to the mRNA. With the help of several proteins the mRNA is then cleaved and degraded (Figure 21.27). The cell is able to use these RNA degradation mechanisms to turn off production of gene products. RNA modification and secondary structure rRNA and tRNAs are heavily processed before they are functional. Many bases are modified, for example by methylation, hydroxylation or deamination (Figure 21.29). Without these modifications, tRNAs and rRNAs are often not functional. While RNAs are single stranded, they fold back into a complex secondary structure, based on base pairing. Examples are tRNAs and the ribosome, which contain large sections of double stranded RNA (Figures 21.30, 21.31). RNA Export In eukaryotic cells RNA is synthesized in the nucleus and translated in the cytoplasm. Mature mRNAs are transported from the nucleus to cytoplasm through nuclear pores, a highly organized set of proteins in the nuclear membrane. Transport through nuclear pores is a regulated process that requires the recognition of proteins bound to the poly A-tail, the 5’ cap and internally. RNA processing and disease Malfunction in any part of RNA processing will result in an mRNA that is improperly formed. The lack of a cap or tail will mean that the mRNA is degraded instead of being exported from the nucleus. This is part of the quality control process for mRNA. Improper splicing of mRNA is not as easily detected in the nucleus. The splice mutants can potentially cause disease by being translated into mutant proteins. If mutation introduces a new splice site in an unusual location, like the middle of an exon or intron, the splicing machinery in the nucleus will potentially recognize the mutant site. Use of these mutant sites will result in a mature mRNA transcript that is either missing exon sequence or has included intron sequence. Once this unusual transcript is exported from the nucleus it will be translated into a mutant protein. Topic 20 Review Questions Textbook Chapter 21: Q 57, 59, 61, 65, 69, 77, 83 Topic 21 revised 2024 TRANSLATION Readings: Chapter 22. Protein synthesis By the end of this topic, you should be able to: Describe the properties of the genetic code and provide a rationale for each of these properties. Describe the structure/function relationships of tRNAs. Describe the role of aminoacyl-tRNA synthetases in translation. Describe how the structure of the ribosome relates to its function. Compare translational initiation and termination in bacteria and eukaryotes, and provide a rationale for any differences. The decoding of the mRNA to produce a protein occurs on the ribosome. This process of translation is complex, requiring many RNA and protein factors. Genetic Code (See. Table 22.1) The mRNA "spells out" the amino acid code in 3 letter "words" called codons. Each protein has a specific reading frame that is determined by where the decoding process begins (Fig. 22.1). The code is: Universal (found in all organisms), but of course there are exceptions! Nonoverlapping Comma less, no gaps (gaps were removed by splicing) 61 codons for 20 amino acids; therefore redundant Redundancy occurs at the 3rd position of the codon (wobble) 3 stop codons, 1 start codon Note also the following features of the code. Why might they have evolved? 1. More common amino acids (found often in proteins) have more codons. 2. Related amino acids have similar codons. For example: Gln Glu Asp CAA/G GAA/G GAC/U tRNA (Figure 22.2) Translation requires tRNA (transfer RNA) molecules, which act as the vehicle that bring amino acids to the growing peptide chain. They function in a codon specific fashion relying on base pairing rules. tRNAs are ~80 nucleotides in length and have a cloverleaf secondary structure. The anticodon of the tRNA hybridizes with the codon. The correct amino acid is covalently linked to the 3' end of the tRNA. Wobble results from the fact that accurate base pairing for some tRNAs only requires matching at the first 2 positions of the codon. Amino Acid Activation — Aminoacyl tRNA synthetases (Figure 22.3, 22.4 Table 22.2) Using ATP as the energy source, the carboxyl group of a specific amino acid is coupled to the 3' end of a specific tRNA in a high-energy bond. Amino acid activation by aminoacyl t-RNA synthetases is important for: 1. Providing an energy source for later peptide bond formation. 2. Providing specificity by matching the correct amino acid to the specific tRNA. 3. Some synthetases have proofreading activity, ensuring that only the correct amino acid is ligated to the tRNA Ribosomes (Chapter 22.2, Figures 22.5, 22.6, 22.7, Table 22.4) Protein synthesis occurs on large multimeric protein RNA complexes called ribosomes. Ribosomes have two subunits, large and small. The small and large subunits are composed of both RNA (rRNA) and protein. In eukaryotes the small subunit contains 33 proteins and 1 RNA, whereas the large subunit contains 49 proteins and 3 RNAs. The small subunit matches tRNAs to the codons. The large subunit catalyzes the formation of peptide bonds. There are 3 sites for tRNAs on the ribosome: A site, aminoacyl tRNA site; P site, peptidyl tRNA; E site, exit site. Two of these are occupied at any one time. The mRNA is bound in proximity to the A and P sites. mRNA is decoded in a 5' to 3' direction, one codon at a time. Initiation of Translation (Figures 22.11, 22.12) Initiation is the key step in deciding whether an mRNA is to be translated. Translation begins at an AUG codon and with a special initiator tRNA that carries methionine (Met). In eukaryotes the initiator tRNA is loaded onto the small subunit with initiation factors (proteins). The loaded small subunit recognizes the 5' Cap (not to be confused with the bacterial transcription factor for catabolite repression) of the mRNA and moves in a 5' to 3' direction until it finds an AUG codon. The initiation factors then dissociate, allowing the large subunit to bind the small subunit. In this process the initiator tRNA is positioned at the P site. In bacteria multiple open reading frames (ORFs) are often found in a single message. The result is that more than one protein must be translated from the RNA. The ribosome can initiate translation at internal AUGs. Specificity comes from the fact that 5' of each functional AUG is a ribosome binding site which efficiently recruitments the ribosome. Elongation (Figure 22.15) Assume that the ribosome is already in the process of translating an mRNA. A tRNA in the P site is linked to the growing polypeptide. An aminoacyl tRNA accesses the A site following base pair rules to the mRNA that is being translated. The energy of the aminoacyl-tRNA bond in the P site is used to form a peptide bond between the amino group of the amino acid in the A site with the carboxyl group of the amino acid residue in the P site. The reaction is catalyzed by a peptidyl transferase activity in the ribosome. The reaction is coupled to a conformational change in the ribosome that effectively results in a shift of the large subunit forward relative to the small subunit. In turn this results in a shift of the tRNAs to the E and P sites from the P and A sites respectively. In the final step of the cycle, the small subunit moves downstream precisely 1 codon (3 bases), placing a new codon in the A site and resulting in the release of the tRNA from the E-site. The next incoming aminoacyl-tRNA is delivered to the ribosome by the Elongation Factor EF-Tu (in bacteria, for the eukaryotic counterparts refer to table 22.5). EF-Tu will only release the tRNA to the ribosome if the anticodon of tRNA matches the codon of the mRNA (Figures 22.15, 22.16, 22.17, 22.19). Termination of Translation (Fig. 22.20, 22.21) Stop codons signal the end of translation. Release factors associate with the ribosome when any one of the three stop codons reaches the A site of the ribosome. These factors cause the peptidyl transferase activity to catalyze the addition of a water molecule to the end of the chain. Antibiotics and Translation (see Box 22.B) The majority of known antibiotics used to treat bacterial infections block translation. The complexity and importance of translation makes it a prime target for disruption. The one notable exception to this is the penicillin/ampicillin family of antibiotics. Posttranslational events (Chapter 22.4) After translation, many proteins fold into their correct secondary tertiary structure on their own. Some proteins require the help of chaperones, which help fold proteins. In some cases, these chaperones are associated to the ribosome and fold the protein “on the go” (Figure 22.24). In other cases, the improperly folded protein is unfolded and refolded by a chaperone complex (Figures 22.25 and 22.26), After translation and protein folding are completed, the proteins have to brought to their cellular destination. In eukaryotes, signal peptides (short sections of the protein) are recognized by signal recognition particles (Figure 22.27), which direct the proteins to their cellular destination, such as the cellular membrane, the mitochondria, the endoplasmic reticulum or the nucleus (e.g. histones). In addition to protein folding and location, many proteins (most proteins, in fact), undergo chemical modifications. These modifications can turn a protein with enzymatic activity from “ON” to “OFF” or vice versa. Common modifications are phosphorylation, methylation and glycosylation. Other modifications are the addition of small proteins to the existing proteins, such as the addition of the small proteins SUMO or ubiquitin. Topic 21 Review Questions Textbook Chapter 22: Q 1, 5, 9, 21, 41, 45, 47, 57, 69, 73, 83, 91 Topic 22 revised 2024 Recombinant DNA technology Readings: Sections 20.6, 4.6, 12 (Box 12.A) By the end of this topic you should be able to: Describe the following tools of recombinant DNA technology and give examples of how each can be applied: synthetic oligonucleotides, gel electrophoresis, nucleic acid hybridization, Southern and Northern blotting, cDNA, restriction enzymes, DNA ligase, plasmids, polymerase chain reaction, DNA sequencing, transgenic organisms, CRISPR/Cas9, RNA profiling, DNA fingerprinting. Outline general strategies for cloning and expressing a prokaryotic or eukaryotic gene in E. coli, and for creating transgenic organisms. Recombinant DNA technology refers to the techniques by which DNA fragments from different sources are joined together to make new DNA molecules. Historical Perspective Humans have manipulated genes for millennia through selective breeding of plants and domesticated animals. While this process is effective, it is relatively slow and is limited to breeding species. As our understanding of the structure and functions of nucleic acids grew, we learned how the genetic composition of an organism influences its phenotype. However, different DNA molecules are similar to each other with respect to their physical properties, because their structures are relatively uniform and they are composed of different combinations of only four monomers. Therefore, traditional biochemical approaches were not well suited for separating, analyzing, and manipulating individual genes. The advent of recombinant DNA technologies starting in the early 1970s revolutionized our ability to analyze and modify DNA, such that today manipulation of genes in the lab is relatively easy. The impact of these advances has been enormous! Recombinant DNA technology has allowed the rapid expansion of knowledge in all areas of biochemistry and cell biology. We now know the complete genome sequences of thousands of species, giving us tremendous insight into the biochemical processes that support life through computational and experimental approaches. We can analyze similarities and differences among the genomes of various organisms, introduce genes from one species into another, design specific changes in genes (and the resulting proteins) using site-directed mutagenesis, overexpress proteins that would otherwise be scarce, and create genetically modified organisms for study and for practical purposes. Recombinant DNA technology is the driving force behind modern biotechnology and synthetic biology efforts. The benefits to humanity are evident in diverse areas, including medicine (drug discovery, vaccines, genetic screening, gene therapy), agriculture (vitamin-enhanced crops), the environment (microbes for bioremediation), and forensics (DNA fingerprinting). Modern recombinant DNA technology allows humans to design and create organisms with novel properties; it is an exciting age in which to be a biochemist or molecular biologist! The tools of recombinant DNA technology Many processes in modern biotechnology require production of large amounts of specific proteins or DNA molecules. Manipulating DNA to achieve desired outcomes requires specialized techniques that have been developed over the last several decades. In class we will work through an example of how these techniques can be applied to a specific gene of interest. These notes explain the basics of the methods used without outlining all aspects of the case study. Note that often we talk about particular DNA or RNA molecules as if they were single molecules, but keep in mind that in recombinant DNA technology applications, millions or billions of copies of those molecules are present in solution. Synthetic oligonucleotides Accurate, automated synthesis of single-stranded oligonucleotides (or “oligos” for short) of up to about 200 base pairs is routine. Longer sequences can be obtained by joining strands together. Single-stranded oligos are useful as hybridization probes and as primers for DNA sequencing, PCR and mutagenesis. Gel electrophoresis of DNA Different DNA fragments are most easily separated according to size by gel electrophoresis using the polysaccharide agarose (or sometimes polyacrylamide) as the separating matrix (Fig 20.34). Because the sugar-phosphate backbone is negatively charged, exposing DNA to an electric field causes the molecules to migrate toward the positive pole. The porous polymer matrix hinders movement of DNA toward the pole, with small DNA molecules being able to move more easily through the pores and thus migrating faster. Because DNA has a uniform mass to charge ratio, separation is based solely on the differential sieving effect resulting from the size of the DNA. Gel electrophoresis is most commonly used as an analytical tool to learn about the composition of a sample, but it can also be used as a preparative method to obtain pure samples of desired DNA fragments. Two routine methods are used to non-specifically visualize DNA on a gel. DNA itself has no colour; however, fluorescent dyes such as ethidium bromide (EtBr) can intercalate between the bases and indicate the location of DNA when exposed to ultraviolet light. The degree of fluorescence is proportional to the amount of bound dye, which is proportional to the mass of the DNA fragment. For low amounts of DNA, staining with dyes may not be sensitive enough, and radioactively labeling the DNA may be necessary. One of the most common radiolabeling methods is to transfer 32P to the 5' ends of the DNA using the enzyme polynucleotide kinase and ATP labelled with 32P at its outer or gamma position. After the enzyme transfers the radiolabel to the ends of the DNA, the DNA can be detected after exposure of the gel to X-ray film (autoradiography). Hybridization of nucleic acids Sometimes you want to look for the presence of a specific nucleic acid sequence after separation by electrophoresis. To do this, after running the gel you can transfer the nucleic acids to a nitrocellulose or nylon membrane (this process is called “blotting”). You then add a radiolabeled oligonucleotide (or “probe”) complementary to the sequence you want to detect. Under appropriate conditions, the probe will form base pairs with (or “hybridize to” or “anneal to”) the target sequence. (Recall from Topic 14 that various intrinsic and extrinsic factors influence how readily two nucleic acid strands will hybridize.) After washing away excess probe, you can use autoradiography to determine whether any of the probe bound to the blot. Blots have been named depending upon the nature of the molecule on the membrane and the probing molecule. The first blot developed involved detection of DNA using a DNA probe, and was named a Southern blot for its inventor Ed Southern. When the technique was extended to detecting RNA using a DNA probe, in a play on words the blot was called a Northern blot. cDNA cDNA is a complementary DNA copy of an mRNA molecule. cDNA is created when the enzyme reverse transcriptase (found in RNA viruses) is used to synthesize DNA using a primer complementary to the polyA tail that is common to nearly all eukaryotic mRNAs. cDNAs generated from mRNA templates are useful for expressing eukaryotic proteins in prokaryotic cells, because (unlike genomic DNA) the cDNA lacks introns. DNA polymerase can be used to make double- stranded DNA from the initial single-stranded cDNA; the primers for synthesis can be provided by partially digesting the original RNA molecules, or if the 3' end of the cDNA folds back on itself in a small region of self-complementarity. Polymerase chain reaction (PCR) PCR is a technique to “amplify” (produce large amounts of) a specific DNA fragment. It is possible to successfully amplify a fragment even if only one copy of the fragment is present in the DNA source. In PCR, a DNA polymerase repeatedly extends two oligonucleotide primers that are complementary to the regions flanking the target sequence, resulting in an exponential increase in the number of copies present (Fig 20.37). Note that, to design the primers, you must have information about the sequence to which the primers will anneal. Steps in PCR (Figure 20.37): 1. Denature the DNA template by heating to ~95 °C. 2. Anneal the primers to the template by reducing the temperature to 50-60 °C. The annealing temperature is usually about 5 °C below the Tm of the primer with the lower Tm. 3. Extend the primers by adjusting the temperature to the optimal temperature of your DNA polymerase (often 72°C). Note that you must include the 4 dNTPs as substrates for DNA synthesis! 4. Repeat steps 1 through 3 about 30 times. Technical advances have made PCR simple to perform. Initially, DNA polymerase had to be added to the reaction each cycle because the high temperature required to denature the DNA would also denature the enzyme. The discovery of thermostable polymerases that remain active at elevated temperatures means enzyme must now be added only at the start of the reaction. The first thermostable polymerase known (Taq polymerase) was isolated from Thermus aquaticus, which inhabits hot springs. In addition, the engineering of automated thermocycling machines has simplified temperature cycling. In principle, the number of copies of the target sequence should increase by a factor of two with each round of PCR, such that a single copy of the template would result in about 106 copies after 20 rounds and about 109 copies after 30 rounds. In practice, this degree of amplification is not typically achieved, for reasons including competition between primers and templates during the annealing step, and loss of activity of even thermostable DNA polymerases over repeated heating steps. PCR can be used to intentionally introduce specific changes in a DNA sequence (known as site- directed mutagenesis). For the primers to be extended during PCR, only the 3' end must anneal to the template. The 5' end of the primer may contain one or more mismatches, or may even bear little complementarity to the template. As DNA amplification proceeds, the number of PCR products containing the sequence on the primer increases exponentially, while the number of products containing the original sequence increases linearly. The final PCR product contains the altered sequence present in the primer in far more abundance than the original sequence present in the template. The amount of a specific template DNA in a sample can be estimated using real-time PCR (also called quantitative PCR or qPCR). In this method, amplification of the template causes increased fluorescence of chemical probes. The fluorescence is measured in real time and is proportional to the amount of template (Figure 20.38). This method can be applied to cDNA samples to estimate the amount of mRNA expressed under certain conditions (as an alternative to a Northern blot). It can also be used to estimate the amount of viral or bacterial DNA in a sample. Some applications of PCR 1. Amplifying DNA from samples in which DNA is scarce (e.g., archaeological specimens, crime scene samples) 2. Obtaining a genomic or cDNA clone (a clone is an identical copy of something, be that a DNA sequence or an organism that is identical to its parent) 3. Identifying virus or bacterial DNA in medical samples 4. Quantifying cDNA or pathogenic DNA levels 5. Diagnosis of genetic diseases 6. DNA fingerprinting (see section below) Restriction enzymes (Figure 20.34, Table 20.1) In the early 1960's it was noted that if you transfer bacteriophage (bacterial viruses) from one strain to another, the infectivity or titre could vary. For example, a phage isolated from E. coli strain C had a thousand-fold lower titre when infected into E. coli strain K instead. The phage is said to be “restricted” by the second host. Restriction is due to degradation of phage DNA. The enzymes responsible for this degradation, known as restriction endonucleases or restriction enzymes, were first identified in the early 1970s in one of the most important discoveries in the development of recombinant DNA technology. Type II restriction enzymes are DNA-binding proteins that make double-stranded breaks in DNA at specific sequences, allowing precise cutting of DNA fragments. The availability of restriction enzymes represents a great advance over non-specific physical approaches to breaking DNA that were used previously. About 3000 restriction enzymes recognizing over 230 unique sites have been identified. The naming of restriction enzymes follows a strict pattern. For example the third restriction enzyme isolated from Haemophilus influenza strain D is known as HindIII. In general, type II restriction enzymes recognize and cut palindromic sequences most commonly of four, six or eight base pairs (Table 20.1). (Palindromic sequences read the same on both strands; see examples below. Note that this definition of palindromic is different than how the term is used in language). For example, the restriction enzyme EcoRI recognizes the 6-bp sequence on the left and cuts between the underlined bases to generate the double-strand break shown on the right. 5' GAATTC 3' 5' G AATTC 3' 3' CTTAAG 5' 3' CTTAA G 5' Note that the 5' end of each strand extends beyond the 3' end; these are called 5' overhangs. Because the single-stranded regions are available for base-pairing with complementary DNA strands, they are referred to as “sticky ends”. As another example, the enzyme SacI cuts the sequence GAGCTC (note that it is assumed that the DNA is written 5' to 3', and that the DNA is double-stranded) between the underlined bases to give: 5' GAGCT C 3' 3' C TCGAG 5' These ends are also sticky, but with 3' overhangs. They are not complementary to the EcoRI ends but they are complementary to other SacI ends. Some enzymes do not leave sticky ends, but rather blunt ends with no overhangs. For example, SmaI cuts the sequence CCCGGG to give: 5' CCC GGG 3' 3' GGG CCC 5' (Note that you are not responsible for memorizing specific restriction enzymes or their recognition sequences for the exam.) DNA Ligase Whereas restriction enzymes act as scissors, the enzyme DNA ligase (usually isolated from the T4 bacteriophage) acts as glue (Figure 20.35). If a phosphate is present on the 5' ends, it will join (or ligate) compatible sticky ends, analogously to its role in DNA replication and repair. It will also ligate blunt ends, though with lower efficiency. The enzyme from T4 bacteriophage hydrolyzes ATP during ligation as an energy source. The activity of DNA ligase is critical in the construction of recombinant molecules. Plasmids To introduce a gene of interest into a microorganism such as E. coli, you need a vector, or a carrier molecule that can be introduced, selected for, and propagated in the host organism. Most commonly, the vector is a plasmid, which is generally a relatively small (~3000 bp) circular double-stranded DNA that replicates independently of the host chromosome (Fig 20.36). The copy number per cell can vary, depending on the plasmid, but it can be in the range of 100 copies of the plasmid per cell. The key features of a plasmid that make it useful for cloning are: 1. One or more restriction sites at which DNA of interest can be inserted. Commonly plasmids have been engineered to contain a multiple cloning site or polylinker region in which many restriction enzyme recognition sites are clustered together. 2. A selectable marker to allow identification of bacteria that contain the plasmid. The most common selectable markers used in bacteria are genes encoding resistance to an antibiotic such as ampicillin, tetracycline, kanamycin, or chloramphenicol. When plated on growth medium containing an appropriate antibiotic, cells that don’t contain the plasmid will die, but cells containing the plasmid will be protected by expression of antibiotic resistance. 3. An origin of replication to allow replication of the plasmid independently of the host chromosome. Inserting a DNA fragment of interest into a plasmid provides a simple way to generate many copies of the fragment, because the bacteria can easily be grown in large amounts. As the bacteria proliferate, they will make many copies of plasmids they contain. The plasmid also provides DNA of known sequence that flanks the DNA of interest, which is useful for polymerase chain reactions and DNA sequencing. Gene cloning Once you have an amplified copy of a gene of interest (e.g., by PCR amplification of a cDNA), you can clone it into a plasmid vector. The steps are: 1. Digest the gene (the “insert”) and the plasmid (the vector) with the same restriction enzymes (or enzymes giving the same sticky ends). Note that cut sites can be introduced into the primers used for PCR to enable digestion of the gene. 2. Incubate the cut insert and plasmid together in the presence of T4 DNA ligase and ATP. 3. Transform the ligated DNA into E. coli. Transformation is the name of the process in which a plasmid is introduced into a bacterial cell. 4. Plate the transformed bacteria onto agar plates containing the appropriate antibiotic. 5. Isolate plasmid DNA from individual transformants. 6. Test for the presence of the desired insert by analyzing the isolated plasmids by restriction mapping, blotting, or DNA sequencing. Restriction Maps One quick way to check whether an insert of the correct size has been incorporated into a plasmid is to determine the sizes of the fragments obtained after digesting the plasmid with restriction enzymes. Fragments of the digested plasmid are separated by gel electrophoresis, and the pattern of fragments observed is analyzed for consistency with the expected pattern if the insert were correctly incorporated. Restriction maps are helpful as a quick screen of plasmids, but they can’t tell you for certain whether the insert was incorporated exactly as intended, or whether DNA polymerase made a mistake during PCR amplification. Therefore, after you have identified plasmids that appear correct, you normally verify the insert by DNA sequencing. DNA Sequencing Definitive characterization of a DNA molecule requires determination of its sequence. Up until the mid-1970s it was extremely difficult to sequence DNA. Then Frederick Sanger developed dideoxy or chain termination sequencing, an elegant, easily automated method that was rapidly adopted and was eventually used to produce the first human genome sequence. In the procedure, four parallel reactions are set up in which DNA polymerase extends an oligonucleotide primer using the DNA of interest as the complementary template. Each reaction contains, mixed with the regular dNTPs required for DNA synthesis, one chain-terminating 2',3'- dideoxynucleotide triphosphate: either ddATP, ddGTP, ddCTP, or ddTTP. The ddNTPs are each tagged with a different fluorescent label. In each reaction, the ddNTP is present at a low concentration relative to its counterpart dNTP, such that incorporation of the ddNTP will occur (randomly) about once every 500 bases. Because the ddNTP lacks a 3' hydroxyl, after it is incorporated the new strand cannot be extended further. Note that each sequencing reaction contains many billions of template strands, some of which will be copied only a short distance before the ddNTP is incorporated, and some of which will be copied for longer. The result in each sequencing reaction is a mixture of DNA fragments which are of different lengths, but which all terminate at A, G, C, or T (depending on which ddNTP was present in the reaction). The fragments in the four reactions are separated by denaturing polyacrylamide gel electrophoresis at a resolution that allows differentiation of fragments differing in length by a single base. The fluorescent tag detected on fragments of each length indicates which base was present on the template strand in that position. Dideoxy sequencing can produce sequences (or reads) of over 1000 bases per run and is very effective for analyzing a small number of individual samples. However, it requires prior amplification of the target sequence (either using PCR or bacterial cloning), and time-consuming separation of fragments by gel electrophoresis, which limits the number of samples that can be analyzed in parallel. In the mid-2000s, “second-generation” sequencing methods were developed that are faster and less expensive. For large-scale genomic applications, Illumina sequencing (Fig 20.39, 20.40) is now dominant. Illumina sequencing is generally applied to situations in which a heterogeneous mixture of DNA fragments is to be sequenced (for example, genomic DNA that has been randomly sheared to produce millions of DNA fragments). DNA of known sequence is added to the ends of the fragments. The fragments are bound to a surface using these known sequences, and each fragment is amplified in place by a PCR-like process to produce bound clusters of DNA of identical sequence. Primers are then annealed to the fragments, and the following cycle of steps is repeated (Fig 20.39, 20.40): 1. DNA polymerase and fluorescently tagged dNTPS are added. DNA polymerase adds the appropriate dNTP to the end of the primer. The 3' OH of each dNTP is blocked such that polymerase adds only one nucleotide. 2. The free dNTPs are washed away. 3. The fluorescence of the added nucleotide is detected. 4. The fluorescent tag and the block on the 3' OH are removed, leaving the 3' end available for the next base. Illumina sequencing generates reads that are only a few hundred bases in length, but the process is massively parallel, generating millions or billions of reads simultaneously with real-time detection. Also, cloning is not necessary. Second-generation DNA sequencing can be used to profile RNA expression in a given cell type. cDNAs are produced randomly from the tissues of interest, then sequenced. The number of reads detected for a given cDNA is proportional to the level at which the mRNA is expressed. Sequencing technology applied in this way is called RNA sequencing or RNA-seq. Another method of sequencing DNA, nanopore sequencing, involves passing a single strand of DNA through a protein pore that spans a membrane (Fig. 20.41). An electrical potential is applied across the membrane, and the current changes in defined ways when specific bases pass through the pore, allowing the DNA sequence to be read. This technique has a higher error rate than other sequencing methods, but allows sequencing of very long strands of DNA, and can detect chemically modified bases. Expression of cloned genes Often the end product in which we are interested is not the gene itself but a protein. The cloned gene tells us the sequence of the protein and provides the starting point to produce the protein in abundance. "Overexpressed" proteins are used: 1. In medicine as pharmaceuticals (e.g., insulin, growth hormone, or vaccines) 2. Industrially (e.g., proteases in laundry detergents) 3. In research to study protein structure, function, and mechanism Recombinant DNA technology allows the expression proteins at high levels through the use of strong promoters, under controlled conditions through the use of regulated promoters, and in organisms other than the native source to facilita