BIOL 3110 Test 2 PDF
Document Details
Uploaded by ZippyPelican
null
Tags
Summary
This document discusses DNA replication in E. coli, including DNA polymerase I and III, components, and DNA replication processes.
Full Transcript
Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) E.coli DNA polymerase l vs polymerase lll: ○ Kornberg purified the first DNA polymerase from E.coli capable of synthesizing DNA in vitro and named it DNA pol l ○ However E.coli DNA pol l mutants are...
Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) E.coli DNA polymerase l vs polymerase lll: ○ Kornberg purified the first DNA polymerase from E.coli capable of synthesizing DNA in vitro and named it DNA pol l ○ However E.coli DNA pol l mutants are viable ! therefore, DNA pol l cannot be the main replicative polymerase ○ Instead the main replicative polymerase in E.coli = DNA polymerase lll E.coli DNA polymerase lll holoenzyme: ○ = multi-subunit complex that has 17 subunits ○ Contain 4 sub-assemblies Components of DNA pol lll holoenzyme ○ E.coli B-clamp Interacts with a (polymerase) subunit of the pol lll holoenzyme Assembles into a dimer with 35 angstrom diameter hole in the middle Confers extended processivity to the DNA pol lll holoenzyme - can synthesize at least 1.5 x 10^5 bases of the leading strand without dissociation Also increases rate of DNA synthesis (750-1000 nts/s) ○ Clamp is needed to keep DNA pol lll on DNA Most polymerases only synthesize a short stretch of DNA before falling off The B-ring dimer allows the DNA pol lll holoenzyme to stay on DNA and be highly processive The clamp guide the holoenzyme and slide along the DNA duplex, but not ss DNA The B-rings are assembled onto DNA by the clamp loaders, and the assembly / disassembly of the B-rings require ATP as energy DNA replication and related processes: Cycle of loading and unloading DNA polymerase and clamp protein: ○ On the leading strand the moving DNA polymerase is tightly bound to the clamp and the 2 remain associated for a long time ○ On the lagging strand each time the polymerase reaches the 5 end of the Okazaki fragment the polymerase is released and this polymerase molecule then associates with a new clamp assembled on the RNA primer of the next okazaki fragment DNA primase synthesizes RNA primers Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○ DNA primase synthesizes RNA primers to prime synthesis of okazaki fragments ○ Primase is highly error prone however synthesis of RNA primers does not require high fidelity since they eventually get removed Removal of RNA primers ○ After synthesis of the Okazaki fragments, the RNA primers must be removed and replaced by DNA. ○ This step is mediated by RNase H and DNA pol I. ○ Ligase is needed to join Okazaki fragments. Termination of E.coli replication ○ The E. coli replication forks proceed in bi-directional manner until they run into each other or when they hit the termination (Ter) sites. ○ Ter is a short consensus DNA sequence (~ 23 bp) where the Tus (terminus utilization substance) protein binds. ○ Tus is a 36 kD protein that binds DNA as a monomer. It has asymmetrical domains and its asymmetric binding to Ter enable the arrest of the replication fork progression in a directional manner ○ When DnaB helicase runs into Tus-Ter from the permissive side, it will displace Tus from Ter. ○ When DnaB helicase runs into Tus-Ter from non-permissive side, will be trapped and locked. ○ Eventually replication fork machinery disassembles (starting with the removal of the helicase) at the Tus-Ter sites. ○ The Tus-Ter design only allows one way passage of the moving replication fork Summary: DNA replication forks disassemble when they encounter non-permissive Tus-bound Ter sites. Permissive vs non-permissive Ter/Tus sites are relative to direction of replication fork. Helicase is stopped by Tus protein if it is coming from non-permissive side Fidelity of E.coli replication ○ Multiple factors help lower the error rate during DNA replication: ○ Genetics has been very useful for identifying components important for maintaining DNA replication fidelity Random mutation and isolated strains of E. coli (and other organisms) that have higher (mutator) or lower (anti-mutator) mutation rates. Identify the genes that are mutated and study the causes of these phenotypes ○ The first proofreading step is carried out by the DNA polymerase, and occurs before the new nucleotide is added to the growing chain. A) the correct nucleotide with the right geometry fit with the complementary base has higher affinity for the moving polymerase compared to the incorrect nucleotide. Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) B) the active site of the polymerase only accommodates base pairs that have the “right” size. C) after nucleotide binding (but before formation of covalent bond), the polymerase undergoes an induced conformational change (closing of the “fingers”), and any incorrectly bound nucleotide is more likely to be rejected. ○ Second proof-reading step mediated by the 3’ – 5’ exonuclease activity of polymerase à actively removes mis-incorporated nucleotides. ○ Final proof-reading method: Strand-directed mismatch repair mechanism further lowers error rates in DNA replication. ○ MutS detects distortion of the DNA helix and binds to a mismatched base pair. ○ MutL binds the MutS-DNA complex, and activates MutH at near-by methylated GATC to introduce a nick on the unmethylated strand of the DNA duplex. ○ MutS/MutL scan the nearby DNA for a nick, and once detected, MutL recruits helicase to separate DNA strands and exonucleases to degrade the nicked strand all the way back past the mismatch. ○ DNA polymerase (pol III) comes in and fills in the gap using the methylated strand as template. Example of a biochemical assay for studying mismatch repair à incubate plasmid with mismatch within EcoR1 site in extracts from wild type or mutant E. coli strains and assay for repair based on sensitivity to EcoR1 cleavage. Summary ○ Polymerase selectivity 1. Correct base pairing 2. Size selection of correct nucleotide at catalytic site of polymerase 3. Quality control by O-helix before phospho-ester linkage ○ 3 to 5 exonuclease proofreading ○ Mismatch repair mechanism Chapter 10 Initiation / elongation steps of replication Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) DNA polymerase responsible for elongation of newly synthesized DNA strand All polymerases have similar structural features and functions Mg2+ ions (on palm of polymerase) help orient incoming nucleotides Fingers play a role in checking correct base pairing – i.e. fidelity Palm of replicative polymerase has ideal shape and size (~ 22 Å x 30 Å) for B-form DNA E.Coli DNA replication The last few steps of lagging strand synthesis ○ 1. Removal of RNA Primer ○ 2. Filling in the gap ○ 3. Ligation of Okazaki fragments Eukaryotic DNA replication Eukaryotic cells have many origins of replication. In yeast, origins are well defined and also called ARS (autonomously replicating sequences). In higher eukaryotes, origins are not well defined and difficult to identify. In higher eukaryotes, it is clear that not all potential origins of replication fire at the same time during replication. Regulation of origin firing is complex and actively investigated by research labs still. Eukaryotic DNA replication ○ 1. The origin recognition complex (ORC) binds to DNA and provides a site on the chromosome where additional replication factors can associate. ○ 2.Pre-replicative complex formation involves the association of Mcm2-7 complex with DNA at ORC. ○ 3. Mcm2-7 proteins provide helicase activity for DNA synthesis and loading of these proteins confers competence on the origin to fire in S phase. ○ 4. Onset of DNA synthesis requires the action of two protein kinases (cyclin dependent kinase (CDK) and Cdc7), which trigger the association of additional proteins with the origin. During the process of initiation, DNA polymerases are also recruited and DNA synthesis starts. ○ 5. During replication, Mcm2-7 proteins move away from the origin and further assembly of pre-replicative complexes is blocked. This ensures that origins can only fire a single time per cell cycle Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○ Eukaryotic DNA replication Cycles of switching between DNA Pol a and Pol d on the lagging strand: 1. Pol a synthesizes RNA primer (10 – 12 nts) and then ~30 nts of DNA 2. RFC displaces Pol a and recruits PCNA with Pol d 3. PCNA clamps Pol d on DNA 4. Pol d elongates Okazaki fragment synthesis Eukaryotic DNA replication ○ DNA polymerase switching and processing of an Okazaki fragment on the lagging strand A. As the DNA helicase promotes unwinding at the replication fork, DNA pol e with RFC and PCNA synthesizes DNA on the leading strand. DNA pol α initiates synthesis on the lagging strand by generating an RNA primer (red segment) followed by a short segment of DNA. Then, RFC and PCNA load a second DNA polymerase (d) to continue synthesis of the Okazaki fragment. B. As DNA pol d approaches the downstream Okazaki fragment, cleavage by RNase H1 removes the initiator RNA primer leaving a single 5′-ribonucleotide. Then, FEN1/RTH1 removes the last 5′-ribonucleotide. The resulting nick is sealed by DNA ligase. Prokaryotic and eukaryotic DNA replication protein ○ Eukaryotic DNA replication is even more complicated given presence of chromatin and nucleosomes ○ Need to disassemble chromatin ahead of the replication fork, and re-assemble nucleosomes post DNA replication. ○ PCNA directly binds to chromatin remodeling complexes such as CAF-1. Fidelity of DNA replication in eukaryotes ○ Basically the same as in prokaryotes, however, larger selection of polymerases with different error rates. Eukaryotic mismatch repair ○ Again similar to prokaryotic system except not methyl-directed ○ Initiated by nick on one strand→ = repaired strand End replication problem ○ Chromosomes of eukaryotic cells are linear – so the DNA replication machinery cannot replicate the very end of the lagging strand (Watson, Olovnikov, 1970s). How to solve the end replication problem ○ Most eukaryotic chromosomes end in direct repeat sequences called telomeres. ○ All telomeres have a G-rich strand that ends as a single strand overhang. ○ In human, the repeating 6-mer = TTAGGG (aka T2AG3) ○ All chromosomes have the same telomeric sequences, but different chromosomes may have different lengths of telomeres. ○ Can use telomere-specific probes in FISH to visualize telomeres, and also to measure telomere lengths Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) In somatic cells of humans, telomeres shorten with age. So in somatic cells, there is no mechanism to overcome the end replication problem. Gradual loss of telomeres also known as telomere erosion. ○ Chromosomes with critically short telomeres tend to form end-to-end fusions to protect the ends à genomic instability! ○ Telomere shortening is linked to replicative aging. ○ Normal cells have a finite lifespan à Hayflick limit – most cells stop growing and senescence. Small populations continue to grow till they hit a crisis. Fluorescence in Situ Hybridization (FISH) ○ “In situ” means in their natural positions within a chromosome. ○ FISH can be applied to visualize specific genes on chromosomes, or detection of localized RNAs within the cell Telomere length shortening ○ Telomeric length can be measured by Southern blot or by fluorescence microscopy analyses. ○ Telomere length shortening correlates with chronological aging ○ The 3’ overhang of telomeres need to be protected, otherwise will be recognized as damaged DNA by repair mechanisms. ○ Overhangs can fold into T-loop/D-loop or in G-quartet structure, and are also protected by a variety of telomere-binding proteins. ○ However, some cells such as germ cells or stem cells have stable telomere lengths. ○ Also, cancer cells can bypass crisis and grow indefinitely Telomeric ends have to be protected ○ A large variety of proteins bind to the telomeres of different organisms. ○ Some of these proteins bind to the single strand overhang of the telomere (e.g. POT1), but others bind to the double-strand portion of the telomere. Some are even interspersed between nucleosomes. ○ In mammals, these protective proteins (TRF1, TRF2, POT1, TPP1, RAP1 and TIN2) form a complex called Shelterin. ○ In mammals, the double-strand part of telomeric DNA is also organized in tightly packed nucleosomes with shorter repeat lengths (compared to bulk nucleosomes), and they also exhibit hallmarks of heterochromatin. Recap: Fidelity of E.coli replication ○ Polymerase selectivity 1.Correct base pairing 2. Size selection of correct nucleotide at catalytic site of polymerase 3. Quality control by O-helix before phospho-ester linkage 4. 3’ to 5’ exonuclease proofreading 5.Mismatch repair mechanism Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) Okazaki fragment maturation In eukaryotes, RNA primer is removed that RNase H1 and FEN1 (5’ – 3’ exonuclease), and gap is filled by DNA pol s Comparison of prokaryotic vs eukaryotic DNA replication: Replication fork of eukaryotes Nucleosomes are recycled during replication – old nucleosomes (dark green in the figure below) are roughly evenly divided and recycled onto the two daughter duplexes. New nucleosomes (light green ones) are also assembled at gaps. Eukaryotic mismatch repair: General process is similar to prokaryotic mismatch repair; however, it does not scan for methyl to determine which is the template strand. Instead, scan for nicks on newly synthesized strands and chew back from the nicks. End replication problem: Chromosomes of eukaryotic cells are linear – so the DNA replication machinery cannot replicate the very end of the lagging strand. Eukaryotic chromosomes end in direct repeat sequences called telomeres Chapter 11 Eukaryotic DNA replication Telomere length shortening ○ Telomeric length can be measured by Southern blot or by fluorescence microscopy analyses. ○ Telomere length shortening correlates with chronological aging Telomeric ends have to be protected ○ The 3’ overhang of telomeres need to be protected, otherwise will be recognized as damaged DNA by repair mechanisms. ○ Overhang can fold into T-loop/D-loop or in G-quartet structure, and also protected by a variety of telomere-binding proteins. ○ A large variety of proteins bind to the telomeres of different organisms. Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○ Some of these proteins bind to the single strand overhang of the telomere (e.g. POT1), but others bind to the double-strand portion of the telomere. Some are even interspersed between nucleosomes. ○ In mammals, these protective proteins (TRF1, TRF2, POT1, TPP1, RAP1 and TIN2) form a complex called Shelterin. ○ In mammals, the double-strand part of telomeric DNA is also organized in tightly packed nucleosomes with shorter repeat lengths (compared to bulk nucleosomes), and they also exhibit hallmarks of heterochromatin. How to solve the end replication problem? ○ Chromosomes with critically short telomeres tend to form end-to-end fusions to protect the ends à genomic instability! ○ Telomere shortening is linked to replicative aging. ○ Normal cells have finite lifespan à Hayflick limit – most cells stop growing and senescence. Small populations continue to grow till they hit crisis. ○ However, some cells such as germ cells or stem cells have stable telomere lengths ○ Also, cancer cells can bypass crisis and grow indefinitely. ○ In cells such as stem cells or germ cells, they have a special enzyme called telomerase to regenerate telomeres. ○ Greider and Blackburn (mid 80s) set out to biochemically purify telomerase in Tetrahymena à turns out this organism has unique properties that made it an excellent source of telomerase. ○ Tetrahymena is the perfect organism for biochemical purification of telomerase activity because during its vegetative development, they generate a functional nucleus (macronucleus) from the genomic nucleus (micronucleus). This process not only involves amplification of the DNA content (each gene is amplified ~ 40X), but also the addition of numerous telomeres to the ends of each amplified chromosome. ○ Greider and Blackburn deviced an in vitro activity assay to measure telomerase activity to help them purify components of telomerase. ○ Found that telomerase has two distinct components – activity sensitive to both RNase and protease, therefore: An RNA component A protein component ○ They cloned the gene responsible for the RNA component, but identification of the protein component took another 10 years (done by other groups). ○ Greider and Blackburn discovered that telomerase is made up of protein catalytic enzyme (known as TERT) and an RNA component (TER) à discovery led to Nobel prize in 2009. ○ The RNA component has partial complementarity to the G-rich strand overhang à TERT (protein polymerase component) then utilizes RNA component as template for elongation of G-rich strand. ○ Telomerase is a type of reverse transcriptase (copy RNA to make DNA). Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○ Blackburn and colleagues proved that the RNA component of telomerase serves as template for telomeres in Tetrahymena by mutating the gene that transcribes the RNA component (TER RNA). ○ Did similar experiment in human cancer cells and found that this caused them to die To solve the end replication problem: ○ Telomerase specifically extends the G-rich strand of telomeres, and the C-rich strand is filled in by the regular DNA replication machinery. ○ An unknown mechanism maintains a single-stranded overhang on the G-rich strand at the end of chromosomes. Too much telomerase is not a good thing: ○ In humans, the RNA component is transcribed in all cells, but the protein component (TERT) is only expressed in germ/stem cells. ○ Introduction of the hTERT gene into mortal cells not only increased their telomere lengths, but also allowed them to proliferate indefinitely (“immortalized” cell line). Is re-activating telomerase a good thing? ○ Mice engineered to lack telomerase age prematurely. ○ Re-introduction of telomerase into these mice can reverse the premature aging phenotype. Telomerase and aging ○ Myth linking red wine drinking and longevity ○ Resveratrol thought to be key molecule in red wine responsible for this Resveratrol - enhancer of telomerase? ○ Some scientists claimed that resveratrol activates an enzyme called SirT1 known to regulate telomerase activity and telomere length; however, later found to be due to an artifact of the biochemical assay à still very controversial ○ These and other myths have spawned a huge but bogus resveratrol- (and telomerase-) based industry. Methods in Molecular Biology - useful enzymes: Phosphatases and nucleases ○ Phosphatases hydrolyze ester bond to remove phosphate group ○ Nucleases hydrolyze ester bond in phosphodiester linkage between nucleotides degrade nucleic acids endonucleases hydrolyze internal bonds exonucleases chew from the ends also RNases that specifically target RNAs RNase H targets RNA strand of RNA/DNA hybrid Restriction and endonucleases ○ AKA restriction enzymes, derived from bacteria and archaea ○ Highly DNA sequence-specific endonucleases that serve to protect bacteria from foreign DNA ○ The most common REs often belong to the Type II category Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○ Type II enzymes recognize and cleave the same DNA sequences, whereas Type I and Type III enzymes recognize specific sequences, but cut at a distal site (Type III à 20, 30 bp from the recognition site, or Type I à up to 1 Kb away) ○ For Type II enzymes, the recognition/cleavage sites average between 4 – 8 bp, and typically are palindromic in nature ○ Also generally require Mg2+ as co-factor ○ Enzymes that have the same recognition sequence are called isoschizomers e.g. Hpa II (C/CGG) and MspI (C/CGG) ○ Enzymes that have the same recognition sequence cut differently are called neoschizomers e.g. Aat II (GACGT/C) and Zra I (GAC/GTC) ○ Some enzymes are sensitive to the methylation of DNA whereas others are not. e.g. HpaII cannot cleave CCGG when the second C is methylated, whereas MspI will cleave sequence regardless of methylation status ○ Some enzymes have different recognition sequences but leave “compatible” overhangs à very useful for cloning different pieces of DNA together Restriction mapping ○ Method used to map an unknown segment of DNA by digesting with restriction enzymes and identifying the location of the RE cleavage sites Recap: Connection between telomere erosion/ shortening and aging Telomeres in somatic cells (e.g. leukocytes) shorten over an organism’s life span (measured by Terminal Restriction Fragment assay or FISH-quantification assay) Telomere hypothesis of cellular aging Protection of telomeric ends Exposure of telomeric ends can trigger DNA damage/repair checkpoints. Therefore, telomeric ends must be protected. Protection mediated by formation of T- and D-loops and also by a variety of proteins bound to telomeric sequences Telomere shortening and reactivation of telomerase Telomere erosion correlates with cellular (replicative) aging and increasing number of cell division/population doubling. Tetrahymena as an enriched source of telomerase Tetrahymena = single cell protozoan (like paramecium). Macronucleus contains almost 10,000 individual linear mini-chromosomes, each with telomeres à require highly active telomerase to make all these telomeres. Perfect source for biochemical purification of telomerase. Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) Telomerase activity: Greider and Blackburn first to identify telomerase activity in Tetrahymena extracts. Showed that activity has both RNA and protein components (activity is sensitive to RNase as well as protease digestion). RNA component = TER Protein component = TERT Note: human equivalents = hTERT and hTR TER = long RNA that folds into complex secondary structure and binds TERT polymerase enzyme. Within TER, there is a region complementary to G-rich strand that is used as template for G-rich strand synthesis Telomerase is a reverse transcriptase that specifically extends the G -rich strand. Telomerase activity made up of protein component (TERT) = polymerase, and RNA component (TER) = template G -rich strand synthesis involves multiple rounds of DNA elongation and translocation of the RNA template/enzyme. C -rich strand synthesized by the regular cellular replication machinery using the extended G-rich strand as template. Chapter 12 Methods in Molecular biology Cloning ○ To clone, means to make identical copies, so cloning refers to making copies of, or amplifying a DNA fragment or gene of interest ○ Paul Berg (in his 1972 paper) was the first to combine genes from different organisms, which resulted in the formation of recombinant DNA ○ Used restriction enzymes to cut open DNA from SV40 (monkey virus) and lambda bacteriophage (bacterial virus) and engineered a “cut-and-splice” method by joining these pieces of DNA through the sticky ends generated by the REs and by the use of DNA ligase Reagents important for cloning ○ 1. Enzymes A. restriction endonucleases B. DNA ligase C. phosphatase / kinase ○ 2. Vectors There are also expression vectors that are specifically engineered to express (transcribe and translate) gene of interest in the appropriate host organisms An expression vector would contain a bacterial or mammalian cell promoter for transcribing gene of interest cloned into the MCS Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) For a gene to be translated into protein, must contain ATG start codon for translation initiation ○ 3.Antibiotics Used in combination with the specific antibiotic resistant gene present on the vector of choice Used for selection à i.e. to inhibit growth of E. coli that do not contain plasmid-of interest E.g. Ampicillin, Kanamycin Amp = b-lactam antibiotic with an amino group side chain attached to the penicillin structure Penicillin derivative that stops bacterial cell wall synthesis by stopping peptidoglycan cross linking Mode of resistance: the b-lactamase (bla) gene cleaves the b-lactam ring of Amp Tricks of cloning ○ 1. Phosphate treatment of vector Useful for preventing religation of vector ○ 2. Blue white screening Multiple cloning site contain the lacZ gene which encodes the b-galactosidase gene. This enzyme can cleave X-gal to form intense blue precipitate. ○ 3. a) Fill in overhangs to produce blunt ends: Can only fill in 5’ overhangs Need to use polymerase (e.g. T4 DNA polymerase) + nucleotides for fill in reactions ○ 3.b) Chew back overhangs to produce blunt ends Usually when one wants to convert a 3’ overhang to blunt end Use enzyme such as Klenow fragment which has 3’ à 5’ exonuclease activity ○ 4.Adding adaptors or linkers: For adding sticky ends to blunt ends. Adaptors are very similar to linkers, except one end is sticky and the other end is blunt Prevents multiple adaptors ligating to one another Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) Recap Telomerase function Telomerase extends both leading and lagging strands Extends G-rich strand in both cases Immortalizing cells by hRTET expression Expression of hTERT in somatic cells is sufficient immortalize those cells: ○ Increases and maintains slightly longer telomere lengths ○ Bypasses Senescence or Crisis Reagents important for cloning 1. Enzymes: ○ Phosphatases, kinases ○ Nucleases Endonucleases, exonucleases, restriction endonucleases ○ Restriction enzymes → useful for: Restriction enzyme mapping Cloning of DNA fragments Need REs, vectors, antibiotics for selection, screening methods ○ Ligase → join DNA fragments togethers Chapter 13 Methods in Molecular Biology How to test if a clone contains desired insert ○ 1. Harvest DNA and do restriction enzyme mapping For example, after blue-white screening, pick white colonies and grow cultures from single colonies (get pure population all derived from single cell) Isolate plasmid DNA from bacterial cultures, cut with known REs that flank insert fragment or within insert sequence Run digested fragments on agarose gel to determine restriction map ○ 2. Harvest DNA and sequence DNA directly DNA sequencing gives the exact sequence of a DNA fragment For synthesis-based methods, short oligo acts as a primer for synthesis of new DNA that can then be detected or analyzed. Note that synthesis can only go in 5’ to 3’ direction By choosing primers to anneal to unique sequences at one or the other end of MCS (e.g T7 or Sp6 sequences in the figure), can sequence into inserted sequences to verify identity of insert DNA, and also to confirm no mutations were generated in the insert Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) DNA sequencing technology ○ Original sequencing method (Maxam Gilbert sequencing) uses chemicals to cleave ssDNA, and assemble DNA sequence based on the cleavage pattern from different chemicals that cut after specific nts ○ Fred Sanger developed alternative method in 1977 called dideoxy sequencing, based on the use of dideoxy nucleotides (ddNTPs) that cause chain termination in in vitro DNA synthesis reaction ○ Sanger sequencing Perform 4 parallel reactions, each using one of the 4 ddNTPs Need high resolution gels to resolve each bp Usually use 0.4 mm thick polyacrylamide gels Direction of reading sequence (5-3 of DNA strand) Fluorescence-based Sanger sequencing Advances in DNA sequencing technology ○ Two modifications have aided in the automation and scaling up of the sequencing procedure: 1.Incorporation of fluorescent label for each ddNTP allows a single reaction to be run that is read as the strands are hit with a laser and pass by an optical scanner 2.Replacement of slab polyacrylamide gels by capillary gels that are long and very thin. This type of gel dissipates heat much more efficiently and allow higher voltage runs, which in turn reduces the time required to resolve the DNA strands Sequencing human genome ○ 700-900 base pairs ○ 3 billion bp ○ Sequencing the human genome is extremely important because it creates/provides the reference genome for all subsequent genome-sequencing based studies to map back to. Next generation DNA sequencing technology ○ 1.Pyrosequencing = one of two common “Next Gen” sequencing methods: takes advantage of the stoichiometric release of pyrophosphate during the dNTP incorporation step of DNA synthesis to determine DNA sequence based on quantifying the amount of pyrophosphate released Automated sequencing method measuring the release of pyrophosphate after each dNTP incorporation step of DNA synthesis Add one dNTP at a time, measure amount of PPi released, degrade all ATP and dNTP using apyrase, repeat cycle ○ 2. Reversible terminator sequencing The DNA-templates are copied base by base using the four nucleotides (ACGT) that are fluorescently-labeled and reversibly terminated. After each synthesis step, the clusters are excited by a laser which causes fluorescence of the last incorporated base. After that, the fluorescence label and the blocking group are removed allowing the Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) addition of the next base. The fluorescence signal after each incorporation step is captured by a built-in camera, producing images of the flow cell. Recap Tricks of cloning Phosphatase treats the ends of DNA fragments to prevent undesired ligations. How to make compatible ends for DNA fragments to ligate together: ○ create blunt ends ○ add linkers/adaptors Polymerase Chain reaction (PCR) Typical cycling steps: Rely on thermostable DNA polymerase (optimal @ 72oC) PCR amplifies DNA sequences exponentially. ○ Ex. Plateau effect Chapter 14 & 15 Methods in molecular biology Applications of PCR ○ 1. Amplify or clone desired DNA sequences E.g cloning out a specific gene from genomic DNA ○ 2. Quantification of specific DNA or RNA in a mixture of DNA or RNA pools. For quantification of RNA, need to reverse transcribe RNA to cDNA first Real time polymerase chain reaction (PCR) ○ Real time PCR employs methods to detect on-going production of amplicons in the reaction vessel, thus allowing “real time” monitoring of the different phases of the PCR reaction. ○ Indirect detection of PCR products is often based on the usage of fluorescent dsDNA binding dyes, such as SYBR Green. SYBR Green little to no fluorescence when it is free in solution, but its fluorescence strongly when it binds to dsDNA. Therefore, the overall fluorescent signal from a reaction is proportional to the amount of dsDNA present and will increase as the target is amplified. ○ Analysis of the amplification curves allows samples to be quantified via a standard curve, or used to calculate relative expression levels between samples ○ Real-Time PCR focuses on the exponential phase because it provides the most precise and accurate data for quantitation. ○ Within the exponential phase, the real-time PCR instrument calculates two values: The Threshold line is the level of detection at which a reaction reaches a fluorescent intensity above background. The PCR cycle at which the sample reaches this level is called the Cycle Threshold, Ct. Sequence-specific method for qPCR Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○ Fluorescent reporter probes, such as Taqman probes, use sequence-specific RNA or DNA probes to specifically quantify products that contain the probe sequence. Therefore, they significantly increase the specificity of detection, and allow quantification even in the presence of other non-specific DNA amplification. ○ Also allow for multiplexing – i.e. assaying several genes in the same reaction using separate probes that contain different coloured labels. ○ A typical probe contains a fluorescent reporter at one end and a quencher of fluorescence at the other end. ○ The close proximity of the R and Q prevents detection of fluorescence. DNA microarrays ○ Another DNA hybridization-based technique Useful for identifying and quantifying unknown mixtures of DNA in a sample. ○ Microarray is a collection of DNA oligonucleotides, each corresponding to unique DNA sequences, anchored onto a solid surface (e.g. glass). ○ Thousands of spots can be arrayed in precise order on a microarray. ○ Unknown DNA sequences can then be hybridized onto DNA microarrays, and their identities determined based on the spots they specifically bind to. ○ Gene expression microarrays can identify and determine the relative amounts of cDNAs generated from mRNAs harvested from different cells. ○ Key principle: DNA complementarity and hybridization to identify DNAs/cDNAs in mixture. Gene expression microarray ○ Microarray analysis is based on DNA complementarity and hybridization ○ Microarray can assay the expression of many genes all at the same time ○ cDNA microarray analysis can also be used to compare differential expression of genes harvested from different cells ○ Determine ratio of fluorescence of the two samples to determine relative expression RNA sequencing ○ Takes advantage of the rapidly advancing “next generation” sequencing technologies ○ Used for measuring RNA levels (gene expression analysis) – new technology replacing microarrays ○ E.g. useful for identifying the transcriptome of different cells/tissues Genome organization Genome size comparison ○ Prokaryotic genomes are small: E. coli genome is only 4639 Kb (~ 4 Mb). ○ Eukaryotic genomes are much larger and highly variable in size: ranging from 10 Mb to 100,000 Mb! ○ The number of genes in a eukaryotic genome also doesn’t correlate with the complexity of the organism, nor with the genome size. ○ C-value paradox (or C-value enigma) à the amount of haploid DNA in an organism does not correlate with evolutionary complexity Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○ Moreover, gene density (number of genes per unit length of DNA) drops for more complex organisms such as humans and more variable in eukaryotes ○ Prokaryotes and single cell organisms pack more genes per Mb base of DNA. Comparing genomes by Cot analysis ○ As early as the 60s, researchers have used Cot analyses to compare genomes of different organisms. ○ Cot analysis is based on measuring the kinetics of DNA renaturation after heat denaturation – e.g., how long it takes to re-anneal entire genomes ○ Cot value = DNA concentration (Co, moles per liter) X renaturation time (t, in seconds) X a buffer factor based on cation concentration. ○ The rate at which a particular sequence will reassociate is proportional to the number of times it is found in the genome. ○ Cot curves for eukaryotic genomes (e.g. human genome) are not simple sigmoidal shapes. ○ Cot value = Conc of DNA X time needed for renaturation X salt conc adjustment factor ○ The rate of DNA re-association is proportional to the size of the genome, and to the amount of repeated sequences in the genome ○ Can be separated into 3 main sections: 1. Highly repetitive DNA (simple repeats): 105 to 106 copies per genome. Highly repetitive DNA are the first to re-anneal because of their high abundance and low sequence complexity (often times simple tandem repeats, e.g. [AAAAT]n). Highly repetitive DNA is often found around centromeres and at the ends of chromosomes (telomeres). These regions are also often structurally condensed in the form of heterochromatin (as opposed to euchromatin, which are less condensed chromatin). Repetitive DNA is found in satellite DNA – based on banding patterns on CsCl gradients. Highly repetitive DNA (complex repeats): Transposable elements (TE) are also highly repeated sequences in the human genome. They are also sometimes classified as middle repetitive DNA, probably because they are not simple tandem repeats like the satellite DNAs (therefore take longer to re-anneal?). They are interspersed throughout the genome and amplify via an RNA intermediate (also referred to as retrotransposons or retroposons). Most abundant TEs in human genome = LINES (Long INterspersed DNA Elements) and SINES (Short INterspersed DNA Elements) à = retroposons since lacking retroviral LTRs. The Alu element is the single most abundant TE in the human genome (estimated to be > 1 x 106 copies per genome, Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) comprising ~ 10% of the human genome). It belongs to the SINE family, and each element is about 280 bp long with a dimeric structure and contains RNA pol III promoter sequences. 2. Middle repetitive DNA: 10s to 1000s of copies per genome. Examples of middle repetitive DNA include gene families that encode highly abundant RNAs such as tRNA and rRNA. The 18S, 5.8S, and 28S rRNAs are produced by post-transcriptional processing of a 45S precursor transcript expressed from clusters of repeated genes. These genes, collectively known as rDNA, are clustered as tandem arrays present on the short arms of 5 chromosomes and form the nucleolar organizing regions. 3. Single copy genes: unique DNA sequences or up to 10 copies per genome. “Single-copy” sequences are dispersed throughout the euchromatin of the genome. This category also includes some small gene families, such as globin genes, that may have multiple related but non-identical family members. This category encompass ~ 50% of the human genome; however, only ~ 1.5% (closer to 1.1% by some estimates) of which contains protein-coding genes. The C-value paradox is not so much a “paradox” nowadays (replaced by the term “Value enigma” [enigma defined as a puzzle] instead). ○ Typical sigmoidal shape of Cot curves: For simple organisms, the relative position of the Cot curve is proportional to its genome size. i.e. the bigger the genome, the longer it takes for all DNA to reanneal. ○ Brief example protocol: 1. Shear the DNA to a size of about 400 bp. 2. Denature the DNA by heating to 100oC. 3. Slowly cool and take samples at different time intervals. 4. Determine the % single-stranded DNA at each time point. ○ The shape of a "Cot" curve for a given species is a function of two factors: 1. the size or complexity of the genome; 2. the amount of repetitive DNA within the genome Recap: Quantitative PCR (qPCR) Real time PCR using Taqman probes ○ Use additional sequence-specific primer that hybridizes to the PCR template for each round of PCR amplification/DNA synthesis Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○ Specifically monitor sequence-specific products and also allow for multiplexing of PCR reactions (i.e. probe multiple PCR products in single PCR reaction) Gene families (e.g. rRNA genes) The cluster and tandem organization of rDNA allow for coordinated transcription of these genes. Actively transcribing rRNA genes can be visualized by Electron Microscopy, first done in the 1960s by Oscar Miller (technique now known as Miller spreads Chapter 16 Genome organization Factors that can account for the C-value paradox / enigma ○ 1.Large amounts of repetitive sequences (e.g. up to 50% in the human genome). ○ 2. ncRNAs – exact number of ncRNA still unknown à estimate about 10,000 - 12,000 in the human genome? ○ 3. Many genes in more complex eukaryotes have introns (non protein-coding and spliced out during post-transcriptional processing). E.g. The human Titin gene has the most number of exons/introns (363 exons). Packaging and organization of the prokaryotic genome ○ E.coli is the best studied and model organism of choice for prokaryotic research ○ Because E.coli has a single closed circular genome, it was often assumed that all bacteria has the same genome organization ○ Bacteria can have circular, linear or multipartite genomes ○ How is the genome of E. coli packaged into the bacterial cell? ○ The E. coli genome is not just naked DNA, but is packaged into a structure called the nucleoid. ○ In addition, there are small circular DNAs that carry non-essential genes, such as antibiotic resistance genes, that are “free floating” inside the bacterium. Note: plasmids are NOT part of the E. coli genome. ○ First, being a closed circular genome, the E. coli genome is supercoiled, which results in more compact dimensions. ○ Second, multiple proteins have been discovered to fold and condense prokaryotic DNA. For example, E. coli DNA is wrapped around HU proteins, which are the most abundant proteins in the nucleoid. These DNA-protein complexes, together with Topoisomerase I and DNA gyrase, generate and maintain supercoiling of the genome. ○ In addition, the supercoiled DNA/HU complexes form loops that radially extend from a central protein core. Organization of genes in prokaryotic genome ○ Because of size constraints, prokaryotes are highly efficient in terms of genome organization. ○ Very little space is left between prokaryotic genes. As a result, noncoding sequences account for an average of 12% of the prokaryotic genome. Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○Most prokaryotic genomes are organized into polycistronic operons, or clusters of several coding regions linked to a single promoter. ○ Discontinuous genes are virtually absent in prokaryotic genomes (i.e. no introns). ○ There are also very few repetitive sequences within the genome. Genome compaction and folding - the problem ○ The human genome has ~ 6.6 X 109 bp of DNA. ○ For B-form DNA, the distance between base pairs is ~ 0.34 X 10-9 m ○ Therefore, the total length of DNA in a human cell is ~ 2 m! ○ All this has to fit into a nucleus of ~ 10 micron in diameter Packaging of the eukaryotic genome Historical figures ○ Walter fleming coined the term chromatin to describe the substance within the cell nucleus that is readily stained by dyes (~ 1881) had the foresight to say, Possibly chromatin is identical to nuclein, but if not, it follows that one carries the other. ○ Albrecht kossel coined the term “histon” to describe the proteins he found by extracting avian erythrocyte nuclei using diluted acids (1884) first to notice that histones are acid-soluble (… still the best way to extract histones) ○ Emil Heitz defined the term heterochromatin as chromosomal material that remains condensed in interphase nuclei (1928) proposed that …euchromatin [true chromatin] is genetically active, heterochromatin is genetically passive” ○ Vincent Alfrey found that histones inhibited RNA synthesis in isolated thymus nuclei (1961) proposed that acetylation and methylation of histones may have possible roles in regulating RNA synthesis (1964 ○ Olins, and christopher woodcock independently observed the “beads on a string” feature of chromatin by EM (1973) each “bead” is what we now call a nucleosome ○ Roger Kornberg In 1974, proposed that ~ 200 bp of DNA forms a complex with 4 histone pairs (based on EM images, nuclease digestion patterns, X-ray diffraction data, and biochemical purification of chromatin components). The nucleosome ○ The DNA-histone complex is called a nucleosome, and is defined as the repeating unit of chromatin. Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) ○ Each nucleosome contains 147 bp of DNA wrapped in a left-handed manner around 2 copies each of H2A, H2B, H3 and H4. Note that the 200 bp band protected from MNase digestion also includes the linker DNA between nucleosomes. ○ Some nucleosomes at transcriptionally silenced regions of the genome contain a 5th histone called H1, which binds the linker DNA that enters and exits the core nucleosome particle. General properties of histones ○ Histones are small basic (i.e. positively charged) proteins that interact with each other to form an octameric structure called this histone octamer. ○ Histones contain many lysines and arginines, which explain the positively charged nature of these proteins ○ They are some of the most highly conserved proteins across species – so much so that antibodies that recognize human histones will often also recognize histones from other organisms (from yeast to flies to mice to human). ○ All histones have very similar domain structures: consisting of relatively unstructured N Terminal tails and highly structured globular domains (making up the histone folds) in the middle. ○ H2A and H2B have longer C-terminal tails whereas H3 and H4 have very short C Terminal tails. General properties of core histones ○ The histone fold of core histones (made up of the 3 central a-helices of each histone) form hand-shake motifs that allow dimerization of H2A with H2B, and H3 with H4. Histone octamer assembly ○ Two H3-H4 dimers form a tetramer, which form the central disc of the octamer. Then two H2A-H2B dimers are added to the H3-H4 tetramer (one dimer on top and the other dimer on the bottom) to form the full octamer. Nucleosome assembly ○ The histone octamer binds and wraps ~ 1.7 turns of DNA (~ 147 bp of DNA) in a left handed manner. The addition of one H1 molecule wraps another 20 bp, resulting in 2 full turns of DNA around the octamer. ○ Note that nucleosome assembly requires the activities of histone chaperones and ATP-dependent chromatin remodeling complexes. Recap Genome size comparison Genome = collection of all genes; proteome = collection of all proteins ~ 20,000 protein-coding genes à highly complex proteome How to pack whole E.coli genome into a 2 by 0.5 micron cell 1. The E. coli genome is supercoiled, which results in more compact dimensions. 2. E. coli DNA is wrapped around HU (and other) proteins to form the nucleoid. These DNA-protein complexes not only compact the genome, but together with Topoisomerase I and DNA gyrase, generate and maintain supercoiling of the DNA. Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) 3. The supercoiled DNA/HU complexes form loops that radially extend from a central protein core. Chapter 17 Packaging of eukaryotic genome First order of DNA compaction ○ Nucleosomes are linked together by linker DNA. The oligonucleosomes form the 11 nm “beads on a string” structure. ○ The wrapping of DNA around nucleosomes compact DNA by 6 – 8 fold. Second order of DNA compaction ○ The 11 nm fiber is further coiled into a shorter and thicker fiber termed the 30 nm fiber. ○ H1 is needed to stabilize this higher order structure. ○ The 30 nm fiber compacts DNA by ~ 50 - 100 fold. ○ The 30 nm fiber is a coiled structure that has ~ 6 nucleosomes per turn. ○ In vitro studies showed that addition of H1 and increased ionic strength of the surrounding buffer both promote the formation of the 30 nm fiber. ○ There are several models of how the 30 nm fiber is folded. All models are theoretical and still under investigation. In fact, whether 30 nm exists in vivo has been much debated recently. Higher orders of DNA compaction ○ Higher order structures beyond the 30 nm fiber are poorly understood and subject to much speculation. ○ For example, it is thought that the solenoid 30 nm fibers further loops to form 300 nm domains that are anchored to a protein scaffold or nuclear matrix. ○ Evidence for the high order structures is mostly based on EM pictures. Metaphase chromosome is the highest compacted state of chromatin ○ Chromosomes are condensed during mitosis to facilitate chromosome segregation during cell division. ○ DNA has to compact 10,000 fold to form mitotic chromosomes. Metaphase chromosomes are useful diagnostic tools for cytogeneticists ○ Cytogenetics is the study of chromosomes and their abnormalities. ○ Karyotype is the number and appearance of chromosomes isolated from eukaryotic cells. ○ Since the 1960s, various staining protocols have been developed to produce reproducible patterns of dark and light bands along the length of each Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap) chromosome. These banding patterns become the barcodes with which cytogeneticists can easily identify chromosomes, detect subtle deletions, inversions, insertions, translocations and more complex rearrangements. ○ The molecular basis of the banding patterns is unknown. Human karyotypes ○ Karyotyping is a test to examine chromosomes in a sample of cells – can help identify genetic problems as the cause of a disorder or disease. ○ Aneuploidy = aberrations in the normal chromosome number in an organism ○ Spectral karyotyping (SKY) utilizes fluorescent probes to “paint” human chromosomes à allow easier identification of chromosomes and easier detection of abnormalities such as translocations. Technique: Fluorescence In Situ Hybridization (FISH) ○ The fluorescent labeling of SKY is based on the technique: FISH. ○ “In situ” means in their natural positions within a chromosome. ○ FISH can be applied to visualize specific genes on chromosomes, or, in the case of SKY, probes can be generated to label entire chromosomes. ○ How to generate fluorescent probes for SKY? FACS = Fluorescence activated cell sorting Chromatic states in interphase cells ○ In interphase cells, genomic chromatin is mostly at the 11 nm or 30 nm compaction states. ○ Chromatin is also classified as euchromatin (less condensed chromatin) or heterochromatin (condensed chromatin). Properties of euchromatin ○ Active genes are located in euchromatin. ○ Active genes are 3 – 10 times more sensitive to nucleases such as DNase I. ○ Classic experiment from Weintraub’s lab in 1980: Active genes are located in euchromatin ○ Used Cot analyses to measure hybridization kinetics of radioactive probes corresponding to either globin gene (active) or ovalbumin gene (inactive) Euchromatin and heterochromatin ○ Euchromatin (also referred to as “open” chromatin) is less condensed and more accessible to nuclear factors. ○ Heterochromatin (also referred to as “closed” chromatin) is more condensed and less accessible to nuclear factors. Chromatin states in interphase cells ○ Euchromatin is more sensitive to DNase I digestion (more accessible to nucleases). ○ Heterochromatin is less accessible to nucleases, and can be further classified as: constitutive heterochromatin (cHC), facultative heterochromatin (fHC) Midterm 2 - Nov 12 -1-2:30 pm in CLH I (lecture 9 -17 recap)