Molecular Biology I: Nucleic Acid Metabolism Lecture 07 PDF

MOLECULAR BIOLOGY I: NUCLEIC ACID METABOLISM SC/BIOL 3110, 2024 S1 Lecture 07: Molecular biology techniques Genome organization 1 Announcements: 1. The second midterm is scheduled on Jun 4th, which is one week from today. Midterm 2 will cover material from the last part of Lecture 04 to the middle of Lecture 08 (exact position will be announced on Thurs). Format will be same as Midterm 1. 2. I will go over Midterm 1 on at the beginning of Thursday’s lecture. Note that the review will not be recorded so you should attend in person. 2 Recap of last lecture: “Tricks” of cloning (how to increase chance of success): Phosphatase treat ends of DNA fragments to prevent undesired ligations. How to make compatible ends for DNA fragments to ligate together: create blunt ends add linkers/adaptors ➔ adding linkers/adaptors also v. useful for adding known primer sequences to DNA fragments for genome sequencing 3 Recap of last lecture: DNA sequencing methods – Sanger sequencing: 4 Methods in Molecular Biology: Advances in DNA sequencing technology: Two modifications have aided in the automation and scaling up of the sequencing procedure: 1. Incorporation of fluorescent label for each ddNTP allows a single reaction to be run that is read as the strands are hit with a laser and pass by an optical scanner 2. Replacement of slab polyacrylamide gels by capillary gels that are long and very thin. This type of gel dissipates heat much more efficiently and allow higher voltage runs, which in turn reduces the time required to resolve the DNA strands 5 Methods in Molecular Biology: Advances in DNA sequencing technology: 6 Methods in Molecular Biology: Sequencing the human genome (Human Genome Project): (~ 10 X) 7 Recap of last lecture: Human Genome Project: Publically funded, involving 25 institutions Private company funded and executed Note: The human genome was sequenced using Sanger sequencing 8 Methods in Molecular Biology: Milestones of genome sequencing: Sequencing the human genome is extremely important because it creates/provides the reference genome for all subsequent genome-sequencing based studies to map back to. I.e. compare what you sequenced to the reference genome to identify what you have sequenced. 9 Methods in Molecular Biology: “Next generation” DNA sequencing technology: 1. Pyrosequencing = one of two common “Next Gen” sequencing methods: takes advantage of the stoichiometric release of pyrophosphate during the dNTP incorporation step of DNA synthesis to determine DNA sequence based on quantifying the amount of pyrophosphate released 10 Methods in Molecular Biology: “Next generation” DNA sequencing technology: 1. Pyrosequencing = automated sequencing method measuring the release of pyrophosphate after each dNTP incorporation step of DNA synthesis Add one dNTP at a time, measure amount of PPi released, degrade all ATP and dNTP using apyrase, repeat cycle 5’ 3’ 5’ 3’ pyrophosphate + APS 11 Methods in Molecular Biology: Note: all 4 types of chain- “Next generation” DNA sequencing technology: terminating nts are added at the same time per round of synthesis 2. Reversible terminator sequencing Each cluster = amplified clones of original template DNA; different clusters = different DNA templates The DNA-templates are copied base by base using the four nucleotides (ACGT) that are fluorescently-labeled and reversibly terminated. After each synthesis step, the clusters are excited by a laser which causes fluorescence of the last incorporated base. After that, the fluorescence label and the blocking group are removed allowing the addition of the next base. The fluorescence signal after each incorporation step is captured by a built-in camera, producing images of the flow cell. Methods in Molecular Biology: Polymerase Chain Reaction (PCR): 2. anneal 3. extend Amplification of a DNA Segment REQUIREMENTS: Long Product 1. melt or Oligonucleotide primers which 5 flank the sequence of interest denature A DNA Template (a few ng) A thermal-stable DNA Polymerase (Taq) 5 Long Product dNTPs Mg2+ An automated thermocycler A repetitive three- step process : Denature--Anneal--Extend (94-97oC) (42-55oC) (72oC) 13 Methods in Molecular Biology: Polymerase Chain Reaction (PCR): long strand short strand Sequential rounds exponential amplification of short strands 14 Methods in Molecular Biology: Polymerase Chain Reaction (PCR): Typical cycling steps: Need to optimize annealing temperature 15 Methods in Molecular Biology: Applications of PCR: 1. Amplify or clone desired DNA sequences à e.g. cloning out a specific gene from genomic DNA 2. Quantification of specific DNA or RNA in a mixture of DNA or RNA pools. For quantification of RNA, need to reverse transcribe RNA to cDNA first 16 Methods in Molecular Biology: Polymerase Chain Reaction (PCR): PCR amplifies DNA sequences exponentially. In theory, should yield 2n products, where n = number of cycles; however… Plateau effect Therefore, conventional PCR is NOT quantitative, especially at high cycle numbers when reaction has reached the plateau 17 Methods in Molecular Biology: Real time Polymerase Chain Reaction (PCR): Real time PCR employs methods to detect on-going production of amplicons in the reaction vessel, thus allow “real time” monitoring of the different phases of the PCR reaction. Indirect detection of PCR products is often based on the usage of fluorescent dsDNA binding dyes, such as SYBR Green. SYBR Green little to no fluorescence when it is free in solution, but its fluorescence strongly when it binds dsDNA. Therefore, the overall fluorescent signal from a reaction is proportional to the amount of dsDNA present and will increase as the target is amplified. Analysis of the amplification curves allows samples to be quantified via a standard curve, or used to calculate relative expression levels between samples 18 Methods in Molecular Biology: Real time Polymerase Chain Reaction (PCR): Real-Time PCR focuses on the exponential phase because it provides the most precise and accurate data for quantitation. Within the exponential phase, the real-time PCR instrument calculates two values: The Threshold line is the level of detection at which a reaction reaches a fluorescent intensity above background. The PCR cycle at which the sample reaches this level is called the Cycle Threshold, Ct. Ct levels are inversely proportional to the amount of target nucleic acid in the sample 19 Methods in Molecular Biology: Real time Polymerase Chain Reaction (PCR): Real-Time PCR focuses on the exponential phase because it provides the most precise and accurate data for quantitation. Within the exponential phase, the real-time PCR instrument calculates two values: The Threshold line is the level of detection at which a reaction reaches a fluorescent intensity above background. The PCR cycle at which the sample reaches this level is called the Cycle Threshold, Ct. Can use DCt to calculate molar ratio (relative amounts) of the DNA template in the original samples Threshold line 2DCt = fold difference 20 Methods in Molecular Biology: Sequence-specific method for qPCR: Fluorescent reporter probes, such as Taqman probes, use sequence-specific RNA or DNA probes to specifically quantify products that contain the probe sequence. Therefore, they significantly increase the specificity of detection, and allow quantification even in the presence of other non-specific DNA amplification. Also allow for multiplexing – i.e. assaying several genes in the same reaction using separate probes that contain different coloured labels. A typical probe contains a fluorescent reporter at one end and a quencher of fluorescence at the other end. The close proximty of the R and Q prevents detection of fluorescence. However, probe can be broken down by the passage of the Taq polymerase that has 5’ to 3’ exonuclease activity, and the free R is now unquenched and its fluorescence can be detected. Fluorescence is proportional to the amount of R released, which is proportional to the number of amplifications 21 Methods in Molecular Biology: DNA microarrays: Another DNA hybridization-based techinique Useful for identifying and quantifying unknown mixtures of DNA in a sample. Microarray is a collection of DNA oligonucleotides, each corresponding to unique DNA sequences, anchored onto a solid surface (e.g. glass). Thousands of spots can be arrayed in precise order on a microarray. Unknown DNA sequences can then be hybridized onto DNA microarrays, and their identities determined based on the spots they specifically bind to. Gene expression microarrays can identify and determine the relative amounts of cDNAs generated from mRNAs harvested from different 22 cells. Methods in Molecular Biology: DNA microarrays: Analogy: Lecture room with pre-assigned seats – Each seat is programmed based on class list and only the assigned students can sit on the designated seats. 23 Methods in Molecular Biology: DNA microarrays: Analogy: Lecture room with pre-assigned seats – Each seat is programmed based on class list and only the assigned students can sit on the designated seats. Also, each entering student given fluorescent light that can be detected by CCTV à By tracking fluorescent seating pattern – can identify attendees (i.e. who is in the class for a given lecture) Technical challenge: how to restrict seating so that only the correct student can sit on the assigned seat? à for DNA, apply stringent hybridization conditions 24 Methods in Molecular Biology: DNA microarrays: Key principle: DNA complementarity and hybridization to identify DNAs/cDNAs in mixture. Stringency depends on: salt concentration temperature * Want unstable conditions so only DNA with most base-pairing will remain bound 25 Methods in Molecular Biology: Gene expression microarray: Microarray analysis is based on DNA complementarity and hybridization Microarray can assay the expression of many genes all at the same time Typicall use oligo-dT to specifically select/purify mRNAs from total RNA 26 Methods in Molecular Biology: Gene expression microarray: cDNA microarray analysis can also be used to compare differential expression of genes harvested from different cells Determine ratio of fluorescence of the two samples to determine relative expression = exp’d in control cells only = exp’d in non-control cells only = exp’d in both cell types = exp’d in neither cell type 27 Methods in Molecular Biology: RNA-sequencing: Takes advantage of the rapidly advancing “next generation” sequencing technologies Used for measuring RNA levels (gene expression analysis) – new technology replacing microarrays Typical (and simplified) RNA-seq workflow 28 Genome organization in prokaryotes and eukaryotes Genome organization: Genome size comparison: Prokaryotic genomes are small: E. coli genome is only 4639 Kb (~ 4.6 Mb). Eukaryotic genomes are much larger and highly variable in size: ranging from 10 Mb to 100,000 Mb! C-value = amount of DNA per haploid genome Total amount of DNA in human genome = 6.6 x 109 bp Eukaryotes However, C-value for humans = 3.3 x 109 bp Prokaryotes 30 1 x 106 bp = 1 Mb Genome organization: Genome size comparison: Prokaryotic genomes are small: E. coli genome is only 4639 Kb (~ 4 Mb). Eukaryotic genomes are much larger and highly variable in size: ranging from 10 Mb to 100,000 Mb! Human genome has C-value of ~ 3.3 x 109 bp, whereas genomes of some flowering plants have 1011 bp! C-value paradox (or C-value enigma) Eukaryotes à the amount of haploid DNA in an organism does not correlate with evolutionary complexity Prokaryotes 31 1 x 106 bp = 1 Mb Genome organization: Genome size comparison: The number of genes in a eukaryotic genome also doesn’t correlate with the complexity of the organism, nor with the genome size. Prokayotes and single cell organisms pack more genes per Mb base of DNA. Genome size ~ # of genes (Mb): per Mb: Mycoplasma ~ 0.6 862 genitalium ~ 12.5 502 ~ 115 174 ~ 100 191 What else, ~ 1.8 968 besides genes, ~ 122 115 makes up the human genome? ~ 2.2 981 ~ 3300 6.4 Note: numbers are approximate since # of genes and genome sizes are still constantly updated 32 Genome organization: Comparing genomes by Cot analysis: As early as the 60s, researchers have used Cot analyses to compare genomes of different organisms. Cot analysis is based on measuring the kinetics of DNA renaturation after heat denaturation – e.g., how long it takes to re-anneal entire genomes. Brief example protocol: 1. Shear the DNA to a size of about 400 bp. 2. Denature the DNA by heating to 100oC. 3. Slowly cool and take samples at different time intervals. 4. Determine the % single-stranded DNA at each time point. The shape of a "Cot" curve for a given species is a function of two factors: 1. the size or complexity of the genome; and 2. the amount of repetitive DNA within the genome 33 Genome organization: Comparing genomes by Cot analysis: Cot value = DNA concentration (Co, moles per liter) X renaturation time (t, in seconds) X a buffer factor based on cation concentration. The rate of at which a particular sequence will reassociate is proportional to the number of times it is found in the genome. Typical sigmoidal shape of Cot curves: 100 For simple organisms, the relative position of the Cot curve is proportional to its genome size. i.e. the bigger the genome, the longer it takes for all DNA to re- anneal. 34 Genome organization: Comparing genomes by Cot analysis: Cot curves for eukaryotic genomes (e.g. human genome) are not simple sigmoidal shapes. Can be separated into 3 main sections: Highly repetitive DNA: 105 to 106 copies per genome. Middle repetitive DNA: 10s to 1000s of copies per genome. Single copy genes: unique DNA sequences or up to 10 copies per genome. 35 Genome organization: Comparing genomes by Cot analysis: Cot curves for eukaryotic genomes (e.g. human genome) are not simple sigmoidal shapes. Can be separated into 3 main sections: The estimated relative distribution of these categories within different genomes are constantly changing and updated. Current estimates suggest that close to 50% of the human genome is made up of repetitive sequences (highly + middle repetitive). 36 Genome organization: Highly repetitive DNA (simple repeats): Highly repetitive DNA are the first to re-anneal because of their high abundance and low sequence complexity (often times simple tandem repeats, e.g. [AAAAT]n). Highly repetitive DNA are often found around centromeres and at the ends of chromosomes (telomeres). These regions are also often structurally condensed in the form of heterochromatin (as opposed to euchromatin, which are less condensed chromatin). Cartoon depiction of a typical metaphase chromosome 37 Genome organization: Highly repetitive DNA (simple repeats): Repetitive DNA are found in satellite DNA – based on banding patterns on CsCl gradients. Total genomic DNA Satellite DNA has separated on CsCl distinct buoyancy gradient. Banding compared to bulk based on buoyant genomic DNA (more density of DNA. AT-rich). 3 general types: Satellite (or alpha-satellite) DNA – long arrays (up to hundreds of Kbs) of tandem repeats (up to 100 bp) Mini-satellite DNA – ~ 1 to 5 Kb of repeating units of 15-60 bp Micro-satellite DNA 38 – small arrays of simple tandem repeats (2-10 bp) Genome organization: Highly repetitive DNA (complex repeats): Transposable elements (TE) are also highly repeated sequences in the human genome. They are also sometimes classified as middle repetitive DNA, probably because they are not simple tandem repeats like the satellite DNAs (therefore take longer to re-anneal?). They are interspersed throughout the genome and amplify via an RNA intermediate (also referred to as retrotransposons or retroposons). Most abundant TEs in human genome = LINES (Long INterspersed DNA Elements) and SINES (Short INterspersed DNA Elements) à = retroposons since lacking retroviral LTRs. The Alu element is the single most abundant TE in the human genome (estimated to be > 1 x 106 copies per genome, comprising ~ 10% of the human genome). It belongs to the SINE family, and each element is about 280 bp long with a dimeric structure and contains RNA pol III promoter sequences. 39 Genome organization: Highly repetitive DNA (complex repeats): Estimate of the amount of repetitive DNA in the human genome (circa 2012): It has been estimated that ~ 50% of the human genome is made up of repeats. (Cvg = percentage of the genome) Relative amounts of repetitive DNA for each human chromosome. Nature Reviews Genetics, 2012, 13: 36-46 40 Genome organization: Middle repetitive DNA: Examples of middle repetitive DNA include gene families that encode highly abundant RNAs such as tRNA and rRNA. The 18S, 5.8S, and 28S rRNAs are produced by post-transcriptional processing of a 45S precursor transcript expressed from clusters of repeated genes. These genes, collectively known as rDNA, are clustered as tandem arrays present on the short arms of 5 chromosomes and form the nucleolar organizing regions. 41 Genome organization: Single-copy sequences: “Single-copy” sequences are dispersed throughout the euchromatin of the genome. This category also includes some small gene families, such as globin genes, that may have multiple related but non-identical family members. Each hemoglobin molecule is made of 2-a-chains and The a- and b-globin gene families cluster on human chromosomes 16 2-b-chains and 11 respectively. Different combinations of a- and b-globin genes are expressed at different times during development. 42 Genome organization: Single-copy sequences: This category encompass ~ 50-60% of the human genome; however, only ~ 1.5% (closer to 1.1% by some estimates) of which contains protein-coding genes. What else is in the non-protein coding parts of the genome? à new paradox?? The C-value paradox is not so much a “paradox” nowadays (replaced by the term “C- value engima” [enigma defined as a puzzle] instead). 43 Genome organization: Factors that can account for the C-value paradox/enigma: 1. Large amounts of repetitive sequences (e.g. up to 50% in the human genome). 2. ncRNAs – exact number of ncRNA still unknown à estimate about 10,000 - 12,000 in the human genome? 3. Many genes in more complex eukaryotes have introns (non protein-coding and spliced out during post-transcriptional processing). E.g. the human Titin gene has the most number of exons/introns (363 exons). Note: alternative splicing allow much greater number of proteins encoded by the small number of genes à estimated that as many as 500,000 distinct protein products encoded by the ~ 20,000 genes in the human genome. Also, genome complexity not measured by number of genes, but by the proteome? 44 Genome organization: Comparison of prokaryotes and eukaryotes: true for E. coli, but not for all prokaryotes 45 Genome organization: Packaging and organization of the prokaryotic genome: E. coli is the best studied and model organism of choice for prokaryotic research. many different strains of E. coli have already been sequenced Because E. coli has a single closed circular genome, it was often assumed that all bacteria has the same genome organization. However… 46 Genome organization: Packaging and organization of the prokaryotic genome: Bacteria can have circular, linear or multipartite genomes causative agent of Lyme disease multipartite genome 47 Genome organization: Packaging and organization of the prokaryotic genome: How is the genome of E. coli packaged into the bacterial cell? The E. coli genome is not just naked DNA, but is packaged into a structure called the nucleoid. In addition, there are small circular DNAs that carry non-essential genes, such as antibiotic resistance genes, that are “free floating” inside the bacterium. Note: plasmids are NOT part of the E. coli genome. 48 Genome organization: Packaging and organization of the prokaryotic genome: Just based on physical dimensions, the E. coli genome is actually several orders of magnitude larger than the cell itself, so how is the genome stuffed into such a small space? EM picture of single E. coli genome 49 Genome organization: Packaging and organization of the prokaryotic genome: First, being a closed circular genome, the E. coli genome is supercoiled, which results in more compact dimensions. Second, multiple proteins have been discovered to fold and condense prokaryotic DNA. For example, E. coli DNA is wrapped around HU proteins, which are the most abundant proteins in the nucleoid. These DNA-protein complexes, together with Topoisomerase I and DNA gyrase, generate and maintain supercoiling of the genome. In addition, the supercoiled DNA/HU complexes form loops that radially extend from a central protein core. Artist’s vision of the E. coli genome 50 Genome organization: Packaging and organization of the prokaryotic genome: The packaging of the E. coli genome in such a way also affects transcription of the genes. During transcription, small regions of the chromosome can be seen projecting from the nucleoid into the cytoplasm, where they unwind and associate with ribosomes. Because there is no nuclear membrane that separates the nucleus and cytoplasm in bacterial cells, transcription and translation are directly coupled in bacteria such as E. coli. When transcription is inhibited, the chromosomal projections retreat back to the nucleoid, suggesting that the act of transcription affects the packaging/unfolding of the bacterial genome. 51

Molecular Biology I: Nucleic Acid Metabolism Lecture 07 PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue