PCR Outline PDF
Document Details
Uploaded by GoldenParallelism
Farmingdale State College
Tags
Summary
This document provides an outline of polymerase chain reaction (PCR). It covers different types of PCR, components, steps, and analysis. The outline is suitable for understanding the fundamentals of PCR.
Full Transcript
PCR OUTLINE: 1. Know what the difference between qualitative and quantitative PCR is Qualitative (regular) PCR- tells you ONLY if sequence is present Quantitative (real-time) PCR- tells you if a sequence is present AND how much of it is present. 2. Know what regular/traditi...
PCR OUTLINE: 1. Know what the difference between qualitative and quantitative PCR is Qualitative (regular) PCR- tells you ONLY if sequence is present Quantitative (real-time) PCR- tells you if a sequence is present AND how much of it is present. 2. Know what regular/traditional PCR is and its advantages/disadvantages a. Know if regular/traditional PCR is quantitative or qualitative Regular PCR- Regular or traditional Polymerase Chain Reaction (PCR) is a molecular biology technique used to amplify specific DNA sequences. It involves three main steps: 1. Denaturation: The double-stranded DNA is heated to separate it into single strands. Temp heat to 95C 2. Annealing: Short DNA primers bind to the target sequence at a lower temperature. Temp cool to 55C 3. Extension: A heat-stable DNA polymerase (like Taq polymerase) synthesizes a new DNA strand by extending the primers. Temp heat to 72C This cycle is repeated multiple times to exponentially amplify the DNA. Traditional PCR requires gel electrophoresis for analyzing the amplified product. Regular PCR advantages: - Speed and ease of use, can be accomplished in a few hours - Sensitivity, starting amount of DNA can be very small - Robustness, amplification possible from degreaged DNA or embedded, fixed samples Regular PCR Disadvantages: - Often requires some prior sequence knowledge - Short size of products, amplification of more than a few kb possible, but not easy to do - Infidelity of replication, some polymerases have no proofreading - Sensitivity, sometimes too sensitive a. Regular/traditional PCR is qualitative. 3. PCR Reaction a. Know the components needed for a PCR reaction and the role of each component - Template DNA:serves as the blueprint for the amplification process. It provides the specific DNA sequence that needs to be copied. - Primers: - DNA polymerase (needs a 3’-OH to build from); - Dictate where in the genome to amplify 3. Taq polymerase: a heat-stable DNA polymerase that plays a crucial role in the extension step of PCR. Its primary functions include: 1. Synthesizing new DNA strands: Taq polymerase adds nucleotides (dNTPs) to the 3' end of primers, extending them to form complementary strands of the template DNA. 2. Heat resistance: It remains functional at the high temperatures (around 72°C) used during the extension phase and survives the denaturation step (94-98°C), which would denature most other enzymes. 3. Has hand shape like other DNA polymerases; Requires bivalent cation (Mg2+) for function; needs primer to initiate synthesis from. 4. Nucleotides – A,T,G,C b. Know the steps of a PCR reaction and the temperatures needed for each step 1. Denature: Heat to 95∘C 2. Anneal: Cool to ~55∘C 3. Extend: Heat to 72∘C (for Taq) c. Know why Taq Polymerase is used for PCR Taq polymerase: a heat-stable DNA polymerase that plays a crucial role in the extension step of PCR. Its primary functions include: 4. Synthesizing new DNA strands: Taq polymerase adds nucleotides (dNTPs) to the 3' end of primers, extending them to form complementary strands of the template DNA. 5. Heat resistance: It remains functional at the high temperatures (around 72°C) used during the extension phase and survives the denaturation step (94-98°C), which would denature most other enzymes. Has hand shape like other DNA polymerases; Requires bivalent cation (Mg2+) for function; needs primer to initiate synthesis from. d. Know why PCR amplification is exponential Amplification is exponential because each product made can be used as a template in subsequent steps. e. Know why you see a decrease in PCR products after 30 cycles After 30 cycles see decrease in product primer concentrations drop as they are incorporated into PCR products dNTP concentrations drop as they are incorporated into PCR products reduction in Taq activity f. Primer design: Length: Usually about 20nt for target sequences in gDNA, less if target DNA is less complex i. Know the formula to calculate annealing temperature Formula to calculate ™ of primers: ™=2(A+T) + 4(G+C) ii. Know what should be avoided when designing primers Base Composition:Tandems repeats should be avoided. Each primer should have same ™ Secondary Structure: Avoid sequences prone to secondary structures such as hairpins 3’ End: Base complementarity of the last two bases of each primer is to be avoided as this will give primer dimers iii. Know what primer dimers are Primer dimers are unintended by-products in a PCR reaction that occur when primers bind to each other instead of to the template DNA. This can happen due to: 1. Complementary sequences within the primers: If two primers have regions that are complementary, they can anneal to each other. 2. Amplification of the primer-primer complex: Taq polymerase may extend these annealed primers, producing a short, non-specific product. At room temperature PCR reactions undergo nonspecific amplification Primers bind to DNA non-specifically Primer dimers form Primer dimers form when primers anneal to each other and amplify themselves (not sequences from your template DNA) g. Know what hot start methods are and why hot start is needed Hot start prevents non-specific amplification; Hot starts prevent polymerase from working until activated after first heat step i. Know how the two different hot start methods work Binding Protein Hot Start Method: Single stranded binding proteins bind to single stranded primers Primers are not able to anneal to each other or to template DNA Denaturation step inactivates binding proteins and primers are now available for priming specific DNA Antibody Hot Start Method: DNA polymerase is bound by an antibody Polymerase cannot function until antibody is removed First denaturation step (heating) of PCR reaction causes antibody to denature Prevents DNAP from amplifying primers that can anneal to each other at low temperatures h. Know what feature Pfu and Vent polymerases have that Taq doesn’t have Taq is most common but has no proofreading Pfu (Pyroccocus furiosus) and Vent (Thermococcus litoralis) 3’ to 5’ exonuclease proofreading activity (5 to 15x more faithful than Taq) Slower amplification speed I. Know the difference between genomic PCR and Reverse Transcriptase-PCR (RT-PCR) Genomic PCR vs. Reverse Transcriptase PCR (RT-PCR) Aspect Genomic PCR Reverse Transcriptase PCR (RT-PCR) Target Amplifies genomic DNA directly. Amplifies RNA, which is first Material converted into complementary DNA (cDNA).** Purpose Used to study DNA sequences, Commonly used to study gene genetic mutations, or DNA-based expression by detecting and pathogens. quantifying mRNA. First Step Denaturation of double-stranded RNA is reverse-transcribed into cDNA DNA. using reverse transcriptase enzyme. Enzyme Used Uses Taq polymerase or other Uses reverse transcriptase for cDNA DNA polymerases. synthesis, followed by Taq polymerase. Applications Genetic analysis, mutation Gene expression analysis, viral RNA detection, genotyping, and cloning. detection (e.g., in COVID-19 testing). Template Requires a DNA template. Requires an RNA template. Summary: Genomic PCR focuses on amplifying DNA sequences. RT-PCR is used to convert RNA into DNA and then amplify it, typically to study RNA-level gene expression or detect RNA viruses. i. Know what RT-PCR is and what cDNA is Reverse Transcriptase PCR (RT-PCR) is a technique used to amplify RNA by first converting it into complementary DNA (cDNA) using the enzyme reverse transcriptase. DNA content is the same in all cells, but RNA levels change By looking at RNA levels you can get a better understanding of what is going on in the cell/organism Ex. clinical research (ex tumor analysis from biopsy), understand mRNA and proteins during developmental stages, different tissue types etc. DNA Polymerase can’t amplify RNA must be converted to DNA for PCR amplification reverse transcriptase– complementary DNA (cDNA) made from mRNA cDNA can be amplified by DNA Polymerase Complementary DNA (cDNA) is a single-stranded DNA synthesized from an RNA template using the enzyme reverse transcriptase. It represents the coding sequences of a gene (exons) without introns, as it is derived from processed mRNA. ii. Know the enzyme that makes cDNA and why cDNA needs to be made -Reverse transcriptase makes cDNA -cDNA needs to be made for RT-PCR because RNA cannot be directly amplified by the polymerase chain reaction (PCR). iii. Know the types of primers are used for cDNA synthesis and what feature of mRNAs allows for its use Many RNA types in a cell (ex. rRNA, tRNA, mRNA, miRNA) In a typical animal cell ~80% of the RNA is rRNA; ~1-5% is mRNA mRNA used understand gene expression Only mRNA is translated into protein reverse transcriptase– DNA polymerase that uses ssRNA as template Only mRNAs get processed 5’ CAP and poly-A tail added introns removed Use a poly-T primer to initiate reverse transcription of mRNAs Complementary to poly-A tail Poly-A tail is present in ALL mRNAs cDNA made can be amplified by PCR using gene specific primers j. Know the most common way to analyze results from traditional PCR Agarose Gel electrophoresis Need to ensure your DNA amplified Run small amount of PCR sample on agarose gel Make sure there is product and it is correct size Make sure only one band on gel (one product amplified) 4. Real-time/quantitative (qPCR): a. Know what qPCR is and how it is similar and different to traditional PCR quantitative Polymerase Chain Reaction (qPCR) A method that allows one to follow in real time the amplification of a target Sometimes referred to as Real-Time PCR (RT-PCR) Don’t confuse with other RT-PCR (reverse transcription PCR) The target can be nucleic acids (RNA or DNA) – If target molecule is RNA then need reverse transcriptase to convert RNA into cDNA BEFORE performing qPCR Computer detects fluorescence and uses information to determine amount of starting sample in original reaction Quantitative PCR (qPCR) and traditional PCR are both methods for amplifying DNA, but they differ in their applications and how they measure the amplification process. Similarities: Both amplify DNA through a series of temperature cycles: denaturation, annealing, and extension. Both use DNA primers and DNA polymerase to facilitate amplification. Differences: 1. Quantification: ○ qPCR (also called real-time PCR) quantifies DNA in real-time during amplification by measuring fluorescence signals that correlate with the amount of DNA generated. ○ Traditional PCR only detects DNA after the amplification process, typically using gel electrophoresis to visualize the product. 2. Detection Method: ○ qPCR uses fluorescent dyes or probes to monitor the amplification process at each cycle, allowing the measurement of the initial amount of template DNA. ○ Traditional PCR uses end-point analysis, where the presence or absence of amplified DNA is detected after the reaction is completed. 3. Purpose: ○ qPCR is used for quantifying gene expression, viral load, or DNA amounts. ○ Traditional PCR is used for qualitative detection (presence/absence) of DNA or amplification of specific sequences. b. Know the advantages of using qPCR Ability to monitor the progress of individual cycles of amplification as they occur in real time Ability to precisely measure the amount of amplicon at each cycle, when compared to a known dilution standard, which allows highly accurate quantification of the amount of starting material in samples An increased dynamic range of detection Amplification and detection occur in a single tube, eliminating post-PCR manipulations c. Know what the computer detects to determine sample amount Computer detects fluorescence emitted from excited fluorophores. Detects fluorescent dye bound to amplified DNA or released during amplification d. Know the relationship of fluorescence to DNA amount Fluorescence increase correlates to the increasing amount of DNA in the reaction tube e. Know what stage of amplification is used during qPCR for analysis of results Exponential phase used for qPCR analysis as it provides precise and accurate data for quantitation Exponential phase is when the doubling of product occurs due to abundant and fresh reagents i. Know what stage of amplification is used for analysis of traditional PCR results Traditional PCR analyze results visualized in plateau stage f. Know what baseline and threshold refer to baseline - signal level during the initial PCR cycles, where there is little change in fluorescent signal – ~cycles 3 to 15 threshold - level of signal that reflects a statistically significant increase over the calculated baseline signal g. Know what the Ct value is and its relationship to starting DNA concentration Ct - cycle number at which the fluorescent signal of the reaction crosses the threshold used to calculate the initial DNA copy number Ct value is inversely related to the starting amount of target Ct increases as concentration of starting DNA decreases Takes longer for fluorescence to be detected with lower DNA starting amounts due to fewer target DNA molecules for initial amplification h. Know what RT-qPCR is and when it is used RT-qPCR is a technique that combines reverse transcription to convert RNA into cDNA, followed by quantitative PCR to measure the amount of cDNA (and thus gene expression) in real-time. Must create cDNA with reverse transcriptase before qPCR experiments are performed (RT-qPCR) Often looking to detect Differential Gene Expression for a single gene (gene of interest - goi) or multiple genes To detect changes in gene expression you MUST compare your goi amplification rates to a reference/housekeeping gene Expression from reference/housekeeping gene is constant and unchanged in ALL cell types under ALL conditions RT-qPCR is used for: – gene expression analysis – RNAi validation – microarray validation – pathogen detection – genetic testing – disease research etc i. Know what differential gene expression is Differential gene expression refers to the variation in the expression levels of genes between different conditions, tissues, or time points, indicating how genes are regulated in response to specific factors or stimuli. I. Know the two methods that are used for qPCR analysis Two methods of analysis: Standard curve method: more optimal if you have only a few genes to test; Need a dilution series for each DNA and primer pair; dilution series of known template concentrations;Can be used to: Determine initial starting amount of the target template in experimental samples, To assessing the reaction efficiency; R2 measures how well the data fits the standard curve, slope measures reaction efficiency, efficiency of 100% = –3.32,Need primer efficiency to be near 100% (R2 = 1) ΔΔCt method: caters to large amounts of DNA samples and number of genes to be tested; the difference between the ∆Ct values of the treated/experimental sample and the untreated/control sample; Used to compare Ct values of samples to a control; Most common use is to determine RNA levels in a cell; Ct values of samples and control are normalized to an appropriate endogenous housekeeping (reference) gene that has constant expression levels i. Know how the starting amount of DNA is reported for each method Tubes with lower Ct value have greater amounts of starting sample amount In standard curve method: Compare Ct values for unknown samples to results on the standard curve to determine amount of DNA in starting tube In ΔΔCt: by comparing the Ct (cycle threshold) values of the target gene in the experimental sample and a reference (control) sample j. Know what a standard curve is and its role in qPCR Standard Curve Method: Standard Curve - dilution series of known template concentrations Can be used to: Determine initial starting amount of the target template in experimental samples To assessing the reaction efficiency Slope, y-intercept, and correlation coefficient (R2 ) values are used to provide information about the reaction performance R 2 measures how well the data fits the standard curve slope measures reaction efficiency efficiency of 100% = –3.32 Make dilution series for each DNA sample and primer set The Ct values for known amounts of DNA are plotted Should obtain linear plot Need primer efficiency to be near 100% (R2 = 1) Calculated from slope of line in standard curve k. Know what the ΔΔCt method is and when it is used ΔΔCt Method: Relative quantification - expression of a gene of interest in one sample is compared to expression of the same gene in another sample – Example treated sample vs untreated sample The results are expressed as fold change, increase or decrease, in expression of the treated sample in relation to the untreated sample. Reference/houskeeping gene is used as a control to normalize for experimental variability – reference gene should be one that does NOT show expression changes in treated and untreated samples Use ∆∆Ct method ∆∆Ct is the difference between the ∆Ct values of the treated/experimental sample and the untreated/control sample Used to compare Ct values of samples to a control Most common use is to determine RNA levels in a cell Ct values of samples and control are normalized to an appropriate endogenous housekeeping (reference) gene that has constant expression levels ex. Look at genes involved in viral response by comparing RNA extracted from uninfected tissue (control) to RNA isolated from tissue infected with a virus (test) i. Know what a reference/housekeeping gene is and why they are necessary for detection of differential gene expression The expression level of a reference gene remain consistent under ALL experimental conditions and/or in different tissue types Reference gene normalize possible variations during: Sample prep & handling (e.g use the same number of cells from a start) RNA isolation (RNA quality and quantity) Reverse transcription efficiency across samples/experiments PCR reaction set up PCR reaction amplification efficiencies Reference genes are typically housekeeping genes: Ribosome subunits, Actin, Tubulin, Ubiquitin etc. Reference genes are necessary for the detection of differential gene expression because they serve as a normalization control to account for variations in the experimental process that are unrelated to the gene of interest. 1. Know what happens to gene expression for reference genes when under different experimental conditions The expression level of an RG remains the same under ALL experimental conditions and/or tissue types. l. Know how each fluorescent molecule is used in qPCR In qPCR (quantitative PCR), different fluorescent molecules or probes are used to monitor the amplification of DNA in real-time. These fluorescent molecules help measure the amount of DNA generated during each cycle of amplification, allowing for the quantification of the target DNA or cDNA. i. Know what dye-based and probe-based refers to Dye-based detection refers to the use of a fluorescent dye that binds to double-stranded DNA (dsDNA) during amplification. As DNA is synthesized during PCR, the dye binds to the growing double-stranded DNA, and the fluorescence emitted is measured during each cycle. Probe-based detection uses fluorescent probes that are specific to the target DNA sequence. These probes emit fluorescence upon binding to the target sequence and, in some cases, undergo a cleavage process during amplification, releasing the fluorescence. 1. Know which dyes are dye-based and which are probe-based Dye-base molecules SYBR Green Non-specific intercalating dye Probe-based molecules TaqMan Molecular Beacons Uses sequence-specific DNA probe ii. SYBR Green: 1. Know how SYBR green is used as a fluorescent dye in qPCR Dye Based Intercalating fluorescent dye No sequence specificity – binds to ANY dsDNA High fluorescence when bound to double stranded DNA Fluorescent dye that intercalates between DNA bases Exhibits low fluorescence when unbound in solution, but starts to fluoresce brightly when associated with double stranded DNA (dsDNA) and exposed to a suitable wavelength of light A) DNA is denatured and SYBR Green molecules are free in the reaction mix B) Primers anneal and SYBR Green molecules bind to any dsDNA C) DNA polymerase elongates the template and more SYBR Green molecules bind to the product formed resulting in exponential increase in the fluorescence level 2. Know what part of each PCR cycle fluorescence is detected Fluorescence detected after elongation phase 3. Know how it is different from TaqMan and Molecular Beacon probes No sequence specificity – binds to ANY dsDNA No DNA probe to give specificity – SYBR Green will bind to ANY dsDNA including primer-dimers and non-specific PCR products 4. Know advantages and disadvantages for SYBR green Advantages: simple PCR primer design no complex probe design ability to test multiple genes quickly without the need for multiple probes lower initial cost than probes Disadvantages: lack of specificity – (no probe) bind to all dsDNA, even nonspecific products (primer dimers) contribute to overall fluorescence and reduces the accuracy of quantification Can’t multiplex reactions - fluorescence signals from different products can’t be distinguished when using different PCR primers in the same reaction mixture to examine multiple genes in real-time PCR assays using SYBR® Green, it is necessary to set up parallel reaction mixtures with different PCR primer pairs in separate tubes, which can present a source of error in quantification 5. Know what melting analysis is a. Know what it detects and why it is needed when using SYBR Green Need to ensure fluorescence detected is due to specific amplification of your goi – perform melting curve analysis Non-specific PCR products (primer-dimers) will denature at different temperatures than the specific amplified goi – Primer-dimer peak is to the left of the peak for the specific amplified goi product iii. Know what FRET is and what fluorophores and quencher molecules are FRET: is the transfer of energy between two molecules, a fluorophore and a quencher Fluorophore: absorbs light energy at one wavelength and re-emits light energy at another, longer wavelength Quencher: accepts energy from fluorophore and then dissipates it without light emission When Flurophore and quencher are together on probe DNA the quencher accepts the energy released by the flurophore and no fluorescence is detected by the qPCR machine When the flurophore is separated from the quencher then its emission can be detected 1. Know which approaches use FRET Probe based qPCR: TaqMan Probes Molecular Beacons 2. Know what probe-based chemistry is why it is beneficial TaqMan uses probe based chemistry Uses probe-based chemistry that requires customized probes complementary to target DNA sequence to be amplified – gives greater specificity Probe has fluorescent dye and quencher next to each other During amplification Taq polymerase removes probe from DNA with 5’ 3’ exonuclease activity Fluorescent dye is moved away from quencher and now fluorescence is detectable iv. TaqMan probes 1. Know how TaqMan probes are used in qPCR A. Primers and probe anneal to target sequence. Fluorophore is excited by light and passes its energy to the quencher – no fluorescence is detected B. DNA polymerase extends the primer and encounters the probe, which is cleaved from the 5’-end releasing the fluorophore – It is no longer quenched and fluorescence can be detected C. DNA polymerase break down the whole probe from template and completes strand elongation – Machine detects fluorescence a. Know the role of DNA polymerase (and what domain) in TaqMan In TaqMan qPCR, the DNA polymerase (specifically Taq polymerase) synthesizes new DNA strands during amplification and uses its 5' to 3' exonuclease activity to cleave the TaqMan probe, separating the fluorescent reporter from the quencher, which enables real-time fluorescence detection. The 5' to 3' exonuclease activity of Taq polymerase responsible for cleaving the TaqMan probe is located in the nucleotidyl transferase domain of the enzyme. 2. Know what part of the PCR cycle fluorescence is detected Elongation phase 3. Know advantages and disadvantages Advantages: high specificity high signal-to-noise ratio multiplexing Disadvantages: cost of probes is high experimental design difficult v. Molecular Beacon probes 1. Know how Molecular Beacon probes are used in qPCR Also uses probe-based chemistry, so has high specificity The ends of probe have 5’ reporter (R, fluorescent dye) and 3’ quencher (Q) molecule – complementary stem sequences (~ 4 to 6 nt) at both ends of the single-stranded probe forms a hairpin-loop by H-bonding Having the reporter and quencher next to each other causes the quenching of the natural fluorescence emission of the reporter – The loop is ssDNA complementary to the target sequence During annealing phase the loop sequence can anneal to the target – R is now separate from Q and fluorescence can be detected Beacon becomes linear during denaturation stage During annealing stage – re-form hairpin loop OR – bind to target DNA During elongation temperature is too high and beacon is released Doesn’t interfere with Taq function Fluorescence is monitored and reported during each annealing step – when the beacon is bound to its complementary target fluorescence is detected only if beacon is bound to the target fluorescence detected is proportional to amount of target in reaction 2. Know what part of the PCR cycle fluorescence is detected Annealing step 3. Know advantages and disadvantages Advantages: highly specific (very little background) can be used for multiplexing molecular beacons are displaced but not destroyed during amplification Disadvantage: difficult to design stem of the hairpin must be strong enough that molecule will not spontaneously fold into non-hairpin conformations stem of the hairpin must not be too strong, or the beacon may not properly hybridize to the target vi. Know what multiplexing is and the advantage of multiplexing Multiple targets are amplified in a single reaction tube Each target gene is amplified by different primer pairs uniquely-labeled probe distinguishes each PCR product expression levels of several genes done quickly Control sample in same PCR tube Multiplexing in qPCR allows for the simultaneous detection and quantification of multiple target sequences in a single reaction, improving efficiency, saving resources, and enabling direct comparisons across targets. m. Know what emulsion PCR is and how it is different from traditional and qPCR Clonal bead populations are generated in water-in-oil microreactors Each emulsion contains ONE bead and ~ONE dsDNA molecule with adaptors on each end – Also contains, enzyme, dNTPs, and primers Used in digital PCR and next-generation sequencing library preparation Steps: Denature - separate dsDNA into ssDNA Annealing - ssDNA attaches to bead; primer anneals to other adaptor sequence Extension copies DNA Repeat 30 – 60 X until bead covered with same DNA Emulsion PCR is different from traditional and qPCR in that it involves amplifying DNA within microscopically isolated droplets (or emulsions) in a water-oil mixture, which ensures that each droplet contains a single DNA molecule, leading to clonal amplification. This contrasts with traditional PCR, where all reactions occur in a single solution, and qPCR, which monitors DNA amplification in real-time but typically without such isolation, making emulsion PCR ideal for applications like next-generation sequencing and high-throughput screening. i. Know the general idea on how emulsion PCR works and its benefits Emulsion PCR works by partitioning a DNA sample into thousands of tiny, water-in-oil droplets, where each droplet contains a single DNA molecule along with the necessary PCR reagents. This ensures that each droplet acts as an individual PCR reaction chamber, amplifying a single DNA molecule. The resulting amplicons from each droplet are then pooled together for further analysis, often for applications like sequencing. Benefits: 1. Clonal Amplification: Each droplet contains a single DNA molecule, ensuring that only one copy of a DNA sequence is amplified per droplet, leading to highly specific, clonal amplification. 2. High-throughput: Allows for the parallel amplification of many different DNA molecules simultaneously, which is efficient for large-scale sequencing or screening. 3. Reduced Cross-contamination: The isolation of reactions in separate droplets minimizes the risk of cross-contamination between reactions. 4. Increased Sensitivity: The compartmentalization can improve detection of rare sequences or low-abundance targets. BRIEF EXPLANATION: Emulsion PCR amplifies DNA in isolated droplets, each containing a single DNA molecule, ensuring clonal amplification. Its benefits include high-throughput, reduced cross-contamination, and increased sensitivity for detecting rare sequences. ii. Know what droplet digital PCR is Droplet Digital PCR (ddPCR™) improves qPCR sensitivity Template DNA is distributed randomly into ~20,000 droplets Each droplet undergoes PCR amplification and analysis separately Uses fluorescent dyes droplets are then individually counted and scored as positive or negative for fluorescence digital PCR does not rely Ct values but uses Poisson statistics (probability) to determine the absolute template quantity 1. Know why it is better than traditional qPCR Digital droplet PCR (ddPCR) is better than traditional PCR because it offers higher precision and sensitivity by partitioning the sample into thousands of individual droplets, allowing for absolute quantification of DNA without the need for standard curves. This results in improved detection of low-abundance targets, reduced variability, and more accurate measurements, especially for rare mutations or low copy number samples. DNA SEQUENCING OUTLINE: 1. Know the three different generations of sequencing technology The three generations of sequencing technology are: 1. First Generation (Sanger Sequencing): ○ Developed in the 1970s by Frederick Sanger, this method is based on chain termination. It involves using dideoxynucleotides (ddNTPs) to terminate DNA strand elongation at specific bases, allowing for the sequence to be determined by electrophoresis. It is highly accurate but labor-intensive and has a high cost per base. Ideal for small-scale sequencing projects like sequencing individual genes 2. Second Generation (Next-Generation Sequencing, NGS): ○ NGS technologies, like Illumina and Roche 454, massively parallelize sequencing, allowing millions of DNA fragments to be sequenced at once. It uses techniques like sequencing-by-synthesis and is faster and more cost-effective than Sanger sequencing, although it requires more computational power for data analysis. Suitable for large-scale genomic projects such as whole genome sequencing, transcriptome analysis (RNA-Seq), 3. Third Generation (Long-Read Sequencing): ○ Technologies like Pacific Biosciences (PacBio) and Oxford Nanopore provide long-read sequencing, where DNA strands are read continuously rather than in fragments. These methods offer the advantage of sequencing longer stretches of DNA, which is useful for assembling genomes and detecting structural variants. They also offer faster turnaround times and potentially lower costs per base than second-generation methods.Best for projects requiring long-read sequences to assemble complex genomes, resolve structural variants, or study repetitive regions of DNA. a. Know when each sequencing type would be used First Generation (Sanger Sequencing): Use Case: Ideal for small-scale sequencing projects like sequencing individual genes or validating findings from larger sequencing studies. It's highly accurate but slower and more expensive for large genomes. Examples: Targeted gene sequencing, validation of NGS results, and sequencing of small genomes. Second Generation (Next-Generation Sequencing, NGS): Use Case: Suitable for large-scale genomic projects such as whole genome sequencing, transcriptome analysis (RNA-Seq), and metagenomics. NGS is used when high throughput is needed, and cost-efficiency is a priority. Examples: Whole-genome sequencing, exome sequencing, large cohort studies, and microbiome analysis. Third Generation (Long-Read Sequencing): Use Case: Best for projects requiring long-read sequences to assemble complex genomes, resolve structural variants, or study repetitive regions of DNA. These technologies are useful when the need for long, uninterrupted reads outweighs cost or speed concerns. Examples: De novo genome assembly, structural variant detection, and sequencing of complex or repetitive regions (e.g., centromeres, telomeres). b. Know the major differences between each generation (read length, accuracy) First Generation (Sanger Sequencing): Read Length: Typically 500-1,000 base pairs per read. Accuracy: Very high, with error rates usually below 0.1%, making it one of the most accurate methods. Notes: Best for small-scale projects where high accuracy is critical. Second Generation (Next-Generation Sequencing, NGS): Read Length: Short reads, typically 50-300 base pairs. Accuracy: Generally high, but error rates can be higher than Sanger, especially in complex regions (around 1-2%). Notes: Offers high throughput and cost-efficiency but sacrifices some read length and accuracy compared to Sanger. Third Generation (Long-Read Sequencing): Read Length: Can produce much longer reads, ranging from several kilobases to over 100,000 base pairs (depending on the technology). Accuracy: Generally lower accuracy than Sanger and NGS, with error rates ranging from 5-15%, though improvements are ongoing. Notes: The long reads are beneficial for assembling genomes and resolving complex regions, despite the trade-off in accuracy. 2. First Generation – Sanger (dideoxy) and cycle sequencing Allows for sequencing of up to 1,000 nt per reaction Can perform multiple reactions per day Sanger sequencing more popular method – still used today a. Know how Sanger/dideoxy sequencing works DNA polymerase synthesize second DNA strand – DNA polymerase always adds new bases to the 3’ end of a primer that is base-paired to the template DNA. – DNA polymerase is modified to eliminate its editing function Uses chain terminator nucleotides: small amounts of dideoxy nucleotides (ddNTPs), which lack the –OH group on the 3' carbon of the deoxyribose – DNA polymerase inserts a ddNTPs into growing DNA chain nothing else can be added to its 3' end 4 separate reactions: DNA polymerase starts creating the second strand beginning at the primer – Primer is labeled with radioactivity When DNA polymerase reaches a base for which some ddNTP is present, the chain will either: 1. terminate if ddNTP is added or 2. continue if the normal dNTP is added – based on ratio of dNTP to ddNTP in tube Different lengths will form (depending on when the ddNTP was added) Run each of the reactions in a separate lane of a gel b. Know what ddNTPs are and what specifically is missing from them ddNTPs are dideoxy nucleotides and lack the -OH group on the 3’ carbon of the deoxyribose i. know what happens when a ddNTP is incorporated and how it is used for sequencing When a ddNTP is incorporated during sequencing, it terminates the DNA strand because it lacks a 3' hydroxyl group, preventing further elongation. In Sanger sequencing, ddNTPs are mixed with regular nucleotides (dNTPs) and labeled with different fluorescent dyes. As DNA is synthesized, the incorporation of a ddNTP stops the chain at specific points. These terminated fragments are then separated by size, and the sequence is determined by reading the positions of the ddNTPs. c. Know all components are needed for a Sanger sequencing reaction and their role Needs: 1. Single stranded DNA template- serves as the blueprint for synthesizing the complementary strand. It provides the sequence information that will be read during the sequencing process. 2. A primer- to initiate DNA synthesis 3. DNA polymerase-synthesize the complementary strand to the single-stranded DNA (ssDNA) template 4. Deoxynucleoside triphosphates (dNTPs) and dideoxynucleotide triphosphates (ddNTPs): dNTPs role-are the building blocks that DNA polymerase uses to extend the complementary strand of the ssDNA template. ddNTPs role- terminating DNA strand elongation i. Know how many tubes are needed for a single sequencing reaction 4 separate reactions 1 tube (reaction) for each ddNTP Each reaction has all 4 dNTPs AND a small amount of one of the ddNTPs 1. Know what is the same and different for each tube All the same dNTPs and different ddNTPs d. Polyacrylamide gel Polyacrylamide gel electrophoresis-- good resolution of fragments differing by a single nucleotide Primers are radioactively labeled See different sizes of each fragment (band) by exposing gel to film (autoradiography) Read the sequence from the bottom up i. For each DNA molecule sequenced 1. Know how the samples are loaded onto a gel Samples are loaded into each well with tracking dye 2. Know how many wells and what is in each well 4 wells each well has a dNTP, ex. Well 1 (A), Well 2 (T), etc. ii. Know how a DNA sequence is read from a gel Read from bottom up, fragments get larger as you read up iii. Know approximately the length of sequence that is read per DNA sample For sanger sequencing- 500-1000bp e. Know why radioactivity is needed To detect DNA fragments during electrophoresis i. Know what component of the sequencing reaction is radioactivity labeled Primers are radioactively labeled f. Know what cycle sequencing is Advantages: don’t need a lot of template DNA Disadvantages: Sometimes DNA polymerase incorporates ddNTPs poorly i. Know the similarities and differences between the original Sanger sequencing protocol and cycle sequencing Differences from dideoxy sequencing: Uses dsDNA as starting material Uses PCR protocol for reaction steps Only 1 tube needed Doesn’t need radioactivity – uses fluorescent ddNTPS ii. Know how fluorescence is used in cycle sequencing Fluorescence – ddNTPs chemically synthesized to contain fluorescence – Each ddNTP fluoresces at a different wavelength – Automated sequencers use 4 different fluorescent dyes as tags attached to the dideoxy nucleotides and run all 4 reactions in the same lane of the gel – Run on capillary gel – require only a tiny amount of sample to be loaded, run much faster than slab gels, best for high throughput sequencing iii. Know the type of gel used and how many wells the sample is loaded into Capillary gel 4 wells iv. Know how the sequences are determined In cycle sequencing, the sequence is determined through the following steps: 1. Template Preparation: The DNA template is denatured into single strands, and a primer is added for DNA polymerase to start synthesizing the complementary strand. 2. Reaction Setup: The reaction mixture contains dNTPs (for normal extension) and small amounts of labeled ddNTPs (for chain termination). Each ddNTP is labeled with a different fluorescent dye. 3. DNA Synthesis: DNA polymerase synthesizes the complementary strand, randomly incorporating ddNTPs, which cause chain termination at specific bases. 4. Capillary Electrophoresis: The fragments are separated by size in capillaries. The fluorescence emitted by the labeled ddNTPs is detected. 5. Sequence Determination: The fluorescence signals are used to determine the order of bases (A, T, C, G), and the final sequence is assembled based on the detected colors. g. Know what universal/internal primers are and when and why they are needed Universal primers- match vector sequence; Don’t need to know anything about sequence of DNA to be used in reaction Internal primers- are created as sequence information is generated ;Allows for sequencing large inserts without significant subcloning h. Know what the human genome project was One of the largest scientific endeavors Started in 1990 by DoE and NIH $3 Billion and 15 years Goal was to identify 25K genes and 3 billion bases Used the Sanger (cycle) sequencing method Draft assembly done in 2000, complete genome by 2003, last chromosome published in 2006 Sanger sequencing generates sequences less than 1 kb (1,000 nt) in length – the human genome has ~3,200,000,000 nt!??? – how was it done? Used Hierarchical Shotgun Sequencing – Use Genomic library (in BACs) with large cloned inserts – Fragment each BAC and subclone into another vector Use universal primers to sequence smaller cloned fragments Reassemble the sequences based on sequence overlaps to obtain the original long sequence Public Project Hierarchical shotgun approach Large segments of DNA were cloned via BACs and located along the chromosome BACs were mapped to the human genome Each BAC was fragmented randomly – fragmented and sequenced multiple times to obtain overlapping sequences construct a complete sequence Use sequences for all of the BACs to assemble a complete genome i. Know what hierarchical shotgun sequencing is and how it was used to sequence the human genome Randomly sheared DNA into small pieces subclone into a vector Genomic library of subfragments is sequenced at random – use a universal primer directing sequencing from within the cloning vector These sequence reads are then assembled into contigs, and the complete sequence of the clone generated Sequencing reactions are performed with a universal primer on a random selection of the clones in the shotgun library 3. Next generation sequencing a. Know what it is and the major differences from Sanger/dideoxy/cycle sequencing i. Know approximately how many different DNA sequences are read per run for next generation compared to Sanger/cycle sequencing 1st Generation: dideoxy/Sanger Single DNA sequenced at a time 500=100bp read length 2nd Generation: Short-read/Illumina 10 – 50 million DNAs sequenced at once very low error rate (>1%) Read length depends on platform used but around 100-44bp 3rd Generation: Long-read TruSeq Synthetic Long-read** Oxford Nanopore sequencing 1 – 150 million DNAs sequenced at once with high error rate (5 – 10%) PacBio single-molecule real-time (SMRT) sequencing 400,000 – 5 million DNAs sequenced at once with lower error rate (>1%) Depends on platform used can range from 2000-100000bp ii. Know what sequencing-by-synthesis is and which sequencing approaches use it Sequencing-by-synthesis (SBS) is a method of DNA sequencing where DNA synthesis is monitored in real-time to determine the sequence of nucleotides. During SBS, a DNA polymerase enzyme synthesizes a complementary strand using the template DNA, and each incorporation of a nucleotide is detected as it happens. Sanger/Dideoxy chain termination – 1st Generation Reversible terminator (Illumina) – 2nd Generation Ion Torrent (Life Technologies) – 2nd Generation Zero Mode Waveguide (Pacific Biosciences) - 3rd Generation b. 2nd Generation/Short-read sequencing Illumina - uses reversible terminators, short-reads (25 kb), error rate fluctuates ( 900 kb), high error rate (5 – 15%) i. Know the features the different 2nd generation methods have in common Creates short reads Requires amplification of DNA before sequencing Massively parallel ii. Know general steps most methods use: fragmentation, add adaptors, amplify all of the different fragments of DNA (ePCR or bridge PCR), sequence each amplified product simultaneously (massively parallel) Sequence hundreds of different DNA samples at one time Massively parallel 1. Fragment DNA sample into small pieces and add adaptors on ends 2. Use emulsion PCR or bridge PCR to amplify one DNA molecule numerous times 3. Sequence each amplified product simultaneously DNA is broken into small fragments Adaptors (known sequences) are ligated onto the ends Adaptor sequence is used to prime the amplification Use emulsion PCR to make hundreds of copies of ONE DNA molecule onto ONE bead OR Use bridge PCR to make hundreds of copies of ONE DNA molecule in ONE region of a plate Each bead or cluster is sequenced 1. Know what adapters are and why they are needed Adapters- known sequences; short, synthetic DNA sequences that are ligated to the ends of DNA fragments during sequencing preparation They are needed for primer binding, amplification, multiplexing, and because they enable DNA fragments to bind to the sequencing surface which is necessary for sequencing to occur. Used to prime the amplification iii. Illumina: 1. Know the basic concept of how it works: 1) Library Preparation: break DNA into small (~100 – 200 bp) fragments, add adaptors onto both ends of all DNA fragments 2) Cluster Amplification: uses bridge PCR to amplify many different single DNA molecule many times. Each part of the bridge PCR plate has different DNA sequence so numerous different DNA molecules are sequenced at one time. 3) Sequencing: each of the four DNA bases is attached to one of four different fluorescent dyes and also has a reversible terminator that only allows addition of one nucleotide per cycle. All dNTPs are added at once. If the nucleotide is added then fluorescence is detected (unincorporated dNTPs are washed away). The reversible terminator and fluorescence are then removed to allow for addition another round of dNTPs. 4 steps: 1. Library Preparation 2. Cluster Amplification 3. Sequencing 4. Alignment and Data Analysis Uses modified dNTPs (terminator which blocks further polymerization) Similar to Sanger sequencing only a single base can be added by a polymerase enzyme to each growing DNA copy strand Terminator also contains fluorescent label (detected by camera) Conducted simultaneously on millions of different template molecules (150 – 250 million clusters sequenced simultaneously) All 4 bases are added The complementary dNTPs is added to each template and unincorporated ones are washed away Images are recorded and terminators are removed (reversible terminators) Next cycle of bases are added this method uses the basic Sanger idea of “sequencing by synthesis” of the second strand of a DNA molecule Starting with a primer, new bases are incorporated one at a time, fluorescent tags used to determine which base was added – Each nucleotide has a different fluorescent color The reversible terminator (–OR) blocks 3’-OH Next base can only be added when the fluorescent tag and terminator is removed The cycle is repeated 50-100 times Massively Parallel system: 1st – shear DNA into small pieces and ligate adaptors onto ends – Adaptors provide universal sequences for the primers to bind to 2nd – make many copies of DNA molecule by bridge PCR 3rd – sequence each cluster of amplified DNA by reversible termination 2. Know what bridge PCR is and how it works flow cell surface is coated with single stranded oligonucleotides complementary to sequences of the adapters added to DNA ssDNA fragments (with adapters) are spread onto flow cell and exposed to reagents for polymerase-based extension Priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary primer on the surface Repeat steps and get localized amplification of single molecules in millions of unique locations across the flow cell surface BRIEF EXPLANATION: Bridge amplification is a process in Illumina sequencing where DNA fragments are immobilized on a flow cell and amplified to form clusters of identical copies. This amplification ensures high signal intensity and enables parallel sequencing of millions of fragments, improving throughput and accuracy. a. Know why it is needed Bridge amplification is needed in Illumina sequencing to generate dense clusters of identical DNA fragments on the flow cell, enabling high-throughput sequencing. This amplification increases signal strength and improves accuracy by allowing for multiple readings of the same fragment. b. Know where the primers are located and what the DNA has to do to be primed by these primers Primers are located on the flow cell surface specifically on adapter sequences that are ligated to the ends of DNA fragments For DNA to be primed they must have adapter sequences ligated to the ends and must be immobilized on the surface of the flow cell. 3. Know what reversible terminators are and their role in the reaction 1. Immobilize sequencing templates and primers on a glass plate 2. Add modified dNTPs - if corresponding nucleotide is present primer will be extended by one base only due to terminator 3. Detect fluorophore attached to base only if it’s incorporated (unincorporated nucleotides washed away) 4. Remove fluorescent tag and the 3’-O (terminator) 5. Wash and repeat steps 2-4 with different modified dNTP Reversible terminators are specialized nucleotides used in some sequencing technologies, such as Illumina sequencing, to control the addition of nucleotides during the sequencing process. 4. Know advantages/disadvantages Advantages: Can generate millions to billions of reads in single run High accuracy Cost effective versatile Disadvantages: Short read lengths Can have bias towards or against regions with extreme GC content Limited long read capability iv. Ion Torrent PGM 1. Know the basic concepts of how it works: 1) Library Preparation: break DNA into small (~100 – 200 bp) fragments, add adaptors onto both ends of all DNA fragments 2) emulsionPCR (ePCR) to amplify single molecules many times onto one bead. 3) Sequencing: each bead is placed in a separate well and all beads are sequenced simultaneously. Beneath the well is an ion-sensitive layer and a sensor. Add one nucleotide at a time (one after another). If the nucleotide added is incorporated into the growing DNA strand a hydrogen ion is released and the change of the pH of the solution is detected by the sensor underneath the well. The number of nucleotides added is determined by the peak intensity. Emulsion PCR is used to make beads with the SAME DNA molecule amplified Each bead is sequenced-by-synthesis – Incorporation of new nucleotide is detected by release of H+ Hydrogen ion is released when a nucleotide is incorporated into a strand of DNA by a polymerase Machine detects change of the pH of the solution When nucleotide is added to DNA by polymerase a hydrogen ion is released Place beads on high-density array of wells – each well has different bead (DNA template) Beneath each well is an ion-sensitive layer and a sensor Sequentially flood chip with 1 nucleotide after another If nucleotide is added, a hydrogen ion is released and pH change of the solution is detected a. Know what emulsion PCR is Clonal bead populations are generated in water-in-oil microreactors Each emulsion contains ONE bead and ONE dsDNA molecule with adaptors on each end One adaptor has streptavidin on one side of the ssDNA and anneals to bead, a second adaptor binds to the other end of the ssDNA – primer anneals to this adaptor Also contains, enzyme, dNTPs, and primers Denature - separate dsDNA into ssDNA Annealing - ssDNA attaches to bead; primer anneals to other adaptor sequence Extension copies DNA i. Know how it is different from regular PCR Emulsion PCR involves amplifying DNA fragments in isolated microdroplets, each containing a single DNA molecule, enabling parallel amplification of millions of fragments. In contrast, regular PCR amplifies all DNA templates together in a single reaction mixture without spatial separation. ii. Know why it is needed Emulsion PCR is needed in Ion Torrent PGM (Personal Genome Machine) to generate millions of copies of DNA fragments that can be attached to beads for sequencing. This process allows for the amplification of individual DNA molecules in separate droplets, ensuring that each bead carries a single DNA fragment, which is crucial for high-throughput sequencing and accurate detection of nucleotide incorporation during the sequencing process. 2. Know advantages/disadvantages of Ion Torrent sequencing Advantages: Speed No fluorescent labels Real time sequencing, can give quick results Disadvantages: Shorter read lengths Struggles with accurate base calling in homopolymer regions (stretches of the same base) Limited throughput (number of generated sequences) 3. Know what major error occurs most frequently by this method Inaccurate base calling because it often struggles to accurately detect the length of homopolymer regions leading to misinterpretation of base calls. Insertion/deletion errors- homopolymer regions can cause insertion or deletion errors as well v. Know why the error rate is lower for 2nd generation sequencing approaches Longer reads make assembly of sequences easier Have fewer gaps in sequence Different long read sequencing technologies available ONT and PacBio Generate reads > 10 kb without needing to amplify first Sequence each molecule only once Leads to higher error rate Developing techniques to read each sequence more than 1x to improve error rate Often has a tradeoff that read length is not as long Barcoding, Bell-template c. 3rd Generation Pacific Biosciences- read length is limited by longevity of the polymerase average 30-kb polymerase read length library insert sizes amenable to SMRT sequencing range from 250 bp to 50 kbp Oxford Nanopore (ONT)- mostly limited by ability to deliver very high-molecular weight DNA to the nanopore sequencing provides the longest read lengths read length from 500 bp to the current record of 2.3 Mb! The library insert sizes average 10–30-kb ONLY approach that can directly sequence RNA i. Know the features the different 3rd generation methods have in common Creates long reads NO amplification of DNA needed before sequencing Sequence single DNA molecules ii. Pacific Biosciences (PacBio) 1. Know the basic concepts of how it works: Single DNA molecules (no amplification or adaptor ligations are needed) are loaded onto ZMW wells (each ZMW well has a different DNA template). Single Molecule Real Time (SMRT) technology used: All nucleotides are present at the same time. Each nucleotide is attached to one of four different fluorescent dyes. When a nucleotide is being incorporated by the DNA polymerase, the fluorescent dye is brought close to a detector and the base call is made according to the corresponding fluorescence of the dye. During phosphodiester bond formation, the fluorescent tag is cleaved off and diffuses out of the observation area of the ZMW where its fluorescence is no longer observable. Creates long reads NO amplification of DNA needed before sequencing – Sequence single DNA molecules Massively parallel – A SMRT cell contains 150,000 – 1 million ZMWs Only about ½ of the ZMW wells produce a read – Each ZMW reads a DIFFERENT DNA molecule One run lasts 0.5–4 h 1st long read method - 2011 Uses Single Molecular Real-Time (SMRT) real-time technology and does not require DNA amplification Each chip has waveguides – a 100 nm hole to watch DNA polymerase perform sequencing by synthesis – Phospholinked nucleotides are labeled with colored fluorophores A single DNA polymerase enzyme is affixed at the bottom of a ZMW with a single molecule of DNA as a template ZMW: structure that can observe only a single nucleotide of DNA being incorporated by DNA polymerase Each of the four dNTPs is attached to a different fluorescent dye While the correct nucleotide is being added the fluorescent tag comes close to the ZMW and the base can be identified Once the correct nucleotide is incorporated the fluorescent tag is cleaved off and diffuses out of the observation area of the ZMW can generate continuous long reads (CLR) or high-fidelity (HiFi) reads CLR sequences a SMRTbell template with a >30 kb DNA insert (yellow for forward strand, dark blue for reverse strand) Due to large insert size, polymerase often only reads template 1x HiFi reads are generated by circular consensus sequencing (CCS) of SMRTbell template with 10–30 kb DNA insert smaller insert size allows the polymerase to make several passes around the SMRTbell template 2. Know advantages/disadvantages Advantages: Long reads (up to 100,000+ bp) High accuracy for long reads Can detect structural variations like insertions or deletions because of long read lengths Disadvantages: Higher cost Fewer reads generated in a given frame High error rate in raw reads 3. Know why it has such a high error rate Highly error prone due to the speed of the polymerase and difficulty in calling each nucleotide so quickly – Circularizing DNA reduces error rate 4. Know what changes were made to create high fidelity reads Improved accuracy by using the bell template approach – hairpin adaptors circularize DNA Size of DNA sequenced is lower – Same DNA sequenced multiple times – Sequence differences in ALL reads will be considered real – Obtain consensus sequence a. Know the limitation that comes with the approach Limited to shorter fragments iii. MinION Oxford Nanopore Technology (ONT) 1. Know the basic concepts of how it works: Single DNA (or RNA) molecules are sequenced (no amplification or adaptor ligations are needed). The DNA (or RNA) is loaded onto a flow cell that has two chambers (cis and trans). The flow cells are filled with ionic solutions and each chamber is separated by membrane with a nanopore. Over 2,000 nanopores are present therefore over 2,000 different DNA molecules can be sequenced at one time. Each nucleic acid moves through the pore by a motor protein and current shifts are recorded (each nucleotide has a different current recording). The current recordings are reported as a squiggle plot and the sequence is determined. MinION Oxford Nanopore Technology (ONT) can sequence over 2,000 DNA molecules at a time Multiple sequences can pass through a single nanopore Does NOT require DNA amplification step Can sequence DNA and RNA Has variable read lengths High error rate (over 10%) Flow cell has 2 chambers (cis & trans) filled with ionic solutions Chambers separated by membrane with a nanopore (blue) The MinION has 1 flow cell that contains 512 nanopore channels Helicase motor protein move nucleic acids through pore only 1 strand passes through pore current shifts are recorded and correspond to different nucleotides (squiggle plot) current is measured by a sensor several thousand times per second graphically represented in ‘squiggle plot’ can generate long or ultra-long reads Need high-molecular-weight (HMW) DNA use commercially available DNA extraction kit (generates long (10–100 kb) reads) or via traditional protocol (phenol-chloroform extraction generates ultra-long (>100 kb) reads) 2. Know advantages/disadvantages Advantages: Ultra long reads Real time sequencing Portable and scalable Disadvantages: Higher error rate Lower throughput Quality of reads can vary significantly 3. Know how they improved the error rate of this system Ways to lower error rate: – Pass second strand (if dsDNA) after 1st strand sequenced – Barcode samples iv. Know what barcodes are, why they are needed and when they are used Samples from a particular treatment or source can be marked with a unique barcode (or index) reads will be assigned to particular DNA sequence/source Needed to distinguish between different samples or DNA fragments to lower error rate Used when multiplexing multiple samples in a single sequencing run v. Know why the error rate is high for 3rd generation sequencing approaches The error rate is high in third-generation sequencing (e.g., PacBio and Oxford Nanopore) due to several factors: 1. Long Reads: These technologies generate long reads, which are more prone to errors, especially in regions with repetitive sequences or homopolymers. Longer reads make it harder to maintain accuracy over large stretches of DNA. 2. Detection Mechanisms: Third-generation technologies often rely on single-molecule sequencing and novel detection methods (e.g., measuring ion flow or current changes), which can be less precise than traditional optical methods used in second-generation sequencing. 3. Raw Data Quality: The raw data produced by these technologies can have a higher base calling error rate, although techniques like circular consensus sequencing (CCS) for PacBio can improve accuracy. vi. Know the different uses for next-generation sequencing DNA-seq – De novo sequencing of genomes – Clinical sequencing of individuals – whole bacterial genome sequencing in a single run – metagenomics: sequencing DNA from environmental samples – rare variants in single amplified region (tumors/viral infections) RNA-seq – global transcriptome analysis ChIP-seq* – Determines genome-wide location of protein-DNA interactions Bisulfite sequencing* – Identify DNA methylation 1. Know what RNA-seq is and what information it provides Used for global transcriptome analysis – replacing microarrays Next-gen sequencing of ALL cDNAs under different conditions Each library (condition) is barcoded Shows non-protein coding RNA transcripts (like microRNA’s) that had been missed before Match reads to genome and calculate # of reads per gene Normalize the results RPKM (Reads Per Kilobase Million) is a common normalization method which attempts to adjust for sequencing depth and gene length More reads = more RNA sequences and more RNA expression from that gene no (or fewer) reads = low/no RNA sequences present and no/low expression from gene d. Assembly i. Know some of the challenges/problems with DNA sequence assembly In principle, assembling a sequence is just a matter of finding overlaps and combining them In practice: most genomes contain multiple copies of many sequences there are random mutations (either naturally occurring cell-to-cell variation or generated by PCR or cloning) there are sequencing errors and misreadings sometimes miscellaneous junk DNA gets sequenced Repeat sequence DNA is very common in eukaryotes, and sequencing highly repeated regions (such as centromeres) remains difficult even now High quality sequencing helps a lot: small variants can be reliably identified. Sequencing errors, bad data, random mutations, etc. were originally dealt with by hand alignment and human judgment. However, this became impractical when dealing with the Human Genome Project. This led to the development of automated methods ex. phred/phrap programs ii. Know the following terms: read, contig, scaffold and consensus sequence Scaffold- made of contigs and gaps Contig- series of overlapping DNA sequences used to make a physical map that reconstructs the original DNA sequence Consensus sequence- a sequence of DNA or RNA that represents the most common nucleotide at each position across a set of aligned sequences. It is derived by comparing multiple sequences from different sources or samples and selecting the nucleotide that appears most frequently at each position, providing a representative or "consensus" version of the original sequences. Read- a short segment of DNA or RNA that is sequenced during a sequencing process. It represents a portion of the genetic material, typically ranging from a few dozen to several hundred base pairs, depending on the sequencing technology used. Reads are later assembled to reconstruct longer DNA sequences, such as entire genomes. 1. Know which are the smallest to biggest in length From smallest to biggest: -Read -Contig -Scaffold -Consensus sequence 2. Know the basic concept on how reads from the sequencer are assembled Find the overlapping DNA sequences for each of the reads from the sequencer Assemble into contigs- series of overlapping DNA sequences used to make a physical map that reconstructs the original DNA sequence Use the contigs to make a scaffold- made of contigs and gaps iii. Know what sequence coverage is and its relationship to read length sequence coverage – amount of sequence reads needed to generate a high-quality assembly of the genome; coverage needed differs for different sequencing platforms Longer read lengths can improve sequencing assembly by covering more DNA in a single read, while higher coverage (more reads per base) ensures greater accuracy and reduces errors in sequencing. iv. Know how read-length, coverage and quality affect assembly of sequences Read length, coverage, and quality all play crucial roles in the assembly of sequences: Read length: Longer reads help span repetitive regions and complex areas more effectively, reducing gaps and improving the assembly process. Coverage: Higher coverage (more reads per base) increases the chances of correctly sequencing every part of the genome, reducing errors and ensuring a more complete assembly. Quality: Higher quality reads with fewer errors result in more accurate assemblies, reducing the need for error correction and leading to a more reliable final sequence. v. Know what contig N50 is and what it tells you about the assembly of the sequence Describe ‘completeness’ of genome tells you about the distribution of contig lengths Line up contigs in assembly in order of sequence lengths. Longest contig first, then second longest, etc. Add up lengths of all contigs from the beginning, until you’ve reached the number that makes up 50% of your total assembly length. Length of contig you stopped counting at, is your N50 number Prokaryotic Transcription and Control Outline: 1. Prokaryotic Gene Structure a. Know where transcription and translation occur in prokaryotes In prokaryotes transcription and translation both occur in the cytoplasm b. Know what gene expression is Gene expression - when a gene is converted into mRNA (often translated into protein) i. Know what polycistronic expression is Prokaryotes use polycistronic expression Use only ONE promoter for expression of SEVERAL genes ONE mRNA contains multiple genes that are translated into individual proteins Polycistronic expression in prokaryotes refers to the ability of a single mRNA molecule to encode multiple proteins. This occurs because prokaryotic genes are often organized into operons—clusters of genes under the control of a single promoter. c. Know the three phases of transcription Initiation – RNAP binds to the promoter DNA sequence Elongation – RNAP adds rNTPs onto the 3’ end of the growing RNA strand Termination – RNAP stops transcribing RNA and dissociates from the DNA d. E. coli RNAP DNA-dependent RNA polymerase Very large protein complex ~400kD i. Know the subunits that make up the core polymerase and the holoenzyme Holoenzyme: 6 subunits: 2-𝝰, 1-𝛽 , 1-𝛽 ’, 1- ⍵, and 1-𝜎 Core polymerase: 5 subunits (2-𝝰, 1-𝛽, 1-𝛽’, 1-⍵ ) All except 𝜎 subunit The 𝜎 subunit binds weakly and can be dissociated from the core polymerase Core can perform RNA synthesis but lacks specificity Unable to recognize promoter sequences 𝜎 factor stimulates DNA binding in a sequence specific manner by recognizing promoters Core enzyme is less stable (t1/2=< 1 min) than holoenzyme (t1/2 = 30 to 60hrs) 1. Know the role of the 𝝰, 𝛽 and 𝛽‘ and 𝜎 subunits 𝝰- two alpha subunits in core pol and holoenzyme, they are involved in enzyme assembly and help in binding regulatory proteins C-terminus ( CTD) interacts with other proteins or DNA 𝛼 𝛼 CTD binds UP element (DNA sequence) increasing transcription 30x up element – DNA upstream of rRNA core promoter (-40 to -60) Positive transcription regulators also bind 𝛼 CTD (cAMP-CAP) 𝛽- plays a key role in the catalytic activity of RNA synthesis, binds to the nucleotide substrates during transcription 𝛽’- responsible for binding to the DNA template, helps to form the active site for RNA synthesis Involved in initiation & elongation steps form a claw-like structure which grasps DNA 𝛽’ critical to phosphodiester bond formation Catalytic center located between the two claws Catalytic center is marked by Mg2+ ion 𝜎- a detachable subunit that helps the RNA polymerase recognize and bind to the promoter regions, allows for specificity Holoenzyme required for specific initiation while core enzyme responsible for elongation of RNA transcript. ii. Know the shape of the RNAP Crab claw Has 2 α, 1 β, 1 β’ and 1 ω subunit β and β’ make up pincers have catalytic core e. Know the different features of a prokaryotic gene Promoter – DNA sequence used by RNAP to know what DNA sequence/strand to transcribe 5’ UTR (5’ untranslated region/leader) begins at TSS until the translation start site (important for efficient translation of mRNA); Made into RNA, but not translated into protein Coding region - DNA translated into protein 3’ UTR (3’ untranslated region/terminator) begins after stop codon and includes signals for termination of transcription (and RNA processing if eukaryote) i. Know what coding sequence, open reading frame and untranslated regions are Coding Sequence (CDS): The portion of a gene or mRNA that is translated into a protein. It begins with a start codon (usually AUG) and ends with a stop codon. Open Reading Frame (ORF): A continuous stretch of codons in the mRNA that starts with a start codon and ends with a stop codon. An ORF represents a potential coding region within the genetic sequence. Untranslated Regions (UTRs): Segments of mRNA that are not translated into protein but play regulatory roles. These include: 5' UTR: Located upstream of the start codon, helps regulate translation initiation. 3' UTR: Located downstream of the stop codon, involved in mRNA stability, localization, and translation efficiency. ii. Know what upstream and downstream are referring to In molecular biology, upstream and downstream refer to positions on a DNA or RNA molecule relative to a specific point of reference, often the transcription start site (TSS) or the start codon. Upstream: Refers to sequences before (5' direction) the reference point. Typically includes promoters and regulatory elements that control transcription. Example: In a gene, sequences upstream of the TSS influence how and when transcription begins. Downstream: Refers to sequences after (3' direction) the reference point. Includes the coding sequence, terminators, and regulatory regions such as the 3' UTR. Example: In mRNA, downstream sequences after the start codon are translated into protein. iii. Know what the TSS is Transcription start site The transcription start site (TSS) is the specific location on the DNA where RNA polymerase begins synthesizing the RNA transcript. It is the first nucleotide that is transcribed into RNA. Labeled as +1 Marks the beginning of the 5’ untranslated region (5’ UTR) iv. Know what a promoter is Promoter: region immediately upstream of the Transcription Start Site (TSS) 2 important promoter sequences – -10 sequence - TATA Box or Pribnow’s Box – -35 sequence Promoter sequences determine which strand is the template strand Promoters - DNA sequences that are bound by the RNA polymerase first RNA Polymerase still needs to build RNA in 5’ to 3’ direction (build off the 3’-OH) Template strand is 3’ – 5’ v. Know what 5’UTR and 3’UTR are and what their functions are 5' UTR: Located upstream of the start codon, helps regulate translation initiation. begins at TSS until the translation start site (important for efficient translation of mRNA) Made into RNA, but not translated into protein 3' UTR: Located downstream of the stop codon, involved in mRNA stability, localization, and translation efficiency. begins after stop codon and includes signals for termination of transcription (and RNA processing if eukaryote) f. Know what template strand and coding strands are Template (antisense) strand - the sequence of DNA that is copied during the synthesis of mRNA – Complementary to mRNA made Coding (sense) strand - opposite strand (strand with a base sequence similar to the mRNA sequence) – Same sequence as mRNA made (except mRNA has U instead of T) i. Know which one the RNA polymerase uses to determine the RNA sequence DNA template strand ii. Know what direction each strand in the DNA is going (5’-->3’ or 3’ --> 5’) Template- 3’->5’ Coding- 5’->3’ iii. Know what determines the direction of transcription Template and coding strand orientation, promoter sequence The direction of transcription is determined by the orientation of the promoter on the DNA and which strand RNA polymerase binds to. Promoter: Contains specific sequences (e.g., -10 and -35 regions in prokaryotes) that signal RNA polymerase where to start transcription and in which direction to move. Template Strand: RNA polymerase reads the template strand in the 3' to 5' direction, ensuring RNA is synthesized in the 5' to 3' direction. 2. Prokaryotic Transcription a. Know the regions commonly found in E. coli promoters 2 important promoter sequences – -10 sequence TATA Box or Pribnow’s Box -35 sequence i. Know what a consensus sequence is consensus sequence - order of most frequent residues found at each position in sequence alignment A consensus sequence is a sequence of DNA or RNA that represents the most common nucleotides found at each position in a set of related sequences. It reflects the typical or optimal sequence for binding proteins or enzymes, such as RNA polymerase or transcription factors. ii. Know what promoter strength means and what dictates it Promoter strength = amount of transcript made Stronger promoters make more transcript, weaker promoters make less transcript The number of nucleotides that match the consensus sequence determine the strength of the promoter b. Transcriptional Initiation The core polymerase can synthesize RNA, but the sigma factor gives specificity to which DNA is transcribed Regions of σ: – Arrows show regions of σ factor that recognize specific promoter regions σ and α subunits recruit RNA polymerase core enzyme to the promoter – whereas σ regions 2 and 4 recognize the –10 (σ2 ) and –35 (σ4 ) αCTD (Carboxy-terminal domain of α subunit) recognizes UPelement (if present) – increases polymerase binding i. Know what the -10 and -35 sequences and UP element are and their roles in transcription In prokaryotes, the -10 and -35 sequences and the UP element are important components of the promoter region that help RNA polymerase initiate transcription. -10 Sequence (Pribnow Box): Location: About 10 bases upstream of the transcription start site. Sequence: Typically "TATAAT". Role: This sequence is recognized by the RNA polymerase sigma (σ) factor, helping to position the enzyme for the start of transcription. -35 Sequence: Location: About 35 bases upstream of the transcription start site. Sequence: Typically "TTGACA". Role: The sigma factor also recognizes this sequence, aiding RNA polymerase in binding and initiating transcription. UP Element: Location: Found upstream of the -35 sequence in some promoters. Sequence: AAT, or a similar motif. Role: The UP element helps increase RNA polymerase binding affinity, enhancing transcription efficiency, especially for highly expressed genes. ii. Promoter Recognition 1. Know what proteins and regions bind to each DNA sequence within a promoter (Up element, -10 and -35 sequences) 2. Know what the sigma factor is and its role in transcription initiation Sigma factors (𝜎 ) recognize promoter sequences 𝜎 – 𝜎70 primary sigma factors recognizes most promoters The closer the promoter sequence matches the consensus the stronger the sigma factor binds – more transcript made (higher gene expression) σ factor occupies the RNA exit channel that is used for RNA molecules larger than 10 nt For an RNA chain to be made longer than 10 nucleotides, σ must be ejected (may take several attempts – produces abortive transcripts) To eject the σ region it needs to be more weakly associated with the elongating enzyme than it is with the open complex iii. Know the difference between the closed and open complex Closed complex –polymerase initially binds to the DNA – DNA is double stranded Open Complex – dsDNA is unwound around transcription start site (~13 bp) – Forms transcription bubble – Initially many small transcripts made - Abortive synthesis – Transcripts less than 10 nt iv. Know how the open complex is formed Two bases in the non-template strand of the –10 element flip out and insert into pockets within the σ protein Causes opening of the DNA double helix to reveal the template and non-template strands This ‘melting’ occurs between positions –11 and +2, with respect to the transcription start site Promoter melting is the separation of the dsDNA Does NOT require ATP 1. Know if ATP is needed Not needed to form open complex 2. Know what promoter melting is Promoter melting is the separation of the dsDNA 3. Know what abortive synthesis is and why it happens During transcription initiation, RNA polymerase produces and releases short RNA transcripts of 40nt cystosine-rich region Rho utilization (rut) site ATPase activity allows for rho to move along RNA Upon reaching RNAP, Rho unwinds the RNA-DNA hybrid releasing the RNA Rho-dependent termination occurs when the Rho protein binds to the RNA transcript and moves toward the RNA polymerase. Once Rho catches up with the polymerase at a pause site on the RNA, it causes the polymerase to dissociate from the DNA, terminating transcription. This process requires the Rho protein and a specific rut (Rho utilization) site on the RNA. a. Know what Rho is Rho- a hexameric protein with ATPase activity binds to ssRNA just prior to the termination -Binds at >40nt cystosine-rich region (rut site) Rho is a protein involved in Rho-dependent termination of transcription in prokaryotes. It binds to the RNA transcript at specific Rho utilization (rut) sites and moves along the RNA towards the RNA polymerase. Once Rho catches up with the polymerase, it causes the release of the RNA transcript, terminating transcription. b. Know what type of activity the Rho protein has ATPase activity c. Know what rut sites are Rho utilization site; >40nt cytosine-rich region RUT (Rho utilization) sites are specific sequences in the RNA transcript that are recognized by the Rho protein during Rho-dependent termination of transcription. These sites are typically rich in cytosine (C) and poor in guanine (G), and they serve as the binding site for Rho, which helps terminate transcription by causing RNA polymerase to release the RNA. d. Know what happens when Rho reaches the RNA:DNA hybrid Rho unwinds the RNA-DNA hybrid releasing the RNA 3. Know how Rho-independent (intrinsic) terminators work Termination is signaled by GC-rich inverted repeats(~20nt) followed by 8-10 ‘A’ residues encoded within DNA Inverted repeats form stem/hairpin loop in RNA transcript Causes disruption of elongation complex polyA’s transcribed into polyU sequence in RNA rU-dA hybrid is unstable The hairpin destabilizes the RNA-DNA hybrid physically pulling the RNA out of the polymerase Rho-independent (intrinsic) terminators work by forming a hairpin loop structure in the RNA transcript followed by a series of uracil (U) nucleotides. The hairpin causes RNA polymerase to pause, and the weak U-A base pairs in the RNA-DNA hybrid cause the RNA to dissociate from the DNA template, terminating transcription. a. Know the types of sequences present in them Inverted repeat sequences: These are regions in the RNA that are complementary to each other, allowing the RNA to form a hairpin loop structure. This loop causes RNA polymerase to pause. Poly-U sequence: Following the hairpin, there is a stretch of uracil (U) nucleotides in the RNA. The weak U-A base pairs cause the RNA to dissociate from the DNA template, completing transcription termination. b. Know what type of secondary structure they form Rho-independent (intrinsic) terminators form a hairpin loop secondary structure in the RNA. This structure is formed by inverted repeat sequences in the RNA that base-pair with each other, creating a stable stem-loop structure. The hairpin causes RNA polymerase to pause, and the subsequent poly-U tail causes the RNA to dissociate from the DNA, terminating transcription. c. Know the role of the A:U hybrid The A:U hybrid plays a crucial role in Rho-independent termination. After the RNA forms a hairpin loop, the RNA polymerase pauses at the terminator. The RNA then contains a stretch of uracil (U) nucleotides that pair with adenine (A) nucleotides on the DNA template. The A:U base pairs are weak, which facilitates the dissociation of the RNA from the DNA, leading to the termination of transcription. d. Know how secondary structure and A:U hybrid cause termination In Rho-independent termination, the hairpin loop formed by inverted repeats causes RNA polymerase to pause. The subsequent A:U hybrid (a stretch of uracil in the RNA pairing with adenine in the DNA) is weak, causing the RNA to dissociate from the DNA and terminating transcription. 3. Prokaryotic Control of Gene Expression a. Know the different ways to regulate gene expression in bacteria Transcriptional control: regulatory proteins affect the ability of RNAP to bind to or transcribe a particular gene *most common control because Provides the cell with correct amount of gene product at the correct time – At transcription initiation (most common) – At elongation or termination – Regulation from the transcribed RNA itself Translational control: various proteins may affect rate of translation or enzyme affect stability of mRNA transcript Post-translational control: translated protein may be modified by phosphorylation, which can change its folding and/or activity i. Know what mechanism is most common and why Transcription regulation most common Provides the cell with correct amount of gene product at the correct time – RNAP can transcribe any gene with a functional promoter – More complex organisms have more complex transcriptional regulation Promoter strength one important level of regulation Other types of regulation target gene regulation: – transcription initiation (most common) – elongation or termination – Regulation from the transcribed RNA itself b. Know what regulatory proteins and regulatory sequences are Regulatory sequences: specific DNA regions where regulatory proteins bind – Types of regulatory proteins: Activators and Repressors Transcription regulators bind specific DNA sequences through interaction with the major groove sequence-specific binding AT and GC base pairs have available H bond donors and acceptors Regulatory proteins are proteins that control the expression of genes by interacting with DNA or other proteins. They can act as activators (increasing gene expression) or repressors (decreasing gene expression) by binding to specific DNA sequences, such as promoters or enhancers, or by interacting with RNA polymerase. These proteins help coordinate cellular processes and respond to environmental signals. i. Know what region of the DNA regulatory proteins typically bind to Regulatory sequences ii. Know that regulatory proteins bind to specific DNA sequences At major groove Regulatory proteins bind to specific DNA sequences in regulatory regions, such as promoters, enhancers, or operators, to control gene expression. By binding to these sequences, they can either activate or repress transcription, influencing how genes are expressed in response to various signals. c. Know the definition of operon, inducible, repressible, negative regulation, positive regulation Inducible - gene regulated by presence of substrate Repressible - gene regulated by its product Negative regulation- DNA-binding protein prevents gene expression Positive regulation - DNA-binding protein required for transcription d. Know what activator and repressor proteins are and their overall role in controlling gene expression 2 types of regulatory proteins: activators and repressors – Activator - protein that turns on transcription - DNA binding proteins that recognize specific sequences near the genes they control -Positive regulator - Increases the amount of RNA transcript produced -Activators can interact with target proteins (such as RNAP) when they are close by or far away from each other Cooperative binding – binding of the activator ‘recruits’ the RNAP to the correct DNA region When they are far apart DNA looping brings them closer together Repressor – protein that reduces or stops transcription -Negative regulator -Decreases or eliminates RNA transcript production i. Know what stage of transcription they typically control Transcription initiation e. Know what basal level of transcription and constitutive transcription are Without an activator or repressor, RNA polymerase binds weakly to the promoter and initiates basal levels (low levels) of transcription Makes constitutive transcript Weak binding because promoter elements are imperfect or missing Repressor binding blocks the polymerase from binding to the promoter Basal level of transcription refers to the low, unregulated level of gene expression that occurs in the absence of any specific activators or repressors. It is the default transcription activity of a gene. Constitutive transcription refers to the continuous, unregulated expression of a gene, meaning it is always transcribed at a relatively constant level, regardless of environmental conditions. f. Know what Catabolic (inducible) and anabolic (repressible) operons are Inducible (ex. Lac operon) – Generally make enzymes need for catabolism -set of metabolic pathways that breaks down molecules into smaller units that are either oxidized to release energy, or used in other anabolic reactions – Only turned on if substrate is present – Substrate (what is metabolized) is inducer of the operon Repressible (ex. Trp operon) – Generally enzymes involved in anabolism - set of metabolic pathways that construct molecules from smaller units – Feedback inhibition - End product is repressor (co-repressor) of the operon Catabolic (break down) Anabolic (build) i. Know which category the lac and trp operons each fall into Lac operon- inducible (catabolic) Trp operon- repressible (anabolic) g. Lac operon Glucose is the preferred energy source for E. coli, less energy to break down Lactose is a sugar that can be used as a food source ONLY used if glucose is not present Expression of the lac operon is tightly controlled Two regulatory (trans-acting) proteins involved in turning on/off lac operon gene expression CAP (catabolite activator protein) Lac repressor Both are DNA-binding proteins that bind at or near the lac operon promoter For lac operon expression, there must be activation by cAMP-CAP as well as removal of the lac repressor from the operator i. Know the role of the lac operon The lac operon is a genetic system in E. coli that controls the breakdown of lactose into simpler sugars for energy. It consists of three genes (lacZ, lacY, and lacA) that encode enzymes necessary for lactose metabolism. The operon is regulated by the presence or absence of lactose and glucose. When lactose is available, the operon is activated, allowing the enzymes to be produced; when lactose is absent, the operon is turned off to conserve resources. ii. Know the conditions when the lac operon is turned on/off Lac operon is turned off (repressed) if: 1. there is no lactose OR 2. glucose is present Lac operon is turned on in: 1. Presence of lactose 2. Low glucose levels iii. Know the three genes controlled by the operon and their roles The lac operon encodes 3 genes, whose products transport and break down lactose Lactose enters the cell through lactose permease Lactose is converted to glucose and galactose by β-galactosidase ONE promoter that is used by ALL 3 genes lac promoter expresses 3 genes as a single mRNA The lac operon controls three genes: 1. lacZ: Encodes β-galactosidase, cleaves lactose (sugar) into galactose and 𝛽 glucose (energy sources) 2. lacY: Encodes lactose permease, protein that inserts into cell membrane and transports lactose into the cell 3. lacA: Encodes thiogalactoside transacetylase, rids the cell of toxic thiogalactosides that also get transported in by lacY product iv. Know what cis-acting sequences and trans-acting factors are cis-acting DNA element - short DNA sequence that acts as a binding site for a protein that has an affinity for that specific sequence – Ex: operator of lac operon or activator (CAP) site sequences trans-acting factor - protein that controls expression of a gene at a separate location by binding to cis-acting DNA element – Ex: repressor or CAP protein 1. Know examples of each for the lac operon Cis acting example- operator of lac operon or activator (CAP) site sequences Trans-acting example- repressor or CAP protein v. Repression 1. Know what the operator sequences are and where they are located operator - short region of DNA (cis-sequence) partially within the promoter, interacts with a regulatory protein that controls transcription of the operon Operator has inverted repeats - each repeat binds one repressor Most repressor proteins bind DNA as a dimer to each operator Operator DNA overlaps the promoter DNA therefore the repressor interferes with transcription initiation of RNAP a. Know what protein bind to it Regulatory protein i. Know the cellular conditions when the proteins bind When lactose is absent, when glucose is abundant ii. Know how binding affects transcription and why Interferes with transcription initiation of RNAP because operator DNA overlaps promoter 2. Repressor Protein Expressed from the lacI gene (has a constitutive promoter) lacI gene is NOT part of operon Protein made is in the active form, meaning it can bind to operator Inactivated by allolactose/lactose Has helix-turn-helix motif (two 𝛼 helices) recognition helix fits into the major groove of the DNA 2nd helix sits across the major groove, ensures recognition helix is properly placed auxillary operator upstream of the major operator 2 lac repressors bind to each operator sequence (repressor bound as tetramer) a. Know if it is positive or negative control of the lac operon Negative control with repressor protein Ensures that lac operon is only expressed when lactose is present AND glucose is absent CAP watches glucose levels Binds DNA and ACTIVATES lac operon gene expression in ABSENCE of glucose Lac repressor watches lactose levels Binds DNA and PREVENTS lac operon gene expression in the ABSENCE of lactose b. Know what gene makes the repressor protein Expressed from lacI gene c. Know where the gene is located (is it part of the operon or not) lacI is not part of the lac operon d. Know what type of promoter the repressor gene has Constitutive promoter, constant unregulated expression or gene meaning gene is transcribed constantly e. Know if the repressor protein is made in the active or inactive form Active, meaning it can bind to operator f. Know what allolactose is and when it is present Same as lactose As soon as lactose is transported into the cell it is transformed into allolactose Allolactose is an isomer of lactose that acts as an inducer for the lac operon. It is produced when lactose is metabolized by the enzyme β-galactosidase. Allolactose binds to the lac repressor, causing it to undergo a conformational change and release from the operator region, thus allowing transcription of the lac operon genes. Allolactose is present when lactose is available in the cell. g. Know what happens to the repressor protein when allolactose binds Binding of allolactose (lactose isomer) to the repressor causes conformational change (allosteric change) and the repressor is no longer able to bind to the operator Allosteric change i. Know what an allosteric change is Allosteric change refers to a change in the shape or conformation of a protein that occurs when it binds to a specific molecule (often called an allosteric effector) at a site other than its active site. This change can alter the protein's activity, either activating or inhibiting its function. Allosteric regulation is common in enzymes and regulatory proteins, allowing for more precise control of cellular processes. ii. Know what IPTG is Synthetic inducer of lac operon, molecular mimic of allolactose a synthetic molecule that mimics lactose and is used to induce the lac operon in bacterial cells. It binds to the lac repressor, causing it to release from the operator region, thereby allowing transcription of the genes in the operon. h. Know how allolactose binding to the repressor protein affects expression of the lac operon structural genes When allolactose binds to the lac repressor protein, it causes a conformational change that prevents the repressor from binding to the operator region of the lac operon. This allows RNA polymerase to bind to the promoter and transcribe the operon’s structural genes (lacZ, lacY, and lacA), enabling the cell to metabolize lactose. vi. Know what the CAP protein is Catabolite Activator Protein (CAP)- interacts with the 𝛼- CTD and recruits the RNAP to the promoter Stabilizes the binding of polymerase to the promoter CAP requires cAMP for it to be able to bind to the CAP site; cAMP causes CAP to bind to CAP site; interacts with the alpha subunit of RNAP and increases affinity of RNA polymerase cAMP only present in the cell when glucose levels are low cAMP allosterically activates CAP protein to bind RNA polymerase binds the lac promoter poorly in absence of CAP protein – 10 and –35 region of the lac promoter is not optimal and the promoter lacks an UP-element 1. Know if it is positive or negative control of the lac operon Positive control with CAP protein 2. Know where it binds and under what conditions it can bind Binds to CAP site only when activated by cAMP 3. Know the role of cAMP in CAP activation Requi