Lecture 5: Sequencing Technology PDF
Document Details
Uploaded by ThrillingLesNabis
University of Massachusetts Amherst
Tags
Summary
This lecture provides an overview of sequencing technology, including Sanger sequencing, and Next-Generation Sequencing (NGS). It details the objectives, methods, and applications of each approach. The document also describes various steps in the process of NGS.
Full Transcript
Lecture 5: Sequencing Technology Objectives: Distinction between Sanger sequencing and deep sequencing or NGS The Advent of PCR has made sequencing much easier Anything from short fragments to whole genomes can be sequenced Gives cluse as to polymorphism, mutations, evo...
Lecture 5: Sequencing Technology Objectives: Distinction between Sanger sequencing and deep sequencing or NGS The Advent of PCR has made sequencing much easier Anything from short fragments to whole genomes can be sequenced Gives cluse as to polymorphism, mutations, evolution Sequencing methods commonly involves the following Enzymatic addition of a new nucleotides to a template, complementary to the sequence of interest Use of DNA polymerase to add new nucleotides Identification of added bases Sanger Sequencing Addition of using dNTP in the reaction mix, use ddNTP Each ddNTP is labeled with fluorescent dye o The lack of hydroxyl group, no more new nucleotides can be added – the chain stops o Creates “short” DNA fragment of all possible length that stops at each nucleotide o Only one ddNTP labeled per reaction ▪ All bands/peaks that you identify in this reaction correspond to a fragment that stops at given ddNTP ▪ Have to separate the bands and simply “read” the sequence Starting from the bottom (shortest fragment = basically the size of the primer) Causes of Bad Sequencing Reads Not enough template Bad primer DNA is not pure Next Generation Sequencing Illumina, 454, nanopore technologies Used for whole genome sequencing, de novo sequencing, RNA sequencing A set of advanced sequencing technologies that allow for the rapid sequencing of large amounts of DNA and RNA. Unlike traditional Sanger sequencing, which sequences one fragment at a time, NGS can generate millions of sequences in parallel, making it a high- throughput and cost-effective approach to genomics. Multi step process of Next Generation Sequencing 1. DNA Fragmentation Lecture 5: Sequencing Technology o Description: The genomic DNA is broken down into smaller, manageable pieces, usually ranging from 200 to 600 base pairs o Methods: Could be accomplished through various techniques such as mechanical shearing (sonication or nebulization) or enzymatic digestion using restriction enzymes o Purpose: Allows for easier handling of the DNA and ensures that the resulting sequences can be amplified and analyzed effectively 2. Phosphorylation of the DNA fragment o Description: The ends of the fragmented DNA are phosphorylated by adding phosphate groups using a kinase enzyme (such as T4 polynucleotide kinase). o Purpose: Phosphorylation is crucial for the subsequent ligation of adapters, as it provides the necessary 5' phosphate group required for the formation of phosphodiester bonds during ligation 3. Ligation of adapters o Description: Short, double-stranded DNA sequences known as adapters are ligated to both ends of the phosphorylated DNA fragments. Each adapter is designed to be complementary to the PCR primers used in the amplification step. o Purpose: Adapters serve multiple purposes: ▪ They allow for the binding of the DNA fragments to the sequencing platform. ▪ They contain sequences necessary for amplification during PCR. ▪ They can have unique indices (barcodes) that enable multiplexing, allowing multiple samples to be sequenced in the same run. 4. Size selection o Description: This step involves isolating DNA fragments within a specific size range. Size selection can be achieved using various methods, including gel electrophoresis or magnetic beads. o Purpose: Ensuring that only fragments of the desired size are used for sequencing improves the quality and accuracy of the sequencing data. It helps to avoid the bias introduced by very small or very large fragments, which could affect amplification and sequencing efficiency. 5. Fragments Capture on Oligo-Coated Slides o Description: The prepared DNA library, which now contains the adapter-ligated fragments, is applied to a slide coated with oligonucleotides complementary to the adapter sequences. o Purpose: This step allows the fragments to adhere to the surface of the slide, facilitating the amplification of each fragment in situ. It ensures that the sequencing reactions can occur efficiently on a solid support, allowing for high-throughput sequencing. 6. DNA polymerase synthesize the complementary strand o Description: In this step, the attached DNA fragments undergo a process known as Bridge PCR. During this process, the DNA polymerase synthesizes the complementary strand of the immobilized DNA fragments. Lecture 5: Sequencing Technology o Mechanism: ▪ The bound DNA is denatured, creating single-stranded templates. ▪ Primers complementary to the adapter sequences anneal to the single- stranded DNA. ▪ DNA polymerase extends these primers, synthesizing new strands and creating double-stranded DNA. ▪ The single-stranded templates form a "bridge" as the newly synthesized strand remains attached to the surface, creating a loop that bridges back to the oligonucleotide-coated slide. o Purpose: This amplification step increases the quantity of each individual fragment, resulting in clusters of identical DNA molecules (each cluster originating from a single fragment). These clusters are essential for ensuring that the signal during sequencing is strong enough to be detected. 7. Addition of Sequencing Primers and Obtaining Reads o Description: After many cycles of amplification through Bridge PCR, sequencing primers, which are also part of the adapter sequences, are introduced to the clusters of DNA. o Sequencing Process: ▪ Each cluster is sequenced by adding a mixture of labeled nucleotides (A, T, C, and G), where each nucleotide has a distinct fluorescent dye. ▪ As each nucleotide is incorporated into the growing complementary strand, a fluorescent signal corresponding to the added nucleotide is emitted. ▪ A camera detects these signals, and the sequence of nucleotides is recorded based on the fluorescence emitted at each cycle. o Purpose: This step allows for the reading of the DNA sequence, one base at a time, producing high-resolution sequence data for the entire fragment. The use of single nucleotide fluorescent addition enables real-time monitoring of the sequencing process and allows for accurate determination of the DNA sequence. Genome Sequence Assembly Genome sequence assembly is the process of reconstructing the original genome from short DNA sequences (reads) obtained through sequencing technologies. The assembly process can vary significantly based on the availability of reference genomes. 1. Aligning Reads to a Reference Genome Easy Process: When a known reference genome is available, aligning sequencing reads to this reference is straightforward. Method: Bioinformatics tools (e.g., Bowtie, BWA) can be used to map the reads directly to the reference sequence, allowing for the identification of variants, structural changes, or regions of interest. Lecture 5: Sequencing Technology Advantages: This method benefits from the pre-existing genomic information, making the assembly process faster and more accurate. 2. De Novo Assembly Without a Reference Genome Contig Formation: If no reference genome is available, reads must be assembled by identifying and matching overlapping sequences (ends) to create longer contiguous sequences called contigs. Process: Software tools (e.g., Velvet, SPAdes) analyze the overlaps between reads and stitch them together, forming contigs that represent segments of the genome. Challenges with Repeats: The presence of repeat sequences complicates this process. Repeats can cause ambiguous overlaps because multiple contigs may arise from the same region, leading to difficulties in accurately reconstructing the genome. Adaptation 1. Single Reads vs. Paired Ends Single Reads: This involves sequencing DNA from one end of the fragment, resulting in a single sequence read for each DNA fragment. Paired-End Reads: In this method, DNA fragments are sequenced from both ends. This results in two reads that are a known distance apart, providing additional context about the sequence and its location in the genome. Advantages of Paired Ends: Paired-end sequencing improves the assembly process by offering better alignment information and resolving ambiguities, especially in repetitive regions. 2. Depth of Coverage Definition: Depth, also known as coverage, refers to how many times each nucleotide of the genome is represented across the sequencing reads. Importance: Higher depth increases the confidence in the accuracy of the assembled sequences. It helps to identify true variants and ensures that low-quality regions are correctly represented in the assembly. Implication: Each nucleotide will appear multiple times in the reads, providing redundancy that helps to filter out sequencing errors and improve the reliability of the assembled genome. Examples of Sequencing Experiments and Their Uses 1. Whole Genome Sequencing (WGS) Lecture 5: Sequencing Technology Description: WGS involves sequencing the entire genomic DNA of an organism. Uses: ▪ Genomic Research: Identifying genetic variants, including SNPs, insertions, and deletions across an entire genome. ▪ Evolutionary Studies: Comparing genomes of different species to study evolutionary relationships. ▪ Personalized Medicine: Understanding genetic predispositions to diseases, guiding personalized treatment plans. 2. Exome Sequencing ▪ Description: This targets the exons, or coding regions, of genes, which represent about 1-2% of the genome but contain the majority of known disease-related variants. ▪ Uses: ▪ Disease Diagnosis: Identifying mutations associated with genetic disorders, especially in cases where other tests have failed. ▪ Cancer Genomics: Discovering mutations in cancer-related genes for targeted therapies. 3. RNA Sequencing (RNA-Seq) ▪ Description: RNA-Seq is used to analyze the transcriptome by sequencing the cDNA generated from RNA. ▪ Uses: ▪ Gene Expression Analysis: Measuring the expression levels of genes across different conditions or time points. ▪ Alternative Splicing: Identifying different isoforms of genes resulting from alternative splicing events. ▪ Non-coding RNA Detection: Studying the role of non-coding RNAs in gene regulation. 4. Targeted Sequencing Description: This focuses on specific regions of the genome, such as selected genes or panels of genes, using capture techniques. Uses: ▪ Cancer Genotyping: Assessing mutations in cancer-related genes to inform treatment decisions. ▪ Hereditary Disease Studies: Analyzing known disease-associated genes in patients to diagnose genetic disorders. 5. Metagenomic Sequencing Description: This involves sequencing the collective genomes of microbial communities directly from environmental samples. Uses: Lecture 5: Sequencing Technology ▪ Microbial Diversity Studies: Investigating the diversity and function of microbial populations in different environments (e.g., gut microbiome, soil microbiome). ▪ Pathogen Discovery: Identifying pathogens in clinical samples without prior culturing. 6. ChIP-Seq (Chromatin Immunoprecipitation Sequencing) Description: ChIP-Seq is used to analyze protein-DNA interactions by sequencing DNA that is bound by specific proteins (e.g., transcription factors). Uses: ▪ Gene Regulation Studies: Identifying binding sites of transcription factors and understanding regulatory networks. ▪ Epigenomic Profiling: Mapping histone modifications to study chromatin states and their influence on gene expression. 7. Bisulfite Sequencing Description: This technique determines the methylation status of DNA by treating it with sodium bisulfite, which converts unmethylated cytosines to uracil. Uses: o Methylation Analysis: Studying DNA methylation patterns associated with gene regulation and epigenetic changes in development and disease. 8. Single-Cell RNA Sequencing (scRNA-Seq) Description: This allows for the sequencing of RNA from individual cells, providing insights into cellular heterogeneity. Uses: o Cellular Diversity: Understanding the differences in gene expression between individual cells within a population. o Developmental Biology: Tracking cell lineage and differentiation processes during development. Advantages and Disadvantages of Next Generation Sequencing (NGS) and Sanger Sequencing Next Generation Sequencing (NGS) Advantages: 1. High Throughput: o Can generate millions of sequences simultaneously, allowing for the analysis of entire genomes or transcriptomes in a single run. 2. Cost-Effective: Lecture 5: Sequencing Technology o The cost per base is significantly lower than Sanger sequencing, making large-scale studies economically feasible. 3. Speed: o Rapid data generation, with results available in days or weeks, compared to the longer timeframes associated with Sanger sequencing. 4. Versatility: o Applicable to a wide range of applications, including whole genome sequencing, RNA sequencing, targeted sequencing, and metagenomics. 5. Comprehensive: o Provides insights into not just specific genes but entire genomic regions, offering a more holistic view of genetic variation. Disadvantages: 1. Data Complexity: o Generates large volumes of data, requiring significant computational resources and expertise for analysis and interpretation. 2. Shorter Reads: o Many NGS platforms produce shorter reads compared to Sanger sequencing, which can complicate assembly and alignment, especially in repetitive regions. 3. Errors and Bias: o Higher error rates in certain regions, particularly homopolymers, necessitating careful validation of findings. 4. Library Preparation: o The need for extensive library preparation can introduce biases and may require specialized equipment and expertise. Sanger Sequencing Advantages: 1. Accuracy: o Generally provides high-quality, accurate sequences, making it a gold standard for validating variants identified by NGS. 2. Longer Reads: Lecture 5: Sequencing Technology o Produces longer read lengths (up to 1,000 bases or more), facilitating the sequencing of complex regions and repetitive sequences. 3. Simplicity: o The workflow is straightforward, making it easier to perform and requiring less computational power for analysis compared to NGS. 4. Established Method: o A well-established technique with extensive history and application, widely accepted in clinical and research settings. Disadvantages: 1. Low Throughput: o Sequences one fragment at a time, leading to lower overall output and making it impractical for large-scale projects. 2. Higher Cost per Base: o More expensive per base sequenced compared to NGS, limiting its use for large genomic studies. 3. Time-Consuming: o Longer turnaround times, often requiring weeks to complete sequencing for larger projects. 4. Limited Scope: o Primarily used for sequencing specific genes or smaller regions, not suitable for whole genome or transcriptome analysis.