Podcast
Questions and Answers
Which of the following best describes Sanger sequencing?
Which of the following best describes Sanger sequencing?
Sanger sequencing allows for the simultaneous sequencing of multiple DNA fragments.
Sanger sequencing allows for the simultaneous sequencing of multiple DNA fragments.
False
What is the primary role of ddNTPs in Sanger sequencing?
What is the primary role of ddNTPs in Sanger sequencing?
To terminate the DNA strand extension.
In Sanger sequencing, the DNA polymerization uses dATP, dCTP, dTTP, and dGTP, while the ______ are used to terminate the extension.
In Sanger sequencing, the DNA polymerization uses dATP, dCTP, dTTP, and dGTP, while the ______ are used to terminate the extension.
Signup and view all the answers
Match the following terms with their correct descriptions:
Match the following terms with their correct descriptions:
Signup and view all the answers
Which of the following technologies is NOT a main player in next-gen sequencing?
Which of the following technologies is NOT a main player in next-gen sequencing?
Signup and view all the answers
Next-generation sequencing is slower and more expensive than Sanger sequencing.
Next-generation sequencing is slower and more expensive than Sanger sequencing.
Signup and view all the answers
What are the two main types of next-gen sequencing?
What are the two main types of next-gen sequencing?
Signup and view all the answers
Illumina sequencing operates by __________ to determine nucleotide incorporation.
Illumina sequencing operates by __________ to determine nucleotide incorporation.
Signup and view all the answers
What primary advantage do short read technologies provide?
What primary advantage do short read technologies provide?
Signup and view all the answers
Match the type of next-gen sequencing with its primary usage:
Match the type of next-gen sequencing with its primary usage:
Signup and view all the answers
Which next-gen sequencing technology has the largest market share?
Which next-gen sequencing technology has the largest market share?
Signup and view all the answers
What is the first step in the Hi-C workflow?
What is the first step in the Hi-C workflow?
Signup and view all the answers
Topologically Associating Domains (TADs) are regions of the genome that interact with distant regions.
Topologically Associating Domains (TADs) are regions of the genome that interact with distant regions.
Signup and view all the answers
What technique is used to identify inter-fraction pairings in the Hi-C method?
What technique is used to identify inter-fraction pairings in the Hi-C method?
Signup and view all the answers
Hi-C provides insights into how the _____ structure of the genome impacts gene expression.
Hi-C provides insights into how the _____ structure of the genome impacts gene expression.
Signup and view all the answers
Match the following applications of Hi-C with their descriptions:
Match the following applications of Hi-C with their descriptions:
Signup and view all the answers
Which step in the Hi-C workflow involves filling in the fragmented DNA with nucleotides?
Which step in the Hi-C workflow involves filling in the fragmented DNA with nucleotides?
Signup and view all the answers
Hi-C data can provide insights into epigenetic regulation.
Hi-C data can provide insights into epigenetic regulation.
Signup and view all the answers
What is the purpose of the purification step in the Hi-C workflow?
What is the purpose of the purification step in the Hi-C workflow?
Signup and view all the answers
In the Hi-C workflow, after ligation, the _____ are reversed to analyze the interactions.
In the Hi-C workflow, after ligation, the _____ are reversed to analyze the interactions.
Signup and view all the answers
What is the overall goal of Illumina Complete Long Reads library preparation?
What is the overall goal of Illumina Complete Long Reads library preparation?
Signup and view all the answers
Illumina Complete Long Reads technology was released recently and is expected to have widespread acceptance immediately.
Illumina Complete Long Reads technology was released recently and is expected to have widespread acceptance immediately.
Signup and view all the answers
What process does Illumina use to insert primer binding sites into the genome?
What process does Illumina use to insert primer binding sites into the genome?
Signup and view all the answers
Illumina Complete Long Reads produce pseudo-long reads from overlapping ________.
Illumina Complete Long Reads produce pseudo-long reads from overlapping ________.
Signup and view all the answers
Match the following components of Illumina Complete Long Reads with their purposes:
Match the following components of Illumina Complete Long Reads with their purposes:
Signup and view all the answers
Which statement is true regarding the criticism of Illumina Complete Long Reads?
Which statement is true regarding the criticism of Illumina Complete Long Reads?
Signup and view all the answers
PCR amplification is performed on both landmarked sequences and unmodified genomic DNA samples.
PCR amplification is performed on both landmarked sequences and unmodified genomic DNA samples.
Signup and view all the answers
What is the size range of the DNA template fragments targeted by Illumina Complete Long Reads?
What is the size range of the DNA template fragments targeted by Illumina Complete Long Reads?
Signup and view all the answers
The goal of using landmarks in Illumina Complete Long Reads is to ________ short reads from the same fragment.
The goal of using landmarks in Illumina Complete Long Reads is to ________ short reads from the same fragment.
Signup and view all the answers
Study Notes
Bioinformatics Lecture 2 - DNA Sequencing Technologies and Applications
- Bioinformatics lecture covers DNA sequencing technologies and applications, progressing from DNA sequencing to genome annotation and expression analysis, as well as marker-trait associations and population analysis.
- The workflow diagram illustrates the sequential steps in the process. This includes sequencing quality control, assembly, DNA mapping, genome annotation, expression analysis, genotyping, and polymorphism discovery.
- Learning outcomes include understanding the history, positives and negatives of current sequencing technology, innovations in sequencing library preparation, and technology choices influence on genome sequencing projects.
Sanger Sequencing
- Sanger sequencing is also known as dideoxy sequencing or chain termination sequencing.
- DNA extension occurs through polymerization using dNTPs (dATP, dCTP, dTTP, dGTP) with an existing DNA strand as a template.
- Radiolabelled primer, dNTPs and dideoxy-NTP (ddNTPs) are mixed in separate sequencing reactions (one for each nucleotide).
- The ddNTPs randomly incorporate into the new strand, terminating extension, resulting in radioactive fragments of varying lengths.
- Fragments are separated by size on a gel (acrylamide gel) and the sequence is read from an autoradiogram.
Fluorescent Dye-Termination
- This method replaces radioactivity by fluorescent dyes attached to ddNTPs.
- Each nucleotide has a different color.
- All four sequencing reactions can be run on a single lane.
Capillary Electrophoresis
- Capillary electrophoresis is used to resolve DNA fragments during Sanger sequencing and is a critical part of modern sequencing procedures.
Sanger Sequencing (continued)
- Sanger sequencing remains a standard, used to measure other sequencing types.
- Reads 800-1000 base pairs today.
- Still commonly used for sequencing small regions or DNA fragments.
- Important for its high fidelity and established error models.
Quality Scores
- A quality score measures the confidence in a base-call.
- Derived from Phred software, used in base-calling from dye-terminator fluorescent sequencing.
- The formula is q=−10log10(p), where p is the probability of a correct base call.
- A score of q=20 indicates a 99% probability of a correctly called base.
- Useful for removing low-quality ends in sequence reads and generating consensus sequences from aligned reads.
Next-gen Sequencing Technology
- Next-generation sequencing (NGS) platforms sequence DNA in a highly parallel manner.
- Contrast with Sanger sequencing by producing billions of sequences in a single run (Sanger is 1 sequence per run)
- Faster and cheaper.
- Removing the need for bacterial colonies.
- Faster library construction compared to Sanger sequencing.
- Four main types available (Illumina, MGI, Oxford Nanopore and Pacific Biosciences) with unique approaches.
Next-gen Sequencing Technology (continued)
- Short read technologies produce a lot of data per dollar spent and are excellent at SNP detection.
- Long read technologies are better for identifying structural variations in individuals.
- NGS can be divided into short read and long read categories.
- Short read sequencing creates a large number of short fragments compared to long reads.
- Short reads primarily used for re-sequencing, genotyping, and analysis of gene expression.
Illumina Sequencing
- Illumina is the dominant sequencing platform (over 80% market share).
- Sequencing-by-synthesis approach, adding nucleotides in a step-wise fashion to a single strand template.
- Each nucleotide is detected by fluorescence, based on its wavelength.
- Polymerization is interrupted after each nucleotide, leading to fluorescence representing a single base.
- Illumina sequencing has advantages: whole-genome re-sequencing, small RNA identification, RNA-seq, digital gene expression, and several other applications.
- Limitations include read length, increasing error rates at longer read positions, and a non-random error model.
- Bridge amplification is a key part of the process, and numerous steps are involved from end-repair to ligation and then to denaturation.
10X Genomics
- An Illumina library construction technique
- Enables assignment of sequencing reads to individual starting template molecules
- Used for single-cell RNA sequencing and ATAC-Seq.
- No longer offered for whole-genome sequencing due to patent disputes.
Hi-C
- Hi-C is an extension of the Chromosome Conformation Capture (3C) method to capture the three-dimensional organization of genomes.
- Purpose: Understand how chromosomes fold and interact within the nucleus, identify topologically associating domains (TADs), etc.
- Key reference: Lieberman-Aiden et al., Science, 2009.
- Key steps in Hi-C workflow include cross-linking, digestion, ligation, purification, and sequencing, along with subsequent data analysis.
Hi-C Applications
- Hi-C data is helpful in scaffolding contigs during de novo genome assembly.
- Used to identify topologically associating domains (TADs).
- Important for understanding enhancer-promoter interactions, critical in gene regulation.
- Applicable to cancer genomics and epigenetics, helping understand chromosomal abnormalities and how 3D genome impacts gene expression.
Illumina Complete Long Reads
- Created from overlapping short reads to develop pseudo-long reads.
- Released in 2023, goal to identify short reads from the same template segment.
- Longer fragments with highly accurate base calls exist.
- Some criticism about read length and accuracy compared to other long-read sequencing technologies.
- New tagmentation approach using transposable elements to randomly insert primer binding sites across the genome.
MGI DNA Nanoball Sequencing
- Newer platform employing rolling circle amplification to generate long oligos with tandem repeats.
- Loads single molecules (DNA Nanoballs) onto flowcells through interactions between the DNA's phosphate backbone and flowcell.
- Uses combinatorial probe anchor synthesis (cPAS) for base-calling.
- High-throughput for generating a large volume of sequence per run.
Illumina vs. MGI
- Comparative analysis shows both Illumina and MGI technologies performing similarly; either is well-suited for whole-genome variant analysis.
PacBio Sequencing
- PacBio is another long-read sequencing technology with real-time sequencing using single-molecule technology.
- Uses single-molecule sequencing without amplification.
- Can generate long reads (up to 100 kb) with high accuracy (HiFi). HiFi reads generate consensus sequence by sequencing the same DNA molecule multiple times.
- Suitable for genome assembly and variant detection.
PacBio - HiFi
- PacBio HiFi sequencing.
- Generates extremely high-quality long reads (10-25 kb) with high accuracy (Q30 or higher).
- Generates consensus sequence from repeated sequencing of the same DNA fragment.
Nanopore Sequencing
- Oxford Nanopore Technologies (ONT)
- Real-time sequencing is used for DNA and RNA sequences.
- Detects electrical current changes as DNA molecules pass through a nanopore.
- Produces long reads up to hundreds of kilobases.
- Available in portable devices (MinION) to high-throughput devices (PromethION).
ONT Sequencing (continued)
- Raw data from ONT is squiggle plot data; numerical values reflecting the way current is disrupted when a molecule passes through the nanopore.
ONT Applications
- Whole genome sequencing using de novo assembly of high-quality ONT reads.
- Real-time detection of pathogens (e.g. Ebola, COVID-19)
- Metagenomics
- Structural variant detection.
- Epigenetics: detection of base modifications (e.g., methylation)
- Full-length RNA sequencing.
ONT Pros and Cons
- Advantages: portability, scalability, real-time data, production of ultra-long reads.
- Disadvantages: lower raw read accuracy, higher error rate in homopolymeric regions, higher cost per base compared to short-read sequencing.
Case Study - Cherry Tree Assembly
- Purpose: protect Intellectual Property (IP) for commercialized varieties with unambiguous gene assignment to cherry chromosomes.
- Key parameters: limited funds, quantity of available DNA, limited time constraints.
- Choosing the best assembly considerations and technologies for the job.
Assembly Considerations
- Goal is to assign chromosome locations to unique gene-based markers, requiring high-quality contig assembly for unique regions.
- No need for elaborate genome characterization around repetitive regions.
Technology Choices
- High-quality contig sequence (especially in non-repetitive regions) is needed (using 10X genomics is the best choice)
- Scaffold generation needs excellent initial scaffold lengths (using combination of 10X Genomics, Hi-C, or PacBio) -- Nanopore or PacBio likely too costly without a critical reason to use them.
- Pseudochromosome building must use a pre-existing genetic map, (genetic map could be made with existing genetic markers).
Cherry 10X Genomics Input
- Data summarization for a cherry 10X Genomics Sequencing input and the qualities of the raw data.
Kmer Frequency Analysis
- Provides data on genome length, unique content, and heterozygosity for the fruit from which the data was acquired.
- Data can be visualized visually with a graph.
10X Genomics Assembly Summary
- Summary metrics for the 10X Genomics sequencing of the cherry genome, including the number of reads, coverage, insert size, GC content, and marker density.
Hi-C Scaffolding 10X Assembly
- Details on Hi-C assembly results before and after Hi-C, including the numbers of contigs, N50, count >= N50, and total scaffold length for the 10X assembly.
Evaluation of the Hi-C Scaffolds
- Assessment, plots and data for evaluating Hi-C scaffolds (comparing genetic maps based on physical positions and comparing the genetic maps).
Hi-C Scaffolding Results (Summary)
- Detailed overview and summary of Hi-C assembly results, including the numbers before and after Hi-C assembly for several key metrics (e.g., contigs, N50, etc.).
Cherry Genetic Map
- Visualization (often in graph form) of the gene order and physical distances of genes on each chromosome.
Pseudomolecule Construction
- Details (often in tabular format) of the pseudochromosome assembly and the linkage maps with scaffold lengths of individual chromosomes (sometimes from an existing genetic map).
Assembly Evaluation
- Summary tables and charts for details on BUSCO gene evaluation of the completeness and quality of the genome assembly.
Choosing a Technology
- Factors for deciding which sequencing technology to use for a particular project: application, total cost, cost per sample, number of reads, sequence length, and quality; as well as availability of the technology.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on Sanger sequencing and next-generation sequencing technologies. This quiz covers the fundamental principles, advantages, and key terms associated with these sequencing methods. Perfect for students studying molecular biology or genetics.