Genomic Sequencing Strategies and New Technologies PDF
Document Details
Tags
Summary
This document discusses various genomic sequencing strategies, from the foundational Sanger method to Next-Generation Sequencing (NGS) and beyond. It details the steps involved in shotgun sequencing and the different approaches to sequencing entire genomes, including hierarchical and whole-genome shotgun strategies. Different sequencing generations are also compared.
Full Transcript
GENOMIC SEQUENCING STRATEGIES AND NEW SEQUENCING TECHNOLOGIES Sequencing Technologies The two basic sequencing approaches, Maxam-Gilbert and Sanger, differ primarily in the way the nested DNA fragments are produced. Maxam-Gilbert sequencing (also called the chemical degra...
GENOMIC SEQUENCING STRATEGIES AND NEW SEQUENCING TECHNOLOGIES Sequencing Technologies The two basic sequencing approaches, Maxam-Gilbert and Sanger, differ primarily in the way the nested DNA fragments are produced. Maxam-Gilbert sequencing (also called the chemical degradation method) uses chemicals to cleave DNA at specific bases, resulting in fragments of different lengths. A refinement to the Maxam-Gilbert method known as multiplex sequencing enables investigators to analyze about 40 clones on a single DNA sequencing gel. Sanger sequencing (also called the chain termination or dideoxy method) involves using an enzymatic procedure to synthesize DNA chains of varying length in four different reactions, stopping the DNA replication at positions occupied by one of the four bases, and then determining the resulting fragment lengths. Sanger NGS Sample Clones, PCR DNA libraries Sample tracking Many samples in 90 – Few 384 well plates Preparation Few, (including Many, mostly complex steps reaction clan up) procedures Data collection Samples in plates of 96 Samples in slides 1-16+ - 381 Data One read/sample Thousands & millions reads/sample Output One forward & one Millions of fragments reverse read run in parallel Common use Clinical research Clinical research When to use Sanger vs NGS Sanger NGS Sequencing single gene Interrogating >100 genes at a time Sequencing 1-1– amplicon targets at Finding novel variants through the lowest cost expansion of no. of target sequence in a single run Sequencing up to 96 samples at a time Sequencing samples with low amount without barcoding of starting material Microbial identification Sequencing microbial genome to subtype pathogen (critical outbreak research) Fragment analysis / high throughput genotyping Microsatellite / STR analysis NGS confirmation So What’s Wrong With It? The dideoxy method is good only for 500- 750bp reactions Expensive Takes a while The human genome is about 3 billion bp Sanger Throughput Limitations Must have 1 colony picked for every 2 reactions Must do 1 DNA prep for every 2 reactions Must have 1 PCR tube for each reaction Must have 1 gel lane for each reaction from The Economist Sequencing Technologies Meeting Human Genome Project sequencing goals by 2003 has required continual improvements in sequencing speed, reliability, and costs. Previously, standard methods were based on separating DNA fragments by gel electrophoresis, which was extremely labor intensive and expensive. Total sequencing output in the community was about 200 Mb for 1998. In January 2003, the DOE Joint Genome Institute alone sequenced 1.5 billion bases for the month. Gel-based sequencers use multiple tiny (capillary) tubes to run standard electrophoretic separations. These separations are much faster because the tubes dissipate heat well and allow the use of much higher electric fields to complete sequencing in shorter times. SHOTGUN SEQUENCING Shotgun sequencing was introduced by Sanger et al. in 1977 for sequencing genomes – Obtain random sequence reads from a genome – Assemble them into contigs on the basis of sequence overlaps Straightforward for simple genomes (with no or few repeat sequences) Merge reads containing overlapping sequence Shotgun sequencing is more challenging for complex (repeat-rich) genomes: two approaches SHOTGUN SEQUENCING – Hierarchical shotgun approach Generating an overlapping set of intermediate- sized (e.g. bacterial artificial chromosomes with 200 KB inserts) clones, and keeping a map of that (it took 2 yrs for mapping E. coli) Subjecting each of these clones to shotgun sequencing, and using the map to get the whole sequence. Used in S. cerevisiae (yeast), C. elegans (nematode), A. thaliana (mustard weed) and by the International Human Genome Sequencing Consortium (started in 1990, draft made available in 2000) SHOTGUN SEQUENCING – Whole-genome shotgun (WGS) approach Generating sequence reads directly from a whole-genome library Using computational techniques to reassemble in one step. Used for Drosophila melanogaster (fruit fly) and by Celera Genomics (formed 1998) for human genome. Shotgun Genome Sequencing Complete genome Fragmented genomecopies chunks Shotgun Genome Sequencing Fragmented genome chunks NOT REALLY DONE BY DUCK HUNTERS Hydroshearing, sonication, enzymatic shearing Assembly, aka All the King’s horses and all the King’s men… 17 bp ATTGTTCCCACAGACCG CGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC ATGCCGGCGAAGCATTGT ACAGACCGTGTTTCCCGA TAATGCGACCTCGATGCC AAGCATTGTTCCCACAG TGTTTTCCGACCGAAAT TGCCGGCGAAGCCTTGT CCGACCGAAATGGCTCC 66 bp Assembly Consensus: TAATGCGACCTCGATGCCGGCGAAGCATTGTTCCCACAGACCGTGTTTTCCGACCGAAATGGCTCC ATTGTTCCCACAGACCG CGGCGAAGCATTGTTCC ACCGTGTTTTCCGACCG AGCTCGATGCCGGCGAAG TTGTTCCCACAGACCGTG TTTCCGACCGAAATGGC ATGCCGGCGAAGCATTGT ACAGACCGTGTTTCCCGA TAATGCGACCTCGATGCC AAGCATTGTTCCCACAG TGTTTTCCGACCGAAAT TGCCGGCGAAGCCTTGT CCGACCGAAATGGCTCC Coverage: # of reads underlying the consensus Shotgun Sequencing Used to sequence whole genomes Steps: – DNA is broken up randomly into smaller fragments – Dideoxy method produces reads – Look for overlap of reads Strand Sequence AGCATGCTGCAGTCATGCT------- First Shotgun Sequence -------------------TAGGCTA AGCATG-------------------- Second Shotgun Sequence ------CTGCAGTCATGCTTAGGCTA Reconstruction AGCATGCTGCAGTCATGCTTAGGCTA Generations of DNA Sequencing First generation Maxam-Gilbert + Sanger Second generation Pyrosequencing, Illumina, Ion Torrent – Key characteristics – the use of many clonal templates in parallel and a sequence determination process using enzymatic replication – Individual, randomly arrayed, clonal DNA templates generated and sequenced in parallel using microfluidics and imaging – Sequencing reactions via cycle enzymatic to generate base-specific signal captured by imaging – Base additions can be done using DNA polymerization or ligation and the visualization is with chemiluminescence (pyrosequencing chemistry used in the Roche FLX instrument), fluorescence (in the Illumina and LifeTechnologies 5500) or with pH changes (in the IonTorrent instrument) Generations of DNA Sequencing 2.5 Generation SMRT Sequencer (Pacific Biosciences) – Similar characteristics to 2nd generation, using enzyme template replication system to sequence individual cloned molecules – Difference/improvement – polymerase enzymatic system is positioned at the bottom of the wells of zero mode wave guides. Template DNA molecules are captured and base incorporation is monitored by cleavage of fluorescent dye-linked pyrophosphate in the volume limited observation window Generations of DNA Sequencing Third generation Biological Nanopore, Solid-State Nanopore – Direct reading of individual nucleic acid molecules without using replication enzymatic system to identify the sequence – Nanopores work as confinements between the two chambers of an electrophoretic system where electric current force ions to drive through the nanopore – Identification - based on the degree of blockage and time due to a transitioning molecule – Biological nanopores take advantage of the accurate sizes that are achieved by biological materials. Their disadvantage is that they require positioning in compatible carrier systems such as lipid bilayers. – Solid-state nanopores, similar application but pose substantial challenges to reproducible manufacturing Generations of DNA Sequencing – 2 approaches for nucleic sequence readout: (1) exonuclease nanopore – coupling of nanopore to a exonuclease, degrading a nucleic acid molecule and drops nucleotide after nucleotide into a nanopore that can identify what base (T, C, G, A, or 5methyl C) is transitioning (2) strand sequencing – translation of a single-stranded DNA molecule through the nanopore – A difficulty is the deconvolution of bases within the detection area of the nanopore as it reach maturity sooner than expected Generations of DNA Sequencing Fourth Generation – Still being developed – With a role to determine relationships between cells & mutational status – Beneficial for mutated DNA sequences or DNA that have undergone mutation THE METHODS IN VITRO CLONAL AMPLIFICATION Molecular detection methods are not sensitive enough for single molecule sequencing Most approaches use an in vitro cloning step to amplify individual DNA molecules. Emulsion PCR isolates individual DNA molecules along with primer-coated beads in aqueous bubbles within an oil phase. Polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. IN VITRO CLONAL AMPLIFICATION Emulsion PCR is used in the methods by Marguilis et al. (commercialized by 454 Life Sciences), Shendure and Porreca et al. (also known as "polony sequencing") and SOLiD sequencing, (developed by Agencourt, now Applied Biosystems). Another method for in vitro clonal amplification is bridge PCR, where fragments are amplified upon primers attached to a solid surface. Fragmented DNA templates are ligated to adapter sequences Encapsulated in an aqueous droplet (micelle) along with a bead covered with complementary adapters, deoxynucleotides (dNTPs), primers and DNA polymerase PCR carried out within the micelle, covering each bead with thousands of copies of the same DNA sequence Bridge PCR Bridge PCR Bridge PCR Bridge PCR Digital PCR Comparison of different PCR Then what? Which Sequencing Technology to choose? E. g. Solid-phase bridge amplification E. g. Solid-phase template walking Fragmented DNA is ligated to adapter fragmented DNA is ligated to adapters and bound to a complementary primer attached to a solid sequences and bound to a primer support immobilized on a solid support PCR is used to generate a second strand. The now The free end can interact with other nearby double-stranded template is partially denatured, primers, forming a bridge structure allowing the free end of the original template to drift and bind to another nearby primer sequence PCR is used to create a second strand from Reverse primers are used to initiate strand the immobilized primers, and unbound DNA displacement to generate additional free is removed templates, each of which can bind to a new primer E. g. Synthetic long-read sequencing platforms In Illumina, genomic DNA templates are fragmented to 8–10 kb pieces, partitioned into a microtitre (3,000 templates in a single well) & each fragment is sheared to around 350 bp and barcoded with a single barcode per well The DNA can then be pooled and sent through standard short-read pipelines In 10X Genomics' emulsion-based sequencing, the GemCode can partition arbitrarily large DNA fragments, up to ~100 kb, into micelles (also called 'GEMs') along with gel beads containing adapter and barcode sequences. The GEMs typically contain ~0.3× copies of the genome and 1 unique barcode out of 750,000 Within each GEM, the gel bead dissolves and smaller fragments of DNA are amplified from the original large fragments, each with a barcode identifying the source GE After sequencing, the reads are aligned and linked together to form a series of anchored fragments across a span of ~50 kb Unlike the Illumina system, this approach does not attempt to get full end-to-end coverage of a single DNA fragment. Instead, the reads from a single GEM are dispersed across the original DNA fragment and the cumulative coverage is derived from multiple GEMs with dispersed — but linked — reads E. g. In DNA nanoball generation, DNA is fragmented and ligated to the first of four adapter sequences The template is amplified, circularized and cleaved with a type II endonuclease A second set of adapters is added, followed by amplification, circularization and cleavage This process is repeated for the remaining two adapters The final product is a circular template with four adapters, each separated by a template sequence Library molecules undergo a rolling circle amplification step, generating a large mass of concatamers called DNA nanoballs, which are then deposited on a flow cell E.g. Pyrosequencing In microtitre plate, bead-based enriched template + primers + enzyme cocktail mixed 1st cycle –single (1) nucleotide species added followed by incorporation of complementary into a newly synthesized strand – producing pyrophosphate molecule (Ppi) PPi molecule + ATP sulfurylase converts adenosine 5ʹ phosphosulfate (APS) into ATP, cofactor for conversion of luciferin to oxyluciferin (light, hence PYRO) Any unincorporated bases will be degraded by pyrase Each burst of light is detected by a charge-coupled device (CCD) camera E.g. Ion Torrent In microtitre plate, on bead occupies single reaction well Nucleotide species added to the wells one at a time and as each base is incorporated, a single H+ ion is generated as a by-product The H+ release results in a 0.02 unit change in pH, detected by an integrated complementary metal-oxide semiconductor (CMOS) and an ion-sensitive field-effect transistor (ISFET) device After the introduction of a single nucleotide species, the unincorporated bases are washed away and the next is added DATA ANALYSIS? Data Analysis – 2nd Gen & above 3 major components: i. Base calling Using software provided by supplier of machine ii. Alignment When sample has a reference sequence Comparison of sample with ref seq to determine the most likely position in the genome Advantage – seq with no mismatches to the ref are easy to position Disadvantage – need to have certain threshold/acceptable degree of difference for sequences that differ from the ref (can be an advantage for variability determination) iii. Variant calling Linked to alignment Single nucleotide variant & small indels are easy to call (identify) Larger indels + structural variants – need the help of software or using de novo sequence assembly (but increase computational costs)