NGS Library Preparation and Sequencing (Ripetuto) PDF

Summary

This document provides a detailed overview of next-generation sequencing (NGS), focusing on library preparation and sequencing by synthesis (SBS) methods. It also discusses various applications such as RNA-seq and ChIP-seq, highlighting data analysis techniques and typical outcomes. The document covers concepts like cluster generation and multiplex sequencing for greater efficiency.

Full Transcript

NGS LIBRARY PREPARATION How Does Illumina NGS Work? Illumina next-generation sequencing utilizes a fundamentally different approach from the classic Sanger chain-termination method. It leverages sequencing by synthesis (SBS) technology – tracking the addition of labeled nucleotides as the DNA chai...

NGS LIBRARY PREPARATION How Does Illumina NGS Work? Illumina next-generation sequencing utilizes a fundamentally different approach from the classic Sanger chain-termination method. It leverages sequencing by synthesis (SBS) technology – tracking the addition of labeled nucleotides as the DNA chain is copied – in a massively parallel fashion. Next-gen sequencing generates masses of DNA sequence data that's richer and more complete than is imaginable with Sanger sequencing. Illumina sequencing systems can deliver data output ranging from 300 kilobases up to multiple terabases in a single run, depending on instrument type and configuration. llumina sequencing technology Sequencing by synthesis (SBS) Method that detects single bases as they are incorporated into growing DNA strands. In this method, DNA molecules and primers are first attached on a slide or flow cell and amplified with polymerase so that local clonal DNA colonies, later coined "DNA clusters", are formed. To determine the sequence, four types of reversible terminator bases (RT- bases) are added and non-incorporated nucleotides are washed away. A camera takes images of the fluorescently labeled nucleotides. Then the dye, along with the terminal 3' blocker, is chemically removed from the DNA, allowing for the next cycle to begin. Unlike pyrosequencing, the DNA chains are extended one nucleotide at a time and image acquisition can be performed at a delayed moment, allowing for very large arrays of DNA colonies to be captured by sequential images taken from a single camera. Library preparation Ø Samples consisting of longer fragments are first sheared into a random library of 100-300 base-pair long fragments. Ø After fragmentation the ends of the obtained DNA-fragments are repaired and an A-overhang is added at the 3'-end of each strand. Ø Afterwards, adaptors which are necessary for amplification and sequencing are ligated to both ends of the DNA-fragments. Ø These fragments are then size selected and purified. Basic steps for sequencing: Tn5-Transposase based library preparation Multiplex Sequencing Assay Process a large number of samples with multiplex sequencing on a high-throughput instrument. Sample multiplexing is a useful technique when targeting specific genomic regions or working with smaller genomes. To accomplish this, individual "barcode" sequences are added to each sample so they can be distinguished and sorted during data analysis. Pooling samples exponentially increases the number of samples analyzed in a single run, without drastically increasing cost or time. Multiplex Sequencing Highlights ü Fast, High-Throughput Strategy: Large sample numbers can be simultaneously sequenced during a single experiment ü Cost-Effective Method: Multisample pooling improves productivity by reducing time and reagent use ü High-Quality Data: Accurate maintenance of read length of unknown sequences ü Simplified Analysis: Automatic sample identification with "barcodes" using Illumina data analysis software Illumina NGS sequencing 1) Sequencing by synthesis (the sequence is copied, one nucleotide at a time) 2) Massive parallelization of the sequencing reaction to increase sequencing capacity All currently available methods rely on PCR-based amplification of template DNA (single molecule sequencing not yet available; however all these methods use a single DNA molecule as template for the amplification meaning that each lane contains millions of ‘clusters’ each one generated by the amplification of a single molecule) Stepwise determination of sequence by iterative cycles of nucleotide extension (=sequencing by synthesis) done in parallel. In each cycle nucleotides and required reagents are flowed over the immobilized templates and then washed away [“flow cells” composed of lanes] CLUSTER GENERATION Single DNA-fragments are attached to the flow cell by hybridizing to oligos on its surface that are complementary to the ligated adaptors. The DNA-molecules are then amplified by a so called bridge amplification which results in a hundred of millions of unique clusters. Finally, the reverse strands are cleaved and washed away and the sequencing primer is hybridized to the DNA-templates. Sequencing During sequencing the huge amount of generated clusters are sequenced simultaneously. The DNA-templates are copied base by base using the four nucleotides (ACGT) which are fluorescently-labeled and reversibly terminated. After each synthesis step, the clusters are excited by a laser which causes fluorescence of the last incorporated base. After that, the fluorescence label and the blocking group are removed allowing the addition of the next base. The flourescence signal after each incorporation step is captured by a built-in camera, producing images of the flow cell. Illumina Video on cluster generation and sequencing: Sequencing https://www.youtube.com/watch?annotation_id=annotation_1533942809&featur e=iv&src_vid=HMyCqWhwB8E&v=fCd6B5HRaZ8 Per saperne di più: https://www.youtube.com/watch?v=PFwSe09dJX0 Most promising companies: https://www.elementbiosciences.com/products/aviti https://www.youtube.com/watch?v=b_cC5wi2OYg https://www.youtube.com/watch?v=zthkuaQDHzM DATA ANALYSIS ØRNAseq ØChIP-seq/ATAC-seq RNA-seq in a nutshell A cell subjected to a different condition à i.e., a cold shock is applied. 1. Produce sequencing data from the transcriptome form both the “perturbed” and control state 2. Match sequencing reads to the same genome or transcriptome. 3. Count how many reads align to the same region for both conditions. 4. Look for differentially expressed genes Sequence alignment software Aligner Approach Applications Availability BWA-mem Burrows-Wheeler DNA, SE, PE, SV open-source Bowtie2 Burrows-Wheeler DNA, SE, PE, SV open-source Novoalign hash-based DNA, SE, PE free for academic use TopHat Burrows-Wheeler RNA-seq open-source STAR hash-based RNA-seq open-source GSNAP hash-based RNA-seq open-source RNA Seq DATA What is the typical outcome of an RNA-Seq analysis? DIFFERENTIAL EXPRESSION By observing differentially expressed genes and transcripts, we can infer the functional characteristics of the different states. A matrix containing the Read Counts for all the annotated genes in the genome Apply statistics to check whether the differential expression is statistically significant Represent the results of the differential expression in a proper way What is R? R is at least three things at the same time: A software environment for statistical computing A software for analyses, plotting and graphics An open-source programming language R Learning levels Computational tool for statistics Software to perform domain-specific analyses (i.e., genomics) Programming language All depends on the scope RNA-Seq Workflow 1. Import and visualize RNA-seq data in R (counts) 2. Perform statistics and determine the Differentially Expressed Genes (DEG) in our dataset 3. Represent the data with different kind of plots A matrix containing the Read Counts for the genes of interest or for all the annotated genes in the genome RNA-Seq Workflow 1. Import and visualize RNA-seq data in R (counts) 2. Perform statistics and determine the Differentially Expressed Genes (DEG) in our dataset 3. Represent the data with different kind of plots Apply statistics to compare the two sample groups and to check whether the differential expression is statistically significant Gene log2FoldChange Padj HSD17B6 -3,3699439 1,82E-135 CAV1 3,13195826 1,42E-117 SLC35A2 -1,3321695 1,42E-117 GPAM 4,081604 3,35E-110 NPR1 3,47604144 1,14E-106 GABRD -3,5369007 2,31E-102 HSD17B13 5,08149855 2,31E-102 PDE2A 2,90811388 1,61E-96 RDH5 4,40947408 7,06E-95 TK1 -2,8249466 8,83E-95 RNA-Seq Workflow 1. Import and visualize RNA-seq data in R (counts) 2. Perform statistics and determine the Differentially Expressed Genes (DEG) in our dataset 3. Represent the data with different kind of plots CASE vs CONTROL HSD17B6 CAV1 SLC35A2 GPAM NPR1 100 GABRD HSD17B13 RDH5 PDE2A TK1 −log10 adjusted p−value 50 0 −5 0 5 10 log2 fold change HEATMAP Identifying TFs or Histone modifications through ChIP-seq experiments 300 bp 50 bp 50 bp 250 bp (insert size) mate Single end Sequencing Paired end Sequencing Single end alignment ACGCT.. Human genome (3e9 bp) ACGCT.. OR Paired end alignment ACGCT.. TCTTA.. Human genome (3e9 bp) ACGCT.. TCTTA.. Adapted from Park P, Nature Review Genetics 2009 Data analysis ChIP-seq analysis workflow Park P, Nature Review Genetics 2009 ChIPseq and RNAseq datasets THE GENOMIC LANDSCAPES OF INFLAMMATION IN MACROPHAGES A novel link between a family of metabolic enzymes and epigenome regulation Russo, Gualdrini et al., Molecular Cell, 2024 Mediator interacts with Acyl-CoA producing enzymes Mediator IP-MS A 2-ketoacid Dld Dlat Pyruvate Dlst A dehydrogenases 2-ketoacid Dld DLATPyruvate Dlst Dlat Med24 6 dehydrogenases PDHA1 DLAT Mediator 6 complex Med13 Ogdh Med24 E2 Mediator complex Med1 Med13 Ogdh PDHA1PDHB E2 DLD Med7 Med1 Pdhx PDHB E1DLD E3 Med7 Pdhx Bckdhb E1 E3 Med28 Promoter Mediator Med28 Bckdhb TF Med23 Med26 Med26 Med23 Med30 Ccnc Med30 PDH PDH Ccnc Leucine Leucine -log10 P-value Acetyl CoA Acetyl CoA -log10 P-value Enhancer Med14 Med14 Med20Med20 Med17Med17 4 Med6 Med6 4 Med8Med8 Cdk19 Med29 Med16 Cdk19 Med29 Med16 Med4 DBT Med13l Med4 Bckdha DBT Citrate Med13l Dbt Bckdha Med9 BCKDHA E2 Oxaloacetate Citr DbtMed19 Med9Cdk8 Med18 BCKDHA BCKDHB E1 E2DLD E3 Oxaloacetate Pol II Med19 Cdk8 Med15 BCKDHB E1 E3 DLD Med18 Med27 mRNA 2 Med12 Med15Med31 BCKDH Med25 Med27 Med11 2 Transcription Med12 Med10 Pdhb Med22Med31 BCKDH bursting Med25Med21 Med11 Isoleucine α-Ketoglutarate Med10 Pdha1 Med21 Med22 Pdhb Valine Succinyl-CoA DLST Isoleucine α-Ketogl 0 Pdha1 E2 Valine OGDH DLD Cohesin Succinyl-CoA E1 E3 DLST −10 5 0 5 10 0 OGDH E2 CTCF Log2 Fold change (Med1 / IgG) OGDH E3 E1 DNA −10 5 0 5 10 B RAW 264.7 HeLa C HeLa MED1 IP E OGDH Log2 Fig. 3 |A workingmodelfor Mediator function.Anenhancer–promoter interaction Fold nuclear (loop) changenuclear extract isshown (Med1 onthe left, / IgG) extract within MED1 DLST INPUT IgG IP a largertopologically associatingdomainformedbyCTCF andcohesin.Mediator INPUT isboundIgGto one IP ormoretranscription INPUT IgG IP (n=19697) (n=5132) factors(TFs) Adaptedthat fromoccupy theenhancer,andthe Richter WF,….l. Nature preinitiation reviews. complex(PIC) at the promoterisfully -250 assembledandactive. Such local architecture Molecular B ofenhancer–promoterchromatinloopingcould cell biology (2022) RAW MED1 264.7stabilizedbyMediator-associated befurther HeLa C MED1 -250 HeLa MED1 IP -250 E 1957 1311 284 nuclear extract associatingdomainboundaries nuclear extract cohesin167,butthisassociation wouldbetransient (dashedcircle) relative (solidcircle). Following abrief,direct enhancer–promoterinteraction, to topologically DLAT the IgG enhancerdetachesfromthe INPUT IgG Russo, Gualdrini et al., Molecular IP-100 Cell, MED1 1580 2024 INPUT IP INPUT-50IgGpromoter(for IP -50 MED24 1551 14966 (n=19697) -75 1194 example,through dissociation of TFsfromenhancer DNA); however,if one ormoreTFsremainboundto Mediator, the -100 complexcould remaininanactive conformational state.Thisstate MED1 could allow MED24continuedtranscription reinitiation (burst- -250 -100 -250 MED1 OGDH -250 OGDH1957 ing)fromthe PIC scaffoldcomplex,providedRNA polymeraseII (PolII) andother PIC factorscontinue to associateforrein- -100 (n=4609) itiation (right). Ultimately,reinitiation maystop(not shown),becauseofTF–Mediator DLAT dissociation,bindingofthe kinase -100 158 DLAT moduleto Mediator (which wouldblockMediator–Pol II interaction) orPIC disassembly.Thelight -50 blueshadingrepresents -50 DLST -50 -50 MED24 -50 14966 ahuborcondensatethat establishesa highlocal concentration ofPIC componentsthat MED16 promotestranscription -100 initiation -100 F-75 DLST OGDH n=5132 n=4609 11 andbursting.TFIIH,transcription factor IIH. siRNA -100 NT + - + - + - 1 MED24 -100 MED1 - + - + - + s Med13l Med13l Bckdha Bckdha Med9 E2 Oxaloacetate Citrate Citrate BCKDHA E2 -log10 -log1 Dbt Med19 Dbt Dbt Med9 Med9 Cdk8 Cdk8 BCKDHA BCKDHA BCKDHB E2 E3 DLD Oxaloacetate DLD DLD Oxaloacetate Med19 Med19 Cdk8 Med18 BCKDHB BCKDHB E1E1 E3 E3 Med18 Med18 Med15 E1 Med15 Med15 Med27 2 22 Med12 Med12 Med12 Med27 Med27 Med31 BCKDH BCKDH BCKDH Med25 Med31 Med31 Med25 Med25 Med10 Med10 Med11 Med11 Med11 Med10 Med22 Pdhb Pdhb Pdhb Med21 Med21 Med21 Med22 Med22 α-Ketoglutarate Isoleucine Isoleucine Isoleucine α-Ketoglutarate α-Ketoglutarate Pdha1 Pdha1 Pdha1 Valine Valine Valine Succinyl-CoA DLST Succinyl-CoA Succinyl-CoA DLST DLST 0 00 Mediator interacts with Acyl-CoA producing OGDH OGDH OGDH enzymes E2E2 E2 DLD DLD DLD E3 E3 E1E1 E3 E1 −10 −10 −10 555 000 555 10 10 OGDH OGDH OGDH Log2 Log2 Fold Log2Fold change Foldchange (Med1 change(Med1 (Med1// IgG) IgG) A C E BB B Macrophages RAW RAW 264.7 RAW264.7 264.7 HeLa HeLa HeLa HeLa C 2-ketoacid HeLa HeLaMED1 HeLa MED1IP MED1 IP IP Dld EE Dlst Dlat Pyruvate nuclear nuclear extract nuclearextract extract nuclear extract nuclearextract nuclear extract dehydrogenases MED1 DLAT 6 INPUT IgG IP Med24 MED1 MED1 DLST DLST E2DLST INPUT MediatorINPUT complex IgG IgG IP IP Ogdh PDHA1 Med13 INPUT INPUT IgG INPUTIgGIgG IPIP IP INPUT IgG INPUT IgG INPUT IP IgG IPIP (n=19697) (n=19697) PDHB (n=5132) (n=5132)DLD Med7 Med1 Pdhx (n=19697) E1 (n=5132) E3 Med28 Bckdhb MED1 MED1 -250 -250 -250 -250 1311 1311 MED1 -250 -250 MED1 MED1 -250 -250 Med30 1957 1311 1957 PDH 1957 Med23 MED1 Med26 -250 Ccnc Leucine 284 284 Acetyl CoA -log10 P-value Med14 Med20 Med6 Med17 1580 1580 284 DLAT 4 Med8 -100 -100 1580 DLAT Cdk19 -100Med16 DLAT -50 -50 -50 -50 MED24 MED24 Med29 Med4 14966DBT 14966 1551 1551 -50 -50 MED24 Med13l -75 Bckdha -75 14966 1194 1551 1194 Oxaloacetate Citrate Dbt -75 Med9 BCKDHA E2 1194 -100 Cdk8 BCKDHB E1 DLD MED24 MED24 -100 -100 -100 Med19 Med18 E3 MED24 -100 -100 OGDH OGDH Med15 OGDH OGDH OGDH Med12 Med27 -100 -100 BCKDH OGDH (n=4609) 2 Med31 -100 (n=4609) DLAT DLAT Med25 Med10 Med11 (n=4609) DLAT -50 -50 -50 -50 DLST Med22 Pdhb -50 -50 DLST Med21 -50 -50 Isoleucine α-Ketoglutarate DLST Pdha1 MED16 -100 -100 -100 -100 -50 F F DLST DLST OGDH Valine OGDH Succinyl-CoA DLST MED16 MED16 -100 -100 F DLSTn=4609 n=5132 n=5132 OGDH n=4609 siRNA 0 NT NT ++ -- ++ -- ++ -- 11 n=5132 n=4609 E2 OGDH DLD siRNA NT MED1 MED1 -+ - ++- --+ ++- -- + ++ - 1 E1 E3 Count per milion reads DLAT DLAT MED1 - -0 + −10 5+ 5- + 10 Count per milion reads DLAT -50 -50 -50 -50 MED1 MED1 OGDH -50 -50 Log2 Fold change (Med1 / IgG) MED1 DD 00 0.7 0.7MED1 D RAW RAW 264.7 264.7 B RAW 264.7 HeLa C HeLa 0 IP EFLAG FLAG 60 60 nuclear extract nuclear extract 0.7 RAW 264.7 INPUT IgG IP FLAG MED1 Empty Empty DLST 60 INPUT IgG IP INPUT IgG IP (n=19697) (n=5132) Empty DLST DLST 40 40 1311 local. with MED1/tot. obj. (%) MED1 -250 -250 MED1 -250 1957 40 00 OGDHDLST OGDH l. with MED1/tot. obj. (%) 284 20 20 DLAT Russo, Gualdrini et al., Molecular Distance 0 to Distance to peak peak center center (± -100 1kb) OGDH (± 1kb) 1580 Cell, 1551 2024 -50 -50 MED24 14966 20 Distance to peak center (± 1kb) 1194 G -75 00 MED1 MED1 MED24 -100 -100 G 10 10 kb kb 10 10 kbkb DLAT OGDH G OGDH r DLAT er e 5 H 2S5P ol2 1 r ro c H P D PD LD DLT 0 et H S A A -100 DAPI MED1 DAPI 10 kb MED1 (n=4609) Po A1DH 10 kb [0-12] MED1 T L oc t e [0-12] [0-12] [0-12] LA D DLAT DLAT HeLa HeLa DAPI -50 MED1 er milions -50 [0-12] D DLST -50 [0-12] l D 60 60 [0-4] FLAG HeLa -100 -100 [0-4] [0-4] [0-4] F FLAG DLST OGDH olocal. ilions MED16 n=5132 n=4609 60 +[0-4] - FLAG siRNA NT + - + - [0-4] 1 40 40 [0-4] [0-4] [0-4] [0-4] DLST DLST MED1 - + - + - + s DLST -50 -50 -100 -100 F DLST OGDH n=5132 n=4609 -100 MED16 siRNA NT + - + - + - 1 MED1 - + - + - + Count per milion reads -50 -50 DLAT MED1 Mediator and Acyl-CoA producing enzymes colocalyze on chromatin -50 0 RAW 264.7 D 0.7 FLAG RAW 264 60 Empty DLST 40 obj. (%) 0 OGDH Distance 20 to peak center (± 1kb) with MED1 % G with MED1/tot. MED1 0 10 kb 10 kb MED1 r DLAT oc r DLAT Po A1 H 2S5 oc PD LD T DAPI er [0-12] MED1 LA [0-12] DAPI H er D l D et Colocaliz

Use Quizgecko on...
Browser
Browser