Midterm 2: Next Generation Sequencing Methods PDF

MIDTERM 2 OCTOBER 2ND Cheaper genome - Using BAC clone library - Save money when sequencing because not using hierarchical steps (preparing BAC clone library and then bac clone and fragmenting clones) - Instead use celera genomics method of shotgun sequencing (fragment entire genome and sequence it, and assembly is easier as you have a reference, even with overlapping ends) - Sequence many sequences (more at parallel) at once compared to sanger sequencing which is one at a time - Shrink everything down to one sequencer : NGS NEXT GENERATION SEQUENCING METHODS - 1. Integration : seamless system for efficiency by removing user intervention - 2. Parraeliization : sequencing millions or billions of sequences at once - 3. Miniturazation: 1 sequencer, smaller sizes of reagents, faster process, smaller footprint Workflow 1. Prepare Library - Starts with genomic DNA extraction by isolating nucleus (100-500ng) - Represents many copies of human genome one copy from many different cells a. Fragment DNA to generate random ends - goal is to create overlapping ends which will align together for sequence assembly. Aiming for 200-400bp in comparison to large bac clones. b. Adapter Ligation. Adapters are short oligonucleotides designed and supplied by manufacturer. Insert is target DNA. SP1 and SP2 are two primers which are sequencing primer binding sites. Compared to sanger sequencing which needed a primer and then would bind to a complementary vector, this case a primer is going to bind to this adapter allowing insert sequencing. P5 and P7 are anchors of target DNA to a solid suffice. Index sequence; not all NGS has it but it is essentially a tag so u can massively sequence millions of sequences at a time. These are all olignonuucleotides double stranded ligated on both ends of the target DNA. c. Size selection - want short DNA fragments, optimal size is determined by the limitations of the instrument. Can control the size at the beginning by fragmentation technique time and instrument. Goal is uniform sequences in size. d. Amplify, this is optional to ensure you have avid material to sequence. It also enriches fragments that have ligation on each end. PCR ensures you only amplify the fragments with adapters because ligation is not efficient and removes everything that doesnt have the adaptors on 5’ end and 3’ end. e. Now ready to be sequenced. 2. Sequence Library - In HGS all of the DNA was cloned into vectors and transformed into bacteria - In NGS there is no copy of library, fragment, adapter ligand and here to solid surface. If want more we need to retrieve from source and refragment i - Massively Parallel Sequencing - Flow cell : glass slide contains microfluidic channels which allows introduction of reagents which flow over glass slide. Each chanelle has small wells (patterned flow cell) allows for separate DNA templates so signals dont overlap with eachother. - Illumina Sequencing (synchronous sequencing method because each individual cluster is being introduce all dntps at once, so each strand is growing at the same rate and every cluster is being synthesized at the same rate.) FIRST WAY - Each flow cell can sequence 400 million templates. Which is 2 billion sequencing templates. - 1. Flow cell contains DNA templates. ( at this point, dna fragments have adaptors on the end, all is good to go) - 2. Attach DNA templates to dna flow cells (bottom of microwells). On surface there are many oligonucleotides attached to the surface. 1. Type of oligonucleotide complementary to P5 adaptor and another complementary to P7. Which helps anchor template to the surface - Oligo is attached to the surface of the cell by its 5’ end. 3’ hydroxyl is used to make a phosphodiester bond with another nucleotide, acting as a primer to copy the template. Using DNA polymerase will extend from the 3’ hydroxyl copying the template creating a duplex, of original template and reverse complementary newly synthesized strand. One strand (new strand) is attached to surface, other strand is not. - Next is denature and separate the two strands then introduce wash buffer through the two strands. The synthesized strand will stay and original template will go away as it wasnt attached to anything. So left with the reverse complement which is attached to the flow cell. Occurs at 100 million locations. - If we used one copy of the DNA, since we rely on fluorescence it wouldnt be strong enough as one fluorophore cannot be seen hence we amplify by making more copies of one template - This is done (the copying) of the reverse complement and it loops around and creates complementary sequences as one has p3 sequences on end and p5 on the other. It loops and creates and 3’ hydroxyl copies template through DNA polymerase and then again denatures. - Hence when buffer is added they both are attached to solid surface and 2 ssdna which bends over and creates a bridge ongoing until 2k molecules which are copies of the original template (ISOTHERMAL BRIDGE AMPLIFICATION - goal is to end with collection of signal strand dna molecules at the end) - Repeat until there are clonal clusters which are repeats, but not all strands in clusters are identical because half our molecules are reverse complement of the other half (sense and antisense) - This will be problematic as its gonna give us mixed signals. So we have to cleave reverse strands through cleavage strands by knowing p5/p7. We must get rid of 50% of our dna to receive a homogenous sequence. - Block 3’ free ends of oligos on lawna nd template strands as they willc continue extension as it could loop and make a template. - amplified clusters, density depends on how much dna you began with. Underclustered , overclustering make sit too hard to measure fluorescence signal. Each dot is a fragment of genomic dna SECOND WAY Base pairing to metallic beads EMULSION PCR( example of miniature and parallel) : emulsion, add everything you need for PCR reaction to tube (primer, template, DNTP, polymerase beads and then oil, because it doesnt mix, so it will form droplets which contains everything required to make pcr reaction so you can do many pcr reactions simultaneously. So you can use 1 tube for millions pcr reactions) - Identical to pairing the template dna to flow cell but htis has metallic beads which contain template dna. Metallic beads are covered with oligos which are complementary to adaptors to pair together and dna polymerase extends oligos to make a copy of template, denature, associates with another oligo, extends denatures, associates etx. Until entire metallic bread is bound by oligos to fill surface. - Seqenucing a. Sequencing by synthesis ( dna polymerase, sanger sequencing) b. Sequencing by ligation (dependent on dna ligase, ligate complementary sequence) 1. Illumina sequencing - Fluorescence based method - In sanger sequencing, fluorescent nucleotides were chain terminators which lacked 3’ OH, so they would terminate - In Illumina, they have re versible terminators which have a blocked 3’OH but can be unblocked to restart synthesis, theres also a reporter dye to fliorescentce, theres also a cleavage site to reduce signa. 1. Why do sequence from top down? Complementary strands have to be anti parallel. So if template strand is attached 5’ at the bottom. 3’ end is at the top hence synthesis is at the top. Add complementary primer which will bind to every template in flow cell because they all contain same sequencing primer 1 sequence 2. Microfluidic channels will aid in introducing all DNTPs at once which contain different fluorescent molecule. 3. Wash away all unincorporated nucleotides and only one nucleotide is added because 3’ OH on modified nucleotide is blocked. 4. Image the fluorescent signal which corresponds to the nucleotide. (cycle 1) 5. Then unblock 3’PH and remove fluorophore to prepare for 2nd cycle. Flow through dNTP again. Next complementary will bind and create a phosphodiester bond, only one will be added which has a terminator on its hydroxyl. 150bp is ideal sequence read length. 6. End up with composite image which shows identity of the first nucleotide added. - Paired end sequencing ( sequencing both ends of molecule sense and anti sense) - Oligo lawn contains both types of oligos, we can convert the entire library to the opposite strand. This is done by denaturing sequence strand and youre left with the original strand and you unblock all 3’OH to loop over and form a bridge and then use dna polymerase to make a copy and then cleave original forward strand. So youre making the reverse complement of the strands and now they are attached to surface of flow cell. Somehow end with double??? 54 MINS. - Cleaable dye is on the phosphate group because itgets removed during formation 3. Analyze Data OCTOBER 9TH B. Ion Torrent (thermo fisher) - asynchronous - Single nucleotide edition method instead of 4 at a time (SNA). - First platform without fluorescence sequencing. - Dna sequenced on beads that contains template which has a unique template. Each well is connected to sensor which can detect change in voltage. - Sequencing by synthesis starts with primer, because primer is a free 3’OH which is needed for phosphodiester bond formation. - In ion torrent , the dntps are not modified and not labelled so there must only be 1 nucleotide addition at a time for differentiation. - 1 nucleotide added through flow cell so it gets exposed to wells. If it is complementary to the template. Dna polymerase creates a phosphodiester bond and when that happens a proton is released and will change the pH of the solution, there is 0.02 unit change. And it is one proton by many molecules. This change in pH is picked up by change in voltage pucked up on semiconductor and ends up on a trace which is proportional to a change in voltage. - Then wash away anything which wasnt incorporated. - Say the nucleotide isnt incorporated, nothing happens washes away, say it is, it is added and peak forms, wash away etc. repeats until sequence sufficient length for molecule - Nothing prevents two from being incorporated if it matches, this causes the release of 2 protons, so then bigger change in voltage. Change in pH is imperfectly proportional to nucleotides added - This is problematic for homopolymeric regions as you then need to extrapolate and guess from the graph then number of nucleotides added leading to errors. - IR generates reads of 400 bp and is short read sequences. Less maintenance and less cost. Compare and contrast illumina and ion torrent. 3rd GENERATION SEQUENCING What is the problem of short reads - Short reads have issues when long repetitives, copy number alterations - We need long read sequencing to generate reads of thousands base pairs - Referred to as real time sequencing. For entire genome sequencing C. single molecule real time sequencing SMRT (PacBio) - Two other methods rely on linear fragments - In this case we create circular template - Create large fragments because we want long reads best up to 50kb. Usually for 15-20kb - Ligate adaptors of single strand to form a loop at the end of the molecule for both 3’ reverse strand and 5’ end forward strand. - If we denature it ends up as a circular template with both strands like paired end sequencing on the same molecule - For this work you must use a strand displacement dna polymerase (example of sequencing by synthesis) - What this does is when it finds ddsdna it pushes aside one strand and use the other as the sense strand. - This is done by attaching a primer and dna polymerase. It will use 3’OH on the primer and move along replicating, once it has moved all the way around, itll repeat the pushing and use the original template again making another reverse complement. So it continues to use original template by displacement and pushing - This results in a very long raw read. As you have the original strand in bp then the information on the SMRT bells and then the sequence read information - Then use bioinformatic tools to separate and trim and remove adaptor sequences then separate forward and reverse read and then create a circular consensus sequence - Even though DNA polymerase makes mistakes, hence accuracy of sequencing is lower, but since there are many reads of the same sequencing you can create a consensus sequence. - Dna polymerase fixed to the well and template binds and add primer. Bottom of wells are transparent and detectors are on the bottom. - Nucleotides are modified, no blocking groups but do have fluorescent labels on phosphates. Only measures fluorescence in active site of dna polymerase which is fixed to the bottom of the well. - Fluorescence increases after active site association. Dna polymerase creates phosphodiester bond and proton and phosphates are removed to stop fluorescent signal. And then signal decreases. - This is called real time because there is no need to intervene during signalling reaction adn theres no need to add 3’OH or cleave signal. - Process is faster, longer reads etc 4th GENERATION SEQUENCING - Eliminates the need to copy the DNA strand and instead reads the nucleotide sequence from the molecule directly. - Still aiming for long reads; best for repetitive reads and genomic sequencing - Advantage od long reads - Longer reads, easier assembly D. Nanopore sequencing (MinION, ONT) - Works by having a membrane with a bunch of pores with motorprotein (DNA helicase which is an enzyme which unwinds dna) - 1. Generate a library, pairing them to flow cell, beads, create a SMRTbell.in this case you add a single strand adaptor to one end - When you denature instead of circular template you end up with one linear molecule with both forward and reverse sequence - When you apply this to nanopores each template associates with each pore, the motorprotein aids in unwinding DNA and feeds the end of the dna through the pore and block the pore based on nucleotide structure - Add an electrical current to the pore. This current will be modulated based on the bases structure. - Squiggle: modification of current because of base structure. This is the output of sequencing. - Use machine learning sequences using algorithms to decode the sequencing based on known sequences. Allows us to sequence ultra long reads and it much quicker as there is no need for dna polymerase and works for - Works for modifications on bases as well - DNA methylation - cytosine methylation (CpG islands on the heterochromatin, silencing gene expression and transcription) - Cant use illumina because it will be complementary either way. Why bother with NGS is human genome has already been sequenced - Information on health - Reference came from handful individuals hence a collection - Lack information on variability which may not be captured in reference - NGS is used a lot in paleontology, we can track evolution and hereditary and ancestry DNA. - gives information on genetic causes on hereditary diseases. - disease causing mutations Whole exome sequencing - Exome ; portion exons in individual genome, portion of dna retained after splicing - 85% of disease causing variants come from exons. 2 ways to capture only exons, receiving starting materials. 1. Fragment 2. Methods to capture exon sequences a. In solution hybridization - Probes, complementary to exons (we know because we already sequence genome and know exon sequence - Attached to biotin molecule and add beads which recognize biotin - Use this to purify or precipitate probes bound to exon sequences. And isolate entire complex. Separate exon from entire complex - Then use NGS b. Array based - Glass slide with DNA bound, which is a probe which is complementary to an exon sequence on face , dna of interest remains bound and everything is washes away and elute dna from array and left with exon collection. Targeted gene sequencing - Selecting genes youre interest in sequencing - Only gonna sequence select genes, exons introns etc. - Useful for analyzing material - Can start 2 ways - Targeted enrichment designing probes which will recognize the sequence and then separate it from the rest - Amplicon sequencing designing primers to amplify gene of interest, amplify and then sequence. - Newborn screening mitochondrial nuclear (425) MODULE 3 OCTOBER 21ST Nucleotide diversity around 0.5%, 50 million bp differentiation. Phase diploid genome - sequences of both homologues. - Helps dictate linked gene which are lost in consensus sequences. Mutation: permanent change to DNA sequence - Caused by failure of DNA repair machinery to fix mutations - Location of mutation effects wether its highly affected or not Genetic Variation Variants: nutations that result in alternative forms of DNA Common: minor allele, least common with frequency of atleast 1% in the population, version of majority of population - typically have no effect because if it was detrimental it would be eradicated. If it was beneficiary it would become fixed. Rare: minor allele frequency of less than 1% Allele: version of a group of nucleotides found at a given region in the genome - 2 alleles per genetic locus, 1 allele from one parent and 1 allele from other parent. Genomic variant types SNPs (single nucleotide polymorphisms) Why are these the most useful in finding disease causing mutations. - Most abundant but occur infrequently - Single nucleotide change in many different forms - Most common variant 1/300 nucleotides - Caused by errors in dna replication (unrepaired errors) - Found in coding , non coding and intergenic - Most found in intergenic because most DNA codes for nothing, 1.5% codes for proteins. Most do not have an effect - Non coding regions; long non coding rnas. Single nucleotide variants, 5’UTR or 3”UTR which effects stability nd gene expression - In regulatory region; enhancer or promoter, affecting binding too. - Coding regions; synonymous and non synonymous - Synonymous; change which does not change the amino acid - Non synonymous; changes amino acid - Missense; different amino acid to be incorporated which can influence function or will be degraded. - Nonsense; appearance of premature stop codon so protein does not get translate into functional gene product. - Causative: genetic change that causes disease phenotype. - Correlated: the change itself does not cause disease but is found close enough to disease causing mutation and are inherited together. - Human germline mutation rate: - 1.2*10^-8 per site per generation - 1 in 100 million nt=ucleotides are substituted per generation - For every embry there will be 30 new dna changes in that individual. - Most SNPs are bi allelic - Two options for that positions, A or G, one or the other at that position, some may be triallelic - This is because these mutations are very very rare. - Comparing tells us about shared ancestry - Not all SNPS are disease causing. - Rs17822931- affects ABCC11 which encodes for membrane transporter Insertion Deletions - Second most comment - 1 per 10kb of DNA - Vary length 1-10kbp - Caused by error in replication, repair or recombination - Coding regions indelts - Frameshift - Non-frame shift - multiples of 3 SSR simple sequence repeats - Majority not in coding regions - Microsatteleties - 3% of total DNA - 1/30kbp - 1-6bp in tandem upto 100 times - Different loci can have different repeating units - Members of a single family probability similar. - Used to trace paternity - Huntingtons disease - Autosomal dominant (1 parent needed) - Polyq (trinucleotide expansion) (tag repeats) - Caused by CAg repeat in HTT gene - Under 35 is normal, after is not good - Repeats keep growing so generations passing, earlier and more severe effect of huntingtons disease. Copy number variants - Smaller seems to be more abundant - 1kb found more than once (duplication) - As important or more than SNPS - Less abundant but affects more nucleotides - Can be used to detect using CGH OCTOBER 23 Population level sequencing 1000 genome project - Goal was to create a deep catalogue of human genetic variation - Looking for variants with frequency less than 1 - Went in with sanger sequencing - Exome aggregation consortium - Catalog genetic variation only in protein coding regions - Advantage to being just focused on protein coding regions is you dont need to sequence as much. - This study looked a t snps - Genome aggregation - First one to fill the gap - Focused on 50 variants larger nucleotides - Characterize snaps deletions duplications, translocations. Results - Study name was gnomad-sv and compared it to 1000 genome projects and other population level - Sequenced 14237 with 32x coverage - 46% was european, 9.2% eas, 8.7% america. 34.9 was african and 1.2 other - Representative with adult population of average 49 years and depleted of severe mendelian disorders. - Separated into CNV (unbalanced changes) and other SV (balanced structural variants). - Complex SV - 2 or more other types of structural variants. - Results - Deletions are most common then insertions and duplications - Inversions are detected less frequently, this can be because there are less or because an issue of assembly - 8775 SV in african genome holds true in 1000 genome - goes back to founder effect and out of africa theory. - Most SVs are small and fewer that are large, mean size are 331 and most are rare - Singletons are ultra rare, so one copy in one individual - Structural variants are very rare. - Smaller SVs are the most abundant - Most complex svs have an inversion and/or duplication Information on ethnic backgrounds - Americas have the most variability - African has most variant sites per genome - Out of africa theory, established by a smaller group of people. - Europeans and asia have similar variant numbers per genome - Large number of variants points to an older variant - Differs from 4.1 to 5 million - OCTOBER 28th Looking at structural variants in chromosomes - MCNV and duplications are more around centromere and telomeres where theres the repetitive regions - The others tend to not occur in repetitive regions - Summarise the key findings of the paper. Why do we care about cataloguing genomic variants - Genomic variants which are disease causing - Variants occur in junk dna, non repetitive regions - No obvious phenotypic effects - Use it for hereditary genetic disease testing, disease causing genes Finding disease causing genes Mendel’s law 1. Law of segregation - parental alleles segregate randomly (chromosomes) 2. Pairs of alleles segregate independently (alleles) Crossing over and gene linkage - Occurs during meiosis 1 - Crossing over occurs between 2 sister chromatids of 2 different homologous chromosomes - Once they segregate, they will have 2 with parental haplotype and 2 which are a shuffle of the alleles - Hence 50% are parental and 50% are recombinant - We also know that sequences in proximity they are typically inherited together - which is an exception to law 2 - For these to separate, the initial dna break must occur between the two sequences of proximity before recombination. - - 2 germ cells with parental genotypes - Recombination frequency - Number of recombinants over total progeny - Closer they are together the less likely they are to separated during crossing over - Recombination to distance (1% to 1cM) - Allows us to a SNP because they are the easiest and most characterized to genotype - Use it to find precise inheritance pattern without knowing where it is - Finding dna variants which are most common in a group - Then we can use this to find the genes nearby and use SNPs to find causing genes even though the SNPs dont cause the diseases. - Finding genes responsible for complex traits Mendelian disorders - Mutations in genes and inherited in mendelian fashion - Allows us to study inheritance patterns - Dissection of complex traits - do not follow mendels law because it is usually more than one gene controlling the phenotype. - Inheritance patterns - Incomplete penetrance : mutant genotype but not mutant phenotype ( so they have the genotype but may never develop the mutation physically) - Phenocopy: mutant phenotype is not caused by the genotype; breast cancer but not caused by genotype but sudden mutation - Genetic heterogeneity: mutation at more than one locus causes the same phenotype, different genotypes giving rise to same phenotype - Polygenic heredity; 2 or more genes influence the expression phenotypic trait. Usually phenotypes which have a broad spectrum. - So how do we find the loci responsible for the disease phenotype? - Haplotype association analysis - Haplotype: haploid genotype - On one of two chromosomes - Cluster of variants (group of SNPs), two or more variants which are present on chromosome and genetically linked so they are inherited together. - So we will associate haplotypes with disease phenotypes. - For SNPs to be useful, genetic distance has to be small around 1-100kb so its not broken up. Most useful - Linkage disequilibrium - Non random association of alleles at an adjacent loci - So no longer randomly separating - Tag SNPs (representative of haplotype) - Choosing tag snps to represent the haplotype. - Indirect association, looking for correlated or causative. - Region of high linkage disequilibrium: a region of snps that tend to stay together October 30th Genome wide association studies - Different mutations that cause phenotype same but cant use pedigree analysis - Most snps are not causative but can be correlated with disease - Large scale genome wide way to scan for polymorphic markers for large population (1k analysis) - Looking for genetic variant which correspond for a certain disease - If found it could be close to a disease causing mutation as they are typically passed together and are probably in linkage equilibrium - Revolves around SNP genotype; discover all genotypes across a genome, as many as possible - Ways to do SNP genotyping: goal is to determine which variant as many places as possible - Whole-genome - Targeted: SNPs of interest, using a narrow approach - Microarray (most common) - Can find millions of markers at once - Illumina - 1. Denature and amplify genomic DNA - Depending on sufficiency of material will be amplified, most of these methods are based on fluorescence,here there is fluorescence but it is not labelled - 2. Fragment, purify and precipitate - 3. Hybridize to a beadchip - Add them to an array which will look for complementary sequence and bind, but gdna is not labelled like in cgh (contrast - Each bead is coded with a sequence specific oligonucleotide - These are made to be complementary all the way to the snp, and thats where the probe binds - Each bead for different snp - 4. Wash away we didnt need - 5. Detection step: use polymerase to extend oligo probe - Must keep in consideration that probe is attached by 5’ end so 3’oh can extend - Dna polymerase adds complementary nucleotide - All 4 nucleotides are labelled with a fluorescent nucleotides - All nucleotides added at same time - 6. Image the beadchip; infer identity at each position - 3 possibilities - Green C - Greena nd red G and A - Red; T - GWAS - 1. RECRUITING affected - 2. Genotype ; which SNP at each position - 3. Statistics, find snps more frequent in effected - 4. Identify genes which are close to sequence in gDNA - Maybe use BIObanks - Accepts process and distributes biospeciments - Multistage approach - Initial first stage; small with liberal p value - genotype as many as possible - If its too stringent we risk miss some SNPs - Used beadarray - Stage 2: more stringent p value - Snps which were statistically significant and genotype in larger population - Stage 3; optional, more stringent - Manhattan Plot - Genome wide view of all SNPs - SNP that had been their positive control and identifying it as their most positive snp with a low p Value - PAR; propotiona of incidence due to exposure - The effect of DNA variants on human health - SNPS that do find in a coding region Precision Medicine - Adverse drug reactions - Because of variants in gDNA - Pharmacogenomics - Looking for variants which effect drug influence - Maybe make treatment plans? - 1. Blood clotting - Intrinsic pathway; damage from within Extrinsic; damage from external; rapid and fast - Both lead to activation of factor X in the active form - Cleaves prothombin to thrombin (serine protease) which will cleave fibrinogen(soluble) to fibrin(insolubin) - Vitamin K dependent - Warfarin; anticoagulant drug, doesnr change viscosity but slows down clotting cascade. - It inhibits vitamin k dependent synthesis of active forms - Factor 10 needs vitamin k to turn prothrombin to thrombin - Thrombin needs vitamin k to turn fibrinogen to fibrin - Enzyme that airds vitamin k recycling - Reduced form of vitamin k is the one which is cofactor - Oxidized form has to be reduced again to be used - Stop reduced vitamin k so u reduce efficiency of clotting cascade. - Cytochrome p450 - Super family of enzymes - Turns prodrug into active - Active drug to inactive form - Warfarin is metaboliszed by CYPC29 - Both S and R - S is most effective - CYC2C9 converts into warfarin intohydroxywarfarinw which cannot inhibit vitamin D - Comes from variants in cyp2c9 - 1 - wildtype (normal to warfarin) - 2- variant (amino acid substitution from arginine to cytsosine) - 3- variant - snp changes codon which changes sequence and alters activity. - Both ⅔ causes reduced enzyme activity - Conversion to reduced warfarin is slower and active form is in your system longer - 1/1 normal - ½ intermediate to poor and slow - 2 has 20-40 reduction - ⅓ - 40-80 reduction - Poor metabolizers need a slower dose - Because active form stays longer - So inhibition of vitamin k stays longer, which decreases ability to clot - VKORc1 - Codes for synthesis of vitamin k - Oxidized vitamin k to reduced vitamin k ( warfarin target) - Recycles vitamin k - Contains snp in promoter region (g/a) - Results in decreased transcription of Gene - Gg- high - Aa- low NOVEMBER 2ND Precision Medicine - Treatments and cures - Protein therapeutics - Give recombinant proteins - Limitations: how do proteins enter transporter cell - Limited to extracellular uses - Viral and non viral - Treating loss of function genes - Limitations: reduction is never 100% - Off target effects - Rna modifications - Complementary to rna and reduces defective modified gene product - Off target effects, silencing something important - Safety; random insertion. What if we could alter the genomic DNA sequence instead Editing DNA 1. Mechanism which allows you to target specific sequence of dna 2. Mechanism which allows you to cleave or cut dna at this area. Zinc finger motifs - Fusion proteins which bind to dna in tandem to a nuclease domain - Made of non specific nuclease fok1 restriction enzyme domain which is attached to series of zinc finger proteins - Zinc fingers are found in transcription factors - Each zinc finger recognizes 3 nucleotides on a domain each has a different sequences - Use these amino acid sequences to make a fusion protein to recognize a certain dna sequence we choose. - Summary; targeting x sequence, fuse 3 different zinc fingers creates a larger protein and we will fuse fok1 endonuclease so that it can bind to dna and cut dna at this position. The zinc fingers bind as dimers because fok1 works as a dimer, we need both to break dsdna. Way to produce a precise and localised dna break Talen - transcription activator like effector nuclease - We need two halves of a dimer - Also uses fok1, need to dimerize to break domain - Instead of zinc fingers, use tales which is made of 33-35 amino acids, all the same except hypervariable residues which differ from each talen repeat - These two allow us to recognize a specific nucleotide - We have to create a large code proteins so it can recognize target sequence This break must be repaired Endogenous repair pathways NHEJ - Works by having large complex proteins which can recognize free ends and bind and make them compatible, proteins brings them together and uses a ligase to remake the phosphodiester bond - Very error prone; insertions or deletions will happen when ligated and can occur to frameshift - These have been used to disrupt gene expression. - Most used operates during the whole cell cycle - Disrupts gene expression Homology directed repair - Large complex recognizes and binds to the ends, it creates an 3’ overhang and tries to find its complementary region and creates a new holiday duplex and uses homolgous chromosome to fix the chromosome which can result in gene conversion - High fidelity high repair but takes time - Limited to certain stages of the cell cycle, late s phase and g2 phase when homologous chromosome is nearby - Repairs defective allele Crispr-cas9 Clustered regularly interspaced short palindromic - Defense mechanism for bacteria - 3 systems - 1,2,3 - 2 components- single guide rna and cas9 - Cas9 cuts the dna - 1 that cleaves sense and 1 that cleaves anti sense - sGuiderna takes it to specific region so you can make it complementary to target sequence and region - Sgrna is engineered - Made of 2nucleotide at 5’ end called protospacer region which will be complementary to target sequence - This will determine binding region - Protospacer regions confirms its specificity - Must be upsrtea of PAM which is sequence right beside protospacer region - The typical is 5’NGG3’ - The nucleotide protospacer will bind adjacent to this pam sequence. - Sgrna binds upstream but opposite end of PAM - When it binds it displaces the duplex and creates sdna region - This directs cas9 to cleave dna - Ruvc and HnH - Always cleaves 3 nucleotides towards 5’ direction always upstream of PAM - Then NHEJ or HDR will repair it. - Series of hairpins at the end help stabilize the complex - Now must get into nucleus - different ways - Mammalian - plasmids transient method - Purify cas9 and rna in vitro and introduce pre assembled complex - Viral particles introducing dna that codes for cas9 and rna and this would integrate into dna and create cas 9 - Compare cas9 to zinc finger and talens - Depending on dna rna interactions vs dna proteins - Dna rna is more precise but proteins are harder to predict - This is much easier to do than others by changing the first 20 nucleotides - Zinc or talens u need to see amino acids and exon counts. - Advantage; we can put many different srnas at once - Multiple gene knockouts - target many genes at once to prevent gene expression - Exon exchange to correct dna mutation, large scale deletions Variants - Nickases’ 1 of 2 domains have been mutated - 1 domain is inactivated - D10 ruvc is inactivated - HNH will cut the strand by the guide rna - Ruvc cuts strand opposite of guiderna - h840 HNH is inactivated - - only cleaving strand opposite of guiderna - why would we want only one strand cut - we can use them as a pair Pair nicking; we will introduce a pair which will bind closely to eachother and cas9 cuts on their respective strands which makes a staggered dna end which will give us overhangs, increases specificity by increasing odds of dna break at target region because instead of 20nt are recognized it will 40 nt, these breaks need to happen close so dna repair machine can recognize this as a double stranded dna break. Odds they both bind in the wrong place. - d10A nickase Off target effects - Unexpected adverse alternation to the genome - One drawback of using crisprcas9 - Hence use paired nicking - Unattended loci Nov 6th How does deadcas9 work - Catalytically inactive and unable to cleave dna - Guide rna binds and bring cas9 and dcas9 will bind to a domain and acts as a activator target. - We can now use - Fuse effector domain to cas9 Base editing - Cytosine base editor - Undergoes spontaneous deaminatoon - Adapots confirmation of uracil and triggers repair machinery to fix it, uracil is read as a T and will put an A across from it. - Any spontaneous deamination causes mutation which is not fixed. - Endogenous system exploited as a base editing tool. - Want to convert cg to at bp - Similarities with cas9 - Protein complex gui is what deaminates. - We have to inhibit ung to it changes. - UGI inhibits UNG Activity window - Guiderna binds opposite the PAM - Be3 has activity window of 4-8 - Cytosines will be edited - Any cytosine in active site will be edited must only be one cytosine Adenosine - Opposiye - Adenosine deaminated becomes inosine and becomes G by dna polymerase - No known enzyme which will deaminate adenosine - Guide rna binds to strand opposite to PAM - Edit occurs in activity window on sgrna - Adenosine becomes dominated - D10 recruits repair machinery - Recognized mismatched base pair - Changes nucleotide Base editors are more reliable than cas9 - Less chance of off target effects than crispr cas9

Midterm 2: Next Generation Sequencing Methods PDF

Document Details

Tags

Related

Summary

Full Transcript