Summary

These notes provide a detailed explanation of DNA sequencing, including its importance in identifying genes, proteins, and mutations. They also cover the sequencing of genomes in various organisms, from viruses to humans, along with a general overview of the sizes of genomes across different species.

Full Transcript

4 Dua’ Al-Shrouf Malak Salameh Mamoun Ahram DNA Sequencing ❖ DNA sequencing: is the process of determining the exact order of nucleotides in a genome/DNA fragment. ❖ Read the DNA sequence from 5’ to 3’ ❖ Importance: Identification of genes and their localization Identification of protein structure a...

4 Dua’ Al-Shrouf Malak Salameh Mamoun Ahram DNA Sequencing ❖ DNA sequencing: is the process of determining the exact order of nucleotides in a genome/DNA fragment. ❖ Read the DNA sequence from 5’ to 3’ ❖ Importance: Identification of genes and their localization Identification of protein structure and function (because if we identify a gene we can identify the codons, and translating those codons to amino acids gives us the sequence of the protein. The sequence of the protein helps us in predicting the structure (we can use artificial intelligence and bioinformatic tools which help us in predicting the structure with high accuracy), and as a result we can predict the function/localization/what proteins it can interact with and so on) Identification of DNA mutations (we have data bases for what a normal human genome should look like, so we can compare the sequence of our unknown DNA/DNA from someone with a certain disease to this data base and we can pinpoint exactly where a mutation occurs and how it is related to a certain disease) Genetic variations among individuals/populations in health and disease (by knowing the DNA sequence of an individual we would be able to know how variable it is in comparison with other individuals) Prediction of disease-susceptibility and treatment efficiency (we can predict what diseases they can get as they grow older, or if they are slow metabolizers of certain medications because they have a genetic variant, we can also determine the best treatment for an individual, or if we know the sequence of the cancer of a patient we will be able to determine what treatment is best for that cancer. So, we can help individuals in taking care of themselves) Evolutionary conservation among organisms (how organisms are related to each other in terms of the evolutionary tree, so we can compare human genes to mouse genes and so on. This might help us understand our cells and diseases, how cells function normally or whenever there’s a mutation cause we use animals as model systems for humans) DNA sequencing of organism genome ❖ Viruses and prokaryotes first ❖ Determining the sequence of genomes started with simple organisms (genomes) like viral and bacterial genomes because they were small and easy to handle ❖ Then the mitochondrial DNA was sequenced followed by simple eukaryotes ❖ These simple eukaryotes were the first eukaryotic genomes to be sequenced, and these eukaryotic genomes were that of yeast, Saccharomyces cerevisiae ❖ Then the genomes of multicellular organisms were sequenced like the nematode Caenorhabditis elegans (a worm) ❖ Then along the way the human genome was sequenced but it hasn’t been completed yet, we still have to determine the final regions or pieces of the Y chromosome (even in science girls are easier to deal with) ❖ Determination of the base sequence in the human genome was initiated in 1990 ❖ This table isn’t for you to memorize but it shows a comparison between the genomes of different organisms, number of protein coding genes and the number of chromosomes ❖ Mbp (is the mega base pairs or 106 base pairs) ❖ Notice that the yeast has more chromosomes that the fruit fly even though the fly is more complex ❖ The mouse/dog /chimpanzee genomes and the human one almost have similar numbers Nucleotides per genome ❖ This isn’t for you to memorize ❖ This diagram shows a comparison of genomes of different organisms starting with bacteria and ending with mammals including humans, so you can see how variable the genome sizes are in different species and it’s sort of related to the complexity of organisms (not 100%) and there’s a lot of variation DNA synthesis/elongation ❖ In picture (a) we can see the nucleotide structure which is composed of a deoxyribose (sugars), three phosphate groups, and a base ❖ The nucleotide is the substrate of the DNA polymerase ❖ The sugar is deoxy because it’s missing a hydroxyl group on the second carbon ❖ Note that on carbon number 3 (C3) we have a hydroxyl group attached/ conjugated to it ❖ While carbons number 5 (C5) is associated with a triphosphate group ❖ So, whenever DNA is synthesized, you’ll have attachments of this phosphate with the hydroxyl group at the 3’ (three prime) end (so you add nucleotides to the DNA from the 3’ end) forming a phosphodiester bond ❖ So, the phosphodiester bond forms between the hydroxyl group of the third carbon (in the nucleotide at the Nucleotide = deoxyribonucleoside end of the developing DNA strand) and the phosphate of the fifth carbon (in the incoming triphosphate (dNTP) nucleotide that’ll continue the formation of this Nucleoside= nucleotide – triphosphate strand) ❖ Note that energy is needed for the formation of this bond, this energy comes from the release of two phosphate groups (from the triphosphate attached to the 5th carbon in the incoming nucleotide) ❖ The formation of the strand continues in a downward direction (from the 3’ end) ❖ So, in order to synthesize DNA we need a dry phosphate nucleotide and a hydroxide group at the three prime end 1)The basic method of DNA sequencing (radioactivity) ❖ The technique of DNA sequencing originally was based on the use of a Di-deoxyriconucleotide ❖ The Dideoxyribonucleotide: is similar to the deoxyribonucleotide but here both the second and third carbons are missing a hydroxyl group ❖ IMP: This means that whenever a Dideoxyribonucleotide is added to the developing DNA no other nucleotide can be added to the three prime end (3’) and no phosphodiester bond would be formed because there’s no hydroxyl group on carbon number 3 (there’s no reactive group where the phosphate of the incoming nucleotide can bind) ❖ So as a result of adding this nucleotide the DNA synthesis stops at this point (the very last nucleotide that is added to the DNA) ❖ And this is the method used for DNA sequencing The process ❖ The idea here is in order to sequence a DNA ddATP: Dideoxyribonucleoside triphosphate A what we need is a primer, just one primer ddTTP: Dideoxyribonucleoside triphosphate T because we’re going to sequence just one ddGTP: Dideoxyribonucleoside triphosphate G strand (not both strands) ddCTP: Dideoxyribonucleoside triphosphate C ❖ We also need a DNA polymerase ❖ So, we need the primer because DNA polymerase cannot start DNA synthesis from scratch ❖ DNA synthesis is initiated from a primer that has been labeled with a radioisotope (like radioactive phosphorus) (it’s labeled because the incoming nucleotides will be attached to it) ❖ We also need the substrate of the DNA polymerase and that is the four deoxynucleotides (A,T,C,G). Four separate reactions are run, each including deoxynucleotides plus one dideoxynucleotide (either A, C, G, or T) ❖ Incorporation of a dideoxynucleotide stops further DNA synthesis because no 3 hydroxyl group is available for addition of the next nucleotide ❖ What happens is that we’re going to have four reactions in four tube in each tube we’re going to have: 1) The template (the DNA strand that is complementary) 2) DNA polymerase 3) The four substrates (A,T,C,G) 4) A single primer (that has a known sequence) 5) One Di-deoxyneucleotide (for example in tube number 1: add Di-deoxy G in tube number 2: add Di-deoxy C in tube number 3: add Di-deoxy T in tube number 4: add Di-deoxy A -The idea here is that DNA polymerase will start synthesizing DNA if it adds a deoxyribonucleotide it can continue but if it adds a Di-deoxyribonucleotide the synthesis is terminated because no nucleotide can be added to the deoxy three prime end of the DNA (and the type of this Di-deoxy will be the next nucleotide in the sequence) A helpful video: https://youtu.be/dVRB4CaLizc?si=pmD-jfbwFa6_xIYn Primer The drawn 5’ and 3’ ends are for the developing strand and not for the template Template Polymerase ddNTPs Substrates ❖ As we can see here, in tube 1 the Dideoxy G was added to the primer and the synthesis stopped, that tells us that nucleotide G is the first in sequence after the primer sequence (which is known), looking at the fourth tube after adding a deoxy G a Dideoxy A was added and the synthesis stopped that means that nucleotide A is the second in sequence and so on ❖ Notice that tube 4 fragment is one nucleotide taller that tube 1 tube 3 fragment is one nucleotide taller that tube 4 tube 2 fragment is one nucleotide taller that tube 2 Generation of fragments ❖ A series of labeled DNA molecules are generated, each terminated by the dideoxynucleotide in each reaction ❖ Having different fragments means having different lengths of the DNA fragments ❖ These DNA fragments differ in one nucleotide, which means we’re going to have a sequence of nucleotides (as we said above) ❖ These fragments of DNA are then separated according to size by gel electrophoresis and detected by exposure of the gel to X-ray film (that if we use radioactivity which is not used anymore because it’s risky and there’s a much easier method right now) ❖ The size of each fragment is determined by its terminal dideoxynucleotide, so the DNA sequence corresponds to the order of fragments read from the gel ❖ Note: the primer is labeled so that we can see the attached fragment in the gal under X-ray Example Primer Template ❖ Here we have a template that we want to sequence, to know the order of the nucleotides we use a primer (that has a known sequence), this primer is labeled with radioactive phosphorus, we add to the tube a DNA polymerase, the 4 normal substrates and then we add to each tube one Dideoxynucleotide, for example here Dideoxy G (ddGTP) was added in a low concentration ❖ So now whenever the DNA polymerase sees a C on the template it’s going to have two choices: A) Adding a normal deoxynucleotide (substrate) G (dGTP) B) Adding a dideoxynucleotide G (ddGTP) ❖ The higher probability here is that it’s going to add a normal substrate (dGTP) because (ddGTP) exists at a low concentration 1)So in the pic let’s say we started with 1000 template molecules, the DNA saw a C in the template so it has two choices, and since the ddGTP is at a low concentration let’s say the polymerase adds it to 100 molecules, so we still have 900 molecules having the normal dGTP and as a result the DNA polymerase in these 900 molecules can continue the synthesis of DNA, but the synthesis of the rest 100 molecules is terminated -These 100 molecules are represented in ,notice that we’re going to have a very short fragment 2)for the other 900 molecules DNA synthesis continues by adding normal deoxynucleotides, then the DNA polymerase recognizes a C in the template again, and here we also have two choices (probabilities), which are adding a dGTP or a ddGTP, synthesis continues or stops respectively -So, let’s say for 100 templates out of the 900 molecules we have the addition of a ddGTP, the synthesis will be terminated for these 100 molecules - DNA polymerase continues the synthesis of the rest of the molecules (800) until it reaches another C 3) Reaching another C means that we have to put a G again, and we have two choices, and as we now adding a ddGTP stops the synthesis, so let’s say the synthesis stops for another 100 molecules, so now we have 700 molecules left for the DNA polymerase to synthesize and so on ❖ Notice that in those three cases we’re going to have DNA fragments of different lengths Example ❖ We have four tubes right here, in each one of them we have the double stranded DNA (the DNA that we want to sequence), a primer which is labeled, the substrates and DNA polymerase ❖ In each one of these tubes, we add a specific ddNTP: (the pink colored nucleotides) -in tube 1: ddATP -in tube 2: ddTTP -in tube 3: ddCTP -in tube 4: ddGTP ❖ We took one of the DNA strands as a template and added our known primer. The first nucleotide in the template (after the nucleotides attached to the primer) is dTTP, so an A should be added to the developing (complementary) sequence ❖ We can notice that in tubes 2,3,4, synthesis would continue normally because there’s only dATP ❖ But in tube 1, you have ddATP and dATP, so the DNA polymerase here has two choices. There’s a higher probability that the polymerase would add a dATP, but it adds ddATP to some fragments where the synthesis would stop with the addition of one A after the primer. ❖ Let’s say we added a normal substrate to the rest of the fragments in tube 1, the synthesis will continue (ATGTC), then we will face another T in the template. ❖ Here we will go back to the two probabilities, if it adds a dATP synthesis continues, but for some of the DNA fragments a ddATP is added and that results in the termination of these fragments ❖ Note that in each one of these tubes we’re going to have different fragments of different lengths ❖ Same thing with the other tubes, so in tube 2 ddTTP is found, so all of the resulting fragments end with a T. While in tube 3 the fragments end with ddCTP ❖ Notice that the primer is labeled which means it gives us a signal, so we separate the DNA fragments using gel electrophoresis (separated according to size) The shortest fragment. The 2nd shortest. The 3rd shortest. ❖ The fragments above are gonna move really fast in the gel (small size) The 2nd longest The 5th The 3rd The longest ❖ These large fragments aren’t going to move fast Large fragments Small fragments ❖ The larger the fragment the slower it moves, so the larger fragments can be seen on the top, while the smaller fragments in the bottom ❖ The largest fragment is on the highest point, while the smallest fragment is in the lowest point ❖ These fragments differ from each other by just one nucleotide (so the smallest fragment in the lowest point has one nucleotide while the second smallest fragment has two and so on) ❖ The resolution of the gel is really high, so we can separate the DNA fragments based on size even if they differ by one nucleotide ❖ Each column represents a tube, so the fragments seen in the first column are the ones found in tube 1, so whenever we see a segment, we know that it ends with a ddATP ❖ So, by going from bottom (from the smallest fragment that has one nucleotide which is a ddNTP) to top (to the largest one that has the longest sequence and ends with a specific ddNTP) we would be able to know the sequence of the newly synthesized DNA ❖ So, for example the first fragment in the bottom ends with an A because it’s in the first column, so the sequence starts with an A ❖ And the longest fragment (on the highest point) is in the 4th column, so the sequence ends with a G ❖ The bottom represents the 5’ while the top represents the 3’ prime, and in this example the DNA sequence reading from the bottom to the top is (ATGTCAGTCCAG) ❖ By knowing the newly synthesized DNA we will be able to know the sequence of the template (because it’s complementary to the synthesized DNA) Example ❖ Here we can see different results for a different template, this (the pic on the left) is a gel electrophoresis done for the synthesized DNA ❖ Here again we start from the bottom (the 5’ end) to the top (3’) to know the sequence ❖ What’s the sequence (for the photo on the left)? (The Ans: TGCGGGCTTATCGGGTCTAA) ❖ The pic on the right explains how switching the gel (hypothetically) can give us the sequence of the template (by switching the letters which means switching between the (C and G) columns and between the (A and T) columns ❖ Read the switched gel from the top to the bottom (top=5’/bottom =3’)) ❖ So, the columns are (ATCG from the left) but when we switch them, they’ll be (TAGC from the left to the right) ❖ Note: The fragment distribution isn’t switched with the letter so the A in a switched gel will have the fragment distribution of the T in the newly synthesized DNA ❖ The switched gel is a hypothetical gel for the template ❖ You can easily find the template sequence by flipping the sequences and without using the gel, because the complementary strand of the newly synthesized DNA is the template ❖ Remember that the DNA is anti-parallel (the 5’ of one strand meets the 3’ of the other strand) ❖ Let’s say that two bands appear at the same level (as we can see in the first picture), which means they have the same size, so The paternal chromosome:from the father they have the same number of nucleotides but differ in the type of the The maternal chromosome: from the mother last one (the ddNTP) ❖ It’s not a necessity to have one band at the same level, you can have two bands having the same length. Remember our cells are deployed, which means we have two chromosomes. So whenever we take a DNA from an When we started sequencing individual, we’re sequencing both chromosomes (the human genomes, we learned paternal and the maternal) that humans are pretty ❖ The DNA polymerase in this case would use each identical, but we saw specific chromosome as a template (we have two templates), changes between people in one letter at a time, which and it would read them at the same time ❖ So, it reads the same sequence using both templates we now call a single (they would be exactly the same) except if there are nucleotide polymorphism genetic variants (there are differences in DNA sequences among individuals) ❖ So, at certain point the DNA polymerase might read a T on one chromosome and at the same time read a G on the other chromosome (genetic variant). These are single nucleotide polymorphism (individual number 3 in the lower explanation) ❖ After this genetic variant they’ll go back to being This photo is for you to understand identical until the polymerase faces another genetic variant ❖ So you can have DNA fragments of the same level (length) migrating together (but in different tubes), and this is polymorphism or it can be a mutation. This person is a carrier (depends if it’s a mutation or polymorphism/ heterozygous ❖ How can we differentiate between polymorphism and mutations? If it exists in more the 1% of the population, it is a polymorphism but if it exists in less than 1% then it’s a mutation ❖ If we sequenced a DNA of a normal individual and we saw a band at a certain level in the A column, then we sequenced the DNA of another individual with a certain disease and we saw a band at the same level of the band above but in the T column, this means that the second person has a mutation or polymorphism on both chromosomes, so this person is homozygous (individual number 2 in the lower explanation) ❖ Explanation: for example in a normal person (individual 1) both chromosomes have a dATP at nucleotide number 10 so we will see one band in column A at a certain level, then we sequenced the DNA of another individual (individual 2) and both of his chromosomes have a dTTP at nucleotide no.10, we will see one band in column T that means that this person has a mutation/polymorphism in both chromosomes because he was supposed to have a dATP at this level, if we sequenced the DNA of a third person (individual 3) and he has a dATP at nucleotide 10 in the maternal chromosome and a dTTP at nucleotide 10 in the paternal chromosome, we will see two bands at the same level in columns A and T, this person has a polymorphism or a mutation in one chromosome, and he’s heterozygous/ carrier Picture 3 (individual 3) Picture 1 (individual 1) Picture 2 (individual 2) There’re 2 bands at the same level one in column A which is the normal site and The normal site of this band is The band should’ve been in column A at in column A, a normal person, this level but it’s in column T, a person homozygous (same nucleotide with mutation/polymorphism in both in both chromosomes) chromosomes, homozygous the other one in in column T, a person with mutation/polymorphism in one chromosome, heterozygous/a carrier ❖ Then we have to study if this genetic variant or mutation is pathogenic or not 2) Fluorescence-based DNA sequencing ❖ Working with radioactivity isn’t really friendly, it’s hazardous. The individual working with radioactivity can have the radioactive phosphorus (used to label the primer) get into his body called induced mutations (it’s harmful) ❖ So, scientists suggested using fluorescence instead of radioactive, and that would make it less laborious (easier), because the whole process of dealing with 4 tubes then looking at lanes and trying to read the sequence is really laborious ❖ The scientists also suggested making the DNA sequencing automated by letting the an instrument read it ❖ We use fluorescent substrates, so in addition to the (dATP,dTTP,dGTP,dCTP), we’re going to have ddATP which gives a red fluorescence, ddCTP that gives a blue fluorescence , ddGTP which gives a green fluorescence, and ddTTP which gives a yellow fluorescence ❖ Reactions include the four deoxynucleotides plus the four dideoxynucleotides in the same reaction with each ddNTP labeled with a unique fluorescent tag. ❖ In a single tube we’re going to have: 1)template 2)the primer (not labeled) 3)the normal substrates (dNTPs) 4)the fluorescent substrates and each one of them would give us a certain color (ddNTPs) ❖ The DNA fragment would be easily read, so in the picture above in one single tube the DNA polymerase would read a G and add a dCTP or ddCTP, so in some fragments synthesis would stop when it adds a ddCTP but for the majority the synthesis would continue ❖ Then it sees an A so it either adds a dTTP for the majority or a ddTTP for some of them ❖ So synthesis continues for the dTTP fragments, and so on ❖ Remember that adding a ddCTP, this fragment would fluoresce with a blue color, and if there’s fragments that end with ddTTP they would fluoresce with a yellow color ❖ Now the fragment that ends with a C and the one that ends with a T differ by just one nucleotide, so we can have our DNA fragments separated through a gel where they get separated based on their size even though they differ by one nucleotide because the resolution of the gel is very high Large fragments Small fragments ❖ So, instead of having four lanes, we’re going to have our fragments separated in one lane, the ones on the top are larger and the ones in the bottom are smaller ❖ Each one of these fragments gives a center color, and as they migrate you have a sensor that reads the fluorescence, and it transforms the results in the shape of peaks (as we can see in the picture above) ❖ So this sensor reads a peak and it can translate it into a letter ❖ Now we have an instrument that reads the DNA sequence while translating colors into letters ❖ We’re going to have results that look like this picture ❖ Let’s say that the left one is normal, and the right one is from an affected individual. So you look at the peaks and the instrument will start translating each color into a letter (A,T,C,G) ❖ The right diagram shows an individual that has two peaks appearing at the same location (like individual 2 with two bands), so the instrument will say that we either have a polymorphism or a mutation. This affected person is heterozygous because he has two overlapping bands and they give signals, so the instrument will read both signals at the same time from the same band, so it puts two peaks at the same location ❖ Even though we have two peaks, the instrument gives us one letter, and since there’re two peaks we will figure out that there’s a C on one chromosome and an A on the other one, so this person is heterozygous and has a polymorphism or a mutation (individual 2) ❖ if we knew that the normal person should have a blue peak (C) at a certain location but the diagram shows a green one instead (A), this individual has a mutation or polymorphism on both chromosomes and he’s homozygous (individual 3) (he has an A on both chromosomes instead of having a C) Next- generation sequencing ❖ Remember that the human genome project costs about 3 billion dollars and they aren’t done yet because they still have the Y chromosome to finish and it takes a lot of time, so scientists invented a much faster method to sequence DNA ❖ This method is called next generation sequencing, there are different technologies invented so far ❖ So, basically what happens is that a long DNA fragment (a whole genome) is taken (costs 100-500$) and we want to sequence it ❖ Cellular DNA is fragmented randomly, so we’re going to have different fragments of different lengths (long or short and they usually have a certain range of sizes) ❖ We aren’t dealing just with one molecule, we have like millions of human genome molecules, and since all of them are fragmented randomly, some of these fragments could overlap ❖ Then the DNA adapters are added to both ends of each DNA fragment, these adapters have known sequences ❖ Each DNA fragment is attached to a solid surface/platform and we have DNA synthesis going on for each one of these fragments ❖ Amplified like PCR using primers that anneal to the adapter sequences (the fragments have different sequences, but all the added adapters have the same sequence, so the same primer is added to all of them because this primer is complementary to the adapter) ❖ Four-color nucleotides with terminating ends are added. (so we have sequencing going on for each one of these fragments ) ❖ The platform has billions of clusters (DNA fragments) attached to it, how can we sequence all of these fragments at the same time? Using technology, you have a camera that records what each one of these clusters does, and what color is given by each one of these clusters ❖ Let’s say we took one of these clusters and we added special nucleotides: 1) whenever these nucleotides are added no other nucleotide can be added 2) In order to add another nucleotide the nucleotide that is added (at the 3’ end) must be chemically modified (activated), then the DNA polymerase puts the incoming nucleotide in place so now no other nucleotide can be added again 3) Each nucleotide gives a certain color, so whenever you have the addition of a nucleotide it gives a color, and this color is recorded and it gets translated into a letter, then it gets chemically modified so we can add another nucleotide ❖ So, one nucleotide is added at a time to each one of these DNA fragments in a cluster ❖ A single nucleotide is incorporated and unincorporated nucleotides are removed. ❖ The incorporated nucleotide is modified in two ways: It is activated and detected by a special camera. A new nucleotide can then be added to it. ❖ The cycle is repeated. (The nucleotide is activated so we can add another nucleotide, a color is given, the second nucleotide is activated to add the third one and so on) This video is from the slides, and it contains a lot of details that we don’t have to worry about : https://youtu.be/womKfikWlxM?si=MNiYeJFA8AT3cO_P The detection ❖ This black square if a part of the solid platform ❖ In round 1, we add the special nucleotides. We can see 4 clusters, in the first one a G is added, while in the 2nd/3rd/and 4th, C/A/T are added respectively ❖ The addition of each nucleotide will give us a certain color, and we have a camera on top that detects the color that comes out of each cluster ❖ So, the camera tells the computer what nucleotide was added to each cluster ❖ In the second round the added nucleotides are activated (modified) so we can add other nucleotides, so in clusters 1/2/3/4 the nucleotides C/T/G/A are added respectively, the the camera reads these clusters all at once ❖ In the third round we have different colors generated from each cluster and the camera reads them ❖ At the end, the sequence of the DNA in cluster 1 for example is GCTGA ❖ While the DNA sequence in cluster 2 is CTTAG and so on ❖ So, we would be able to know the DNA sequence of each cluster A real look ❖ It would really look like fireworks going on and all the clusters can be read at the same time by the camera ❖ Then each color of each cluster is translated into a letter ❖ Using bioinformatics, we take the sequence of each cluster and since there’s an overlap between the different fragments the computer can put a sequence to all DNA fragments by combining all of this information End of Sheet 4

Use Quizgecko on...
Browser
Browser