12_BL_20231102_WhisperAI_1.docx
Document Details
Uploaded by CourageousStrength
ETH Zürich
Full Transcript
We will continue on the topic of transcription. We will discuss the expression of genetic information, the first step of expression of genetic information, and that is how DNA molecule is copied into RNA molecule. This process is called transcription. We will discuss the basic machinery involved in...
We will continue on the topic of transcription. We will discuss the expression of genetic information, the first step of expression of genetic information, and that is how DNA molecule is copied into RNA molecule. This process is called transcription. We will discuss the basic machinery involved in transcription, RNA polymerase, and what are the features of DNA that RNA polymerase recognizes. The process of transcription, can be divided in several stages. Initiation of transcription, elongation, that is iterative addition of one nucleotide after the other, and ultimately termination of transcription. And in this context, there are particular sequence features that play an important role, for example, in either initiation or termination process. And thenwe will continue on the topic of RNA processing. I will give you some examples of RNA processing in prokaryotes, and this will give you a hint of what is about to come when eukaryotic systems are discussed. So, eukaryotic cells, for example, our cells, have much more extensive RNA processing enzymes and machines that are not present in bacteria. But already in bacteria, there is some aspect of RNA processing that will be a good introduction to the topic. And finally, I will also introduce to you some sort of an outlook to what is happening in eukaryotes, and that includes self-splicing introns, catalytic RNAs, and then how transcription in eukaryotes is different in terms of the architecture of RNA polymerases, what kind of regulation of transcription takes place in eukaryotes, and a bit of an idea of what kind of mRNA processing is important in eukaryotes. Let me start with a little bit of historical facts about how this extremely important aspect of expression of genetic information has been discovered. So, I told you last week about how the discovery of DNA came about, and very soon it became obvious that in order for this genetic information to ultimately be expressed in a phenotype of any cell, from bacteria to our cells, there has to be some intermediate molecule. And RNA, which was also a nucleic acid, was considered as a prime candidate for this role. But there was a bit of a dilemma and confusion in the field, and it's quite remarkable. I'll show you now what people thought might be happening. So, there was a bit of confusion. So, as I mentioned to you, in eukaryotic cells, which are much bigger than bacterial, DNA molecule was identified to be localized in the nucleus by Friedrich Mischer. Of course, he didn't know what DNA does, but he found this molecule to reside in nuclei of eukaryotic cells. However, most RNA molecules that were identified to be associated with the site where protein synthesis takes place were found in cytoplasm, outside of the nucleus. And stable RNAs that could be isolated belong to this little particle called ribosomes. They were later turned ribosomes, but they were identified as particles that were very rich in RNA molecules, and as particles that were very close to each other. So, this was the site of where proteins are synthesized. So, let me a little bit tell you about RNA molecule. I know you heard about it, but I would like to just emphasize a few additional points that go beyond the differences in the architecture induced by the presence or absence of the two prime OH group. In DNA, of course, this hydroxyl group is missing, and in RNA it's present. So, this leads to these differences in the architecture of the double helical segments of DNA versus RNA. And sometimes DNA can also adopt structures similar to RNA, double helix, but it's rare. That's so-called A-form DNA, or the form that I told you also is present in RNAs. But beyond this difference in the two prime OH group presence or absence, there is also a difference in the composition of nucleotides. And already last time I mentioned to you that thymine nucleotide that has very similar structural features as uracil, and in fact can form equivalent Watson-Crick base pair geometry as uracil, is present in DNA, but uracil is present in RNA. So, the difference between these two bases is this metal group that is there in thymine, but absent in uracil. However, this is the region of the nucleotide that is involved in Watson-Crick base pairing, and you will see that this distribution of atoms and groups is exactly the same in both cases. Other nucleotides are exactly the same, so you have adenine, guanine, cytosine, and then the only difference is uracil or thymine, and both of them can pair with an A exactly the same way. And this is shown here. So, this is one aspect of structural formulas that you should be able to answer in case it is asked in your exam. So, this is a GC Watson-Crick base pair, and this is an AT Watson-Crick base pair, and it looks exactly the same when you compare it to the AU base pair. So, in principle, the U that is present in RNA can base pair exactly the same. And this is very important to understand, because this is the basis of how information in DNA can be very easily copied directly, copied or transcribed into an RNA molecule. So, it is not copying, which takes place when you generate exactly identical molecule, and that is in case of DNA replication. In this case, when you go from a T base pair to a U base pair in RNA, you are transcribing the information. So, basically, it is very close to copying, but it is changing the identity of the molecule that is being synthesized. And in this context, I would again like to emphasize the fact that it will be very important in our understanding of transcription and many other RNA processing processes. RNA molecule, even if it is single stranded, and it is mostly single stranded in nature, can nevertheless form double helical regions through self-complementarity. So, if you have a linear sequence where one part of the sequence is complementary to another part of the sequence, where you have a sequence where there is C U G and the opposite side, you have anti-parallel sequence of G A C, then those will form Watson Creek base pairs. And so, RNA molecule, even though it is mostly single stranded, will tend to stabilize its own structure through formation of these double stranded regions. So, this will be actually the minimum energy state, because base pairing and formation of the stacked interactions along the RNA duplex is a stabilizer. So, stabilizing, that means that there is energetic benefit from these types of interactions. So, this is the summary. Through self-complementarity, you end up with secondary structure of RNA in three dimensions. This RNA can fold into three dimensional shapes, and not only forming these double helical regions, but RNA can also form complex three dimensional folds. A little bit similar to what you have heard about how proteins can fold, except the main difference between folding of proteins, they're mostly driven by formation of this hydrophobic core, where hydrophobic residues kind of pack and get away from the solvent. RNA packing involves double helical segments, and then additional, often hydrogen bonding patterns that bring this secondary structure features into more three dimensional shapes. So, you can have multiple double helical segments sometimes interact, sometimes the two tips like this, which are called a loop, this is called a stem loop like structure, this loop regions can sometimes interact and form three dimensional quaternary type of interactions even between different types of RNA. Okay, so let's come back to this question that I asked you just a minute ago. So, since ribosomes were abundant in the cells that were making proteins, people assumed erroneously that ribosomal RNA, which was identified, those ribosomes had its own RNA, that this meant that this RNA is actually the message that can specify a particular protein. But did this really make sense? And I'll show you why this does not make sense. Ribosomal RNA is homogeneous in length, so if you analyze these ribosomes, they will all have the same length RNA molecule. Can this explain the great variety of ribosomes? Great variety of lengths of polypeptide chains that are found in proteins. Of course not. So the information content is homogeneous in the ribosomal RNA, but the information content in proteins is highly heterogeneous. So, RNA found in ribosomes was not the one responsible for synthesis of proteins. And people spent days and years thinking about this problem theoretically. But the solution came from a beautiful experiment that was carried out in 1958 by Volkin and Astrochan. What they have shown is that the problem was identified in 1958 by Volkin and Astrochan. What they have shown, and this analysis of the composition of bases was of course also critical in Chagraf's experiments to demonstrate the nature of base pairing between bases in DNA. And this experiment was more difficult to do, but it illustrated exactly what is the relationship between the sequence in the DNA and the sequence in the RNA product. So what this experiment showed is that when they would infect bacteria, so this is again using a virus, Phi X 174 virus, which is infecting bacteria, and because of that these types of viruses are called phages. They are infecting bacteria. They could find out that as a product of this infection with this template that has this composition, they didn't know the sequence, but they could just see that there is 25% of As, 18% of Cs, and so forth. That in the end, within experimental error, so they couldn't determine it so precisely, but they saw that the RNA product that they could isolate, a single RNA molecule, had the equivalent number of Us to As, the equivalent number of As to Ts, the equivalent number of Cs to Gs, and Cs to Gs. So basically as a result of this, it turned out that there is a direct relationship between the DNA template strand of DNA that is used for synthesis of the RNA and the RNA product. So T is complementary to A that is being produced, and therefore the numbers are exactly the same. So the number of As, the percentage of As, is equivalent to the percentage of Us. And just look at this. So that meant when you have a template strand of DNA, there is one-to-one relationship between A and U in the RNA, between T and A in the RNA, T and A, between C and G in the RNA, between C and G in the RNA, and so forth. What was very useful in this case was that they could use a DNA template that consisted of only one strand of DNA, and that was the T, and that's why they could see this relationship. Otherwise, if they would have had a template, a virus that had double strand of DNA, this would have been more difficult to interpret, because then you would have also had to account for the number of Ts in the coding strand of DNA, and then the numbers would not be as clean, because you would end up with averages of Ts that are here and averages of Ts that were in the opposite strand of DNA. So this experiment showed you just the relationship between the strand of DNA that is being transcribed into a strand of messenger RNA. So based on these findings, these people did not really know what this meant, and this is quite remarkable. They didn't have enough overview of the science at that time to really understand what this meant. But people who thought about it a lot, and many of these scientists at that point started their career as physicists or mathematicians. So they were very good in understanding these types of abstract relationships. They, Francois Jacob, St. Brenner, and Francis Crick, they discussed these results and said, well, this is the answer. This is the messenger RNA. That's the intermediate between the DNA sequence and the small molecule that ultimately will be used for protein synthesis. So based on this central dogma of molecular biology, the information flow is from DNA to RNA to protein, and with this discovery, Jacob, Brenner, and Crick said, aha, this was the moment, the aha moment, this is the RNA, the messenger RNA molecule. So messenger RNA, big new chapter, that will be now discussed. So how does this work? So this concept I already introduced, a gene, a sequence of nucleotides in the DNA, is not directly translated into protein. So gene does not build a protein directly. You need an intermediate, and this works for both prokaryotes and eukaryotes, and I'm sorry, I know this class is mostly focused on prokaryotes, but in eukaryotes, you have DNA that is spatially separated in the cell, in the nucleus, and then it has to be transcribed into RNA and then into proteins. So messenger RNA is a copy of just one small region of DNA, not the entire thing. DNA replication always involves copying the entire information content. Here we are copying just a segment, but it still has to be copied very accurately. Messenger RNA still speaks the same language of DNA, so we are not talking about a change of type of information. It's just transcription into a slightly different composition of molecule, chemically different molecule, but it is still made of four nucleotides. It has the phosphate-sugar backbone. It has four types of nucleotides with U being replaced with the U in RNA, existing as a T in DNA. So these are the different between DNA and RNA. You have in DNA deoxyribose, you have ribose, extra OH group. These are the four bases present in RNA. These are the four bases present in DNA. Shape is double helical. Sometimes DNA can also be single-stranded, like with the example of a single-stranded cell, but it's not a single-stranded cell. It's a single-stranded cell. It's a single-stranded cell. It's a single-stranded cell. It's a single-stranded cell. It's a single-stranded cell. Because it's helical, sometimes DNA can also be single-stranded, like with the example of this virus. But RNA can be single-stranded. It can adopt also secondary structure, and sometimes RNAs can also form double helical segments if you have two complementary RNA strands. So some viruses, for example, will have their genome as double-stranded RNA molecule. RNA molecule. So what are the reasons for using a messenger? First of all, it's a very elegant solution because if you need to constantly make and copy information from DNA, this is a very strictly controlled process, as I told you. Whenever you need to copy or read the information from DNA, you have to separate the DNA strands. So during DNA copying, you had to have these DNA binding proteins that will protect single-stranded DNA. However, in this case, you will see that RNA synthesis can be done with minimal opening of the DNA structure. So this protects the original and its genetic information. Once you have a single-stranded copy of one of the DNA strands, the single-stranded molecule is actually easier to read the information from. So if you need to read off what the meaning of these bases are, it is obvious that this will be easier if these bases are not involved in Watson-Crick base pairing. So a single-stranded molecule is more suitable for reading off the information. And of course, it's obvious, but it should be emphasized, that using an intermediate in this process between DNA and protein, you can make a lot of these intermediate molecules. So messenger RNA from a single region of DNA can be made in one or ten or hundred or thousand copies. So that has important implications for regulation of transcription or how this information from DNA is read. So one part of DNA can be read just a little bit, and another part of DNA can be read in large amounts. So this allows a unique level of regulation that I'll discuss later. Messenger RNA versions and machinery that is involved in editing and changing these features of RNA can be used on top of everything else to allow additional heterogeneity of products. So this again gives you an opportunity to go beyond the information in the DNA. And on top of that, this secondary structure propensity of DNA to form secondary structures can also be exploited to regulate gene expression. So how does this central process in the cell work? Transcription means synthesis of RNA according to DNA template. So I will first introduce to you the basic concepts of what has to happen, almost geometric concepts of what has to happen for transcription to transfer DNA to the cell. For transcription to take place. And then I will show you what is the machinery involved in the process. So for the information in DNA to be transcribed, you in fact have to melt the DNA. So this is concept number one. You cannot read off the information according to Watson-Crick base pairing while DNA is still double helical, because then the Watson-Crick phases of each base are engaged in hydrogen bonding interactions. So to read the information, the first thing that has to happen is for the DNA duplex to melt. Once you melt the DNA duplex, you will, as I told you before in the case of DNA, generate a winding of the DNA in front and a strain on the DNA. However, since you do not need to keep unwinding the DNA, you will not keep generating this torsional strain on the DNA. Because as the information is copied, only a small region of the DNA has to be melted. And then once the copying is done, the DNA should be allowed to rebase pair, and then it can simply rewind. So whatever type of unwinding or winding type of tension is generated on the DNA during transcription is immediately resolved. So you do not need enzymes that will alleviate this type of tension on the DNA. Local melting is permitted and it's energetically costly, but not such that you would need a special machinery to account for it. Now, the second main thing that has to happen when you melt the DNA, you have to accurately synthesize an RNA molecule that is increasing in length and therefore a duplex between RNA and DNA has to form when the information is copied. This is referred to as the RNA-DNA hybrid helix. So at the three prime ends of this duplex, hybrid helix, the next nucleotide is added. So I will show you more details later, but the chemistry of nucleotide addition is very similar to the chemistry of nucleotide addition during DNA copying or DNA replication. This would be the active site of the enzyme that has to carry out transcription. The enzyme is called RNA polymerase. Once, and as I mentioned to you, it is usually difficult to accurately introduce the first nucleotide that has been copied. This is the reason why DNA copying starts with initial insertion of RNA molecule, because this is imprecise. So the precise addition of the next nucleotide critically depends on the stacking interactions, and that's why you have to maintain during transcription a certain length of double helical hybrid, because only then the next nucleotide can be very precisely added. But the length of this duplex is not so critical. So as long as it exists, it's good, but at some point you actually do not want this duplex to be long, because you want the DNA molecule to re-form the duplex, not to generate too much winding tension, and that's why the product RNA molecule, so-called newly synthesized RNA molecule, or nascent RNA, is allowed to depart away from the DNA duplex, at which point the DNA refolds. So you have local melting of the DNA, formation of a duplex when the synthesis starts, starts from zero. First nucleotide might not be so correct, but then later it's a highly accurate process of transcription. So how about the chemistry of nucleotide addition? This slide looks very similar to the slide that I showed you for the DNA, except that here we have an OH group in the 2' position of ribose. Again, you have addition of the next nucleotide based on Watson-Crick base complementarity. And when the next nucleotide is positioned, you have nucleophilic attack of the oxygen belonging to the hydroxy group attached to the 3' carbon of ribose, so this is 3' OH group. And then you have nucleophilic attack to the phosphorus atom of the alpha phosphate of new incoming nucleotide. As a result, you have departure of pyrophosphate, so the beta and the gamma phosphates are allowed to leave. And as a result, you have a covalent bond between the last nucleotide of the primer strand of the growing RNA molecule with the nucleotide that follows, based on the sequence in the DNA. So you have in this reaction an extension of the growing RNA strand by one nucleotide. But as I already mentioned at the beginning, this process of transcription involves very defined stages. So I would now like to come back, now that I have introduced some basic aspects of how nucleotide addition takes place, I would like to come back and reintroduce all the key stages and how they're happening. So the transcription can be divided into initiation, elongation, and termination. So initiation, this is pretty much the stage of transcription where RNA polymerase has to find the right place to start. This is very important. It's a little bit like your eyes screening some sort of a text and identifying a chapter that you actually are interested in reading. So the RNA polymerase has to find the start point. And the start point on the DNA is identified without even separating the DNA strands. So as I mentioned to you before, DNA has this minor and major groove where external enzymes and proteins have access to the sequence information from the side. So not from the Watson-Crick side, but from the side they can recognize certain chemical pattern that is present only when certain sequences are present in the DNA. So kind of screening through the DNA, the RNA polymerase identifies the place where to start transcription. And these sites on the DNA, as you will hear more about, are called promoters. So these are the initial binding sites. When the promoter is recognized, then you have opening of DNA, so melting of the DNA duplex. And then once the first bond between two successive nucleotides is formed, you shift to an elongation stage of transcription. So during initiation, RNA polymerase binds to the promoter region. And you will hear there are additional factors involved in the process. The DNA melts and then the nucleotides are recognized and first nucleotides are paired to the template DNA. During elongation, RNA polymerase continues adding nucleotide after nucleotide. These initial transcription factors that helped find the start point, they move away. And this opened region of DNA, which is called transcription bubble, is moving along the DNA. And only one of the two DNA strands is copied. I'll tell you more about it later. And then finally, when certain sequences in the DNA are encountered, then termination stage of transcription takes place. And at that point, the RNA polymerase dissociates and the fully synthesized RNA transcript is allowed to dissociate and then engage in subsequent cellular processes. So this is how you would envision transcription bubble that moves along and a short segment of RNA DNA duplex is formed and nucleotides that have to be incorporated are added one after the other. I'm sorry about this. There was a blank slide. So what kind of sequences are used to be recognized by RNA polymerase during initiation of transcription? And this is very important to understand that, and I already mentioned it in the brief introduction of of the features of transcription and stages of transcription. So RNA polymerase will find, with the help of additional proteins called transcription factor, transcription start sites. And this is what they mean. So if you look at different sites where transcription is initiated and you analyze the sequences, you will notice that even though there are sometimes some small differences in the sequence, like in one case here there is a T, in some cases G, and so on, there is a usual and pretty strong consensus on this sequence. This T-A-T-A-A-T sequence that is very precisely located in terms of the distance from the start of transcription is always found. So here you have transcription start, it's usually an A, and at position that is referred to as minus 10 region of bacterial promoters, you have this particular sequence. So in bacteria analyzing the sequence and analyzing where transcription starts, you can very well predict which regions of DNA are transcribed. This is much more difficult in eukaryotes because these signature sequences are much more diverse and influenced by many more factors. But in bacteria you can do this very accurately. So basically these are the key characteristics of bacterial promoters.