12_BL_20231102_WhisperAI_2.docx
Document Details
Uploaded by CourageousStrength
ETH Zürich
Full Transcript
Very important information that I'm telling you here now. So promoters are specific DNA sequences that recruit RNA polymerase and these are the sites where transcription is initiated. In bacteria promoter is recognized directly by RNA polymerase holoenzyme. I'll tell you a little bit more about how...
Very important information that I'm telling you here now. So promoters are specific DNA sequences that recruit RNA polymerase and these are the sites where transcription is initiated. In bacteria promoter is recognized directly by RNA polymerase holoenzyme. I'll tell you a little bit more about how these RNA polymerases look. That means that it's an enzyme with an associated sigma factor. So together sigma factor and RNA polymerase form what is called a holoenzyme. So that's the entire enzyme necessary for initiation of transcription. So these bacterial promoters have highly conserved hexanucleotide sequences in the minus 10 and minus 35 regions. So the minus 10 region, that's important to know, this distance. So it's highly conserved distance and sequence is called the Pribnow box and this region is called the minus 35 region. You can also refer to this region as the minus 10 sequence in the bacterial promoter. So E. coli promoters have three conserved elements. This minus 35 sequence. This is at 35 nucleotides before the transcription starts. This is the sequence TTGACA. Bacterial promoters often also have the minus 10 sequence, which is centered 10 base pairs upstream of the start point of transcription and this sequence is TAAT. And what is also very important is that the spacing between this minus 10 and minus 35 region is within one base pair. So 17 nucleotides. So this is a very exact distance and you will soon see why this is the case. So it'll be easier to remember why these bacterial promoters, all of them have this preserved sequences and spacing between the homologous regions. So how does promoter recognition and transcription takes place in this in bacterial cell? So time wise, the formation of this initial complex between RNA and polymerase and DNA takes some time. So it can be measured in seconds. So this is the slowest step of transcription. So this is the slowest step of transcription. So finding the right promoter and kind of melting the DNA, this takes seconds. But then once the reading of the information and the DNA takes place, this is much faster. It's slower than the DNA polymerases that can go thousands of nucleotides per second. Here the elongation during transcription takes place at a rate of about 20 to 50 nucleotides per second. The process is highly processive. So that means that once transcription starts that RNA polymerase should not and usually does not dissociate from the template. That would be a wrong RNA molecule if it is released before the entire region of DNA is copied that has certain information in it. So one thing that you see here and that maybe you did not realize immediately is that and I told you before that DNA segment can be copied many times. So you can have this is a way to regulate the amounts of RNA that are present for a particular sequence in the DNA. But this can happen simultaneously. So basically you can have one RNA polymerase follow another one before the first one has finished synthesis. So this means that many RNA polymerases can simultaneously transcribe a gene. And so basically here you have initiation site and these little strings that you see that are longer and longer, these are actually longer and longer RNAs that are synthesized by RNA polymerase that is moving down the DNA. And this is the stop of synthesis and this is another initiation site. So very nicely you can see how this works. So in this process the error rate of transcription is much higher than the error rate during copying of DNA but it's still very highly accurate process. So it's RNA polymerases, bacterial RNA polymerase will introduce one wrong nucleotide per 10,000 incorporations. So how about this enzyme? How does it look? So let me show you. It consists of multiple protein subunits and these are the core subunits here in ochre and blue, gray, yellow and green. Holoenzyme that I mentioned before that is capable of recognizing the promoter has one additional subunit called sigma and it is shown here in purple. So Holoenzyme binds weakly to DNA in general and then kind of linearly along the DNA sequence diffuses and finds the promoter. Then when promoter is found then RNA polymerase binds tightly and clamps onto the DNA and then it opens up the bubble and you'll see later then sigma subunit dissociates. So what are these subunits and what are their roles? Well the catalytically active subunit are these beta and beta prime subunits. So the beta prime and beta subunits are the two subunits of RNA polymerase that clamp around the DNA. So they form this claw like feature, like craps claw that goes around the DNA, opens up the duplex and catalyzes the reaction. So these are very similar. They are not identical in sequences but they are very very similar in sequences. Then alpha one and alpha two, one and two, these two subunits are required for assembly. So they simply help this clamp or claw be formed. So these are architectural subunits and the same thing is for sigma subunit here. So basically the association between the beta and the beta prime is helped by this sigma subunit here shown in gray. And ultimately the, sorry this is not sigma, this is omega. So this omega in gray is needed to stabilize the interaction between beta prime and beta. And finally the sigma subunit, the purple one, this is the one that gives the RNA polymerase high affinity for promoters. So how does this enzyme look in real life? This is how the feature looks like. Beta in ochre, beta prime in gray. These are the two alpha subunits and this is the sigma subunit. You see it's bridging the beta and the beta prime. And when you compare bacterial RNA polymerases with those found in archaea, you will notice that although some basic aspects of architecture are the same, archaeal RNA polymerase has many more subunits. And in fact this is already a hint of what RNA polymerases look like also in eukaryotes. They're much more complex than bacterial. So this also tells us a little bit about the evolutionary origins of many eukaryotic complexes involved in DNA replication, transcription, translation. There is typically more similarity between those and archaeal enzymes or complexes. So let me show you how this structure here looks like in a little bit more detail. So you do not have to be able to of course draw something like this, but let me just show you how it looks. It'll be easier to understand discussion that follows. So basically you have the two subunits that are forming this clamp. That's the beta prime and the beta subunit. Then you have the alpha and alpha one and alpha two subunits in the back. And then finally you also have this in yellow is just a region of the polymerase. So the two subunits here are in green, the alpha and alpha prime. And then finally you have this omega subunit in gray which stabilizes interaction between the beta prime and beta subunit. So between the blue one and the purple one. And these in yellow are different parts of this clamp-like that are important for interactions with RNA molecule and DNA molecule that is recognized. But I'll show you that a little bit later. So time-wise I would like to pause now and then I would like to show you what are the features of promoters that are being recognized by RNA polymerase during initiation of transcription. So thanks for now. We'll continue a few days from now. So, hello everybody and welcome. Let me just briefly recap what was discussed in the last lecture and then we can continue on the topic of discussing initiation of transcription. So, in the last lecture, I introduced to you the basic geometric and chemical aspects of transcription. During transcription, the sequence in the DNA, but not the entire sequence, but just one part of the sequence, has to be read off and a different molecule that has very similar features, RNA molecule, has to be synthesized. This is called transcription. For this to happen, you have to melt the DNA. You have to synthesize RNA based on the sequence of one of the two strands of DNA. Once sufficient length of RNA is synthesized, it goes away from DNA so that the DNA bubble, transcription bubble that is formed, can close down and so you, during transcription, continuously open the DNA and allow it to re-form the double helix after the information is copied. Transcription has distinct stages, initiation, elongation and termination. The process of transcription follows very similar chemistry as DNA copying does, DNA meaning that the next nucleotide that is incorporated is a nucleotide triphosphate complementary to the next base in the template DNA strand. The only difference being is that the strand of nucleic acid that is being synthesized is an RNA molecule, meaning that it has two prime OH group and that the composition of nucleotides is different. Instead of a T, RNA will have a U. And then I explain to you that the initiation of transcription is one of the key steps in the process and that this is the slowest step, but it's a very important aspect of transcription because it allows the RNA polymerase to scan and identify the start site where transcription should occur on the DNA, on very long DNAs. And this is happening such that RNA polymerase associates with an extra subunit called sigma subunit, which allows the RNA polymerase to, with high affinity, recognize certain sequences in the DNA called promoter sequences. And I showed you that these sequences have unique structures and motifs, primary structures in terms of sequence, which are distributed very accurately relative to the transcription start, meaning that these sequences are located at minus 10 and minus 35 positions relative to plus one position, which indicates transcription start. So this is the case for most genes and the sigma subunit is called sigma 70. However, for transcription of some other types of genes, so that genes that have to be controlled due to environmental conditions, let's say if there is higher temperature that bacteria is exposed to, then you have to express a certain set of genes that are under control of different promoters. This different so-called heat shock promoter also has a sequence which is different, but also at minus 10, minus 35 locations. And the reason why RNA polymerase will sometimes transcribe these heat shock promoter genes is because under higher temperatures, there is some sort of sensors, for example, RNA duplex that melts and allows production of sigma 32 subunit, which is different from sigma 70. And this sigma subunit will then bind RNA polymerase and allow it to recognize a different sequence of the promoter. So you can this way very elegantly control when bacteria is at a different temperature through some sort of temperature switch, you induce first production of a specialized sigma subunit, which then in turn allows transcription from maybe 100 different genes that are under control of heat shock promoter. So this is a very elegant way with a little switch. You can actually control production of many, many genes and increase production of proteins that are helping the bacteria cope with higher temperatures. So what are characteristics of these sigma subunits so that they can recognize this minus 10 and minus 35 features? And this is shown here, so it nicely explains what's happening. So basically you have sigma subunit that has unusual structures. So it has one domain here and then a long connection and another domain here. And one of them is responsible for recognition of minus 10 sequences and two full turns of RNA, 17 nucleotides down, you will end up having the minus 35 element, approximately 17 nucleotides. This spacing is 17 nucleotides, but of course the center of the minus 10 element and minus 35 element is more than that. It's of course 25 nucleotides. But basically as a result of this separation, you have sigma subunit, which has two different binding regions and the sequence in between is not relevant, just the spacing between the two regions of the DNA that are being recognized. So this will help you remember why bacterial promoters have this unique pattern of conserved sequences between which there is no particular pressure for a particular sequence to exist. So how does this look like in three dimensions? So let me show you this animation. So basically you have a clamp of the RNA polymerase in purple and blue. And then you have DNA strand that is recognized by sigma in red. So sigma subunit will recognize the minus 10 region here shown in yellow and the minus 35 region also shown in yellow. This is one domain of the sigma subunit. It's connected here and that's another domain there, right there. So how about let's move on to discussion of the elongation stage. So once the RNA polymerase recognizes the promoter and melts the DNA, what happens next? Then I showed you the chemistry, you have successive addition of the next nucleotide and this is proceeding in the direction of 5' to 3' in the newly synthesized transcript. So during elongation the RNA polymerase untwists DNA as it moves along, exposes about 15 to 20 bases so that the RNA duplex can form. And finally, what I mentioned before, after a certain segment of DNA has been copied, then the newly synthesized RNA peels away from DNA so the DNA strands can refold. The speed in prokaryotes is about 50 nucleotides, it's a touch faster. In prokaryotes it's about 50 nucleotides and it's similar in eukaryotes. So, in addition to, you know, understanding that DNA molecule is copied in a way that only one of the strands is used as a template, there is a few additional, let's say, nomenclature features that I would like to cement and emphasize again. So if you look at the newly synthesized RNA transcript, you will have to realize that this RNA transcript is complementary to the template. So just for you to be aware when somebody talks about template DNA or coding DNA, what is the difference? So template DNA is the one that is complementary to the RNA transcript. At the same time, the other strand of DNA, if double strand of DNA is copied, will have exactly the same sequence as the RNA except that U's will be present instead of T's. And this other strand of DNA that is not the one that is being read by the RNA polymerase, so this is an important concept, this strand of DNA is called a coding or sense strand. This was exactly what was compared in this initial experiment with positive strand DNA where messenger RNA was discovered that I talked about last time. So then if you look at the newly synthesized transcript, then the first one that is being synthesized is referred to as plus one position. So this plus one position is relation to this minus 10, minus 35 regions of DNA sequences found in the promoters. And again, how about the one right before? There is no zero, the one before is referred to as minus one, and this is how it continues. So minus 10 is the region of the promoter, minus 35 and so on. The five prime end of this transcript is usually A or G and it's a triphosphate. So that's what is found initially at the five prime ends of new transcripts. So how does this elongation look like? So once the duplex of DNA is melted, this is a nice overview. These images were actually provided by Seth Darst at Rockefeller University who solved these transcribing complexes. So basically you have DNA molecule, you have two strands, yellow and red, and now you can see which one of these two is the template strand. So the template strand is the red one and based on this one, a newly synthesized RNA in orange is being synthesized. So basically the newly synthesized RNA is complementary to the template strand. And the other strand goes out of the way, this one is melted, the yellow one. So in the process of transcription, this is how actively transcribing RNA polymerase, without the sigma subunit, sigma subunit has dissociated, this is how the elongating complex of RNA polymerase looks like. And here in the active site, what is happening in the active site, well you know that you have this type of chemistry taking place, but in three dimensions you have the newly incoming ribonucleoside triphosphate that, so this is this nucleophilic attack that would occur, the pyrophosphate leaves, so this is in a way a transition state of the reaction that takes place where the substrates of the reaction are ultimately converted into a product of the reaction that has the next nucleotide covalently linked with the previous one. So this is how the active site of RNA polymerase looks like. And this is the chemistry that takes place. Okay, so how about a bit more, still molecular but a little bit more macroscopic type of conformational changes that take place. Here we are not talking about atoms and the reactivity between atoms, we are talking about the positioning of the nucleotides. So you may have noticed in the previous slides, there are very conserved metal ion sites, these are usually magnesium sites and conserved amino acids on the RNA polymerase that play a role in stabilizing this transition state so that nucleophilic attack can occur. So in the next stage I would like to show you what kind of conformational steps are taking place. So first of all you have the growing RNA strand and the template DNA that has the next nucleotide in position to be base-paired to the next nucleotide in the RNA that has to be incorporated. The metal is in its place, the next nucleotide triphosphate binds, base pairs to the next complementary base and then the reaction takes place. This is this nucleophilic attack that releases the paraphosphate. Then what happens next? How does the next nucleotide that is here get positioned in place so that the active site can bind the next nucleotide? Well this is what happens. Once the reaction takes place then you have a translocation so basically this entire DNA RNA duplex moves back or the RNA polymerase moves forward such that the active site is here and the next nucleotide is shifted from a splayed out conformation to the position where it is nicely stacked with the previous nucleotide for the next addition, elongation step of polymerization. And then finally, in addition to this forward elongation mode of RNA polymerase, I would like to show you what happens in case RNA polymerase makes a mistake. And unlike what is happening with DNA polymerases where the wrongly incorporated nucleotide causes the DNA strands to separate a little bit and because of this a separate exonuclease domain of DNA polymerase choose a way the wrongly incorporated nucleotide and only then DNA replication continues. Here the error correcting is done by the active site of RNA polymerase itself. And how does this happen? So in case a wrong nucleotide is incorporated, let's say this orange is an incorrect one, then the repositioning of the active site that typically takes place when you have forward translocation, meaning that this next nucleotide would be positioned here, that this RNA-DNA duplex would move back or the RNA polymerase would move forward, so the active site instead of moving forward, actually the RNA polymerase backtracks a little bit on the substrate. And so the active site is here and the nucleotide that is wrong that doesn't base pair with this one. And this is why the backtracking happens because these two are not tightly base paired, this is a wrong nucleotide here. So it backtracks a little bit, this one is unpaired, this one is in the previous position and then in a reaction that is very similar to the initial reaction that connects the template strand, the primer of the RNA with the next nucleotide, in this case you have metal ion activated water molecule that attacks the RNA strand and hydrolyzes the last two nucleotides of the RNA. And as a result this eliminates the wrongly incorporated nucleotide base and then normal steps of elongation can continue. So in this case this backtracking that the RNA polymerase anyway has a bit of a tendency to do happens much more readily when wrong nucleotide is incorporated. So this is a mechanism that ensures high level of accuracy of RNA polymerases to the point that errors are introduced only one in ten thousand cases. Now one in ten thousand seems very high error rate compared to DNA polymerases. The reason why still very accurate but less accurate transcription can be tolerated by the cell is because these errors are not propagated to progeny, to a new organism that is being the daughter organism. In this case if an error is made in the RNA molecule typically it's not the only RNA molecule in the cell as it is often the case with genomic DNA. Rather if a mistake happens in one of the RNAs the next one will likely be correct and in that case transcription will still yield some properly formed RNA molecules that are then used for translation which you will hear about later. So in this case of course you have error correction that reduces the error rate to one in ten thousand still higher than DNA but it is sufficiently low error rate to allow many many RNA molecules to be synthesized without any errors and that's all the organism needs. And now I would like to move to kind of the last aspect of transcription that I have not discussed yet and this is this process of termination. So basically at the end of reading the message in the DNA the RNA polymerase should not keep going indefinitely because this would be very wasteful for the cell to synthesize RNA molecules that are very long. In addition if they would be synthesized past the point that the information is contained in DNA they would also maybe be more susceptible for damage in the cell which again would be problematic not only due to the wasteful incorporation of extra nucleotides in the RNA molecule. So the way RNA polymerase will know where to stop is by encountering special sequences and there are several different mechanisms how this termination can take place and I will here only discuss this case of termination through a unique stable RNA hairpin structure. So terminator sequence is located downstream of transcribed genes. It's a stretch of 30 to 40 nucleotides that is able to form a very stable hairpin and this is an example of a structured region of RNA molecule that happens through self-complementarity. So this is a very stable hairpin which will cause the RNA polymerase to kind of slow down and then release from the DNA. So again in addition to this mechanism there is a second mechanism involving protein raw will not be which will not be discussed in this class. Okay so now that I have introduced to you stages of transcription I would just like to recap certain basic terminology that you would be expected to know at the end of this class. So transcriptional units is a term that refers to DNA segments that are transcribed into RNA molecule. So you will see here that the definition is a little bit broader than transcribing a gene because not all regions of DNA that are transcribed are genes. Most of them are and they encode proteins but some of these transcriptional units are actually encoding RNA molecules. So for these RNA molecules like ribosomal RNA, transfer RNA, these RNA molecules are not translated later and typically what is referred to as a gene is the RNA molecule that is ultimately converted into protein in the process of expression of genetic information. These ribosomal RNAs are three types called 16S, 23S, 5S but this is all you will hear about later. What is also interesting about transcription in bacteria is that the transcriptional unit is frequently not just one gene, that gene that encodes protein. So transcriptional unit is often an operon. What does operon mean? It means that a single mRNA molecule is in fact covering the information in the DNA that corresponds to production of several proteins and this is why this single mRNA is called polycystronic mRNA that has multiple regions, so called open reading frames that you will hear about later that encode proteins. So this is again quite specific to bacteria and in our organism for example there is virtually no polycystronic mRNAs, it is always that an RNA will result in one protein but there are other aspects of transcriptional units that make things more complicated than eukaryotes but polycystronic mRNAs are in most cases not present. So what happens once you generate such transcriptional units which are maybe not always exactly corresponding to what the final product of transcription should be? Then what is happening is that there is follow-up biochemical steps that have to take place which are referred to as RNA processing and RNA processing happens both in bacteria and in eukaryotes and here you will hear a little bit about how it works in bacteria. So for example in order to make these ribosomal RNAs that have different names as I told you before, a single region of DNA is transcribed based on a promoter just the way I told you before but the problem is that one single RNA that is produced then has to be separated into individual RNA molecules and this is happening through processing steps and these processing steps involve enzymatic cleavage of this polycystronic or primary transcript RNA into individual functional pieces. So basically ribosomal RNA are made from a single transcript where different parts are cut out. For example this is actually a tRNA which you will learn much more about. This is the promoter region still that is still made. So this is the five prime region that is not meaningful. This has to be cut out. So all these regions in between that are not relevant for the generation of functional RNAs in the end which are here shown in orange color are cut out, degraded, they're unnecessary and then the spacers and ends are trimmed and final product is produced. So this is the difference between the primary transcript and mature transcript and this is the terminology that is used in both bacterial cells and eukaryotic cells. So in addition to DNA regions encoding some RNA molecules that are used as they are, so basically after this processing these RNA molecules are used as they are. So they are not further translated, converted into proteins. They are used as RNA molecules. But in case of genes in bacteria you often generate this polycystronic messenger RNAs and these are not cut, they are not processed. But rather there is a different level of regulation in terms of how the parts of this polycystronic polycystronic means that it is a single RNA that has multiple genes on it. So different regions of this RNA are then translated into proteins. So you have so-called open reading frame one, two and three and so in the initial stage of transcription you generate the big RNA but then different segments are used differently not through processing but kind of through reading of the information that is here. And most of what I told you about now refer to bacterial RNA polymerases and bacterial recognition of promoters. So how does this work in archaea? Well archaeal RNA polymerases are more complex, this is something I already introduced to you and they have different promoters and these promoters are actually a step towards how promoter sequences look like in our organisms. So in archaea the promoter is recognized by a TATA box binding protein and you will hear much more about that in classes that follow. So archaeal RNA polymerase will bind to the promoters using a transcription factor so-called TATA box binding protein that recognizes the promoter and then the RNA polymerase will undergo a conformational change that positions the transcription start site, the site where the DNA duplex has to be melted in order for transcription to begin. So the most important recognition sequence is the TATA box promoter and here we are not seeing the same pattern of promoters as in bacteria where you have this minus 10 and minus 35 regions. So with this we conclude the main topics of discussion in these classes that dealt with transcription,