13_BL_20231103_WhisperAI_1.docx
Document Details
Uploaded by CourageousStrength
ETH Zürich
Full Transcript
In this lecture, we will continue on the topic of expression of genetic information, and we will be specifically discussing the process of translation. Follows the presentation on transcription that we ended up with in the previous lecture. Just a brief reminder what transcription is and what are th...
In this lecture, we will continue on the topic of expression of genetic information, and we will be specifically discussing the process of translation. Follows the presentation on transcription that we ended up with in the previous lecture. Just a brief reminder what transcription is and what are the latest topics that you have heard from me about. We have discussed that transcription carried out by RNA polymerase in prokaryotes involves recognition of promoter sequences with the help of sigma subunit that binds to the RNA polymerase and then recognizes the promoters, bacterial promoters. Now, sigma 70 is the canonical sigma factor that allows RNA polymerase to bind to most main promoters in bacterial cell. However, there are other sigma subunits, and they will allow RNA polymerase to recognize a subset of other promoters that are specific for certain bacterial needs under different environmental conditions. For example, when there is heat shock, high temperature, or change of food source, then these other sigma subunits are produced. And they allow transcription from other genes that are necessary under those conditions. You have also heard that transcription takes place in three different stages. Initiation involves recognition of the promoter. Elongation involves successive addition of nucleotides. And that this addition of nucleotides takes place through discrete conformational changes in the active site of RNA polymerase that allow the DNA RNA duplex to be translocated one step further. So that the active site of RNA polymerase can then incorporate the next cognate nucleotide according to the template DNA sequence. And the next nucleotide that is incorporated is an RNA nucleotide. An RNA molecule is extended. And then finally, I told you that termination of transcription involves, in certain cases, formation of very stable RNA hairpin structure that allows the RNA to peel away from the DNA template. The DNA transcription bubble can disappear. The DNA forms duplex, and RNA polymerase dissociates. And then finally, I told you that in bacteria, and particularly in eukaryotes, the first primary RNA transcript is frequently modified. In bacteria, there are often cleavage reactions that result in mature RNA molecules, and that these transcripts are not always transcripts that are going to be converted into proteins through the process of translation that you will hear about today. These transcripts are sometimes used as RNA molecules, as functional RNA molecules like tRNA and ribosomal RNA that again you will hear about today. And those are produced by cutting the primary transcript into functional pieces. And then finally, I also told you that in bacteria, these transcripts are often not corresponding to one protein product. They're usually so-called polycystronic transcripts that are read by the ribosome in several different segments. So they contain multiple messages. And today, we will continue on the topic of translation, and I would like to introduce the main chapters that will be covered. And that is the genetic code. Then I will tell you about enzymes that prepare so-called adapter molecules that couple the information in the nucleic acids, such that it is translated into the information in proteins which consists of amino acids. And I will show you where this translation takes place. The cellular machines that carry out translation are called ribosomes. And then next time, I will talk about stages of translation and the more detailed structural information about ribosomes. So let's start. So in the past several lectures, you have heard about DNA replication. You have heard about the process of transcription. And today, we will start our discussion about translation for which ribosomes are responsible. These are cellular assemblies. They're very complex, much more complex than, for example, RNA polymerases and DNA polymerases. This process is also very accurate, as you will see. However, this accuracy is slightly diminishing from transcription to translation. So why do we refer to replication of DNA as copying of information? So basically, it's a one-to-one copy. So basically, new DNA should be indistinguishable from the parent DNA. Here, we are talking about transcription, meaning that the information is slightly changed but mostly preserved. So basically, you're just changing kind of the type of paper that is written on. But the information has direct one-to-one correspondence between RNA and DNA. However, when we talk about translation, we are talking about the change of language of molecules that build either the nucleic acids or ultimately proteins. And this is what is meant by this concept of translation. Basically, you are moving from one language that consists of four letters in nucleic acids, so in DNA and RNA, there is a total of four amino acids and four nucleotides. And these are the examples for DNA, cytosine, adenine, guanine, and thymine. Then it goes into RNA, cytosine, adenine, guanine, and uracil. And then this information, this language of four letters, has to be translated. And this is a very complex process. You will see conceptually, this is a much more complex process. Has to be translated into the language of amino acids. And there is no one-to-one correspondence between one nucleotide and one amino acid, there is 20 possible amino acids and four nucleotides. So we are changing the alphabet of the language from a four letter alphabet to a 20 letter alphabet. So that means that you cannot say that adenine is going to have a meaning of one of these amino acids, like phenylalanine or an alanine. There is no one-to-one correspondence. So how many of these nucleotides would you need to specify 20 different amino acids? Assuming different combinations, so if you have a linear sequence of these nucleotides, you can think of them being arranged adenine and then guanine and then thymine and so on, in any combination possible. So if you think of four letters and two word, two letter words, how many different combinations can you have? Can you think of it for a moment? What would be this number? That would be four squared or 16 different combinations, which is not enough to account, to correspond to 20 different amino acids. Therefore, through simple mathematical logic, people concluded, and I will mention a few people who have been critical in these early days of molecular biology in deriving the basic concepts of the expression of genetic information, a fascinating information, a fascinating problem in information theory. So basically in order to specify uniquely 20 different amino acids, you in fact need at least three nucleotides in row if there is four possibilities for each nucleotide. This would correspond to 64 different combinations or four different nucleotides in words of three, four to the third is 64. This immediately implied that the genetic code is degenerate. What does that mean? It means that you in fact have more information than you need. And therefore, very likely, it could immediately be assumed that a number of these different combinations have the same meaning. So probably two or three, or sometimes even four, of these little words of three nucleotides would imply, would mean the same amino acid. And you will soon see that that is indeed the case. So this is now the process of expression of genetic information shown in a little bit more pictorial way. So you have the sequence of DNA. One of these DNA strands is read and transcribed into an RNA molecule. And then this RNA molecule has to be translated into a different language, a language of proteins, where at least three of these nucleotides in row would specify one amino acid. Sometimes several of them would probably have the same meaning and specify the same amino acid, as you will see in a moment. But how is this language of triplets, so triplet code, read? And what does it mean? This, of course, was not immediately obvious. There's so many ways you can read a triplet code. You can read three nucleotides, then you can move by one, you can read another three, and so forth. There are many, many different combinations. So this is something that required careful thinking and also very elegant experimentation. And Sid Brenner, a biologist who came to a laboratory of molecular biology in Cambridge, worked with Francis Crick to deduce how the genetic code is read. And these were very elegant experiments, again, using the simplest possible system where they were investigating mutations in a virus that infects bacteria. Bacteriophage T4 was the system where they did genetic experiments by introducing mutations. And they had a very elegant opportunity to actually carry out these experiments. They realized that there are some compounds that can be used to directly introduce the kind of mutation in the DNA sequence that results in an addition of a base pair. So basically, using certain chemicals, they could introduce an extra base pair in the DNA sequence. And then they tried to see, well, if we introduce an extra base pair, basically, in this sequence that is read, of course, they knew that the information is encoded on one of these two DNA strands. That means that they would have added one nucleotide to the sequence. And what did this mean in terms of the protein that is being produced? So they already, at that time, could understand that this message had to be somehow read and proteins had to be synthesized. And now, again, for fun, I would like to use an analogy. I will show you a sentence made out of three letters. So this sentence would, in a way, correspond to the sequence of DNA. It's an extrapolation, of course, in our language, we use more than four letters. But I'm using this ordinary sentence from everyday life as a tool to indicate to you what would happen if you would change this reading frame that we are immediately assuming when we see spaces between words. So in this sentence where you have only three letter words, what happens is if you, for example, introduce an extra nucleotide or an extra letter that has no meaning in the middle of the sentence. And now look what happens. So if you have a sentence, you can very easily read it, you understand what it means. But then if you introduce this X and you shift the reading frame, so this is the concept that is very important. You're shifting the reading frame of this sequence of letters in a sentence. And immediately, the sentence becomes illegible. So basically, you're trying to read it, you're looking at the words, and the words have no meaning. However, what Crick and Brenner did was, they introduced another insertion into this DNA, using again mutagenesis. Still, when they would have two introduced letters, the sentence would still be illegible. But when they did it the third time, now look, this is the example here. You have one X, second X, third X, and then suddenly they would see that, even though maybe the function was a little bit perturbed, most of the text, most of the produced protein was still fine. And this nicely explained that the reading frame of translation is in this triplet. And if you introduce a single nucleotide, you mess up the reading frame, everything is gibberish. But if you introduce three, you basically introduce very limited error in this, in this message. But the rest still has the same meaning, that's in this case. And similarly, they also did experiments where they would introduce an extra nucleotide, and then using similar mutagenesis experiments, they would delete one nucleotide. And this is what they refer to, they observed using this genetics, that these were so-called suppressor mutations. So a single deletion could restore the meaning of the word, the function of the protein, and this is an example of a suppressor mutations. So these are frame shift mutations, these are suppressor mutations. So as a result of a suppressor mutation, initial deletion messes things up, but an extra insertion would restore the message. Quite remarkable. But then, they still realized that there is, there is a big open question with respect to how this message with triplets of nucleotides can somehow be converted into amino acids. There is no way that a single amino acid, some of them very small, could nicely and specifically pair with these three nucleotides, the way Watson Creek base pairing happens. So this was a much more difficult problem to, to understand in a way than, than understanding the concept of base pairing, Watson Creek base pairing. So they proposed that in this process, there must be some sort of an adapter molecule. And this is what is the concept of an adapter. So they said, hypothetically, theoretically, that there must be an adapter molecule, which is very likely a nucleic acid, that recognizes triplets of, of codons, according to Watson Creek base pairing. And then somehow, these adapter molecules are connected physically, chemically, to a particular amino acid. And these adapter molecules, you will see later, they are so-called transfer RNA molecules. And with this, I would also like to introduce the concept of a codon. These are these triplets, codons. And the fact that these adapter molecules have to have complementary anticodons. And you can see here, it's schematically shown what it's meant by complementary. This tRNA will never recognize this codon. They're codon specific. This one will not recognize this one. So there are specific pairings between adapter molecules that have one type of amino acid and recognize specific codons. So once this became apparent, there are many, many scientists who started thinking, well, how can we crack the genetic code? What is the meaning of a particular triplet? Which amino acid it specifies? And this group of scientists, there were more of them, they formed a club. And they call it the RNA tie club. See, they had different RNA molecules and sequences here in their ties. And they were determined to figure out, theoretically, through theoretical considerations, the nature of genetic code. These were all extremely well-known scientists. Crick and Watson already discovered the structure of DNA. They got Nobel for that. There were many theoreticians, many mathematicians. And they were certain that just through theoretical considerations, they could figure out the language of nucleic acids and how it is converted, translated into proteins. At the same time, there was a young person, student, Marshall Nirenberg. And at age 20, his parents moved him to Florida because the weather was better. And he has, since he grew up, had some issues with rheumatic fever. And they thought that the warm weather would be better for him. In Florida, he really enjoyed the outdoor life. So he was at an age that you are now in the first year of your studies. And he realized that he became extremely fascinated and interested in biology. And he started studying animals and plants. And then this interest led him to tackle bigger biological questions. Okay, so later on in his studies, he went on to study biology and worked at NIH. This is National Institute of Health in Washington on the East Coast of the US. And in a way, it's a research institution that is associated with the central funding agencies for science in the US. So it's a little bit like Swiss National Science Foundation, but in the US, this National Institute of Health supports research across the US, but they also have their own research activities. And there, Marshall Nirenberg carried out some of the most striking experiments in the history of biology. So what he did is he decided to try to experimentally deduce the genetic code. But he did not try to figure it out from scratch, the whole thing. He decided to ask, as a proof of principle, the question of whether he can figure out the meaning of one of the 64 possible triplets in nucleic acid. So he took chemically synthesized poly-U, poly-uracil. So he took a very simple, minimal message of RNA. And he mixed it with broken open cells that were very active in translation. And as a result, he saw that chemically he could detect presence of polyphenylalanine, an absolutely remarkable result. So he deduced the first word of the genetic code. UUU, triplet, that was already known that there should be triplets, meant phenylalanine. So this was the relationship. He could figure out this relationship between the amino acid that is brought by adapter molecules and the triplets that interact with the tRNA anti-codons. So he, with this experiment, came to a meeting in biochemistry in Moscow. And it was a big meeting, thousands and thousands of people. But his talk was scheduled for a small audience of about 30. So after he delivered his talk, people realized what it meant. And then they said, look, you have to give this talk in front of the entire audience. So the next day, he repeated his talk, and then 2,000 people were present. And when they heard what he found out and how he did it, everybody in the world decided to try to understand and decipher the genetic code using these types of experiments. And many huge labs that might have had much larger number of personnel than Marshall Nuremberg had tackled the problem. But then at that point, several of the colleagues of Marshall Nuremberg at NIH came together and helped him, and Marshall Nuremberg had another brilliant idea of how he could simplify the problem. So rather than dealing with long poly you, poly this, poly that messages, he would synthesize very short triplets. Triplets of RNA, and then he would take this broken up cells and figure out which radioactively labeled amino acids bound to a particular triplet. He would take a triplet and he would then see which amino acids, which are labeled radioactively, so he would in one case label a serine radioactively, in another case he would label a phenylalanine, third case serine and so on, alanine and so on. And then, and these are true notes that he took. Marshall Nuremberg, he took notes, and he could say, okay, so when I use UCA or UCG, this is where the radioactivity is. And this radioactivity originates from radioactively C14 labeled serine. And this is the way he figured out the entire genetic code. Maybe not all the words, but most of them. And this was really revealing a whole new world of molecular biology. Another set of very excellent, very informative and elegant experiments were performed using synthetic RNA molecules. That only at that point people could start to synthesize. And there were a few key people who did that. One scientist was Gobin Korana, who in fact part of his career worked here at ETH. So he would synthesize nucleic acids of specific sequences, and then he could figure out some other aspects of the code. So with this, he could show that the triplet code is read in groups of three. So basically, if he would synthesize UAC, UAC, UAC RNA, as a result, he would either get tyrosine, tyrosine, tyrosine. Meaning that once the reading starts UAC, then it always jumps by three. And UAC as the next Korana is read. And then next UAC, if the reading starts here with an A, then it'll be ACU. Next one ACU, next one ACU. So it'll always be either tyrosine, tyrosine, tyrosine, or threonine, threonine, threonine, or leucine, leucine, leucine. There is never a situation where you have tyrosine, threonine, leucine as a product, so it's a non-overlapping code. So for these discoveries, Robert Holley, he figured out the sequence of tRNA molecule, he figured out where the anti-corons are. Marshall Niemberg, I told you what he did. Gobin Korana, he synthesized these nucleotides. And as a result, the conclusion of this, of this part of the lecture is the following. So basically you have a triplet code. Individual letters here in this sentence correspond to bases. The groups of three letters are codons. All letters together, they mean something new. That's a gene in the language of biology. Individual words that your brain kind of interprets, these are amino acids. So if you mess up this word, it has no meaning, so it has a wrong amino acid. You change the word if you change the letters. The whole sentence together, you have to have the full sentence to have a functional meaning. That is a protein. And now, what is doing this translation from these individual letters? So little kids, they often, their brain is not developed enough to actually read a sentence. They can read individual letters easily. They can recognize them, but they cannot string them together. So the molecule in biology that strings these letters, individual letters, into a meaningful sentence is a ribosome. And in the cell, this translation is done by the ribosome, and our understanding of this sentence is done by our brain. So Marshall Nuremberg got Nobel for his contributions to deciphering the genetic code. And as it is custom in the US, if there is a Nobel laureate, ultimately he's invited to the White House. And here, this is a picture of Marshall Nuremberg explaining the genetic code to the US President Lyndon Johnson. It would be interesting to actually know how long it would take to explain the nature of genetic code to President Trump. So, the conclusion of this meant that the code was non-overlapping, that meant that you read triplets, another triplet, another triplet, and so forth. That it depended on this reading frame, so once the reading frame is established, there is no changing it. And ultimately, by deciphering one word after another, 64 different combinations of triplets, one could arrive at a table that related the sequence of nucleotides. This is the first position in each of these possible triplets. Let's say U, the second position is labeled here, that's a C, and the last position is a G, for example, here. So what kind of amino acid is specified by UCG? You can nicely look it up, UCG means serine. However, as I told you before, not only UCG means serine, serine is also specified by UCU, UCC, UCA, they're all specifying serine. This is what is meant by this code being degenerate. Now, there is another feature if you map the distribution of these amino acids. It's non-random, meaning that if you change the second, if you change the first position here, very often, even though the nature of amino acid changes from phenylalanine to leucine, from leucine to isoleucine, very often, with a single nucleotide change, you will end up with sometimes the same and sometimes different amino acid, but a different amino acid that still has similar chemical characteristics. And this is what is meant by this genetic code being non-random. So basically, even if there are some mutations, there are typical changes in the amino acid, so basically, even if there are some mutations, they're typically such that they would not lead to a big chemical change in the sequence of protein. Lastly, and this is something I didn't even mention to you, a few of these codons, three of them, do not specify an amino acid. In fact, they specify stop. So if you would, you know, go back to a sentence like this, it is clear that if you would read a message, you would somehow have to know where to start and where to stop. And so I'll show you how you find the start position, but stop position, the dot here is specified by three nucleotides, but they do not correspond to a word. So when you're reading a sentence, you will never read a dot at the end of the sentence. You will just know this is where I need to stop reading the sentence. So three of these nucleotides actually indicate stop, and I'll show you later what this means. And then finally, AUG is a nucleotide, a unique nucleotide, this one here, that specifies a start. So the message is always started, the cellular machinery that reads this message starts with a start codon, AUG, and then stops when it encounters one of the three stop codons. So how about this adapter molecule, tRNA molecule? Well, Robert Holley's experiments showed that based on the sequence, tRNA molecule is likely to form a clover leaf type structure that has one stem and three loops, like a three leaf plant, like a clover leaf. In the stem loop that is opposite to this open end of the RNA molecule that has its five prime and three prime ends, at this end, physically opposite side of the RNA molecule, of this clover leaf secondary structure diagram, there is anticodon. And then this anticodon pairs with the messenger RNA codons. The RNA molecule also has some other features, not all of these nucleotides are standard, sometimes they're modified, and based on this, this loop is called the D-loop, and this one is called a T-loop. And in three dimensions, this RNA molecule folds such that two of these, the stem and the stem loop, kind of coaxially stack, and the other two also, and between this helix and this helix here, because they kind of are brought together, the L-shaped tRNA molecule is formed. So here there is a place where amino acid is added, and here is where the anticodon is. So this is the shape, the secondary structure, and the three-dimensional shape of RNA. These little lines between nucleotides are base pairs. They are Watson-Crick base pairs. Sometimes people in the exam think that these are disulfide bridges. No, these are base pairs. They are hydrogen bonding interactions. Of course, what is critical for this process is that a particular tRNA that has a particular anticodon has to be physically chemically connected to the right amino acid. If this goes wrong, the wrong amino acid is going to be incorporated into the protein. And the enzymes that connect the right tRNA with the right amino acid, they are called aminoacyl tRNA synthetases. They will recognize a particular tRNA and link it chemically to the right amino acid. So valine tRNA will be connected to a valine amino acid, to a valine, and this valine tRNA will have an anticodon that can recognize the right codon. And then finally, and this is very interesting, right? And I'll talk about that. You will hear more about that in lectures that follow. These codon-anticodon interactions, you may wonder if there is multiple codons that specify the same amino acid, do you really need to have a different tRNA for every codon? Because in principle, you could have, if this particular tRNA can interact with multiple codons, you could have the same amino acid attached to a single tRNA that would recognize two codons that have the same meaning. And indeed, that's the case, and you'll hear more about it. But the base-pairing interactions are truly Watson-Crick only for the first two base pairs. The third codon-anticodon interaction is not completely free to pair any which way, but it has more flexibility. So in a way, you can think of this generic code as being two and a half bases. So it's not extremely strict. So basically, you can have 64 different codons in the messenger RNA recognized by less than 64 different tRNAs.