14_BL_20231106_WhisperAI.docx
Document Details
Uploaded by CourageousStrength
ETH Zürich
Full Transcript
Now we will continue on the topic of translation and also have an opportunity to kind of bring together your understanding of the inner replication, transcription signals, transcription process and translation for the purposes of being able to design new what is called expression vectors, synthetic...
Now we will continue on the topic of translation and also have an opportunity to kind of bring together your understanding of the inner replication, transcription signals, transcription process and translation for the purposes of being able to design new what is called expression vectors, synthetic DNA molecules that can be used to produce different proteins according to our wishes or change them according to our wishes. So what you have heard about in the previous lecture is what ribosomes are, what they are composed of, that they are responsible for translation, that during translation the ribosomes have to find the start of the message on the messenger RNA, that they have to iteratively add successive amino acids to the growing polypeptide chain and that finally when the newly synthesized protein is fully read, fully translated, then the ribosomes will encounter a stop-coron on the messenger RNA and with the help of release factors release this fully synthesized protein so that it can fold and be functional in the cell. All of these stages of translation, initiation, elongation and termination involve participation of factors or proteins, there are three of them that help during initiation, three during elongation and three during termination. So once these ribosomes read the message and the new protein It can either fold in the cytosol and this is the fate of all soluble proteins, but in many cases these newly synthesized proteins have to be targeted to membranes, they are membrane proteins or they sometimes have to be translocated across the membranes and so there are some special features of these proteins that are being recognized as they are being synthesized on the ribosome and then targeted to different cellular fates. So there are a couple of different ways to influence the fate of the protein that is being synthesized by the ribosome. In one case when it's a normal soluble protein like an enzyme that is not hydrophobic, that only has a hydrophobic core when it folds, this protein will have no special features and it will simply fold in the cytosol in bacterial or in our cells. However, sometimes a protein that has to be inserted into a membrane is a very hydrophobic protein and you cannot just release it from the ribosome because it will immediately aggregate because it's not able to fold without membranes. So in that case this protein will have a special sequence on its end, so-called signal sequence that will be recognized by a so-called signal recognition particle and then these ribosomes will be brought to the membrane and then membrane proteins will be inserted. Sometimes there are proteins that have to be moved across the membrane, they also have special signal sequences but then they will be released because they are not so hydrophobic, they just need the signal sequence that will bring them to the pore in the membrane and then they will be moved across the membrane. So one thing I would like you to notice is that this scheme is slightly different from the scheme in the book. So just be aware that here you should actually have ribosome in complex with signal recognition particle. So the signal recognition particle will recognize the signal sequence only in the context of the ribosome. This is called co-translational insertion into membranes. So this is the concept. Either the proteins have no special sequences, no signal sequences, they'll fold on their own. If they have signal sequences that are recognized by signal recognition particle then these ribosomes will be brought to the membrane and membrane proteins will be made. Signal sequences for proteins that have to be targeted across the membrane, they're targeted post-translationally with the help of other proteins and then they go here. And this is how it's happening both in prokaryotes, bacteria and archaea and you will also see in eukaryotes. So let's discuss regulation of gene expression. So this is now the last topic that kind of brings us back to transcription but also allows us to connect regulatory mechanisms for both transcription and translation. So there is a very fundamental concept that was proposed by Jacob and Monod based on experimental evidence and a lot of theoretical considerations. And the concept for how genes can be regulated was based on the following observation. It is kind of intuitively obvious that bacterial metabolism or any process in any organism is not supposed to be wasteful. So basically it was known that bacteria and so do we, any organism, any life form on earth is not just doing the same thing all the time. Bacteria, for example, will be able to respond to certain environmental conditions. And you already saw that there are these special heat shock promoters that make the bacteria make special enzymes or special chaperones that help bacteria survive higher temperature conditions. But it was also known, for example, that when bacteria finds a particular food source, like in this case lactose, then suddenly bacteria will respond and it will make an enzyme that processes this lactose and breaks it down into simpler sugars which are then used for food. And the enzyme that digests lactose, disaccharide, is beta-galactosidase. It's an enzyme. So it hydrolyzes this larger sugar into simpler sugars. But this enzyme is usually not needed. If bacteria doesn't have access to lactose, there is very little of this enzyme present. But when bacteria finds lactose, suddenly a lot of this enzyme is made. So you can see here, suddenly the amounts of this enzyme are increasing when there is lactose in the medium. So how does this work? So historically, a lot of people, when they first observed these responses of organisms to environment, they assumed that it's really just the enzyme itself that gets activated. But then it became obvious that it wasn't this enzyme that becomes active. It is actually the amount of enzyme that is changing. And this allowed Jacob and Monod to come up with a hypothesis that was initially just theoretically proposed. And the hypothesis was the following. So for a particular messenger RNA to be read by the ribosomes and protein synthesized, first there has to be messenger RNA present. And they have shown that, in fact, this messenger RNA often encodes or carries the message for more than one protein. So in bacterial cells, you can have a single messenger RNA and the ribosomes will read different segments to make different proteins. But the initial production of this messenger RNA is not uniform. So there is a level of transcription from the promoter that can be controlled. And this was the key concept. So they suggested that this transcription from the promoter of a particular messenger RNA that can be polycystronic, meaning that it can encode multiple proteins, is controlled by binding of an extra protein called repressor. And again, they proposed it as a general mechanism not specific to beta-galactosidase, but they used as an example this production and increase of beta-galactosidase amounts. So what they have proposed is that a different protein is produced from a messenger RNA that is under control of a different promoter. And this protein will typically bind and prevent these genes, this messenger RNA from being transcribed, this DNA gene from being transcribed. If an extra compound is present, such as maybe lactose being present, then this compound will bind to the repressor protein and prevent it from binding to the promoter of the genes that have to be made in response to the presence of this inducer. So this is one concept of how regulation of transcription can work. But as you can imagine, there are all sorts of different ways you can do it. You can have, for example, in the presence of an inducer, this repressor not bind, or you can maybe have a protein that is not a repressor until it binds to some molecule. So you can have both positive and negative ways of controlling transcription. So basically, this inducer molecule can be, for example, the metabolite that has to be dealt with, or it can be some side product of the reaction. In case of transcription system that controls beta-galactosidase production, this inducer molecule is produced from lactose. So basically, this lactose that is a food source is changed into a compound that can bind, and Jacob and Monod even could show that similar molecules, artificial molecules, can be used instead of this inducer to control the production of this gene. So even when lactose was not around, they could add a molecule that is similar, and analog, and induce transcription of these genes. So this was really revolutionary because it opened up a whole new world of understanding of how bacteria or any organism can respond to environmental conditions. But at this point, it was not really understood what kind of nature this repressor, what kind of molecule this is. Is it maybe an RNA or protein? What does it do? So this took decades of further research, and it was shown that these regulatory molecules, small molecules, they're typically small. They are small proteins that can control transcription on and off. These protein-nucleic acid interactions, these repressors binding to nucleic acids, they can be specific or non-specific. And I already mentioned to you that there is a number of very specific interactions that take place between nucleic acids and different types of proteins. So for example, finding of the promoter by RNA polymerase is an example of a specific recognition of a sequence by the sigma subunit of the RNA polymerase and a particular sequence in DNA. And the same concept applies to the binding of the repressor to the sequence in the DNA. This specificity is provided by recognition of the sequence in the DNA even when the DNA is double helical. Typically, this is happening to the recognition of the major groove of the DNA. So the major groove of DNA is the main site of specific interactions that govern protein factor recognition of DNA sequences. So a very common structural element that is used in proteins to recognize this major groove of DNA is called helix-turn-helix. So it'll have a helix, a turn, and a helix. And for example, these repressor transcription factors, such as the one that is controlling the transcription from the lac operon, this lactose operon, this set of genes that are transcribed when lactose is around. This lac repressor also has this helix-turn-helix structural element. And this type of structural motif is often present in bacteria and in archaea. But this is not enough that you have just this helix-turn-helix. Typically, transcription factors will have dimerization motifs that allow the transcription factor to bind not just a very short segment of sequence in the DNA, but frequently longer and sometimes somewhat symmetric regions of DNA sequences. So in the case of lac operon and lac repressor, the two helix-turn-helix domains here will be connected with part of the lac repressor protein that has a dimerization unit. So you will have two of these helix-turn-helix motifs that will recognize inverted repeat sequences of the DNA. So this ensures higher specificity, because if you would just choose a short single sequence on the DNA, you might accidentally encounter it somewhere else where there is no need to repress transcription. But this way, this tandem helix-turn-helix motifs that recognize this inverted repeat sequences will ensure high level of specificity and very precise regulation of transcription. So let me just try to turn the light on and off, because right now it seems to be producing some noise. Okay, sorry about this. It had to be dealt with, because otherwise you will have too much noise in the presentation. Okay, so this explains a little bit how lac repressor will find target sequence in the DNA and somehow influence transcription. So how does this happen? And I'll show you an example on a different system. This is a transcription factor which will recognize another promoter. And what these DNA binding proteins, transcription factors, do is they will literally interfere with the binding of either the RNA polymerase or it will block the binding of other factors like the pattern box binding factor in archaea. So a very simple explanation is that you have a repressor that binds very tightly to the sequence in the DNA that is just next to the promoter. And because of this, the RNA polymerase in bacteria with a sigma factor will not be able to bind and find the promoter. Very simply, you don't have binding of the RNA polymerase to the promoter. There is no transcription. In archaea, there is no sigma factor. There is this tata box binding protein. Just the way binding of the repressor to the bacterial promoter in archaea, binding of the repressor close to this tata box sequence in the DNA will prevent the binding of the transcription factor that then delivers the RNA polymerase. So no matter what the mechanism is, you literally physically interfere with the binding of RNA polymerase and that's what controls transcription. You can have positive regulation transcription factors which will bind to particular sequences under certain conditions and they will stimulate the binding of RNA polymerase. So you can have either interference or stimulation. So everything is possible. These are just some examples where you literally have this interference. So how about protein expression in E. coli? What would you need? You have to have a messenger RNA with a promoter. You can have an operator. That's the site where the repressor can bind and prevent transcription. But if there is no repressor bound here, then transcription will take place. Messenger RNA will be synthesized. You need to have a place where the ribosomes will bind to find the start site. And you also have to have a place where messenger RNA synthesis will be terminated. And also on the gene itself you have to have a stop codon. So these are all the genetic elements needed for transcription and for translation of a particular sequence within the messenger RNA. And you can use this type of construct to then for example generate a DNA that has all these features and if you would introduce it for example in a plasmid in E. coli, you could make any gene of interest. And this is called recombinant protein production and it's remarkably powerful. So today in medicine you can literally use and harness bacterial protein synthesis capability to make human proteins. Like for example insulin that is used to respond to the presence of sugar in our bloodstream to store this sugar. If people are lacking insulin and they have symptoms of diabetes, they can be helped by having artificially bacterially synthesized insulin that can be then used in the bloodstream. So this whole concept of how you can design artificial vectors for protein production and bacterial cells of proteins that may not even exist in cells requires some tools. And so let me introduce to you a few of these tools that can be used. First of all you have to have bacterium that is not dangerous, not pathogenic. And Escherichia coli is typically used in the lab. It was discovered by a pediatrician because we have in our intestines a lot of Escherichia coli bacterium. It's not dangerous for us. It actually even helps us with food digestion and so on. But if you would isolate this bacteria and you grow it in a culture, you can use it for different purposes. And today Escherichia coli is used as a great experimental system. Second thing that you have to have, and today there are many tools and these are some of the old ones that were discovered and they became remarkably important in molecular biology. So Werner Arber, Swiss biochemist and biologist discovered restriction enzymes and showed that they can be used for cutting DNA at particular sites. So it's like having molecular scissors for DNA that allows you to put together different pieces of DNA according to your wish. You received Nobel Prize for this discovery. So what these restriction enzymes do, just the way today CRISPR was discovered as a defense system of bacteria against infections of phages, restriction enzymes are also molecular defense molecules that bacteria use. To prevent foreign DNA from being introduced into the bacterium. Whether it's phage infection or some other invasion of foreign DNA, all of this can be prevented by the presence of these enzymes in the DNA that will literally recognize some sequences and cut them in the foreign DNA. These sequences are not cut in the DNA of the host because they are methylated. So the concept is very similar to this repair mechanism in DNA. When you have methylation, the DNA is not touched. And same in bacteria. If you have a methylation of a particular sequence, some nucleotides in the sequence, they will not be cut by an enzyme, but the same enzyme when a foreign DNA comes in will cut it. And what is fascinating is that these different enzymes, and they are called EqR1, depending on the strain of bacteria they have been isolated from, will recognize very specific sequences and cut them. And very often these sequences will be palindromic. That means that you will be able to read them in both directions the same way. GAATTC, GAATTC in both directions. This is another sentence that can be read in both directions. Just an example of a palindromic series of letters. So once you have these scissors, molecular scissors for DNA, you can start constructing artificial plasmids. And one of the most famous and first plasmids that has been designed was created by Francisco Bolivar in Mexico City. And he realized that to have a successful artificial plasmid, what you need is a circular DNA that can be copied in bacteria. And for this it has to have origin of replication. It has to have some sort of resistance, some enzyme that will make this bacteria better off. So let's say if you introduce this plasmid into a bacterium, if it is surrounded by some antibiotic, this plasmid will synthesize an enzyme that will prevent this bacteria to be killed by an antibiotic. So if you have one out of thousand bacteria with this plasmid, by adding ampicillin you will be only able to grow bacteria that have this plasmid. Additionally, in addition to this resistance, this plasmid has to have a unique cutting place, like E-cor1 restriction enzyme site. Because that way you can cut it open, put a different, according to your wish, DNA, let's say for insulin, and then put it back together. So a very, very short plasmid is beneficial. So you don't want to have it too big, because otherwise such plasmids will not, you will not be able to put them in bacteria. So how does this work? You will literally have a plasmid like that. You will use restriction enzymes to cut the plasmid. Then you will take foreign DNA that was also cut on both sides with the same restriction enzyme, you will put it in, you will close it. And some bacteria will pick up the plasmid that has your insert, and then you have to select those that have only your insert. And there are many ways to do it. Either there's a color indicator or something else that happens that allows you to pick the plasmid that has only your insert. And then once you have this new gene introduced, you can use the bacteria to make any protein that you need. What else do you need to pay attention to? Well, of course, when you cut the DNAs into pieces, you need to connect them. And which enzyme, you might remember, is responsible for that. Well, of course, when you cut the DNAs into pieces, you need to connect them. And which enzyme, you might remember, is responsible for this connection? This enzyme is called ligase. So you can use an enzyme called ligase that is used to connect little gaps in the DNA during copying of the DNA for the lagging strand of DNA synthesis. The polymerizers take care of their job. There is sometimes still a gap, covalent gap, in one of the DNA strands. And then ligase will connect these DNA strands. And the same thing will happen here. With ligase, you can connect these two pieces of DNA that have been previously cut and brought together. So using restriction enzyme and ligase, you can really bring them together. You can also use additional enzymes to kind of increase the efficiency of the reaction, but restriction enzyme and ligase are the two critical components. So what kind of proteins can you make? Well, you can make insulin. You can make human growth hormone in E. coli. You can produce interferon that is used for treatment of different viral infections, also in E. coli. The list is huge. For chemotherapy use, for people who need chemotherapy for cancer treatment, also E. coli is used. So this is remarkable. It's a list of different things. And it shows you that today molecular biology and biotechnology is so powerful. We can literally control how we produce proteins, how we change the proteins, and so forth. Okay. So with this, I would like to use the last 10 minutes or so to tell you about the relationship between proteins. And this is what is referred to as evolution and bioinformatics. So what was noticed early on is that the sequence of protein can vary quite a bit, but nevertheless, proteins might still fold into same three-dimensional shapes. So the chemical protein properties of amino acids are more important than the actual sequence. So the information content in amino acid sequences is actually very powerful, not only in terms of the identity of amino acids, but also in terms of their chemical characteristics. So because of this, you can sometimes compare proteins that have very distant relationships, and where identity between amino acids is very low. So when you have two proteins that have different sequences, but the same folds, and the same functions, you will see there are special terms that can be used to describe them. And so basically, if you have two enzymes that are similar in terms of the sequence, they're called homologous. But when they are homologous and also perform the same function, so for example, this is ribonuclease, human versus cow, they both carry out the same reaction, and that's when they're called orthologs. But sometimes there is a protein that still has some sequence and structural similarity, but does not carry out the same function as the homologous molecule. And in that case, this molecule, this pair of molecules, proteins, are called paralogs. So all of these are homologs, but these are paralogs, and these are orthologs. So how can we best compare such proteins? Well, you can see that, of course, if you have identical amino acids, between hemoglobin and myoglobin, these are slightly different oxygen-binding proteins. One is in the blood, and the other one is in your muscle. They're quite different in terms of sequence, but when you have identical sequence, valine and valine in both cases, no problem, that's identity. But sometimes you can have proteins, you can have amino acids that are not identical, valine and leucine, but they're similar. And then in that case, you can use that to figure out the leucine and isoleucine. You can see some similarities here, because they're both small hydrophobic amino acids. So a very elegant way to compare protein is to use a matrix that says, if you have on one side, let's say, a proline, and in the other protein also a proline, this is ranked very, very high, because proline is an unusual amino acid. If you have, let's say, an arginine in one molecule, but a lysine, a K on the other molecule, that still is energetically very favorable. So this is a good score for comparison. If you have an arginine replaced by a completely opposite type of amino acid, let's say a negatively charged amino acid, this will give a negative score. So this way you can really compare very distant proteins. So for example, these are all proteins that bind heme and bind oxygen, one is in animals, and this one is in plants. So those are very different sequences, and they're obviously, they have evolutionary separated a very long time ago, but using this type of analysis, you can compare them. And then if you know how many millions of years plants separated from animals, you can use this to compare them. Plants separated from animals, you can use this molecular clock and say, okay, if the plants and elements separated 500 million years ago, then how many changes in nucleic acids happen per million years? Because you can compare this plant heme-binding protein with animal heme-binding protein, and you will see that the identity is maybe only 20%. So every fifth, four out of five amino acids will be different. And so from this you can calculate the evolutionary clock, meaning how well the DNA sequence is preserved in the course of evolution. And then with this you kind of get a sense for the tools of molecular biology. You also get a sense for how you can compare the sequences, how you can analyze them. And finally now I would like to use the last few minutes to give you some examples of future topics that you will hear about. So I'll come back to protein synthesis machinery. You have seen what ribosomes look like, but in the classes that you will hear about in the future, you will see how you can relate the true atomic structure of many of these large assemblies to get to the chemical basis of protein synthesis. Just the way you were seeing how enzymes catalyze reactions and how they bind their substrates and how the chemical features of amino acids are important in catalysis. That's the same level of understanding that we now have also about the process of protein synthesis. You will see that the two subunits are composed out of RNA and proteins, how they are arranged, how they bind the tRNAs in the three active sites, in the three binding sites, the A, the P, and the E binding site. This is the large subunit, how the peptidyl transferase active site looks like, and so forth. So here is the view into the active side of the ribosome. You'll see it's based on RNA structure here, the active side itself. So it's truly an RNA machine. The RNA part is here shown in silver, and the proteins are shown in yellow and orange. And then finally, as a bit of an outlook conclusion, I would like to show you an animation that in the future you will understand much better of what this protein synthesis process involves. First you have initiation, you have a small subunit that has to find the messenger RNA with the help of initiation factors. tRNA has to bind. Initiation factor two hydrolyzes GTP to deliver to tRNA. Initiation factor one kind of helps the messenger RNA to stay in place. And the large subunit joins, then initiation factors dissociate. Initiation factor one and three dissociate. Initiation factor two hydrolyzes GTP, dissociates. And then you have elongation. So you have tRNA in the P site, messenger RNA interacting with a small subunit. Elongation factor T will deliver the next tRNA. GTP will be hydrolyzed. tRNAs will be positioned next to each other. Then the peptide bond is formed. Peptidyl transferous reaction happens. Then elongation factor G will push these two tRNAs. This is the translocation stage of elongation of protein synthesis. The two are pushed. The elongation factor G will then dissociate. And then the next, then the exit site tRNA will dissociate. The next tRNA will be delivered. This one comes in, this one leaves. And then next peptide bond is formed. And then you have at the rate of 30 to 40 amino acids per second, bacterial ribosome will synthesize one amino acid after another. It'll polymerize one amino acid after another. Sort of the polypeptide chain can be formed. If this protein is soluble, it'll fold. If it's a membrane protein, then it'll be recognized by a signal recognition particle. The whole ribosome will be then delivered to the membrane. And protein synthesis will continue. When stop codon is encountered, there is no tRNA that'll recognize it. Instead, there'll be a release factor, release factor one or two, that'll recognize specific codon here and release the polypeptide chain. And finally, the ribosome is disassembled. So with this, we conclude the last several lectures on the topic of the central dogma. You have heard about DNA replication. You have heard about transcription. You have heard about translation. And how all of this knowledge can be used to manipulate, engineer these processes for purposes of production of proteins for medicine. Or to better understand the process through mutagenesis, you can change the proteins and then study how they're changed in terms of function. So this is the end of the lecture today. We'll continue next time with the possibility to ask questions and with the topics that follow. Thank you very much. Thank you.