Lecture 2 Genomics - Sept 5 2024 - Abbreviated Version PDF

Health Sciences 1I06: Cellular and Molecular Biology Lecture 2: Genomic Processes Review...

Health Sciences 1I06: Cellular and Molecular Biology Lecture 2: Genomic Processes Review Dr. Eric Seidlitz B.Sc., B.A. (Hons.), M.Sc., Ph.D. Assistant Professor, Anesthesia There are a lot of slides here, but we will not deal with many of them in depth. Consider this slide deck as a resource for further study if some of the concepts are unfamiliar. Since every structure that is involved in cellular communication is has its origin in genomics, you will need to re-acquaint yourselves with the world of molecular biology. Much of the material I will skim over may be familiar to you from high school – if it isn’t, please talk to us to help get you up to speed. Some courses try to cover every aspect of the genomic system in minute detail, but that will not be very practical, nor would it be useful here. Instead, the main goal is to view look at the genomic system in terms of how you might use the information. In particular, you should attempt to determine which structures and processes could be prone to failure, or what things might be manipulated to fix something that has already gone wrong. This practical perspective on biology will hopefully provide you with a better sense of what details are important to explore further in your understanding of health and disease. General note: Much of what we will cover in this lecture has been discovered using prokaryotes (single celled organisms like bacteria) rather than eukaryotes (multicellular organisms like you). I will point out any major differences between the two phylogenetic classes of organisms. A third class is the Archaea, and these often have intermediate features. 1 This is the familiar framework that was introduced in the first lecture. Memorize this! 2 We will expect that you understand the fundamental structures and processes involved in communicating information from DNA to proteins. Yes, genomics can also be viewed as a communication system that follows all the same communication steps we have already outlined. You will need to understand the differences between DNA, RNA, and proteins, and you should be able to identify and describe the processes that occur between each of these structures. Beyond that are the details that you may need to look up and understand if they are relevant to solving problems. Proteins will be a significant focus of this lecture, as they are exceptionally complex and essential for all aspects of cellular signalling. 3 DNA & Replication DNA is the stable storage molecule that carries genetic information that codes for the development and functions of all cells Replication is the process where double- stranded DNA is copied to produce two identical DNA molecules Enzymes involved: DNA polymerase, helicase, primase, ligase, etc. Some viruses are DNA-based, and DNA mutations have obvious connections to many disorders I will start with DNA and the processes involved in duplicating it. Why would we be interested in DNA and replication? The answer to that relates to the purpose of DNA. That purpose is to be a stable storage mechanism for information. I highlight “stable” as an important feature because it is probably the single feature that makes life practical on this planet. Replication is the process that generates faithful copies of the information. Cells that reproduce need to make exact duplicates (with some limited variations) of the information they carry in their DNA. There are several enzymes that are involved and all I want to get across is what functions they serve in the processes of replication. Many diseases relate to changes in DNA and related processes. As well, many viruses are DNA based. A recent example is Mpox (previously known as monkeypox) - it is a double stranded DNA virus. DNA viruses can be single stranded (maybe having up to 2,000 nucleobases) or double stranded (over 375,000 bases). 4 Biological information storage Biological information is typically stored in a stable chemical format This single common format has been the virtual standard for 3.5 billion years Comprised of polymer chains of 4 different monomer subunits (the DNA alphabet) Code can be read and understood by any and all organisms using this language wikipedia Biological organisms record their important information chemically in a single common format that has been the virtual standard on Earth for the past 3.5 billion years. Although somewhat unexpected, given all the diversity we see out there, essentially all life forms on the planet store their information in double-stranded molecules of deoxyribonucleic acid (DNA), which are polymer chains of 4 different monomer subunits (always A T C or G; although there is some exclusive RNA storage known…). Such chemically coded information can be read and understood by any and all organisms using the same language – such that a piece of DNA from one organism can be inserted into another, and it can be read, interpreted, and copied without any problem. This language is read using specialized proteins found in all organisms. 5 DNA structure A biopolymer comprised of: A backbone of sugars (deoxyribose) linked by phosphate groups forming an unbranched chain Attached to the backbone are 4 different bases called purines (2 rings) or pyrimidines (1 ring) G A T C D D D D P P P P As a review, DNA is a polymer comprised of a backbone made with a chain of deoxyribose sugars (it is a ribose with the hydroxyl group removed to make it more stable), which are linked together by phosphate groups. Attached to each sugar is one of 4 bases that are called purines (2 ring) or pyrimidines (1 ring). Pyrimidines are single ring structures. Cytosine, Uracil (in RNA), and Thymine Purines, on the other hand, are double ring structures (Guanine and Adenine). Here is a good mnemonic to help remember which bases are the pyrimidines and the purines: “Pyrimidines are like the pyramids, and will CUT your ring finger” [for Cytosine, Uracil, Thymine]. 6 Nucleotides and nucleic acids Sugar + phosphate + base = nucleotide Polymer of nucleotides = nucleic acids P nucleotide G D Base complementation is critical for storing information A binds to T G binds to C Here is some terminology you make encounter: A unit made of a sugar, a base, and a phosphate is called a nucleotide. A chain of nucleotides is called a nucleic acid (which is the polymer that becomes a bigger polymer called DNA). The most critical part of DNA chemistry is that nucleotides happen to have a very strong preference to link up to each other in a very specific pattern. As such, a G prefers to bind to a C, and an A likes to bind with a T. This is called base complementation. Aside: In 1950, Erwin Chargaff noticed that the amount of A nucleotides in DNA matched almost exactly the amount of T nucleotides. The same thing occurred with G and C nucleotides, although their absolute amounts of G and C were less than that for A and T. He suggested that this was a “strange but possibly meaningless” phenomenon. He never really got it at the time, but he had identified the concept of base complementation. This would later be called the Chargaff Rule. 7 P P nucleotide D C G D P P D T A D P P D T A D P P D G C D P = phosphate P P D = deoxyribose C = cytosine T = thymine pyrimidines D A T D G = guanine purines A = adenine This is an animated sequence showing nucleotides and how they combine to form the backbone of DNA. Once you have generated this basic ‘ladder’ structure, all the polymer does is twist on its own due to its unique chemistry. Note that there are only two hydrogen bonds between A and T, but three bonds between G and C (making these somewhat stronger). Here is a video that might be useful to picture what is happening in DNA replication: https://youtu.be/TNKWgcFPHqw?si=Jd1ZdH7hBqI_iaWZ 8 What makes DNA useful? DNA is chemically very stable RNA may have been used in the past, but DNA is far more stable (100x) Smaller grooves and stronger bonds between strands 2’-OH group of RNA can be easily hydrolyzed https://sphweb.bumc.bu.edu/otlt/mph-modules/ph/ph709_basiccellbiology/ph709_basiccellbiology6.html https://passel2.unl.edu/view/lesson/6f214d098527/3 Here are some of the features of DNA that may be important to its success: The genetic material must be extremely stable so that sequence information can be passed on from generation to generation without errors. RNA is also an option, but it is susceptible to base-catalyzed hydrolysis. The removal of the 2′-hydroxyl group from the ribose (making it deoxyribose) decreases the rate of hydrolysis by approximately 100-fold under neutral conditions and perhaps even more under extreme conditions. Thus, the conversion of the genetic material from RNA into DNA would have substantially increased its chemical stability. Double stranded DNA also has smaller grooves than double stranded RNA – therefore less physical opportunity for it to get degraded. Even better, the strength of the bonds in double stranded DNA is actually greater than in double stranded RNA. RNA is continually made, broken down, and re-used… it appears more suitable to fast information transfer (like volatile memory, flash drives, etc.) rather than for long term storage (DVD or CD). Yes, I like analogies. Get over it. 9 DNA topology Information initially encoded in the primary structure – the sequence of the bases 4 letter alphabet is much more information rich than the binary (2 letter) alphabet of computers Amount of information encoded is limited only by chain length (>2.2 million terabytes per gram) Complementary nature of nucleotides is very useful for duplication of the structures You know DNA is a polymer made up of 4 different subunits. It makes sense that the sequence of those subunits along the chain would be an important feature of the polymer. The four different “letters” provide a means of encoding information (regardless of whether you know what that information means). Just like computers that use an alphabet of only two letters (0 and 1), you can predict that the amount of information that can be stored by nucleotide sequences (4 letters) is even greater than with binary. Well, it can be shorter messages – meaning that it has increased information density. With primary structure, the amount of information stored is limited only by the length of the chain. With DNA, the length of the strand is easily changed by simply adding or subtracting more nucleotides. ASIDE: An interesting Nature paper ( https://pubmed.ncbi.nlm.nih.gov/23354052/ ) demonstrated that DNA can store ~2.2 petabytes of information per gram (using only 3 of the 4 nucleotides for information storage). A petabyte is 1x1015 or 1,000,000,000,000,000 bytes (1 million gigabytes or 1,000 terabytes). How heavy is a 3-terabyte hard drive? Probably more than one gram... 11 DNA Topology Information can also be stored in many ways by DNA Primary structure (1°) Sequence of bases Secondary structure (2°) Double helix Tertiary structure (3°) Folding and coiling Quaternary structure (4°) Interaction between chains Right-handed only! Topology is a description of the “higher order structure” of molecules. Boring, right? Primary structure of DNA is simply the sequence of bases. Secondary structure of DNA is simply the double helix shape. This shape has several physical and chemical advantages. Tertiary structure is how the helix folds upon itself in 3-dimensional space. This includes supercoiling and knotting. Quaternary structure is how the DNA chains interact and connect with other strands. Concatenation is an example of this. The 3D twist gives you the familiar double helix shape (it is called double, since there are two backbone strands connected together by their bases). The shape (or secondary structure) displays the major and minor grooves, and this shape allows or prevents interactions with other molecules. Aside: Most DNA (called B-DNA) is a “right-handed” spiral. If you hold your right hand out, thumb up, your fingers describe the direction that the helix winds. Looking from either end, it turns clockwise as it goes away from you. The is the same way a standard bolt or screw gets tightened. If you turn it clockwise from above, the screw will move away from the screwdriver – this only works for right-handed helices. A simper way to tell is to think of the DNA as a winding staircase. Visualize which hand you would use to hold the outer railing going UP, and that will tell you whether the structure is right- or left-handed. Left-handed DNA has little difference in the width between the major and minor grooves, making it chemically less favourable. These grooves are formed by the bonds between the two strands (the minor groove) and space created by the twisting of the strands (the major groove). 12 The double helix staircase designed by Leonardo Da Vinci is one of the most famous features of the Chateau de Chambord in the Loire Valley. Two helical ramps servicing the main floors of the building were designed so that two people could use the different sets of staircases at the same time. They can see each other going up or down, yet they never meet. It is suggested that the Queen and the King’s mistress could be in the castle at the same time… Notice that Da Vinci designed a left-handed spiral. Photo is my own (my wife is on the way up one of the spirals). 13 DNA polymerases Enzymes that bind deoxyribonucleotides together to create a DNA strand Multiple types At least 15 in humans Multiple functions Reads the template bases Adds the complementary bases Checks for errors https://commons.wikimedia.org/wiki/File:DNA_polymerase.svg DNA polymerases are a class of enzymes that have multiple functions in this replication process. Their purpose is to synthesize DNA by reading the base sequence, adding the complementary bases, and checking to make sure that the copy is correct. There are many different types of polymerases, but the main point is that they all do the job of replication. 15 The winding problem DNA replication moves very quickly Up to 1000 nucleotides/s for prokaryotes Up to 50 nucleotides/s for eukaryotes Tangles can form as strands are pulled apart Helix ahead of replication fork needs to rotate 50+ revolutions per second for prokaryotes 2-5 revolutions per second for eukaryotes Tangles will slow down or stop replication Solution? DNA topoisomerases If you pull apart the two halves of a double stranded structure, you have a physical problem. They get twisted. Yes, DNA strands very easily get tangled while trying to separate them, especially when you are replicating nucleotides at the speeds found in bacteria. Bacterial DNA replication can move very quickly – up to 1000 nucleotides per second. This leads to a “winding” problem, in that the parental helix ahead of the replication fork may need to rotate at up to 50 revolutions per second just to keep up! To prevent tangles, enzymes called DNA topoisomerases are used. Eukaryote DNA replication occurs at about 50 nucleotides per second (the DNA is packed very tightly rather than being in loose or circular structures). This would lead to the eukaryote DNA winding at maybe 2-5 revolutions per second… tangling is still an issue. For perspective, a Compact Disc (CD) spins at 5-8 revolutions per second (depending on whether reading near the outside edge or the hub) and most hard disc drives (7200 rpm) spin at 120 revolutions per second. 19 Topoisomerases Isomerase enzymes that bind to DNA to cut then repair the backbone Create temporary break to allow DNA to untangle (nuclease activity) DNA resealed to correct the strand break (ligase activity) Resealed DNA has the same structure (it is an isomer) but different TOPOLOGY Topoisomerase I cuts a single strand (doesn’t require ATP) Topoisomerase II cuts double strands (requires ATP) In simple terms, all the topoisomerases do is to cut DNA strands while hanging onto the other, let that strand pass through the cut, then sealing up the original strand. Topoisomerases got their name because the tangled and untangled DNA molecules are chemical isomers that differ only in their global topology. Topoisomerases are isomerase enzymes that act on the topology of DNA. Topoisomerases are enzymes that possess both nuclease (cut) and ligase (fix) activity. These tasks are important in both replication and DNA repair. Specifically, DNA topoisomerase I breaks a single strand to allow the two strands to rotate freely – only a short length of helix needs to rotate. Topoisomerase II causes a double strand break that is temporary, allowing a gate to form (making it easier to separate two DNA pieces…). Why is this relevant? Some chemotherapy drugs inhibit the topoisomerases. In fact, one with which I have personal experience (doxorubicin) inhibits topoisomerase II. On its own, that doesn’t do a lot, but if you happen to cause DNA damage with something like radiation, the cancer cells are unable to repair the DNA breaks and they no longer replicate. 20 Helicase, Ligase, Primase Classes of enzymes: HELICASE: separates and unpacks DNA to allow for replication LIGASE: Binds together (covalently) separate strands PRIMASE: Initiates the addition of new bases by making an RNA primer (“start here”) https://en.wikipedia.org/wiki/Helicase#/media/File:DNA_replication_en.svg In addition to the topoisomerases, there are a number of enzymes involved in DNA replication that you may read about, and the main ones are helicase, ligase, and primase. Each have multiple forms, and their names describe their functions. Many drugs are available that alter the functions of these various enzymes. Question: What do you think would be the result of inhibiting helicase, ligase, or primase? 21 Genetic punctuation How does replication know where to stop? Telomeres are like DNA punctuation Special nucleotide sequences marking the ends of DNA strands Multiple tandem repeats of a sequence containing blocks of G nucleotides In humans the telomere sequence is TTAGGG and it often extends for ~10,000 nucleotides Telomere sequences are NOT paired (they are single stranded) http://www.corbisimages.com They attract an enzyme called telomerase How does the DNA polymerase determine when it has reached the end of a strand it is replicating? It waits for a signal to let go – almost like a period that defines the end of a sentence. Telomeres are the punctuation of the genetic code. They are special sequences to mark the end of DNA strands in eukaryotes. They mark the ends to show that these are indeed supposed to be the ends rather than just broken strands needing to be repaired. In humans the code for a telomere is “TTAGGG” – this sequence can be repeated thousands of times. Strangely, the telomere DNA is not paired (it is single stranded). The image here shows the telomeres on the ends of chromosomes (all 4 ends of the strands). 22 Telomerase Telomerase recognizes telomere sequences and makes more telomeres using a built-in RNA template The 3’ tail is always longer than the 5’ end Telomeres loops back to a similar repeat sequence further up the strand, thus creating a unique “knot-like” tail Telomerase acts as a reverse transcriptase (Nobel prize 1975 Dulbecco, Temin, Baltimore; for genome interaction of oncoviruses) Telomeres are pieces of DNA Telomerase makes new telomeres Wikipedia.org * Telomerase the enzyme recognizes the telomere sequences and acts like a reverse transcriptase by synthesizing DNA using an RNA template (not a DNA template) onto the 3’ end of the parental strand. This makes the 3’ always a little longer than the paired 5’ end. The extended tail loops back to a similar repeated sequence further upstream, thus creating a unique tail for a DNA molecule (and distinguishing it from a broken strand). Telomerase recognizes telomeres because the RNA template within the enzyme is complementary to the telomere sequence. Once bound, it uses other pieces of DNA to add on more telomere sequences to what was already there. HISTORY: Reverse transcriptase was discovered independently by Howard Temin and David Baltimore, both of whom were former students of Renato Dulbecco. All 3 shared the 1975 Nobel Prize for Physiology or Medicine – they studied oncoviruses and discovered how they interacted with the host genome. I met Dulbecco in the early 1990’s and showed him through the lab where I was working in Toronto. Yes, I was a science geek back then, too! 23 DNA and Replication Applications DNA replication and repair disorders Li-Fraumeni syndrome (missense mutations in TP53) Xeroderma pigmentosum (base excision repair) Fanconi anemia (impaired DNA repair) Mutations Cystic fibrosis (base deletion/frameshift) Huntington’s disease (copy number variation) Sickle-cell anemia (single base substitution) DNA Viruses Most are double stranded Mpox (dsDNA) Herpes simplex virus (linear dsDNA; encodes own polymerase) Can target the viral polymerase (e.g. with acyclovir or tenofovir) Many diseases relate to changes in DNA and related processes. For example, many viruses are DNA-based. A recent example is the Mpox virus (previously known as monkeypox) - it is a double stranded DNA virus. DNA viruses can be single stranded (maybe having up to 2,000 nucleobases) or double stranded (over 375,000 bases). The DNA can be circular or linear. One of the benefits of being a DNA virus is that the genomic material is very stable, and thus a DNA virus can have a very large genome. These viruses either code for their own polymerase (sometimes replicating in the cytoplasm) or they use the host’s own machinery in the nucleus for this. Payne S. 33 - Introduction to DNA viruses. In: Payne S (ed). Viruses (Second Edition). Academic Press, 2023, pp 301–307. https://www.sciencedirect.com/science/article/pii/B9780323903851000273 Kausar S, Said Khan F, Ishaq Mujeeb Ur Rehman M, Akram M, Riaz M, Rasool G et al. A review: Mechanism of action of antiviral drugs. Int J Immunopathol Pharmacol 2021; 35: 20587384211002621. 10.1177/20587384211002621 26 RNA & Transcription RNA is the temporary structure used to as a message to generate a specific protein Transcription is the process where a segment of DNA is copied into RNA by the enzyme RNA polymerase. RNA polymerase separates the DNA strands and synthesizes a complementary RNA strand that is the same sequence as the original gene. RNA can have multiple functions (structural, catalytic, etc.) due to its topology The next major structure to explore is RNA (ribonucleic acid). This structure is similar to DNA in some ways but is used more as a temporary copy of the information stored in the DNA. Due to its chemical properties, RNA has some advantages as an information transfer structure. 27 RNA is for information transfer Gene expression Moving information out of DNA First step (transcription) starts inside the nucleus (eukaryotes) A temporary copy made in RNA RNA sent out of the nucleus to be made into protein Transcription is similar processes to replication Only parts of the DNA are transcribed Synthesis is in multiple steps https://www-archiv.fdm.uni-hamburg.de/b-online/library/biology107/bi107vc/fa99/terry/RNAprot.html Information storage and copying by DNA is all well and good, but how useful is this to a biological organism? What good are blueprints if the reader has no idea what they mean or what can be done with them? This is where I’ll bring in the term “gene expression”. The real use of DNA is in the application of the information that is stored in it. Making RNA is the first step to putting that information to use. Transcription is the process that makes a temporary copy of a gene sequence. It occurs inside the nucleus and the final product is sent out for the remaining steps in 28 Differences between DNA and RNA Only one strand of DNA is used 5’ coding strand (aka sense or non-template) is not used 3’ template strand (aka non-coding or antisense) is used Okazaki fragments not needed Much faster than DNA replication (uses a different polymerase) Only small sections of DNA are processed Multiple copies in progress at the same time Less error correction needed generated RNA strand will be identical in sequence to original gene The 3’ strand (template) of DNA is used only – thus Okazaki fragments are not necessary since the copy or transcript is generated quickly in the 5’ to 3’ direction. The transcription process itself is like making a rapid photocopy of blueprints to give that copy to the contractors who are building a house. If you always keep the original and only give the copies to the builders, the original plans won’t ever be lost. Only small sections of the DNA template are transcribed into RNA – not the whole thing. The non-template strand is considered your “original gene sequence”, and it is called the coding strand. The template strand is complementary to that one and it is the one that is used to make the RNA transcript. Thus, your generated RNA is identical in sequence to your gene (except for the switching out of the THYMINEs by URACILs). The template strand can also be called the non-coding strand or the antisense strand. The coding strand can also be called the sense strand or the non-template strand. There are fewer error checking routines built into transcription since errors will have little impact - more copies can be made quickly to replace poorly coded RNA strands. 29 RNA structure Phosphate sugar (ribose) backbone with bases that are similar to those in DNA: Adenine Guanine Cytosine Uracil (instead of thymine) By Narayanese, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=3481560 https://courses.lumenlearning.com/suny-orgbiochemistry/chapter/19-1-nucleotides/ The structure of RNA is probably familiar, so I won’t go over the details. It works very much like DNA, but used the bases: Adenine, Guanine, Cytosine, and Uracil (used instead of thymine). If you take a single strand of DNA, there is an equivalent strand of RNA that matches by following the complementary base rules. The pairing rule of bases are A, U, C, and G on the RNA paired to T, A, G, and C of the DNA. Remember the names of these two classes of bases? The chemical difference between RNA and DNA is quite subtle (OH on the 2’ carbon) but it allows for a much more flexible final structure. 30 Like DNA replication, there are polymerases involved in RNA synthesis. An RNA polymerase works in similar ways to a DNA polymerase. The structure needs to find the right place to start (a promoter) and cooperates with a number of other proteins to make sure that it will start making the copy at the right place. The double helix DNA here is an oversimplification – in the nucleus, the DNA is not a simple double helix, and in fact is so tightly coiled that there is no way that the polymerase would be able to access a section of DNA to copy. 31 Histones and Nucleosomes Nucleosome DNA being so tightly coiled and packaged in the nucleus is a physical problem to solve. To be able to make a copy as a piece of RNA, you have to gain access to the DNA bases. For prokaryotes that do not have separate nuclei and often have circular DNA, this isn’t an issue. For eukaryotes (and some archaeans) the DNA is wound up tightly on things called histones. Multiple histones (usually 8) together are called a nucleosome. Histones are a family of proteins that act as spools around which DNA is wound. This is a very efficient way to package the average 1.8 metres of DNA in every human cell. The histones hold onto the DNA mostly due to electrostatic force between the negatively charged phosphate backbone of the DNA and the positively charged histones. There are enzymes that can change these forces to essentially tighten up or relax the winding. 32 HATs and HDACs Histone acetyltransferases (HATs) Histone deacetylases (HDACs) Acetylation removes the positive charge of Deacetylation replaces the positive charge of histones (by adding acetyl groups to lysine histones (removes the acetyl groups) thereby residues on the histones) thereby decreasing increasing the interaction of the N-termini of the interaction of the N-termini of histones histones with the negatively charged with the negatively charged phosphate phosphate backbone of DNA backbone of DNA To modify how tightly the DNA is coiled, there are enzymes that change the positive charge on the histones. These are called HATs and HDACs. To relax the DNA and allow access for the RNA polymerase, HATs add an acetyl group to lysine residues on the histone protein sequences. 33 One way to remember is to imagine a “Relaxed HAT” This image shows that acetylation and deacetylation of the histones and the resulting conformations. One way I remember the direction of the effect is to imagine a “relaxed hat” – I chose an image of one that you may recognize. Histone image from: Eslaminejad MB, Fani N, Shahhoseini M. Epigenetic regulation of osteogenic and chondrogenic differentiation of mesenchymal stem cells in culture. Cell J 2013; 15: 1–10. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3660019/ 34 Transcription Process 1. DNA is unwound and bonds are broken between paired nucleotides (using gyrase & helicase) 2. RNA nucleotides line up on the DNA bases 3. RNA polymerase links the correct nucleotides 4. RNA strand breaks away from the DNA 5. RNA further processed and sent out of nucleus (if the cell owns one) Once you have access to the DNA, here is the process of transcription: 1. The enzyme helicase breaks/unwinds the DNA over a short section. This whole process moves like a wave down the DNA backbone. The enzyme gyrase further relaxes the unwinding double helix by making cuts in the DNA – in prokaryotes, gyrase is a special type of topoisomerase II. 2. RNA nucleotides line up around the separated strand of DNA using their base pairing rules. 3. RNA polymerase attaches all the correct bases together with some minor error checking. 4. The RNA transcript breaks off the DNA and another copy can be started right away (even before the first one is finished). 5. The is some post processing of the RNA transcript that does many important things – we can get to this later. This is mostly what happens in prokaryotes… eukaryotes are typically more complicated. If the cell has a nucleus, the RNA is further processed and eventually exits to the cytoplasm through the nuclear pore complex. 35 RNA polymerases Eukaryotes have 3 main RNAP types RNA polymerase I (transcribes r- RNA) RNA polymerase II (transcribes most genes for proteins) RNA polymerase III (transcribes t- r- and sm-RNA) Eukaryotes require transcription factors to Help polymerase bind to a promoter Pull apart DNA to start transcription Release RNA polymerase from promoter to allow elongation The three main RNA polymerases (RNAP) share some common subunits and many structural features, but they transcribe different types of genes. RNA polymerases I and III transcribe the genes encoding transfer RNA, ribosomal RNA, and various small RNAs. RNA polymerase II transcribes the vast majority of genes, including all those that encode proteins, and our subsequent discussion therefore focuses on this enzyme. While bacterial RNA polymerase is able to initiate transcription on a DNA template without the help of additional proteins, eukaryotic RNA polymerases cannot go it alone. They need transcription factors to get them started (I suppose it is easy for a polymerase to get lost in all that DNA). Transcription factors help to position the RNA polymerase correctly at the promoter, aid in pulling apart the two strands of DNA to allow transcription to begin, and release RNA polymerase from the promoter into the elongation mode once transcription has begun. ASIDE: RNA polymerase was discovered independently by Sam Weiss, Audrey Stevens, and Jerard Hurwitz in 1960. They didn’t get a Nobel prize, though, since by this time the 1959 Nobel Prize in Medicine had been awarded to Severo Ochoa and Arthur Kornberg for the discovery of what was believed to be RNAP, but instead turned out to be polynucleotide phosphorylase. 36 1. Pre-initiation Pre-initiation complex Region of DNA that facilitates transcription of a gene Core promoter region Required in eukaryotes for RNA polymerase to bind Common example is the TATA box Located upstream from transcription start site (TSS) Transcription factors (TFs) DNA helicase (splits the DNA helix) RNA polymerase Activators and Repressors A promoter is a region of DNA that facilitates the transcription of a particular gene. In eukaryotes, core promoters are found at -30, -75, and -90 base pairs upstream from the transcription start site (abbreviated to TSS). The TATA box (also called Goldberg-Hogness box) is a DNA sequence found in the promoter region of genes in archaea and eukaryotes; approximately 24% of human genes contain a TATA box within the core promoter. The TATA box has the core DNA sequence 5'- TATAAA-3' or a variant, which is usually followed by three or more adenine bases. It is usually located 25 base pairs upstream of the transcription site. The sequence is believed to have remained consistent throughout much of the evolutionary process, possibly originating in an ancient eukaryotic organism. The TATA Binding Protein (TBP) is a subunit of the general transcription factor TFIID (it is pronounced as “T F 2 D”) that is responsible for recognizing and binding to the TATA box sequence. TBP bends or distorts the DNA and this unique DNA bending (two kinks in the double helix separated by partly unwound DNA) may serve as a landmark that helps to attract the other general transcription factors. 38 Initiation & the pre-initiation complex http://www.mun.ca/ Preinitiation complex is comprised of over 100 proteins! This just shows visually what some of the required components are for pre-initiation. The message to take away here is that transcription pre-initiation is a very complex process. Each and every factor that is involved is controllable in its own way, and this gives the cell exquisite control over what is transcribed. TFIID is even a more complex complex (awkward…!) than is shown here. What I mean by exquisite control in this case is that each of the individual subunits of TFIID can be potentially be controlled by other factors…and that is just one of many different transcription factors. 39 2. & 3. Initiation and Promoter Clearance Begins at the Transcription Start Site In prokaryotes this usually begins with an A Promoters must be cleared to allow elongation to proceed Allows polymerase to maintain contact with DNA after moving away from initiation factors Adding ~23 bases is usually enough Abortive initiation can occur Truncated transcripts http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3529798 After the first ribonucleotide bond is synthesized, the RNA polymerase must clear the promoter or it can get stuck (and transcription stops). During this time, it is common to release the RNA transcript and produce truncated transcripts. This is called abortive initiation and is normal in both eukaryotes and prokaryotes. Approximately 23 nucleotides must be synthesized before RNA polymerase loses its tendency to slip off and prematurely release the RNA transcript. The diagram shows many different points where initiation and promoter clearance can fail. The first base in the ‘coding region’ is usually an A because AUG is the typical start codon when making a protein. A transcription bubble is loosely defined as the region of the double strand DNA that is unwound by helicase to allow transcription to occur. The position of the transcription bubble along the DNA is initially determined by the relative position of the TATA box, not by the transcription start site. Just after initiation, the transcription bubble stays in the same spot until the bubble becomes unstable. Transcription continues while the bubble is stretched until the force of the RNA polymerase movement pulls everything away from the TATA box. For the elongation phase, the transcription bubble just travels along with the polymerase. In prokaryotes, one study found that up to 165 short transcripts were made for every 1 normal “productive” transcript. Eukaryotes are far more successful at getting full length transcripts, primarily due to the presence of the core promoters. 40 4. Elongation Adding complementary bases to achieve full length of transcript Little need for error checking Proceeds until a termination signal is recognized ck12.org Elongation is simply adding on bases to make the full length of the transcript. There is little requirement for error checking since this system is designed to produce multiple copies very rapidly. Any errors will likely be irrelevant since there are always other copies of the RNA being made. 41 5. Termination Termination region/site on DNA (or on transcript) prompts the polymerase to release Termination factors Required by eukaryotes, sometimes used by prokaryotes Each eukaryote RNA polymerase recognizes different termination factors Rho is a common prokaryote termination factor (it is inhibited by bicyclomycin) Rho-independent termination has sequences that make RNA polymerase http://www.ncbi.nlm.nih.gov/books/NBK21601 pause (2° structure!) The termination signal will be recognized by the polymerase and will be a signal to release the transcript from the transcriptional complex and send the RNA polymerase to do other work. There are a number of models for how the termination occurs – one is that the terminator region of the newly formed transcript changes the conformation of the RNA and this promotes dissociation of the complex. Rho protein is a termination factor sometimes used by prokaryotes. It recognizes a region of C bases in the transcript and somehow causes termination. Rho is inhibited by the antibiotic naturally-derived antibiotic bicyclomycin. Conveniently, this antibiotic stops transcription in prokaryotes and thus stops bacteria from being able to grow. The image above is the rho-independent termination sequence in the trp operon. A stem- loop structure forms naturally by complementary base pairing within the new transcript, and its formation somehow causes the RNA polymerase to pause and fall off (RNA is wiggly and changes shape all the time). The polymerase makes the multiple U repeats after the loop has been formed and then releases. 42 6. Post processing Prepare transcript for final destination and use Three common processes Tag the 5’ end (5’ capping) Tag the 3’ end (3’ polyadenylation) Remove introns (RNA splicing) meyerbio1b.wikispaces.com Post processing of the RNA is done to make sure the RNA is ready for its next task (which could be to generate a protein). The most important post processing steps are to cap both ends in what is called 5’ capping and 3’ polyadenylation. These special ends allow the cell to assess whether both ends of an mRNA molecule are present (and the message is therefore intact) before it exports the RNA sequence from the nucleus for translation into protein. If there is no cap on each end of the transcript, other mechanisms in the cell will likely degrade the bad piece of RNA. Since eukaryote DNA has a lot of non-coding regions and introns (defined as anything removed by RNA splicing before the final transcript is produced), these need to be excised or cut out before the RNA goes on to perform its eventual function. The opposite of intron is exon (i.e. the pieces that are not spliced out). The order of the steps presented above is just arbitrary – they usually happen roughly at the same time. If you are interested, this is a short video that shows some of these steps: http://www.youtube.com/watch?v=DoSRu15VtdM 43 5’ Capping 5’ end of newly forming transcript is tagged Occurs shortly after initiation A guanine nucleotide attached by a 5’ to 5’ triphosphate linkage Subsequently methylated Purpose: Promote 5’ intron excision Regulate nuclear export Prevent degradation Promote protein translation The 5’ cap consists of a guanine nucleotide connected to the mRNA via an unusual 5' to 5' triphosphate linkage. The cap of eukaryotic RNA transcripts is important to make sure introns are removed, ensures nuclear export, prevents RNA degradation, and promotes binding to a ribosome during translation (protein production). 44 Polyadenylation Adding multiple adenosine monophosphates to the 3’ end Poly-A polymerase adds ~200 adenine bases Serves as a termination signal Increases stability and helps the export from nucleus NOT copied from the template strand Poly (A) tail shortens over time and can eventually lead to degradation of the 3’ end of the RNA… Now called messenger RNA (mRNA) Polyadenylation is the addition of a tail to an RNA molecule. This poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. You will learn later about the cyclic version of adenosine monophosphate already (cAMP is made with adenylyl cyclase and ATP). In eukaryotes, polyadenylation is part of the process that produces mature messenger RNA (mRNA) for translation. Like the 5’ cap, the poly(A) tail is important for the nuclear export, translation, and stability of mRNA. The tail is shortened over time, and, when it is too short, the mRNA is enzymatically degraded. Does this remind you of anything? 45 RNA splicing Why waste good coding? Easy genetic recombinations make new proteins from same genes Alternative RNA splicing Intentionally used to generate many proteins from same DNA message (60% of genes) Enhances coding potential (more information rich) http://www.nature.com The presence of numerous introns in DNA appear to allow for genetic recombination to readily combine the exons (important bits) of different genes, allowing the same genes to generate new protein products. RNA splicing also has another advantage. The transcripts of many eukaryotic genes (estimated at 60% of genes in humans) are intentionally spliced in a variety of different ways to produce a set of different mRNAs, thereby allowing a corresponding set of different proteins to be produced from the same gene. Rather than being the wasteful process, RNA splicing enables eukaryotes to dramatically increase the already enormous coding potential of their genomes. No longer is it one gene = 1 RNA. The definition of an exon vs. an intron is dependent on what happens here – one RNA’s exon is another RNA’s intron… 46 Very simple version of transcription Get set up at the right spot, with help Start working, but don’t get stuck Make the RNA Stop at the right spot Tweak the RNA to make sure it includes what you want Move it to where you want it to go This is the quick and dirty way to describe transcription in eukaryotes. 47 RNA and Transcription Applications Some viruses hijack transcription HIV (ssRNA virus) It reverse transcribes itself into DNA because it has coding for reverse transcriptase The single stranded virus DNA merges with the host genome and produces multiple copies of the RNA Drugs can target the reverse transcriptase (emtricitabine, tenofovir) (PreP) Hepatitis C (ssRNA virus, positive sense) Wikipedia.org Viral RNA polymerase inhibition (remdesivir) Polio virus (ssRNA virus) Produces products that cleave the TATA binding protein to stop host transcription Influenza virus Interferes with splicing and polyadenylation to prevent nuclear export, allowing abnormal transcript generation Some viruses (such as HIV, the cause of AIDS), have the ability to transcribe RNA into DNA. HIV has an RNA genome that is duplicated into DNA. The resulting DNA can be merged with the DNA genome of the host cell. The main enzyme responsible for synthesis of DNA from an RNA template is called reverse transcriptase. It is also called RNA-dependent DNA polymerase and acts like a DNA polymerase enzyme that transcribes single-stranded RNA into single-stranded DNA. In the case of HIV, reverse transcriptase is encoded directly in the viral genome. HIV is an example of a retrovirus – a virus that reverse transcribes its RNA into DNA to replicate and alter the host. Nucleoside reverse transcriptase inhibitors (NRTIs) are a modern approach to preventing and treating HIV. This is the basis for PreP (pre- exposure prophylaxis) for HIV prevention. Similar drugs exist for treating the hepatitis C virus – taking advantage of detailed knowledge of how transcription works. Poliovirus encodes a protease that specifically cleaves the TATA-binding factor component of TFIID, effectively shutting off all host cell transcription via RNA polymerase II. Polio virus is a positive sense RNA – it is translated directly into protein. Influenza virus produces a protein that blocks both the splicing and the polyadenylation of mRNA transcripts, which therefore fail to be exported from the nucleus. 48 Proteins & Translation Proteins are the structures that are the information output of the genomic system. Translation is the process where the mRNA is decoded by a ribosome to produce a specific amino acid chain, or polypeptide. Role of ribosomes, tRNA, mRNA: Ribosomes synthesize proteins, tRNA brings amino acids, mRNA is the template for protein synthesis. After DNA and RNA, proteins are the final step along the genomic pathway and are involved in all cellular signaling. Proteins are also critical to all the molecular processes previously developed. If you thought DNA and RNA were complicated… hold on for a wild ride! Proteins are far more complicated. However, understanding just the basics of what I call the ‘business end’ of the genomic system will provide you with a wide range of possibilities when solving problems that involve signalling gone wrong. 49 If DNA and RNA are the software, proteins are the hardware. Paraphrasing from a quote by Arvind Gupta: “Biology is the most powerful technology ever created. DNA is software, protein are hardware, cells are factories.” Question: In your view, what is the difference between software and hardware? 50 Why are we interested in proteins? DNA and RNA are the information storage and transfer molecules Proteins are the information! Structural proteins Signalling proteins Receptors, transcription factors, neurotransmitters Enzymes Antibodies Growth factors, cytokines, hormones https://www.youtube.com/watch?v=y-uuk4Pr2i8 kinesin moving vesicle to the cell membrane When you consider that the DNA and RNA in your cells are the information storage and transfer molecules, it is clear that proteins ARE the intended information! Yes, proteins can be seen as the communication output of the entire genomic system. DNA and RNA are the molecules with the information stored in their structures, but proteins are the molecules that make use of the information and do most of the dirty work. They are what the information represents. Proteins perform all the tasks that you would associate with normal cellular functions and intercellular communication. This cool animation shows a kinesin (a motor protein) moving a vesicle towards the cell membrane. https://www.youtube.com/watch?v=y-uuk4Pr2i8 51 Protein structure Biopolymers of amino acids joined by peptide bonds H O H N C C Amino acids are molecules containing: H an amine group O a carboxylic acid group, and R H a side-chain that varies (R) Polypeptide or protein? Peptide chains of 3 or more AAs are called polypeptides Peptide chains that are ‘processed’ are called proteins As you know, a protein is a polymer made up of a series of monomers called amino acids, compounds that have a very similar basic structure to one another (molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids). Polypeptides are chains of amino acids (3 or more amino acids), but once they are processed, they often take characteristic 3-D structures and, after processing, are called proteins. The peptide bond is critical, so it is worth looking at what chemically happens to make this bond. Don’t worry, chemistry is not my thing… I won’t go into detail! 52 Protein Topology 1° = Primary sequence 2° = Secondary folding and coiling 3° = Tertiary interactions between side chains 4° = Quaternary interactions between polypeptides Just like DNA and RNA, the shape of the protein is also a very important way of providing more information than just the simple amino acid sequence, which is called the primary structure. Primary = the sequence (ends, length) Secondary = folding and coiling (alpha helix and beta pleated sheets are examples). Tertiary = interactions between the amino acid side chains on the same molecule. Quaternary = interactions between multiple polypeptides. FYI: All proteins bind in some way to other molecules. 56 Primary structure N- terminus (amide) Targeting Target protein to organelles First AA determines half-life Survival Post-translationally modified for further processing C- terminus (carboxyl) Retention “KDEL” most common sequence for ER retention Post-translationally modified for further Sorting processing As you know, each amino acid has a carboxyl group and an amine group. When the amino acids link to one another to form a chain by that dehydration reaction between the amine group of one and the carboxyl group of the next (i.e. losing the –OH group and adding a proton to make water). N-terminal signals The identity of the N terminal amino acid determines how long the peptide can last before it degrades (this is called the N-end rule). The specific amino acid on the end of the sequence can provide a peptide half-life from 100+ hours (valine) to under 1 hour (glutamine). C-terminal signals While the N-terminus of a protein often acts as a targeting signal, the C-terminus can contain retention signals for later protein sorting. The most common ER retention signal is the amino acid sequence “KDEL” (K = Lysine, D = Aspartic acid. E = Glutamate, and L = Leucine) at the C-terminus, which keeps the protein in the endoplasmic reticulum and prevents it from entering a secretory pathway. The C-terminus of proteins can also be modified post-translationally, most commonly by the addition of a lipid anchor to the C- terminus that allows the protein to be inserted into a membrane without having a transmembrane domain. Bottom line – the ends of a peptide structure are quite important. 57 Secondary structure Purely defined by AA sequence Generally formed by hydrogen bonds Temporary/constantly changing Huge number of possible configurations Most common are α-helices and β-sheets Polar AA side chains Go to outside of the structure to interact with water http://www0.cs.ucl.ac.uk α-helix β-sheet http://www.chemguide.co.uk http://en.citizendium.org It turns out that the secondary structure is, in fact, related to the AA sequence only. This feature of proteins alone provides for a huge number of possible configurations. What I mean here is that, given a set length of chain, the polypeptide can bend essentially at every position. You can imagine how many different structural shapes you can make with that same chain. The polar amino acid side chains tend to gather on the outside of the protein, where they can interact with water; the nonpolar amino acid side chains are buried on the inside to form a tightly packed hydrophobic core of atoms that are hidden from water. This could be very handy, depending on the eventual function of the polypeptide. Although the two most common secondary structures are alpha helices and beta sheets, many others are possible. Beta sheets can wrap around each other to become beta sheet tubes. Other options are imaginatively named beta barrels, beta propellers, alpha/beta horseshoes, and jelly-roll folds. Bottom line – the overall shape of a peptide structure helps to determine potential interactions with the environment. 58 If you were able to observe 100 billion combinations each second, it would take more than 100 billion years to see all of the combinations. To give some perspective on how many different shapes a protein can have based on secondary structure alone, if a very small polypeptide of 100 AA’s in length has bonds that can only move into 2 different conformations (this is very unlikely – multiple configurations are normal), it could adopt over 1030 different 3- dimensional shapes. For even more perspective: if you were able to observe 100 billion combinations each second, it would take more than 100 billion years to see all of the combinations! Remember – this is a very small chain of 100 amino acids long, and this ONLY considers secondary structure. BTW, an average length of a peptide chain in eukaryotes is approximately 472 AAs, so this is a gross simplification of the expansion of possible shapes. Tiessen A, Pérez-Rodríguez P, Delaye-Arredondo LJ. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Research Notes 2012;5:85. 59 Tertiary structure 3-D structure of the protein More permanent than 2° structure Created by interactions between the side chains A combination of 2° structures Mostly determined by 1° structure Some effect of environment Far more possible 3° structures than Histamine N-methyltransferase http://proteopedia.org/wiki/index.php/1jqd for 2° structures! Tertiary structure is the normal 3-dimensional shape of the protein. The difference between secondary and tertiary is that the tertiary structure is a combination of secondary structures. Tertiary structure is created by the chemical interactions and binding between side chains on the amino acids. The shape is mostly determined by primary structure – some predictability based on sequence, but the environment helps to define the final shape. There are more possible tertiary structures for proteins of a given size than there are secondary structures. The complexity is endless. Bottom line – the very specific shape of a peptide structure provides huge variability and specificity for interactions with the environment. 60 Quaternary structure Created by interactions between more than one polypeptide chain Binding site on one protein recognizes another to create a ‘complex’ Dimers, trimers, tetramers,… Heterodimers, heterotrimers, heterotetramers,… More than 20, it is called a “#-mer” 2-8 chains is typical, but some viruses have shells made of multiples of 60 units Potassium ion channel (Streptomyces lividans). ∞ possible interactions! Quaternary structure is the interaction between more than one polypeptide chain. A protein can contain binding sites for a variety of molecules. If a binding site recognizes the surface of a second protein, the tight binding of two folded polypeptide chains at this site creates a larger protein molecule with a specific shape - forming a symmetric complex of two protein subunits (a dimer) held together by interactions between two identical binding sites. If 2 identical peptide chains bind together, it is called a dimer (if they are two different chains, this is a heterodimer). Once you have an interaction between 21 or more polypeptides, you name it as a #-mer (e.g. a 52-mer protein). Most protein quaternary structures are up to 8 units, but some viruses can have protein shells (capsids) in multiples of 60 units. Bottom line – all bets are off with regards to the number of possible interactions when peptide chains interact with each other. Image: Quaternary structure of Streptomyces lividans tetrameric potassium ion channel protein. https://www.chromacademy.com/lms/sco879/05-protein-tertiary-structure-disulphide- bridges.html?fChannel=27&fCourse=114&fSco=879&fPath=sco879/05-protein-tertiary-structure-disulphide-bridges.html 61 Shape = Information https://agclassroom.org/matrix/lesson/575/ So clearly there are infinite possible shapes with proteins. So what? Since this course focuses on cellular communication… when talking proteins, shape equals information. With unlimited potential structures, there is a lot you can do with proteins. Now that you know how proteins can make different shapes and that different shapes can do different things, it hopefully makes more sense why you would want to look at all levels of organization of these molecular structures. And yes, this is exactly the reason why proteins are so important in cell signalling (and in life) – it is because they have so many shapes, and each and every one of them potentially could have a different function. 62 Ribosomes To generate a peptide chain from an mRNA transcript Ribosome location is partly determined by protein to be generated, partly by targeting signals Membrane bound ribosomes on rough ER Proteins targeted for release or within the membrane https://organelleswithsydney.weebly.com/smooth-endoplasmic-reticulum.html Free ribosomes in cytoplasm Proteins often used only within the cytoplasm There are 3 tRNA binding sites on ribosomes A = aminoacyl (add amino acid) P = peptidyl (peptide bond) E = exit (move on to next) You already know that, to make a protein, a ribosome (shown here) attaches to an mRNA and zips along its length, grabbing onto tRNAs and their attached amino acids, linking everything together into what eventually becomes a protein. The ribosome has three tRNA- binding sites, designated A (aminoacyl), P (peptidyl), and E (exit). Ribosomes in general can come in two forms – free and membrane-bound. These ribosomes differ only in their location - they are identical in structure. Whether the ribosome exists in a free or membrane-bound state depends on the presence of an ER- targeting signal sequence on the protein being synthesized, so an individual ribosome might be membrane-bound when it is making one protein, but free in the cytoplasm when it makes another protein. Proteins that are formed from free ribosomes are released into the cytoplasm and used within the cell. Bound ribosomes usually produce proteins that are used within the plasma membrane or are expelled from the cell via exocytosis. Structure of the human mitochondrial ribosome (class 1) https://3d.nih.gov/entries/3DPX-003894 3j9m-surf-bychain-print. Molecular surface representation with each biopolymer chain a different colour. 63 Translation alphabet Twenty-two amino acids are incorporated into polypeptides and are called proteinogenic or natural amino acids. More information dense than DNA or RNA 1. Alanine A 13. Methionine M 2. Arginine R 14. Phenylalanine F 3. Asparagine N 15. Proline P 4. Aspartic acid D 16. Pyrrolysine O 5. Cysteine C 17. Selenocysteine U only made when selenium is present 6. Glutamic acid E 18. Serine S (UGA is usually a stop codon) 7. Glutamine Q 19. Threonine T 8. Glycine G 20. Tryptophan W 9. Histidine H 21. Tyrosine Y 10. Isoleucine I 22. Valine W 11. Leucine L 12. Lysine K Analogous to the 4-letter alphabet used in DNA and RNA, proteins have an alphabet as well. This time it is 22 letters long. With 22 letters, you can encode a lot more information into a short “word”. This makes proteins a more information dense medium. The amino acids making up the letters here are just the proteinogenic or natural amino acids (AAs). These 22 amino acids are naturally incorporated into polypeptides. Of these, only 21 are actually found in eukaryotic cells (not pyrrolysine). Essential amino acids (which can’t be synthesized by eukaryote cells) are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine (i.e. H, I, L, K, M, F, T, W, V). Non-proteinogenic amino acids refer to those AAs that are not used for making proteins (e.g. GABA or L-DOPA) or AAs that are produced indirectly (like hydroxyproline). The two amino acids in purple are unusual amino acids specified by “expansions“ of the genetic code. Aside: Selenocystine is coded for by the UGA codon (which is normally a stop codon called “opal”). When selenium is present selenocysteine is incorporated, when it is not present - translation stops (since the ribosome ‘thinks’ it is a stop codon). The truncated protein in this case is typically non-functional. This arrangement makes for a good genomic-based selenium sensor (an on/off switch, controlled by selenium). Pyrrolysine is naturally occurring in some methanogenic Archaeans and is part of the methane-producing metabolic system. It is coded in a complex mechanism by UAG (another sequence normally used as a stop codon, called “amber”). 64 UAG = Pyrrolysine UGA = Selenocysteine You will likely see charts like this showing the 3-letter (codons) and single letter codes for these amino acids. You read from the centre outwards to determine the amino acid that is coded by that sequence. Remember that the tRNA will have the anticodon (complementarity ends once you make a peptide). The two special amino acids I just mentioned are actually ambiguous stop codons that only continue to make the peptide chain when the environmental conditions around the ribosome are suitable. With respect to selenocysteine, it is incorporated into peptide chains when selenium concentrations reach a certain level (and other sequences are in place in the mRNA). Yes, I do have a favourite amino acid (hey, I never said I was normal!). It happens to go by the same letter code as the beginning of my first name, and it may make you GAG. You’ll likely hear more about this amino acid later in the course. 65 Translation steps 1. Initiation - binding of ribosome and mRNA 2. Elongation - tRNAs add amino acids to a growing chain 3. Termination - ribosome reaches a stop codon 4. Processing - modification or addition of a functional group e.g. glycosylation, acylation, methylation These steps in translation will seem somewhat familiar when compared with transcription. However, the language is being transduced to amino acids and the processes are no longer based on base complementation. The functional groups added during processing are important to final function. I mention glycosylation here partly because it is hard to pronounce (sorry), but also because it is easy to picture. Essentially, it is adding a sugar molecule to the peptide/protein. An example some may have heard about is the common blood test for diabetes called HbA1C. This test simple measures the percentage of hemoglobin molecules (a protein) that are glycosylated (have a sugar attached). 67 Speeding up translation Polysomes (polyribosomes, ergosomes) Translational parallel processing! More than 1 ribosome attached to the mRNA Usually 35 nucleotides apart, due to size of ribosome Significantly increases the speed of protein production mRNA can become circular in 2° structure The 5’ cap & 3’ poly(A) tail on mRNA help with this process Eukaryotes have 4 ribosomes/turn in a left-handed helix! http://www.nobelprize.org Molecular Biology of the Cell. 4th edition. Alberts B, Johnson A, Lewis J, et al. New York: Garland Science; 2002. At only 5 AA per second, how do cell get things done with such slow translation mechanisms? One protein at a time would take forever for big proteins, right? Well, both prokaryotes and eukaryotes (as well as Archaeans) have developed some parallel processing for translation. Many ribosomes can read one mRNA at the same time, progressing along the mRNA to synthesize the same protein (sometimes in a circular fashion). Because of the relatively large size of ribosomes, they can only attach to sites on mRNA about 35 nucleotides apart. In eukaryotes, polyribosomes are described as ‘densely packed left-handed helices with four ribosomes per turn’. It is interesting to note that left-handed helices are quite uncommon (DNA coils in a right- handed way). Polysomes were originally called ergosomes. 69 https://gfycat.com/altruisticsoreiraniangroundjay This animation may make it more clear how this works. Here, the mRNA strand is in blue, the ribosomal subunits are light green, and the newly elongating protein is the green squiggle. Note that the action starts at the start codon, elongation proceeds, and the ribosomes and peptide dissociate at the stop codon. This process will occur until the mRNA itself degrades, so the number of peptides that are produced is clearly associated with the lifespan of the mRNA. https://gfycat.com/altruisticsoreiraniangroundjay 70 Ribosomal math Prokaryote 50 + 30 = 70 ribosomes Eukaryote ribosomes 60 + 40 = 80 Molecular biologists are not always very good at math… although this is actually correct! You may have seen these numbers in relation to the “size” of ribosomal subunits. Prokaryote subunits are listed as being 50 and 30 units in size, while eukaryote ribosome subunits are 60 and 40. These numbers actually don’t tell you about the size of the subunits, and oddly, the math doesn’t add up when you combine the units together. 71 Ribosome size Ribosomes are isolated using centrifugation Ultracentrifugation pioneered by Swedish chemist Theodor H.E. (“The”) Svedberg (1884-1971) Nobel prize in chemistry in 1926 for work on chemistry of colloids Ribosomes have 2 subunits Named according to the sedimentation coefficient Sedimentation rate normalized to acceleration Not exactly related to weight … Theodor H.E. ("The") Svedberg 1 svedberg = 10-13 seconds (or 100 femtoseconds) (30 August 1884 – 25 February 1971) The 50S and 30S ribosomes (prokaryotes), centrifuge out at 70S The 60S and 40S ribosomes (eukaryotes), centrifuge out at 80S Surface area is the key feature determining the different sedimentation beckmancoulter.com coefficients Ribosomes are isolated by differential ultracentrifugation and are named accordingly. The unit of measurement here is the Svedberg unit, a measure of the rate of sedimentation through a viscous medium in an ultracentrifuge (which generates up to 106 times the force of gravity). The coefficient is the ratio of the sedimentation velocity over the applied acceleration (and it is also its terminal velocity… as in the fastest something can go through a specific medium). The Svedberg is technically a measure of time and is defined as exactly 10-13 seconds (100 fs). A femtosecond is 10-15 seconds. With the centrifuge conditions standardized (distance to travel particularly), time is an appropriate measure. Prokaryotic ribosomes have 50S and 30S subunits, but when they are together, they spin out at 70S. Eukaryotic ribosomes have 60S and 40S subunits that, when combined, spin out at 80S. Ribosomes consist of two subunits that fit together and work as one to translate the mRNA into a polypeptide chain. Bigger particles tend to sediment faster and thus have higher Svedberg values, but sedimentation coefficients are not additive. Sedimentation rate does not depend only on the mass or volume of a particle, and when two particles bind together there is a loss of surface area. This accounts for why the fragment sedimentation rates do not add up (70S is made of 50S and 30S). Shape is important! ASIDE: The unit is named after the Swedish chemist Theodor H. E. Svedberg (1884-1971), winner of the Nobel Prize in chemistry in 1926. His initials were T.H.E. and he went by the shortened name “The”. Svedberg was married 4 times and had 12 children (6 boys, 6 girls). About the unit: https://en.wikipedia.org/wiki/Svedberg 72 Does size matter? (for ribosomes) So, why does the size of the ribosomes matter…? The answer is here. Right here…. 73 Taking advantage of the differences… Antibiotic compounds isolated from dirt from graveyards… Streptomyces aureofaciens (a fungal-like bacteria) Developed into aureomycin, a tetracycline Tetracyclines bind reversibly to the 30S subunit Prevents binding of aminoacyl-t-RNA to the mRNA–ribosome complex Tetracycline and derivatives used clinically since 1948 Broad spectrum – works for both Gram-positive and Gram-negative bacteria Fluorescent compounds Side effects (e.g. yellow stained teeth, tiredness) can be significant The size of ribosomes is an important feature because prokaryotes and eukaryotes have different ribosomes. This provides an opportunity to distinguish prokaryotes and eukaryotes – and potentially target them independently. “Tetracyclines are a class of related antibiotic compounds first discovered by retired American botanist Benjamin M. Duggar in the late 1940s. Duggar extracted a yellowish crystalline compound with unique antibacterial properties, called aureomycin (7- chlortetracycline), from soil found near cemeteries containing Streptomyces aureofaciens, a fungal-like bacteria of the Actinomycetales order.” From: Seidlitz E, Saikali Z, Singh G. Use of tetracyclines for bone metastases. In: Singh G, Rabbani S (eds). Bone Metastasis: Experimental and Clinical Therapeutics. Humana Press Inc.: Totowa, N.J., 2005, pp 295–305. Tetracyclines specifically and reversibly bind to the 30S ribosome. They stop the aminoacyl- t-RNA from binding to the mRNA–ribosome complex, and thus prevent bacteria from translating new proteins. This usually kills them. Question: Why would you get side effects like stained teeth and tiredness from tetracycline? FYI: Gram positive bacteria stain purple with crystal violet – they have a peptidoglycan in their bacterial wall. Gram negative bacteria don’t retain the stain well (after alcohol wash) and turn pink instead. 74 However, tetracycline-like compounds have been around a lot longer than the late 1940s! In fact, this stuff proves it. 75 Yes, beer can lead to a clinical application… Ancient mummies from the Nubian kingdom ~250 CE to 550 CE Bones had intermittent bands of fluorescence Streptomyces contamination of grains used to make bread and beer Tetracyclines fluoresce under UV light Useful clinically for bone apposition studies https://www.wired.com/2010/09/antibiotic-beer/ https://www.mdpi.com/1422-0067/20/24/6229/htm# In fact, ancient mummies from the Nubian kingdom (in present day Sudan) were found to have ingested quite a bit of tetracycline. It was likely from the beer and bread that they made with Streptomyces contaminated grains. Bone samples from the mummies showed intermittent bands of fluorescence. Tetracyclines are useful for bone apposition (bone growth studies) where two doses are given over a specific time period, then a bone sample is taken and analyzed. The distance between the two lines of tetracycline (which fluoresce) is measured as the amount of growth over the specific time between the doses. Original study Nelson ML, Dinardo A, Hochberg J, Armelagos GJ. Brief communication: Mass spectroscopic characterization of tetracycline in the skeletal remains of an ancient population from Sudanese Nubia 350–550 CE. American Journal of Physical Anthropology 2010;143:151–154. An interesting news article about the study https://www.wired.com/2010/09/antibiotic-beer/ Image of experimental mineral apposition rate measurement: Li D, Tian Y, Yin C, Huai Y, Zhao Y, Su P et al. Silencing of lncRNA AK045490 promotes osteoblast differentiation and bone formation via β-Catenin/TCF1/Runx2 signaling axis. International Journal of Molecular Sciences 2019;20:6229. FYI: BCE (before the common era) and CE (common era) are the currently preferred terms used in archeology and anthropology for time, rather than the old terms BC and AD. 76 Tetracyclines also https://www.mun.ca/biology/scarr/iGen3_06-14.html inhibit release factors RF-1 and RF-2 Tetracycline is, of course, an antibiotic. How does it stop bacteria? There are 3 sites on the bacterial ribosome that are identified here as A, P and E sites. Tetracycline binds on the (30S) small subunit at the A-site to prevent the aminoacyl tRNA from binding and starting the process. This effectively stops it from generating a peptide chain. However, bacteria have backup plans. They can sometimes mimic the structure and functions of elongation factors to competitively bump tetracycline from the A site, and they also are able to ‘terminate’ tetracyclines by enzymatically degrading it or simply pumping it out using efflux pumps. Tetracyclines have other effects that inhibit protein synthesis – they also prevent binding of the release factors RF-1 and RF-2 during termination of translation, regardless of the stop codon. 77 Protein and Translation Applications Interrupting translation (at all steps) is an effective way of stopping cell growth Antibiotics Tetracyclines, aminoglycosides, macrolides Chemotherapies Rapamycin, Everolimus https://www.rpicorp.com/ There are so many applications where targeting the protein synthesis processes turn out to be useful for human health, it would be impossible to provide a useful overview. Most applications focus on the ribosome or elongation processes. 78 Bacterial protein synthesis as a target Antibiotic Action Streptomycin & other Inhibit initiation and cause misreading of mRNA (prokaryotes) aminoglycosides Binds to the 30S subunit and inhibits binding of aminoacyl-tRNAs Tetracycline (prokaryotes) Inhibits the peptidyl transferase activity of the 50S ribosomal Chloramphenicol subunit (prokaryotes) Inhibits the peptidyl transferase activity of the 60S ribosomal Cycloheximide subunit (eukaryotes) Erythromycin Binds to the 50S subunit and inhibits translocation (prokaryotes) Causes premature chain termination by acting as an analog of Puromycin aminoacyl-tRNA (prokaryotes and eukaryotes) Biochemistry. 5th edition. Why would you want to inhibit protein Berg JM, Tymoczko JL, Stryer L. New York: W H Freeman; 2002. synthesis in eukaryotes? Copyright © 2002, W. H. Freeman and Company. http://www.ncbi.nlm.nih.gov/books/NBK22531/ Protein synthesis is a frequent target for antibiotic drugs other than those in the tetracycline class. In addition to the 30S ribosome subunit shown for tetracycline, other aspects of protein synthesis can be interrupted. Don’t worry about the details here, just look at the processes you already know about and remember that they can be disrupted. Question: Why would you want to inhibit protein synthesis in eukaryotes rather than just prokaryotes? Adapted from: Berg JM, Tymoczko JL, Stryer L. Eukaryotic Protein Synthesis Differs from Prokaryotic Protein Synthesis Primarily in Translation Initiation. Biochemistry 5th edition 2002. http://www.ncbi.nlm.nih.gov/books/NBK22531/ 79 Signalling connections Post-translational modification of signal Synthesis synthesis enzyme changes its activity Failure to shuttle from Golgi to the Release membrane

Lecture 2 Genomics - Sept 5 2024 - Abbreviated Version PDF

Document Details

Tags

Related

Summary

Full Transcript