Fundamentals in Biology 1: From Molecules to the Biochemistry of Cells: Introduction to Transcription PDF

Script to accompany: Fundamentals in Biology 1: From Molecules to the Biochemistry of Cells This script is for internal ETH use only, not to be copied, distributed or sold. It contains referenced images from external sources and copyrighted new images. Part 2: Introduction to Transcription TRANSCRIPTION Introduction Transcription is the process of creating an RNA copy of a gene or a section of DNA. This copy, called mRNA, is then used as a template for protein synthesis. This section will present the basic molecular machinery of transcription, including the RNA polymerase. It will also discuss the features of DNA that are important for transcription and how the process is divided into initiation, elongation, and termination stages. In addition, we will examine particular sequence features that play an important role in initiation or termination and then move on to discussing post-transcriptional RNA processing in prokaryotes, with providing some examples. Differences between DNA and RNA We know that the ribose sugar in RNA carries a 2’OH group that is missing in the DNA and that this leads to differences in the architecture of the double helical segments. Additionally, the composition of nucleotides is different between the two molecules. DNA contains the bases adenine (A), guanine (G), cytosine (C) and thymine (T), while RNA contains the bases adenine (A), guanine (G), cytosine (C) and uracil (U) instead of thymine. However, the difference between uracil and thymine is in the region of the nucleotide that is not involved in Watson-Crick base pairing, and therefore an A pairs with a T in the DNA, whereas it pairs with a U when DNA:RNA hybrid base pairing occurs. Pyrimidine and purine bases and possible Watson-Crick base pairs occurring in double-stranded RNA, DNA and RNA:DNA hybrid molecules. Note the structural difference between T and U, which does not affect base pairing with A. In case of DNA replication, exactly the same type of molecule is generated. However, when RNA is synthesized from a DNA template, instead of T in the DNA a U will become part of the RNA, and this is the reason why this process is called transcription instead of copying. RNA is mostly single-stranded in nature, but it can form double helical regions through self-complementarity, then adopting predominantly the A form conformation. This means that one part of the sequence is complementary to another part of the sequence, forming Watson-Crick base pairs. In addition, RNA can form complex 3D shapes, conceptually similar to how proteins can fold into secondary and tertiary structures. However, the physical basis is base pairing and stacking rather than formation of a hydrophobic core. Examples of RNA structures. Apart from forming simple A form helices in which the 2’OH group is exposed towards the minor groove, RNA can fold into intricate compacted shapes, as exemplified by this RNA hairpin loop. (Adapted from Stryer, Biochemistry, 8th Edition, Freeman 2015, additional illustrations by Marc Leibundgut) messenger RNA (mRNA) Jacob, Brenner and Crick proposed that before proteins can by synthesized, information from DNA has to be converted into an intermediate “messenger” molecule. They found evidence for this hypothesis in an experiment by Volkin and Astrachan, which showed the relationship between the sequence in the DNA and the sequence in the RNA product. In this experiment they used bacteriophage T2, a DNA virus that infects bacteria, and they could isolate an RNA fraction that had the equivalent composition of bases to the sequence of the DNA template and displayed rapid turnover. This showed that there was a one-to-one relationship between the DNA template strand and the RNA product, suggesting that an RNA intermediate rather than DNA may be used for synthesis of proteins. An unpublished sketch of the Central Dogma, drawn in 1956 by Francis Crick. (Image: Wellcome Library, London) messenger RNA (mRNA) is an intermediate between the genetic information stored in DNA and the proteins produced by this information. mRNA synthesis, or transcription, is a strict process that involves separating the double-stranded DNA and copying the genetic information into a single-stranded messenger. This process can be done with minimal opening of the DNA structure, thus protecting the original genetic information. The mRNA, being single stranded, is also more suitable for reading off information and can be used to regulate gene expression and transcription, as it can be made in one or many copies at a time, may be processed to form multiple products and form secondary structures that control its stability and readability. We will next discuss the process of transcription and the machinery involved in the process. The Process of Transcription and the Role of RNA Polymerase The process of transcription requires the DNA duplex to locally melt and form a ‘transcription bubble”, such that the genetic information can be read off, while the remaining DNA stays as a well-protected duplex. Although this process locally increases DNA unwinding, significant tension is not generated since only a short segment of DNA is unwound. Therefore, a special machinery to account for it, such as topoisomerases during DNA replication, is less critical, although on a global level the topology of the DNA still plays an important role for transcription efficiency. During the copying process the newly synthesized single-stranded messenger RNA maintains base pairing with the DNA template only for a small region of DNA. The enzyme responsible for this process is called RNA polymerase. Schematic view of an RNA polymerase bound to the locally unwound DNA, thereby forming the transcription bubble. (Adapted from Stryer, Biochemistry, 8th Edition, Freeman 2015) The transcription process can be divided into three stages: initiation, elongation and termination. Transcription is much slower than DNA replication, occurring at a rate of 20-50 nucleotides per second. RNA polymerase does not usually dissociate from the template strand of DNA until the entire region of DNA has been copied. Multiple RNA polymerases can copy the same gene simultaneously, meaning the amount of RNA transcribed from a particular sequence can be regulated depending on how often the RNA polymerase initiates transcription. RNA polymerase is composed of multiple protein subunits that form a stable core complex and weakly bound additional subunits termed transcription factors. A general transcription factor is the sigma (σ) subunit, responsible for specific recognition of the promoter, the DNA area at the beginning of a gene at which transcription initiates. When the promoter is found, the σ subunit dissociates and the core subunits form a “clamp” around the DNA, opening up the double helix and allowing for transcription. Although the error rate of transcription is higher than during DNA replication, the process is still highly accurate. The catalytically active subunits of RNA polymerase are beta and beta prime (ββ’), which are very similar in sequence, and two copies of alpha (termed αIαII, or simply α2). Additionally, an Omega (ω) subunit helps to stabilize the interaction between the β and β’ subunits. The core structure of the enzyme is similar between bacteria, archaea and eukaryotes, although the latter two have more subunits. (subunits and factors analogous to the bacterial σ factor are termed TFB in archaea and TFIIB in eukaryotes). Structural comparison of RNA polymerases from different kingdoms of life. The five α2ββ’ω core subunits are universally conserved, while archaea and eukaryotes harbor additional subunits. (From Brock, Biology of Microorganisms, 15th Edition 2019) Initiation of Transcription and the Role of Promoters The start point of transcription on the DNA is identified without separating the DNA strands by a scanning mechanism of the DNA polymerase holoenzyme that recognizes the chemical patterns of nucleotides in the promoter region, which are exposed in the major groove of the DNA sequence. The transcription process is initiated once the start point is identified. The RNA polymerase binds to the promoter region and the DNA strands begin to melt, allowing the nucleotides to be recognized and read. This is an energetically costly process, but not one that requires external chemical energy. The first nucleotide is imprecisely incorporated, but the subsequent ones are highly accurate. The opened region of DNA is called the transcription bubble, and it moves along the DNA while only one of the two DNA strands is transcribed by the polymerase. Organization of a bacterial gene with its transcriptional promoter. Under normal growth conditions, the transcription of most genes is controlled by the σ70 factor, which recognizes the “standard” promoter motifs. Switching of sigma factors allows the cell to react to different environmental conditions by expressing sets of alternative genes, including those needed for nutrition uptake, temperature changes or sporulation. Note that while the start point of RNA transcription is at position +1, the start point for translation by the ribosome would be further downstream, at an AUG start codon of the transcript. (Adapted from Griffith et al. 2004, and Stryer, Biochemistry, 8th Edition, Freeman 2015) Transcription initiation in bacteria involves the use of the sigma factor subunit, which together with the core subunits forms the bacterial RNA polymerase holoenzyme. The σ factor helps in detecting, binding and melting of the promoter areas of genes to initiate transcription. Promoters can be identified by computational alignment and analysis of their DNA sequences. They are characterized by the presence of a strong consensus sequence, which is called the TATAAT sequence. This sequence is always located at a precise distance from the start of transcription, known as the -10 region of the promoter. Bacterial promoters have three conserved elements: the -35 sequence, which is located 35 nucleotides before the transcription start, is TTGACA, the -10 sequence is TATAAT, and the third conserved element is the spacing between the two regions. Promoter recognition and transcription initiation, which involves formation of a complex between the RNA polymerase and DNA followed by establishing the transcription bubble, is the slowest stage of transcription. Overview showing the three stages of the transcription process: Initiation, elongation and termination. (Adapted from Abril et al., Applied Microbiology and Biotechnology 2020) Elongation Stage of Transcription In the elongation stage, the RNA molecule is synthesized according to the DNA template. During this stage, the chemistry of nucleotide addition is similar to that of DNA replication, with the addition of the next nucleotide depending on base complementarity between the DNA template and the newly incoming RNA nucleotide. The length of the duplex in the transcription bubble is around 17 base pairs and after that the DNA must be allowed to re-form the duplex. The newly synthesized RNA strand is then separated from the DNA strand, allowing the DNA to refold. This process continues until the entire RNA transcript is synthesized. It is important to note that the newly synthesized RNA is complementary to the template DNA strand and not the coding strand. The coding DNA strand is not the one that is being copied, but its sequence is the same as in the RNA transcript, except that thymine is present instead of uracil. The position in the DNA complementary to the first nucleotide of the newly synthesized RNA is referred to as the “plus 1” position, and is located relative to the region of the promoter sequence found at -10 to -35 positions. The 5’ end of the transcript is usually a nucleotide triphosphate, since this nucleotide did not have a ”preceding” nucleotide to react with. During the elongation stage of transcription, the active site of the RNA polymerase undergoes several conformational changes. Initially, the active site is open, with the catalytic residues in a conformation that allows for binding of the incoming nucleotide. Once the incoming nucleotide is bound according to the Watson Crick pairing rules with the template strand, the active site closes, allowing the catalytic residues, aided by magnesium ions, to catalyze the nucleophilic attack of the 3’ oxygen of the growing RNA chain on the α phosphate of the incoming nucleotide, leading to the formation of the covalent bond and release of a pyrophosphate. Following this reaction, the RNA polymerase moves along the template strand, the active site opens again, allowing the binding of the next nucleotide. In case the polymerase makes a mistake, it can correct errors by backtracking. This mechanism helps the enzyme to correct mistakes in its copying process. It occurs when the enzyme reaches a mismatched base during transcription and moves backwards to the last correct base. The last two nucleotides are then cleaved away, including the wrongly incorporated nucleotide. This allows the enzyme to re-read the template strand for any potential errors and correct them before continuing with transcription. This mechanism helps to improve the fidelity of RNA polymerase. Schematic representation of the transcription elongation mechanism. A metal ion in the active site, typically a Mg2+, is essential for catalysis. During the rection, a new phosphodiester bond is formed, and an inorganic pyrophosphate (PPi) is released. (Adapted from Stryer, Biochemistry, 8th Edition, Freeman 2015) Termination Stage of Transcription The termination stage is the final stage of transcription, in which the newly synthesized mRNA molecule dissociates from the DNA template. RNA polymerase should not keep going indefinitely because this would be very wasteful for the cell to synthesize RNA molecules that are very long. The RNA polymerase will stop after transcription of special sequences located downstream of the transcribed genes that lead to the formation of a stable RNA hairpin structure. It's a stretch of 40 nucleotides that is able to form a very stable base pairing, and this is an example of a structured region of the RNA molecule that happens through self-complementarity. This very stable hairpin loop causes the polymerase to slow down and then detach from the DNA and release the transcript. The DNA then re-forms the duplex and the transcription process is complete. Post Transcriptional Processing ‘Transcriptional unit’ is a term that refers to DNA segments that are transcribed into functional RNA molecules. This definition is broader than transcribing a protein- encoding gene into messenger RNA, because not all regions of DNA that are transcribed are protein-encoding genes. Most of them are, but some of these transcriptional units are actually encoding other functional ‘structural’ RNA molecules, for example transfer RNAs (tRNAs) or ribosomal RNAs (rRNAs), the latter being of three types called 16S, 23S, and 5S rRNAs. In bacteria, the transcriptional unit is often polycistronic, meaning that it includes multiple open reading frames that encode proteins. This is quite specific to prokaryotes, whereas in eukaryotes, polycistronic mRNAs are usually not present. In order to make the right ‘structural’ RNA molecules, different regions of DNA are transcribed based on a single promoter, and then this primary transcript has to be separated into individual functional pieces. This is done through post-transcriptional processing, which involves cleavage of the transcript into separate segments and trimming of the ends. Concept of a transcriptional unit. (Adapted from Brock, Biology of Microorganisms, 15th Edition 2019)

Fundamentals in Biology 1: From Molecules to the Biochemistry of Cells: Introduction to Transcription PDF

Document Details

Tags

Related

Summary

Full Transcript