Molecular Genetics: Transcription I PDF

The DNA in genomes does not direct protein synthesis itself, but instead uses RNA as an intermediary. When the cell needs a particular protein, the nucleotide sequence of the appropriate portion of the immensely long DNA molecule in a chromosome is first copied into RNA (a process called transcription) The flow of genetic information in cells is therefore from DNA to RNA to protein. All cells, from bacteria to humans, express their genetic information in this way—a principle so fundamental that it is termed the central dogma of molecular biology For many genes, RNA is the final product. Like proteins, some of these RNAs fold into precise three-dimensional structures that have structural and catalytic roles in the cell. Other RNAs act primarily as regulators of gene expression. But the roles of many non- coding RNAs are not yet known The genomes of most multicellular organisms are surprisingly disorderly, reflecting their chaotic evolutionary histories. The genes in these organisms largely consist of a long string of alternating short exons and long introns. Moreover, small bits of DNA sequence that code for protein are interspersed with large blocks of seemingly meaningless DNA. Some sections of the genome contain many genes and others lack genes altogether. Proteins that work closely with one another in the cell often have their genes located on different chromosomes, and adjacent genes typically encode proteins that have little to do with each other The problems that cells face in decoding genomes can be appreciated by considering a tiny portion of the human genome (Figure 6–2). The region illustrated represents less than 1/2000th of our genome and includes at least 48 genes that encode proteins and 6 genes for noncoding RNAs The known protein-coding genes (starting with Abcd1 and ending with F8) are shown in dark gray, with coding regions (exons) indicated by bars that extend above and below the central line. Noncoding RNAs with known functions are indicated by purple diamonds. Yellow triangles indicate positions within protein-coding regions where the Neanderthal genome sequences codes for a different amino acid than the human genome. The stretch of yellow triangles in the Txtl1 gene appear to have been positively selected for since the divergence of Homo sapiens from Neanderthals some 200,000 years ago. Note that most of the proteins are identical between us and our extinct relative. The blue histogram indicates the extent to which portions of the human genome are conserved with other vertebrate species. It is likely that additional genes, currently unrecognized, also lie within this portion of the human genome. Transcription and translation are the means by which cells read out, or express, the genetic instructions in their genes. Because many identical RNA copies can be made from the same gene, and each RNA molecule can direct the synthesis of many identical protein molecules, cells can synthesize a large amount of protein from a gene when necessary. But genes can be transcribed and translated with different efficiencies, allowing the cell to make vast quantities of some proteins and tiny amounts of others The enzymes that perform transcription are called RNA polymerases. The RNA polymerase (pale blue) moves stepwise along the DNA, unwinding the DNA helix at its active site indicated by the Mg2+ (red), which is required for catalysis. As it progresses, the polymerase adds nucleotides one by one to the RNA chain at the polymerization site, using an exposed DNA strand as a template. The RNA transcript is thus a complementary copy of one of the two DNA strands. A short region of DNA/RNA helix (approximately nine nucleotide pairs in length) is formed only transiently, and a “window” of DNA/RNA helix therefore moves along the DNA with the polymerase as the DNA double helix reforms behind it. The incoming nucleotides are in the form of ribonucleoside triphosphates (ATP, UTP, CTP, and GTP), and the energy stored in their phosphate– phosphate bonds provides the driving force for the polymerization reaction Although RNA polymerase catalyzes essentially the same chemical reaction as DNA polymerase, there are some important differences between the activities of the two enzymes. First, and most obviously, RNA polymerase catalyzes the linkage of ribonucleotides, not deoxyribonucleotides. Second, unlike the DNA polymerases involved in DNA replication, RNA polymerases can start an RNA chain without a primer (de novo) Third, unlike DNA polymerases, which make their products in segments that are later stitched together, RNA polymerases are absolutely processive; that is, the same RNA polymerase that begins an RNA molecule must finish it without dissociating from the DNA template. Signals Encoded in DNA Tell RNA Polymerase Where to Start and Stop To transcribe a gene accurately, RNA polymerase must recognize where on the genome to start and where to finish. The way in which RNA polymerases perform these tasks differs somewhat between bacteria and eukaryotes. The initiation of transcription is an especially important step in gene expression because it is the main point at which the cell regulates which proteins are to be produced and at what rate. A is close to the TATA box So it makes more protein B has a weaker promoter than A The bacterial RNA polymerase core enzyme is a multisubunit complex that synthesizes RNA using the DNA template as a guide. An additional subunit called sigma (σ) factor associates with the core enzyme and assists it in reading the signals in the DNA that tell it where to begin transcribing. Together, σ factor and core enzyme are known as the RNA polymerase holoenzyme This complex adheres only weakly to bacterial DNA when the two collide, and a holoenzyme typically slides rapidly along the long DNA molecule and then dissociates. However, when the polymerase holoenzyme slides into a special sequence of nucleotides indicating the starting point for RNA synthesis called a promoter, the polymerase binds tightly, because its σ factor makes specific contacts with the edges of bases exposed on the outside of the DNA double helix The tightly bound RNA polymerase holoenzyme at a promoter opens up the double helix to expose a short stretch of nucleotides on each strand The region of unpaired DNA (about 10 nucleotides) is called the transcription bubble and it is stabilized by the binding of σ factor to the unpaired bases on one of the exposed strands. The other exposed DNA strand then acts as a template for complementary base-pairing with incoming ribonucleotides. Often, there are several intends to start polymeration that do not come to fruition. This so called abortive intiation creates short RNAs that are often released, while the polymerase, which remains in place, begin synthesis over again Eventually this process of abortive initiation is overcome and the core enzyme to break free of its interactions with the promoter DNA and discard the σ factor At this point, the polymerase begins to move down the DNA, synthesizing RNA, in a stepwise fashion: the polymerase moves forward one base pair for every nucleotide added. During this process, the transcription bubble continually expands at the front of the polymerase and contracts at its rear. Chain elongation continues until the enzyme encounters a second signal, the terminator, where the polymerase halts and releases both the newly made RNA molecule and the DNA template The free polymerase core enzyme then reassociates with a free σ factor to form a holoenzyme that can begin the process of transcription again most bacterial genes, a termination signal consists of a string of A-T nucleotide pairs preceded by a twofold symmetric DNA sequence, which, when transcribed into RNA, folds into a “hairpin” structure through Watson– Crick base-pairing As the polymerase transcribes across a terminator, the formation of the hairpin helps to disengage the RNA transcript from the active site Transcription Start and Stop Signals Are Heterogeneous in Nucleotide Sequence Despite its differences one can find common features that are often summarized in the form of a consensus sequence. A consensus nucleotide sequence is derived by comparing many sequences with the same basic function and tallying up the most common nucleotides found at each position. It therefore serves as a summary or “average” of a large number of individual nucleotide sequences The DNA sequences of individual bacterial promoters differ in ways that determine their strength (the number of initiation events per unit time of the promoter). Evolutionary processes have fine-tuned each to initiate as often as necessary and have thereby created a wide spectrum of promoter strengths. Promoters for genes that code for abundant proteins are much stronger than those associated with genes that encode less abundant proteins, and the nucleotide sequences of their promoters are responsible for these differences Promoter sequences are asymmetric, ensuring that RNA polymerase can bind in only one orientation. Because the polymerase can synthesize RNA only in the 5ʹ-to-3ʹ direction, the promoter orientation specifies the strand to be used as a template. Genome sequences reveal that the DNA strand that is used as the template for RNA synthesis varies from gene to gene, depending on the orientation of the promoter Some genes are transcribed using one DNA strand as a template, while others are transcribed using the other DNA strand. The direction of transcription is determined by the promoter at the beginning of each gene (green arrowheads Transcription in Eukaryotes Differences between Transcription in eukaryotes and prokaryotes Only one RNA polymerase in Prokaryotes while three in Eukaryotes While bacterial RNA polymerase requires only a single transcription- initiation factor (σ) to begin transcription, eukaryotic RNA polymerases require many such factors, collectively called the general transcription factors. Eukaryotic transcription initiation must take place on DNA that is packaged into nucleosomes and higher-order forms of chromatin structure features that are absent from bacterial chromosome Transcription in Eukaryotes In contrast to bacteria, which contain a single type of RNA polymerase, eukary- otic nuclei have three: RNA polymerase I, RNA polymerase II, and RNA polymerase III. The three polymerases are structurally similar to one another and share some common subunits, but they transcribe different categories of genes All three RNA polymerases have similar subunits All three RNA polymerases have similar subunits CTD (C-terminal Domain) is only present in the B subunit of RNA pol II. It contains 52 repeats of the following sequence: Tyr-Ser-Pro-Thr-Ser-Pro-Ser CTD is phosphorylated during transcription Fully extended, the CTD is nearly 10 times longer than the remainder of RNA polymerase. As a flexible protein domain, it serves as a scaffold or tether, holding a variety of proteins close by so that they can rapidly act when needed. This strategy, which greatly speeds up the overall rate of a series of consecutive reactions, is one that is commonly utilized in the cell RNA Polymerase II Requires a Set of General Transcription Factors The general transcription factors: begin 1. help to position eukaryotic RNA polymerase correctly at the promoter, 2. aid in pulling apart the two strands of DNA to allow transcription to 3. and release RNA polymerase from the promoter to start elongation The proteins are “general” because they are needed at nearly all promoters used by RNA polymerase II. They consist of a set of interacting proteins denoted arbitrarily as TFIIA, TFIIB, TFIIC, TFIID, and so on (TFII standing for “transcription factor for polymerase II).” In a broad sense, the eukaryotic general transcription factors carry out functions equivalent to those of the σ factor in bacteria; indeed, portions of TFIIF have the same threedimensional structure as the equivalent portions of σ. TRANSCRIPTION INITIATION To begin transcription, RNA polymerase requires several general transcription factors. (A) The promoter contains a DNA sequence called the TATA box, which is located 25 nucleotides away from the site at which transcription is initiated. It is not the only DNA sequence that signals the start of transcription but for most polymerase II promoters it is the most important. Others include BRE, INR and DPE Consensus sequences found in the vicinity of eukaryotic RNA polymerase II For most RNA polymerase II transcription start points, only two or three of the four sequences are present. Many polymerase II promoters have a TATA box sequence, but those that do not typically have a “strong” INR sequence. Although most of the DNA sequences that influence transcription initiation are located upstream of the transcription start point, a few, such as the DPE shown in the figure, are located in the transcribed region. (B) Through its subunit TBP (TATA-binding protein), TFIID recognizes and binds the TATA box. This binding induces a large distortion in the DNA of the TATA box. This distortion is thought to serve as a physical landmark for the location of an active promoter in the midst of a very large genome, and it brings DNA sequences on both sides of the distortion closer together to allow for subsequent protein assembly steps. (C) Binding of TFIID to the promoter facilitates the recruitment of TFIIB. This general factor 1. recognizes the BRE sequence 2. accurately positions RNA polymerase at the start site of transcription (D) The rest of the general transcription factors, as well as the RNA polymerase itself, assemble at the promoter. Once all general transcription factors and RNA pol II are assembled at the promoter we call that the transcription initiation complex (E) After forming a transcription initiation complex on the promoter DNA, RNA polymerase II must gain access to the template strand at the transcription start point. TFIIH, which contains a DNA helicase as one of its subunits, makes this step possible by hydrolyzing ATP and unwinding the DNA, thereby exposing the template strand. Next, RNA polymerase II, like the bacterial polymerase, remains at the promoter synthesizing short lengths of RNA until it undergoes a series of conformational changes that allow it to move away from the promoter and enter the elongation phase of transcription. A key step in this transition is the add tion of phosphate groups to the “tail” of the RNA polymerase (known as the CTD or C-terminal domain) this is also performed by TFIIH, which contains a kinase subunit (E) cont. The polymerase can then disengage from the cluster of general transcription factors. During this process, it undergoes a series of conformational changes that tighten its interaction with DNA, and it acquires new proteins that allow it to transcribe for long distances, in some cases for many hours, without dissociating from DNA. Once the polymerase II has begun elongating the RNA transcript, most of the general transcription factors are released from the DNA so that they are available to initiate another round of transcription with a new RNA polymerase molecule. As we see later, the phosphorylation of the tail of RNA polymerase II has an additional function: it causes components of the RNA-processing load onto the polymerase and thus be positioned to modify the newly transcribed RNA as it emerges from the polymerase. As we see later, the phosphorylation of the tail of RNA polymerase II has an additional function: it causes components of the RNAprocessing load onto the polymerase and thus be positioned to modify the newly transcribed RNA as it emerges from the polymerase. Some of these processing proteins are thought to “hop” from the polymerase tail onto the nascent RNA molecule to begin processing it as it emerges from the RNA polymerase Polymerase II Also Requires Activator, Mediator, and Chromatin- Modifying Proteins DNA in eukaryotic cells is packaged into nucleosomes, which are further arranged in higher-order chromatin structures. As a result, transcription initiation in a eukaryotic cell is more complex and requires more proteins than it does on purified DNA. Gene regulatory proteins known as transcriptional activators must bind to specific sequences in DNA (called enhancers) and help to attract RNA polymerase II to the start point of transcription Eukaryotic transcription initiation in vivo requires the presence of a large protein complex known as Mediator, which allows the activator proteins to communicate properly with the polymerase II and with the general transcription factors. Transcription initiation in a eukaryotic cell typically requires the recruitment of chromatin-modifying enzymes, including chromatin remodeling complexes and histone-modifying enzymes. both types of enzymes can increase access to the DNA in chromatin, and by doing so they facilitate the assembly of the transcription initiation machinery onto DNA Polymerase II Also Requires Activator, Mediator, and Chromatin- Modifying Proteins Many proteins (well over 100 individual sub- units) must assemble at the start point of transcription to initiate transcription in a eukaryotic cell. The order of assembly of these proteins does not seem to follow a prescribed pathway; rather, the order differs from gene to gene. Indeed, some of these different protein complexes may be brought to DNA as preformed subassemblies. To begin transcribing, RNA polymerase II must be released from this large complex of proteins. TRANSCRIPTION ELONGATION Once RNA polymerase has initiated transcription, it moves jerkily, pausing at some DNA sequences and rapidly transcribing through others. Elongating RNA polymerases, both bacterial and eukaryotic, are associated with a series of elongation factors, proteins that decrease the likelihood that RNA polymerase will dissociate before it reaches the end of a gene. These factors typically associate with RNA polymerase shortly after initiation and help the polymerase move through the wide variety of different DNA sequences that are found in genes. Eukaryotic RNA polymerases must also contend with chromatin structure as they move along a DNA template, and they are typically aided by ATPdependent chromatin remodeling complexes that either move with the polymerase or may simply seek out and rescue the occasional stalled polymerase. In addition, histone chaperones help by partially disassembling nucleosomes in front of a moving RNA polymerase and assembling them behind. Transcription Elongation in Eukaryotes Is Tightly Coupled to RNA Processing Transcription is only the first of several steps needed to produce a mature mRNA molecule. Other critical steps are the covalent modification of the ends of the RNA: 5’cap 3’polyA tail RNA splicing mRNA differences in Prokaryotes and Eukaryotes The 5ʹ and 3ʹ ends of a bacterial mRNA are the unmodified ends of the chain synthesized by the RNA polymerase with no splicing occurring Eukaryotic mRNA are formed by adding a 5ʹ cap and by cleavage of the premRNA transcript near the 3ʹ end and the addition of a poly-A tail, respectively. Bacterial mRNAs can contain the instructions for several different proteins, whereas eukaryotic mRNAs nearly always contain the information for only a single protein Eukaryotic RNA Pol II CTD acts as a recruitment spot for mRNA processing factors Eukaryotic RNA polymerase II as an “RNA factory.” As the polymerase transcribes DNA into RNA, it carries RNA-processing proteins on its tail that are transferred to the nascent RNA at the appropriate time. Once the CTD is phosphorylated and transcription starts, it recruits 5’cappig enzymes This strategy ensures that the RNA molecule is efficiently capped as soon as its 5ʹ end emerges from the RNA polymerase. It also ensures that a way for capping enzyme to recognize what is an mRNA from other types of RNA Genes that are transcribed by RNA Pol I and RNA pol III do not have CTD and therefore do not recruit capping enzymes nor they cap their 5’end Eukaryotic RNA Pol II CTD acts as a recruitment spot for mRNA processing factors As the polymerase continues transcribing it attracts splicing and 3ʹ-end processing proteins to the moving polymerase, positioning them to act on the newly synthesized RNA as it emerges from the RNA polymerase. There are many RNA-processing enzymes, and not all travel with the polymerase. For RNA splicing, for example, the tail carries only a few critical components; once transferred to an RNA molecule, they serve as a nucleation site for the remaining components. When RNA polymerase II finishes transcribing a gene, it is released from DNA, soluble phosphatases remove the phosphates on its tail, and it can reinitiate transcription RNA Capping is the first modification of Eukaryortic pre-mRNA As soon as RNA polymerase II has produced about 25 nucleotides of RNA, the 5ʹ end of the new RNA molecule is modified by addition of a cap that consists of a modified guanine nucleotide Three enzymes, acting in succession, perform the capping reaction: 1. a phosphatase removes a phosphate from the 5ʹ end of the nascent RNA 2. a guanyl transferase adds a GMP in a reverse linkage (5ʹ to 5ʹ instead of 5ʹ to 3ʹ), 3.and a methyl transferase adds a methyl group to the guanosine Functions of the 5’ cap The 5ʹ-methyl cap signifies the 5ʹ end of eukaryotic mRNAs, and this landmark helps the cell to distinguish mRNAs from the other types of RNA molecules present in the cell. For example, RNA polymerases I and III produce uncapped RNAs during transcription, in part because these polymerases lack a CTD. The 5’ cap protects mRNA from exonucleases In the nucleus, the cap binds a protein complex called CBC (cap-binding complex), which helps a future mRNA be further processed and exported. The 5ʹ-methyl cap also has an important role in the translation of mRNAs in the cytosol. SPLICING Both intron and exon sequences are transcribed into RNA. The intron sequences are removed from the newly synthesized RNA through the process of RNA splicing. The vast majority of RNA splicing that takes place in cells functions in the production of mRNA, and our discussion of splicing focuses on this so-called precursor-mRNA (or pre-mRNA) splicing. Only after 5ʹ- and 3ʹ-end processing and splicing have taken place is such RNA termed mRNA. SPLICING – the ‘What’ Each splicing event removes one intron, proceeding through two sequential phosphoryltransfer reactions known as transesterifications; these two transesterifications join two exons together while removing the intron between them as a “lariat” Nucleophile attack The machinery that catalyzes pre-mRNA splicing is complex, consisting of five additional RNA molecules and several hundred proteins, and it hydrolyzes many ATP molecules per splicing event. This complexity ensures that splicing is accurate, while at the same time being flexible enough to deal with the enormous variety of introns found in a typical eukaryotic cell. lariant Splicing proceeds via two sequential transesterification reactions First transesterification: The hydroxyl group of the Adenine in the branch point attacks the phosphate group of the last nucleotide of exon1, forming a Lariat. Second transesterification: The hydroxyl group of the last nucleotide of exon1 attacks the phosphate group of the first nucleotide of exon 2 SPLICING – the ‘Why’ It may seem wasteful to remove large numbers of introns by RNA splicing. In attempting to explain why it occurs, scientists have pointed out that the exon– intron arrangement would seem to facilitate the emergence of new and useful proteins over evolutionary time scales. Thus, the presence of numerous introns in DNA allows genetic recombination to readily combine the exons of different genes, enabling genes for new proteins to evolve more easily by the combination of parts of preexisting genes. The observation that many proteins in present-day cells resemble patchworks composed from a common set of protein domains, supports this idea SPLICING – the ‘Why’ Alternative splicing The transcripts of many eukaryotic genes (estimated at 95% of genes in humans) are spliced in more than one way, thereby allowing the same gene to produce a corresponding set of different proteins. Rather than being the wasteful process it may have seemed at first sight, RNA splicing enables eukaryotes to increase the coding potential of their genomes. SPLICING – the ‘Players’ Introns: Much larger than exons Key elements: GU sequence at 5’splice site (beginning) AG sequence at the 3’ splice site (end) Branch point (Adenine), at around 20-50b from 3’ splice site Pyrimidine rich region (10-12b) between branch point and 3’ end A small fraction of pre-mRNAs (

Molecular Genetics: Transcription I PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue