Gene Expression SF Lectures Prof Martin 2021 PDF

Summary

This document contains lecture notes on the regulation of gene expression and protein translation. It discusses how genes convert information from DNA into RNA and proteins, influencing cell morphology, function, and behavior. It also discusses different cell types and genetic variants.

Full Transcript

Regulation of gene expression & protein translation Prof. Seamus J. Martin Dept. of Genetics Senior Fresh, Trinity College Dublin See Griffiths et al., Introduction to Genetics Analysis. 11th Edition (Chapt. 8, 11, 12, 13) Lecture 1 Gene expression is required to conv...

Regulation of gene expression & protein translation Prof. Seamus J. Martin Dept. of Genetics Senior Fresh, Trinity College Dublin See Griffiths et al., Introduction to Genetics Analysis. 11th Edition (Chapt. 8, 11, 12, 13) Lecture 1 Gene expression is required to convert the information that resides within our DNA into gene products (RNA and proteins) and these gene expression products then influence the phenotype (i.e. morphology, function and behaviour) of the cell. Using the same common set of genes (i.e. the genome), cells differentially express these genes by turning on, or off, specific subsets of genes at different times and places to achieve a huge diversity of cell types or cellular activation states (e.g. dividing, non- dividing, differentiating, activated, secreting). Thus, although they have exactly the same genomes, a human liver cell (a hepatocyte) will express a very different complement of genes than a T-cell, or a neuron, or a muscle cell. The ability to regulate gene expression, through molecules that bind to the regions that flank the coding sequences of genes is critical to achieving different outputs/phenotypes from the same genome. What exactly is a gene? From classical genetics and studies of heredity, a gene is: A genetic variant affecting a trait, where a genetic mutation (or variant of the same gene, an allele) has an effect on some phenotype (e.g. hair color, eye color, nutrient utilization, etc). Mutations can be in either the protein-coding part or regulatory part of the gene. A more precise molecular definition is that a gene is: A transcriptional unit encoding an RNA or protein, where the gene product has some biochemical or cellular function. As depicted to the right, a gene is the stretch of DNA that encompasses the coding sequence (i.e. the sequence that is actually transcribed into RNA) AND the accompanying DNA regulatory Martin/Snr Fresh/Page1 elements (i.e. the promoter) that regulate the expression of the gene in a positive or negative manner. Thus, variants of a gene (which may be caused by mutation or natural sequence variation between individuals) can affect either the coding sequence or the regulatory sequence of the gene. Changes to the gene regulatory sequence can affect when and where and how much of a gene is expressed and this, in turn, will have an impact on the phenotype (i.e. how the cell looks and behaves) of the cell expressing this gene. Changes to the coding sequence of the gene can subtly or radically change the protein product (or RNA product) encoded by the gene and will also have an impact on the cellular phenotype. So, the key point here is that a gene comprises the coding AND accompanying regulatory sequence that regulates expression of the gene. Gene products can function either as RNA molecules or proteins All genes encode RNA molecules, some of which (i.e. messenger RNAs, mRNAs) are translated into proteins. There is no doubt that the protein-coding fraction of our genomes is a major part of what makes cells different from each other and is also largely responsible for the different activity states that cells can achieve. However, it is very important to remember that many genes encode RNAs that do not become translated into protein and these RNAs possess enzymatic or regulatory function as RNA molecules. The key non-translated RNAs are ribosomal RNA (rRNA), transfer RNA (tRNA) and regulatory RNAs such as snRNA (small nuclear RNA), microRNA (miRNA) and long non-coding RNA (lncRNA) that influence the expression and translation of the protein-coding genes into proteins. Thus, the major gene expression products are: Proteins, encoded by mRNAs (mRNAs) Structural RNAs, ribosomal RNAs (rRNA), transfer RNAs (tRNAs) Regulatory RNAs, small nuclear RNAs (snRNA), microRNAs (miRNA), and long non-coding RNAs (lncRNA). Indeed, ribosomal and transfer RNAs comprise approximately 80-90% of the total RNA within cells, with mRNA comprising only 2-3% of the Martin/Snr Fresh/Page2 total RNA fraction, with the remainder made up with various regulatory RNAs that influence stability, translation and modification of mRNA. Prokaryotic genomes are relatively compact and gene dense but the genomes of higher organisms contain a huge proportion of non-coding sequence that has gene regulatory function In higher eukaryotic cells (such as mammals), only a tiny fraction of the DNA is actually transcribed (2%) and translated into protein. Of the approximately 3 million genes that the human genome could potentially contain (if all of the DNA coded for proteins and the average gene was 1000 base pairs long), it turns out that there are only 25,000 or so protein-coding genes. The rest of the genome consists of non-coding sequence that regulates the expression of the coding fraction of the genome. Thus, the regulation of gene expression is a key source of complexity in higher organisms and is more complex and sophisticated than the gene expression controls that exist in prokaryotes. Higher eukaryotes don't have considerably more protein-coding genes but can get more from their genes through alternative splicing of mRNAs Bacteria (prokaryotes) have approximately 4,500 genes, yeasts (lower eukaryotes) approximately 6,500, fruit flys about 14,000 genes and mammals only 25,000 or so. Thus, although we don't have considerably more genes than the much simpler fruit fly, how we regulate the expression of these genes is under more sophisticated control. In addition to greater levels of control over the expression of their genes, as we shall see, higher eukaryotes also extensively splice the mRNAs of their protein coding genes to make different variations of the same protein, or proteins with very distinct function, rather like using the same set of building blocks to make different creations by using combinations of different blocks within the set. Martin/Snr Fresh/Page3 Fully differentiated cells are still capable of expressing all of the genes required to build an organism As cells differentiate from a fertilized embryo to a fully differentiated state, they lose expression of some genes and acquire expression of others that are particular to specific cell types. However, even upon reaching the fully differentiated state, most cells are still capable of reconstituting the whole organism under the right circumstances. There are some exceptions to this where cells lose their nuclei as they differentiate (e.g. erythrocytes) or have their genome irreversibly altered in some way (e.g. senescent cells that have lost their chromosome ends, called telomeres), such as skin cells or neurons. Experiments carried out by Ian Wilmut at the Roslin Institute in Scotland in the early 1990s showed that transfer of a nucleus from a fully differentiated somatic cell (taken from the udder of an adult sheep) into a sheep egg that had been denucleated (had its nucleus removed), could produce a fully viable lamb that was a clone (called Dolly) of the donor sheep, when implanted into a surrogate mother. Although this somatic nuclear transfer procedure has a high rate of failure (for reasons that we won't go into here), this experiment proved that even fully differentiated cells still contain all of the necessary information required to build a whole organism from scratch. However, differentiated cell types normally only express a subset of the genes they are capable of expressing because environmental and developmental cues switch on, or off, the expression of specific genes that distinguish one cell type from another, through activating specific transcription factors. Thus, within a given organism, all cells have the same genes but not all genes are “expressed” in every cell. Martin/Snr Fresh/Page4 Differential gene expression is achieved through environmental or developmental cues that activate expression of genes in a cell- type specific manner. As outlined above, gene expression is regulated by a variety of mechanisms that permit cells to express only subsets of their genes, at specific times (i.e. in response to developmental or environmental cues), or specific places (cells in different parts of an organism 'know' where they are and express the appropriate cohort of genes for that particular location. Differential gene expression is controlled through activating signaling pathways that activate the correct transcription factors (TFs), leading to further cascades of gene expression as many TFs also induce the expression of other TFs. Transcription factors play a key role in enabling differential gene expression There is a vast array of transcription factors that bind to DNA in a sequence-specific manner and permit the correct genes to be expressed in the correct cells. Switching on the correct transcription factors is typically achieved through external signals (either from the environment or other cells), that activate the specific transcription factors that enable the expression of specific subsets of genes that are required for a particular process (e.g. cell division, cell differentiation) or cell function (secretion of insulin, secretion of antibodies, phagocytosis). Thus, transcription factors play a critical role in the activation of gene expression in a cell-type specific manner. Example I: Initiation of developmental gene expression in the Drosophila embryo through spatial signaling from the mother A very well worked out example of how differential gene expression is initiated early in embryogenesis comes from development of the fruit fly Drosophila melanogaster. During early development of a fly embryo, spatial signals are provided to the developing embryo by cells of the mother (called nurse cells), that effectively instruct cells within the Martin/Snr Fresh/Page5 embryo to express genes involved in making the different parts of the fly along the anterior to posterior (i.e. head to tail) axis, as well as instructing the embryo what is the dorsal (back) or ventral surface. This is achieved through the nurse cells transferring mRNAs largely encoding 2 proteins, Bicoid and Nanos, into the anterior end of the embryo. Bicoid is a transcription factor that, upon introduction into the anterior part of the developing embryo, then forms a gradient running from head to tail of the developing embryo and this transcription factor gradient then induces the expression of a variety of other transcription factors (Hunchback, kruppel, Knirps, Giant), depending on the concentration of Bicoid protein present in the cells throughout the embryo. Thus, the spatial signal of Bicoid mRNA deposition at the anterior pole of the embryo, upon translation into Bicoid protein, is then rapidly translated into the differential expression of several additional transcription factors that Bicoid directs the expression of (Hunchback, Kruppel, Giant, Knirps) running from the Anterior (left on the figure above) to the posterior (right on the figure above) pole of the embryo. The different combinations of Bicoid, Hunchback, Kruppel, Giant, and Knirps running from head to tail of the embryo then switches on a battery of additional developmental genes that 'tells' each cell within the embryo where they are and what fly parts they should specify by turning on additional genes, called Hox genes, that in turn switch on the correct genes to make a leg, a wing, an antenna, a sex bristle, etc. Martin/Snr Fresh/Page6 Example II: Activation of inflammatory genes within the immune system due to detection of pathogen components by Toll-like receptors. Many immune cells express receptors for conserved components of infectious agents, called pathogen-associated molecular patterns (PAMPs). PAMP receptors enable cells of the immune system to detect the presence of pathogens in the body. There are a number of different flavors of PAMP receptors and one well known class of these receptors is called Toll-like receptors. Upon binding of a PAMP to one of the Toll-like receptors capable of detecting these foreign substances, the receptor becomes activated and this results in activation of a transcription factor called NFkappaB that switches on a diverse array of genes (numbering in the hundreds) that are involved in coordinating the process of inflammation, which helps the body fight infection. Thus, the external cue (the PAMP), promotes expression of a new set of genes on demand, by activating the NFkappaB transcription factor, that directs expression of a battery of immune defence genes that are required to fight the infection. There are many PAMP receptors in higher eukaryotes, called Pattern recognition receptors, and they all function similarly, turning on a battery of genes that are involved in fighting infection. Martin/Snr Fresh/Page7 Lecture 2 Differential gene expression patterns produce distinct cell types Gene expression in eukaryotic cells is under complex control. This is absolutely essential because multi-cellular eukaryotes contain many different cell types and what makes one cell type different from another is the specific cohort of genes that each cell type expresses. Thus, even between the cells of the same tissue there will be considerable variation in gene expression patterns between different types of cells. There will also be considerable variation in gene expression patterns between cells of the same type during different cellular activities (e.g. during cell division, in response to stimulation with glucose, in response to starvation, or stress, etc.). Thus, the ability to regulate gene expression is fundamental to what an organism can do and whether it can adapt to changing circumstances. Different cell types arise early in embryonic development, where developmental signals (we saw an example of this earlier when we discussed development in the Drosophila embryo triggered by a gradient of Bicaudal that is initiated by a signal from the mother) initiate different patterns of gene expression that result in a diversity of cell types. Cell differentiation occurs during multiple stages of development and still occurs in the adult. The cells in the early embryo (the blastomere) begin as groups of cells that are totipotent (i.e. can differentiate into any cell type) and in response to signals, then further differentiate into pluripotent, multipotent and finally fully differentiated cells. Martin/Snr Fresh/Page8 Housekeeping versus tissue-specific gene expression Some genes are expressed in almost all cells of a given organism and others are expressed in a highly cell-type specific manner. The subset of genes that are expressed in practically all cells within a given organism are called 'housekeeping' genes and include the RNA and proteins that are required to build common cellular structures (cytoskeleton, plasma membrane, microtubule organizing center etc.) and organelles (e.g. nucleus, mitochondrial, golgi, endoplasmic reticulum, lysosomes) as well as carry out the basic functions of the cell (metabolism and survival). Housekeeping genes: Structural proteins (actin, tubulin, lamin A and lamin B), Enzymes for cellular metabolism, Proteins and other enzymes for organelle synthesis and maintenance However, in addition to housekeeping genes, each cell type also expresses a distinct subset of genes that are relatively unique to that cell type and these are called tissue-specific genes: Tissue-specific genes: Expressed in specific cell types (e.g., haemoglobin, insulin, liver enzymes, neurotransmitter receptors, etc.) Thus each cell type has a characteristic gene expression “profile” of different subsets of the genes in their genome. Because all (or almost all) somatic cells within an individual multicellular organism contain the same set of genes, it follows that there must be mechanisms in place to ensure that the set of proteins that are required for the proper functioning of a given cell type are expressed only where and when required. Without gene expression controls, complex organisms with differentiated cell types would not be possible. In mammalian cells only 2% or so of the DNA is copied into functional RNA. Martin/Snr Fresh/Page9 Different patterns of gene expression are achieved by a variety of mechanisms Different patterns of gene expression between different cell types, or even in a specific cell type over time, are achieved at several levels: 1. Transcriptional (i.e. factors that affect RNA transcription) 2. Post-transcriptional (factors that affect stability or composition of RNAs) 3. Translational (factors that affect translation of mRNAs into protein) 4. Post-translational (factors that affect protein function or activity state) Here we will focus primarily on the transcriptional and post-transcriptional mechanisms for regulating gene expression. RNA Polymerase regulates RNA synthesis in tandem with transcriptional activators and repressors At its most basic, gene expression is regulated through binding of an RNA polymerase to the regulatory elements (called a promoter region) upstream of a gene. There are specific DNA sequences found close to the transcription start sites of most genes that permit the RNA polymerase to bind the promoter and activate transcription. In prokaryotes, one main RNA polymerase is utilized, whereas in eukaryotes there are 3 RNA polymerase complexes. E. coli RNA polymerase is comprised of 6 subunits, 2α subunits, 1β, 1β', 1ω and 1σ factor, as depicted. The sigma subunit recognizes the specific sequences that are common in bacterial promoters and help to position the RNA polymerase at the right place upstream of the transcriptional start site of the gene. Most bacterial promoters have the same basic organization with common sequence elements at -10 bp (the TATA box, also called the Pribnow box) and -35bp (TTGACAT) upstream of the transcription start site of the gene. These sequences are recognized by the RNA polymerase, Martin/Snr Fresh/Page10 specifically the σ (sigma) subunit, which positions the polymerase at the correct position to initiate transcription and also separates (melts) the strands of the double helix apart to enable transcription to begin. Upon initiation of transcription, the σ (sigma) subunit dissociates from the promoter and the RNA polymerase proceeds along the DNA molecule, unwinding the strands apart and copying the DNA into RNA (called the elongation phase). Finally, RNA polymerase recognizes specific sequences within DNA that act to terminate RNA synthesis, called chain termination. These sequences typically cause hairpins to form in the growing RNA molecule that causes it to fall off the DNA template (intrinsic termination signals). Alternatively, a protein called Rho, can recognize certain sequence elements that are also termination signals and upon binding to these elements, can then bind to RNA polmerase causing it to fall off the DNA and release the RNA (Rho-dependent termination). Transcriptional repressors, work by sitting on or close to the promoter regions of the DNA, preventing RNA polymerase or initiator/activator proteins from starting transcription initiation (thereby "turning-off" the gene). In prokaryotes, transcriptional control is relatively straightforward and gene promoters are constitutively on unless the binding or activity of RNA polymerase is blocked or impeded through binding of a transcriptional repressor to the promoter. We will discuss a historically important bacterial repressor called the Lac repressor that is involved in repressing (i.e. switching off) the genes that are involved in lactose utilization in bacteria. Bacterial genes are often co-transcribed within 'Operons' under the control of the same promoter and repressor Another feature about bacterial genes is that they are often organized into groups of related genes that are co-transcribed under the control of the same Martin/Snr Fresh/Page11 promoter and repressor. This type of clustered gene arrangement is called an Operon and many bacterial genes are organized into these functionally-related groups. This is because bacteria live in harsh and ever changing environments where food sources may vary rapidly, requiring the bacterium to be able to rapidly express new sets of genes that are required to deal with varying food sources. It makes sense that all genes that are required for related processes (such as lactose or glucose uptake and utilization) should be capable of being turned on or off all at once. The Lac Operon is controlled by the Lac repressor protein and is historically important because it gave important clues concerning the principles of how gene expression is generally regulated. Jacob and Monod received the Nobel Prize in Physiology or Medicine in 1965 for their work on unravelling how the Lac Operon is organized and regulated. The Lac Operon regulates the expression of genes involved in lactose utilization in bacteria A classic example from bacteria was discovered by Jacob and Monod (for which they later won the Nobel prize) while studying the genes involved in enabling bacteria to use lactose instead of glucose as a food source. They discovered that the enzyme β-galactosidase, which bacteria normally don't synthesize when grown on glucose is induced 10,000- fold over normal expression levels when bacteria were grown on lactose. Martin/Snr Fresh/Page12 Intrigued by this, they set out to identify the reason by performing a mutagenesis screen in bacteria. They found that growth on lactose depends on three enzymes: β-galactosidase (lacZ) Permease (lacA) Thiogalactoside transacetylase (lacY) Each are encoded by a different gene, but: (i) The 3 genes were tightly linked on the chromosome (ii) All were induced coordinately (iii) The ratio of Lac Z:A:Y proteins remained constant They found that all 3 genes were under a common control system (which they called an “operon”). So, how does the lac Operon work? lacI encodes a repressor (LacI) that switches off the whole operon in the absence of lactose (by binding to the Operator for the Lac Operon), but when lactose is present, this binds to the LacI repressor protein and displaces it from the promoter, permitting the Lac Operon to switch on and express the genes (LacZ, LacA, LacY) required for lactose utilization. Martin/Snr Fresh/Page13 Lecture 3 Prokaryotic RNAs are co-translated into protein whereas eukaryotic RNAs are further modified and transported from the nucleus before translation Another key difference between prokaryotic and eukaryotic RNA synthesis is that, due to the absence of a nuclear envelope, prokaryotic RNAs are translated into protein as soon as they are made (i.e. co-transcriptionally). In contrast, because eukaryotic DNA is compartmentalized within a nucleus, eukaryotic RNAs need to be transported into the cytoplasm for translation and this necessitates the addition of sequences to the 5' end of the RNA (the 5'cap) as well as the 3' end (the polyA tail) that can regulate the stability of mRNA molecules and exert another level of control over gene expression that simple prokaryotes don't have. Furthermore, as we will see later, eukaryotic RNAs also undergo splicing within the nucleus to remove introns, and this adds yet another layer of control and complexity over eukaryotic gene expression. Modifications to eukaryotic mRNAs As noted above, eukaryotic mRNAs are modified in several ways and this stabilizes the mRNA as well as facilitates its export from the nucleus. mRNA from higher eukaryothes also undergo splicing by small nuclear RNAs (snRNAs) to remove introns and generate splice variants of genes. We will return to alternative splicing of mRNA later to provide more details and examples of this process. Martin/Snr Fresh/Page14 Eukaryotic RNA polymerases require transcription factors and their access to DNA is regulated by histones and other DNA-binding proteins In eukaryotes, the situation is more complex and gene expression is constitutively off at most promoters due to: (a) The requirement for specific and general transcription factors that regulate the binding of the RNA Pol complex to the promoter, and (b) Due to the physical inaccessibility of gene promoters due to the presence of histones (that form particles called nucleosomes that bind tightly to DNA, each nucleosome has approximately 200 bp of DNA wound around it, like beads on a necklace) and other DNA binding proteins that can regulate the binding of RNA Pol to the promoter regions by compacting the DNA in that region into a 'closed' state that blocks access of the RNA polymerase complex to the promoter. Eukaryotic RNA Polymerases are somewhat more complex than prokaryotic RNA polymerases but bind to similar promoter sequences. Eukaryotic cells possess 3 different RNA polymerases (I, II, III), which are related to the prokaryotic polymerase. The major difference between prokaryotic and eukaryotic RNA polymerases is that the latter require other factors (called transcription factors or TFs) to be bound to the promoter region before they can initiate transcription. The three different RNA polymerases transcribe the 3 main classes of RNA: rRNA, mRNA, and tRNA. RNA Pol I is responsible for generating rRNA precursor molecules, RNA Pol II generates mRNA and RNA Pol III generates small RNAs such as tRNAs and 5S rRNA. RNA Pol II promoters (that regulate mRNA synthesis) bear a lot of similarity to prokaryotic promoters. A common feature of these promoters (located -30 to the transcription start site) is an A-T rich region called the TATA box. The actual sequence is TATAAAA (T's frequently replace the A's in the 5th and 7th positions: TATATAT). The TATA box bears close resemblance to the prokaryotic -10 or Pribnow box found in bacterial Martin/Snr Fresh/Page15 promoters. Sometimes G's and C's appear (as in CATAAAA) and some Pol II promoters have no TATA box at all. Other (somewhat less common) upstream elements of Pol II promoters are the GGCCCAATCT element or CCAAT box (cat box), which typically lies at -75 and the GC box which has the sequence GGGCGG and is found around the same region. Together, these promoter elements control expression of the genes they flank as they are typically required for efficient binding of the transcription factors that are required to initiate gene transcription. As a general rule, promoters recognized by RNA Pol II typically contain a TATA box and at least one other upstream element. RNA Pol II complexes require transcription factors to bind gene promoters While bacterial RNA polymerase is capable of binding to the promoter regions of genes and initiating transcription without help from other factors, eukaryotic RNA polymerases are incapable of recognizing the promoter regions of genes on their own but require transcription factors (TFs) to assist them in this. TFs fall into two broad categories: general and gene- specific. The general transcription factors that collaborate with RNA Pol II are TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH and TFIIJ. The best studied of these is TFIID. This factor contains many subunits, but the most important of these is the TATA box-binding protein (TBP). As its name suggests, TBP binds to the TATA box bringing the rest of TFIID along with it. The DNA-binding domain of the TBP consists of a large β-sheet that forms a saddle-shaped structure that sits astride the DNA in the region of the TATA box. Binding of the TBP to the TATA box induces a dramatic distortion in this region of the DNA. X- ray crystallography has shown that TBP binding causes a large kink in the DNA helix, causing the DNA duplex to unwind over an 8 base pair span thereby allowing access of the RNA polymerase to the template. TFIID (of which the TBP is a part) serves as a platform for organization of the other transcription factors which then bind in the following order; D, A, B, F + Pol II, E, H and J. Martin/Snr Fresh/Page16 Upon assembly of the RNA Pol II preinitiation complex, the C-terminal domain of the largest subunit of RNA polymerase II becomes heavily phosphorylated (most likely by TFIIH which is a kinase) within its C-terminus and this is thought to act as the trigger that uncouples the polymerase from some of the TFs (which remain at the TATA box) allowing the polymerase to move along the DNA template. Gene-specific transcription factors, although numerous, share common DNA binding domains Although the general TF's are sufficient to enable a basal level of transcription this can be dramatically enhanced by binding of additional transcription factors to other regions of the promoter. Literally hundreds of gene-specific TF's exist (some of the well known ones are Myc, Max, Fos, Jun, NfκB, NFAT) and it is not possible to discuss these individually within the scope of this course. However, these can be subdivided into a number of categories based on the structural features they share in common. The DNA-binding domains of most TFs can be grouped into four broad classes whose members contain related motifs that allow interaction between the TF and the major groove of DNA. 1. The zinc finger motif 2. The helix-loop-helix (HLH) motif 3. The leucine zipper (LZ) motif 4. The high mobility group-box (HMG) motif All of these motifs are slightly different solutions to the same problem; that of allowing recognition of specific DNA sequences by the TFs. Most of these motifs contain a segment (often an α-helix) that is inserted into the major groove of DNA where it recognizes the side chains of the bases lining the groove and forms non- covalent interactions with these (mainly hydrogen bonds and ionic interactions). While the strength of each contact between the TF and the DNA is relatively weak, because of the number of such Martin/Snr Fresh/Page17 contacts (approx. 20 in all), the overall strength and specificity of the TF-DNA interaction is very high. In fact, the affinity of TFs for their specific DNA sequence ranks as among the highest seen in biological systems. Specific transcription factors can act at a considerable distance from the proximal promoter of a gene by binding to enhancer elements Although specific TFs may bind relatively close to the proximal promoter of the gene(s) they control, enhancer sequences can often be found thousands of base pairs away from a promoter region that these enhancers control. This is because chromatin looping, facilitated by the specific TF and associated proteins (called mediators) bound to the enhancer, can bring this region in close proximity to the TATA box and the associated RNA Pol II complex sitting on the promoter. This leads to dramatic enhancement of gene expression. Martin/Snr Fresh/Page18 Histones and other chromatin-binding proteins can also regulate gene expression in eukaryotes Histones can be modified through acetylation, methylation, phosphorylation and other modifications to regulate access to promoters and switch the chromatin state in that location from open to closed. In general, modifications that neutralize the positive charges on histones reduce their affinity for DNA (which is negatively charged) and make the DNA more accessible in these regions. There are a huge number of possible patterns of histone modifications and this is called the histone code. One such modification of the histone c-terminal tail, called acetylation, is carried out by enzymes called histone acetyltransferases (HATs) and is a reversible modification (by histone deacetyltransferases, HDACs). Acetylation can lead to a more open chromatin state (due to neutralization of the negative charges on histones, thereby loosening their contact with DNA). Thus, histones associated with more active genes tend to be rich in acetyl modifications (called hyperacetylated) whereas the reverse is also true. Furthermore, some histone modifications can also recruit proteins (such as Polycomb proteins) that stay associated with the DNA in this region and effectively repress expression of genes decorated in this way almost permanently and can even be inherited by daughter cells. This type of regulation is called epigenetic control. Thus, gene expression is regulated by DNA-binding proteins, called transcriptional regulators, as well as the accessibility of the stretch of DNA harboring the gene (i.e. chromatin state). Martin/Snr Fresh/Page19 Nucleosomes (i.e. histone octamers) can also be moved off promoters by SWI/SNF complexes to facilitate the initiation of transcription Another way to pry open regions of DNA that are obstructed by nucleosomes, thereby allowing access of the basal transcription machinery is via the SWI/SNF (switch/sniff) complex. All eukaryotic cells are thought to contain this very large multisubunit complex, that can disrupt histone-DNA interactions in an ATP-dependent manner and allow binding of TFs to gene regulatory regions. Interestingly, this complex was discovered in two separate yeast genetic screens, one for genes involved in growth on sucrose (called sucrose non- fermenting mutants) and the other for genes involved in the ability to switch mating type (switch mutants) and one of the genes found in common from both screens was dubbed swi/snf. Subsequent work found that the product of this gene is involved in a large complex that can reposition nucleosomes to permit transcriptional activation. The SWI/SNF complex is thought to bulldoze histones off promoters and facilitate access of the RNA Pol complex to the promoter. The yeast SWI/SNF complex is comprised of 11 proteins (with a total mass of 2,000 kilodaltons or 2 MDa) and seems to act as an ATP-driven ‘nucleosome plough’ to facilitate transcription. Certain TFs may function by attracting the SWI/SNF complex to the promoter to facilitate transcription. DNA methylation also regulates gene expression patterns Examination of the DNA of mammals and other vertebrates reveals that up to 1% of nucleotides are methylated at cytosine residues (carbon 5) producing 5-Methylcytosine. This simple modification is thought to act as a molecular tag that allows certain regions of DNA to be regulated differently from others. Almost all of the methylated cytosines are found in CG dinucleotides (80% of CG dinucleotides are methylated) within symmetrical sequences (e.g. CCGG or GCGC). These sequences do not occur randomly within the genome but tend to be clustered in GC-rich regions (called CpG islands) that are often located in, or close to, gene regulatory regions. Most Martin/Snr Fresh/Page20 unmethylated CpG islands are found in clusters near active gene promoters. Thus methylation is likely involved in silencing gene expression. Methylation in vertebrates is dynamic, as enzymes exist that can remove (demethylases) as well as add methyl groups to DNA. Remarkable shifts are seen in methylation patterns during the lifetime of mammals. The first major change is seen during the first few divisons of the zygote when enzymes demethylate almost all of the methyl groups that were inherited from the parents. Then, around the time of implantation of the embryo in the uterus, a wave of de novo methylation is seen which establishes a new pattern of methylation throughout the DNA. Once this pattern is established, it is passed on to daughter cells in a process called maintenance methylation (an enzyme called maintenance methylase acts preferentially on those CG sequences that are already methylated on the complementary strand). Mice that have been engineered to knockout a specific enzyme involved in de novo methylation die midway through gestation, underscoring the importance of this process. DNA methylation has been found to correlate with gene expression patterns, with the levels of methylation within gene regulatory regions that are activated during development falling sharply. For example, the γ-globin gene is highly transcribed in the liver during fetal development and this correlates with a large decrease in methylation within the region lying upstream of this gene. However, much evidence also indicates that methylation is unlikely to be the initial event that inactivates certain genes, rather, methylation may serve to maintain a gene in an inactive state. Methylation is though to inhibit transcription in two ways: (1) By interfering with the recognition of DNA-binding sites by TFs and (2) By attracting transcriptional repressors to certain sites Martin/Snr Fresh/Page21 Lecture 4 RNA splicing regulates the complexity of gene expression products in eukaryotes As we mentioned earlier, in prokaryotes genes are relatively short and code directly for proteins or regulatory RNAs. However, in eukaryotes, gene- coding sequences (called exons, expressed regions) are often interrupted by non-coding sequences (called introns, intervening regions) that must be spliced out of the mRNA molecule after it is synthesized. This enables the coding sequences of genes to be split up into smaller pieces that act as modules that can be mixed and matched to create a variety of alternative proteins. This vastly increases the complexity of gene products that higher organisms can produce from a relative small protein coding genome. For example, despite having only 25,000 protein coding genes, mammals can make approximately 100,000-200,000 different proteins due to differential pre-mRNA splicing and this complexity can be increased further due to post-translational modifications of proteins (phosphorylation, proteolysis, ubiquitination) that can further alter protein functional states. Martin/Snr Fresh/Page22 pre-mRNA splicing is carried out by a large molecular complex, called the spliceosome that recognizes the junctions between introns and exons and can exise the introns to produce a fully spliced mRNA molecule. The discovery of introns by Rich Roberts and Phillip Sharp in 1977 (for which they later won the Nobel prize) came as a major surprise. Studies in bacteria indicated that genes were comprised of uninterrupted stretches of coding sequence. The first indication that eukaryotes were different came from studies of a DNA virus that infects human cells (adenovirus). Comparisons of viral mRNA with the corresponding DNA coding sequence revealed that certain sequences found in the viral DNA were missing from the mRNA produced by the same region. The discovery of similar missing pieces in vertebrate ovalbumin and β- globin mRNAs (these were easy to examine because of their abundance) dismissed the possibility that this was some quirk of viral mRNAs. In fact, it turns out that most mammalian genes contain much more intron sequence than exon sequence. Intron sequences evolve rapidly with only their flanking sequences (required for removal from pre-mRNA) under selective pressure. It is thought that introns have been lost from prokaryotes (these organisms are more likely to need to replicate their genomes rapidly to facilitate rapid Martin/Snr Fresh/Page23 cell division) rather than that they have been acquired by eukaryotes. Thus, split genes are probably the more ancient condition. How does the splicing machinery recognize the intron/exon boundaries? By examination of the DNA sequence at the junctions between hundreds of introns and exons it is apparent that similar sequences (or splice sites) are found at the boundaries (typically GU----AG). These boundary sequences are called the 5’ splice site (or donor site) and 3’ splice site (or acceptor site). RNA splicing must be carried out very precisely as an error of even a single nucleotide would result in a shift in reading frame in the resulting mRNA. Alternative splicing is a major source of protein diversity in higher eukaryotes RNAs may undergo alternative splicing reactions where the pre- mRNA is processed in different ways to produce mature transcripts lacking one or more exons. The splice variants that are produced may have a completely different function to that of the full-length splice form. A substantial proportion of higher eukaryotic gene products produce multiple forms of a protein from a single gene using alternative splicing. Alternative splicing is likely to be regulated by molecules that sit on the pre-mRNA, thereby preventing access of the splicing machinery to different splice sites. The calcitonin/neuropeptide gene: Two completely different proteins made in thyroid cells vs neurons. The DSCAM gene in Drosophila is involved in specifying neural circuitry has over 38,000 possible protein isoforms due to alternative splicing. Martin/Snr Fresh/Page24 mRNAs are translated into protein on Ribosomes Upon transcription of protein-coding genes into mRNA molecules, these need to be translated into proteins to play their role in cellular function. This is achieved through binding of the mRNAs to ribosomes, which translate the latter into proteins, as follows. The process of translation happens in the cytoplasm, where mRNA binds with ribosomes, which are protein synthesis machines. Ribosomes have three binding sites that play important roles in the protein synthesis process. One of these binding sites is responsible for binding of mRNA. The other two domains are used to attach tRNA molecules (that carry amino acids to the mRNA template) and are labeled as the “A (for aminoacyl) site”and “P (for peptidyl) site”. The third site is called the "E (exit) site". tRNAs bring amino acids to the ribosome in a manner dictated by the coding sequence of the mRNA The attachment of an mRNA molecule to the ribosome makes possible the binding of tRNA molecules to the ribosome in an order defined by the nucleotide sequence of the mRNA. Each tRNA is associated with specific amino acid. This tRNA function is determined by the structure of the molecule itself. tRNAs are cloverleaf shaped polynucleotide sequences. The tRNA tail end has an acceptor stem that can bind a specific amino acid. While its head has three nucleotides that form the “anticodon” that recognize the corresponding codon sequence of the mRNA molecule. Thus, tRNA anticodons bind complementary to the triplet codons of the mRNA. All tRNA molecules having the same anticodon sequence always carry the same amino acid residue. Thus, there is a tRNA specific for each amino acid. The stages of translation STEP 1: Initiation. The process of translation begins when the mRNA molecule binds to the small subunit of the ribosome (30S in prokaryotes and 40S in eukaryotes) due to recognition of the 5' cap of eukaryotic mRNAs or a sequence (called the Shine Dalgarno sequence, AGGAGG) upstream of the initiator AUG codon in prokaryotic mRNAs. Protein synthesis is initiated Martin/Snr Fresh/Page25 by an AUG codon on the mRNA. The AUG codon signals both the interaction of the ribosome with mRNA and also the tRNA with the anticodons (UAC). Once the AUG codon is recognized, the large ribosomal subunit (60S in eukaryotes and 50S in prokaryotes) now associates with the mRNA and the small ribosomal subunit and protein synthesis is now ready to begin. The start codon (i.e. the first one!) always codes for the amino acid methionine (and is always AUG). The start codon enters the P site on the ribosome, while the second codon enters the A site. The anticodon of the tRNA carrying methionine temporarily base pairs with the start codon. A second tRNA molecule with an anticodon complementary to the mRNA codon in the A site approaches the A site. This tRNA anticodon forms a temporary base pair with the codon of the mRNA in the A site. The amino acid attached to the tRNA in the A site and the methionine in the P site form a peptide bond (catalysed by peptidyl transferase, an RNA-based enzyme that is integrated into the 50S ribosomal subunit) and initiate the polypeptide chain. STEP 2: Elongation. In prokaryotes and eukaryotes, the basics of elongation are the same. The 50S ribosomal subunit of E. coli consists of three compartments: the A (aminoacyl) site binds incoming charged aminoacyl tRNAs. The P (peptidyl) site binds charged tRNAs carrying amino acids that have formed peptide bonds with the growing polypeptide chain but have not yet dissociated from their corresponding tRNA. The E (exit) site releases dissociated tRNAs so that they can be recharged with free amino acids. The ribosome moves along the mRNA sequence and the tRNA that resided in the A site moves over to the P site, due to a ratchet-like conformational change that takes place between the two subunits of the ribosome. The tRNA that was previously in the P site now moves to the Exit site and disengages from the ribosome, leaving the growing peptide chain attached to the tRNA now in the P site. Then a new mRNA codon enters in the A site. A tRNA carrying a complementary amino acid connects with the bases of the new codon in the A site. Once done, the two adjacent amino acids form a new peptide in the chain. The energy for each peptide bond formation is derived from GTP hydrolysis, which is catalyzed by a separate elongation factor. Again, the Martin/Snr Fresh/Page26 ribosome moves down the mRNA sequence. The tRNA molecule from the P site moves to the E site and is released into the cytoplasm, where it can bind with another amino acid of the same type. A tRNA complementary to the new codon in the A site enters and a new peptide bond is created between the new amino acid and the currently formed peptide chain. Amazingly, the E. coli translation apparatus takes only 0.05 seconds to add each amino acid, meaning that a 200-amino acid protein can be translated in just 10 seconds. Step 3: Termination. The process of translation repeats until one of the 3 stop codons (TUU, TUG, TGU) enters the A site of the ribosome. Once the final peptide bond is created, the protein chain, which is connected only to the tRNA located in the P site moves to the cytoplasm. The process of translation is now complete and the ribosome is now ready to repeat the synthesis several more times. Ribosome Structure The ribosome has two sub-units, which are comprised of ribosomal RNAs (rRNAs) and specific ribosomal proteins. These two subunits have different sizes, called the large and small subunits, respectively. When forming the ribosome, the two subunits fit to each other and form a roughly spherical structure. During protein synthesis, the two subunits function together to translate the information encoded in mRNA into a polypeptide chain. The ribosomal subunits slightly differ in prokaryote and eukaryote cells. Prokaryotic Ribosomes Prokaryotes have 70S ribosomes. Their ribosomes consist of one large (50S) and one small (30S) subunit. Each of the small sub-units is built from a 16S RNA sub-unit that is 1540 nucleotides long. This rRNA is Martin/Snr Fresh/Page27 bound to 21 different ribosomal proteins. The large subunit contains a 5S RNA (120 nucleotides) and one 23S RNA (2900 nucleotides long). The two ribosomal RNA molecules in the large sub-unit are bound to 31 proteins. Eukaryotic Ribosomes Eukaryotic ribosomes slightly differ from these of the prokaryotes. They have one large (60S) and one small (40S) sub-unit, forming an 80S ribosome. The eukaryotic small sub-unit contains 33 proteins bound to an 18S RNA (1900 nucleotides). Eukaryotes have 3 RNAs in their large subunit – a 5S RNA (120 nucleotides), a 28S RNA (4700 nucleotides), and a 5.8S RNA (160 nucleotides). The 3 ribosomal RNAs are bound to 46 proteins and form the large sub-unit. Martin/Snr Fresh/Page28

Use Quizgecko on...
Browser
Browser