Cell Cycle Control Notes PDF
Document Details
Uploaded by Deleted User
Tags
Summary
These lecture notes cover the cell cycle, describing its phases, checkpoints, and regulation mechanisms. The document details the molecular control of cell division, including the roles of cyclins, CDKs, and checkpoints. It also discusses methods of studying the cell cycle like flow cytometry and immunohistochemistry.
Full Transcript
LECTURE 1: CELL CYCLE CONTROL All cells come from previous cells and for that reason the cell division is really important for: embryonic development, tissue regeneration, and homeostasis among others. It also is the basis of replication, development, and growth. Any dysregulation during the cycle m...
LECTURE 1: CELL CYCLE CONTROL All cells come from previous cells and for that reason the cell division is really important for: embryonic development, tissue regeneration, and homeostasis among others. It also is the basis of replication, development, and growth. Any dysregulation during the cycle might produce cancer and other diseases. When you look from the perspective of the human level a cell is really small, but from the molecular level a cell is giant, so cell division is huge and needs coordination and mobilization of all molecules and organelles in the cell (the only molecule that is really critical and has to split perfectly is DNA, the rest could be compensated). The cycle is formed by two main phases, Interphase (G1, S, G2) and M phase. G phases are used for growth (if the cell doesn’t grow it would be smaller after each division) and control, and in S phase the DNA duplicates. G1 is high in transcription and translation because the cell needs proteins to grow, but it also needs lipids, sugar, etc. They come from nutrition, so in G1 the cell needs to eat (G1 is around 1 day) (if the cell is not big enough the cell won’t go into S phase). By contrast, the M phase is really quick (around an hour) and it is the phase of movement so it needs lots of coordination for the nucleus and the cytoplasm to divide. In tissues most cells are no cycling, if for example there is not enough food the cell can wait until there is enough of it. For that reason, there are two important cycle exits: G0 or quiescence which is reversible, or a permanent exit called senescence (ex. due to radiation, short telomers, DNA damage). During the interphase the chromosomes are relaxed, but to be divided they need to be compact (mitotic chromosomes) so it can partition equally with no mistake. After the S phase the DNA is doubled, but until the M phase the sister chromatids might be together, which is cohesin rings mission. The timing is very important in the division, and the centrosome has an important function. The centromere is the principal microtubule organizing center (MTOC) is a simple organelle that is duplicated in S phase and the pair stays together until division. It is also important to stand out that mitotic cells are easy to recognize in cell culture because they round up. Mitosis: -Prophase: chromosomes condense and spindle apparatus begins to form -Prometaphase: nuclear envelope breaks and the microtubules contact the chromosomes -Metaphase: chromosomes complete migration to the middle of the cell -Anaphase: sister chromatids separate into daughter chromosomes and are pulled to opposite poles of the spindle apparatus. The transition from metaphase to anaphase is one of the main checkpoints. Here, enzymes under the control of APCCs cut the cohesin rings and the sister chromatids can be separated (until everything is ok the anaphase doesn’t start) -Telophase: the nuclear envelope re-forms and the chromosomes de-condense -Cytokinesis: division of the cytoplasm forming two new daughter cells. CHECKPOINTS AND MOLECULAR CONTROL The cell cycle has developed checkpoints to maintain directionality, in order not to return to previous phases. If cell is damaged, it can temporarily exit the cell cycle and if the damage is too big, it can permanently exit the cell cycle, or even go under apoptosis. The biggest checkpoint is at the beginning, from G1 to S phase, and it is called a start. If a cell crosses the start point it has to finish the cycle sooner or later, it can’t be stuck in other phases. The checkpoint controls that everything is ok inside and outside the cell. Inside every checkpoint the first thing to check is the DNA (if it has any damaged, replicated in G2), and in the surroundings you can for example check the size of the cell. Each checkpoint requires a different stimulant and acts through a different Cyclin-Cdk pair: G0/G1 (Mitogen stimulation); G1/S (Restriction point; Start); S/G2 (DNA damage); G2/M (Antephase checkpoint: to start mitosis); M/G1 (Spindle assembly, SAC). There was a key molecule called MPF (Maturation promoting factor) that was able to activate other molecules in order to start division, but it was discovered that it consisted of two molecules, Cyclin and Cdk. Most of the regulation goes through post-translational regulations (PTM), and cycle of synthesis & degradation. Phosphorylation is a quick post-translational control and it includes two types of enzymes: kinase, which phosphorylates the target, and phosphatase, which dephosphorylates the target. The Cdk (cyclin-dependent kinase) is a kinase that requires cyclin to function and when it joins to a cyclin it is activated and phosphorylates a certain set of targets critical for cell cycle. Cdk levels are constant, while cyclin levels oscillate through cell cycle. You can increase the production of cyclin increasing transcription and translation and to degrade it you need proteases. But if you only need to send this protein to the proteosome complex you need to mark them, they need to be ubiquitinated. We need to control every single cell so our control would be more complex. For that reason, we have four types of cyclins (E, A, B, D). Each cyclin matches to a different Cdk (the main ones are 1,2,4,6), and each couple have different time expression and different functions. Cyclin D sends the stage that helps deciding if the cell proliferates or not, but it is not enough we also need that cyclin E is activated, and with this two we send the cell through the first checkpoint. Cyclin D is also necessary to take cells from the stage G0. *As soon as the cell passes metaphase-anaphase the cyclin-CDK complex disappears Cyclin regulation: Ubiquitination is a PTM that drives proteins for degradation by the proteasome complex. In this case the ubiquitin protein joins the complex (specifically to a exposed lysine) cutting cyclins in small pieces and leading them to the proteasome, which causes an abrupt decrease in cyclin-Cdk complexes after its function finishes (there are other mechanisms using ubiquitination, but this is the main one). The main enzymes that participate in this process are ubiquitin activating enzyme (E1), ubiquitin conjugating enzyme (E2), and ubiquitin ligase (E3). APC/C and SCF are two important types of ubiquitin ligase. APC/C is the key regulator of metaphase to anaphase transition and it targets securins, S- and M- cyclins; and SCF targets CKIs. - APC/C needs coactivators and it has two at different times of the cell cycle. Cdc20 in the first part of the M phase (until metaphase) and together they target molecules like cyclin A, B, B3 and securins. The other coactivator is Cdh1from anaphase until start of S phase and they target molecules like Cdc20, cyclin A, B, Cdh1, CDC14, etc. The separase is an enzyme (activated by the APC) in charge of cutting the cohesin proteins that maintain together the sister chromatids. This allows the microtubules to separate the sister chromatids. APC/C has another important target which is cyclin B, and when the APC/C is activated by the subunit Cdc20, they participate on the ubiquitination of the cyclinB-Cdk complex. -Cdk inhibitors (CKI) can block the Cdk function via covalently attaching chemical groups (like phosphorylation) or covering the protein surface. They are mostly active on G1/S and most of them join the cyclin-CDK complex in order for them not to interact with other molecules. Moreover, in mammals there are two families of CKIs: CIP/KIP and INK4. We can see two examples: p21 suppresses G1/S and S-Cdks upon DNA damage; and p27 mostly helps terminally differentiated cells to leave cell cycle. One important case is when the DNA is damaged and it is possible to repair it. When the gene p53 which is active (it is phosphorylated) leads the p21 join the G1/s-Cdk and S-Cdk and they are inactivated in order not to continue with the cell cycle. The cell will wait in that phase until the DNA is repaired. REGULATION OF CHECKPOINTS In normal cells, cells without mitogen (signal that tells the cell when to proliferate) stimulation will not enter the cell cycle (they will remain in G0). Upon stimulation, cells will progress into G1 due to a signal cascade that will promote proliferation genes being switched on, which includes the translation of Cyclin D. The pathway of RAN-GTP-PI3K-AKT-mTOR is the one that regulates the expression on Cdks. Myc is activated by this pathway, a gene regulatory protein that activates G1-Cdk which inactivates Rb. For RB to be inactive it needs to be phosphorylated and after that the transcription will work leading to cell proliferation. Rb is a tumor suppressor because if it is inactivated E2F (oncogen) produces a child tumor (Cyclin E is another oncogen). DNA damage mediated arrest When there is a damage in the DNA, proteins are recruited to the site of damage. This leads to a signal cascade through sequential phosphorylation events which lead to the inhibition of Cdk- Cyclin complexes by recruitment of CKIs or inhibition of Cdc25. At each phase, different clues are detected. Yet DNA quality is checked all through the cell cycle: G1-Cdk controls the extracellular environment, G1/S- and S-Cdks control the DNA damage, M- Cdks control unreplicated DNA and DNA damage, and APC/C controls that the chromosomes are unattached to the spindle. This last point is called the spindle assembly checkpoint (SAC), which goal is to prevent the premature segregation of sister chromatids and therefore ensure genome stability. METHODS OF CYCLE STUDY Flow cytometry It assesses cell cycle phase by measuring DNA content via several detectors, and via light scattering of individual cells passing through a channel. Also, size and granulation (related to synthesis) data is used to determine cell types. If a DNA dye is added, flow cytometer can be used to determine the proliferating cell population by measuring the DNA content of each cell. As we can see, when the cell has less DNA, it would emit less fluorescence so it means that the DNA hasn’t replicate yet. That is how we see the difference between G1 and G2 phases This technique requires the cells to be treated with a DNA-binding fluorescent dye (Propidium Iodide (PI), Hoechst or DAPI) which labels whole DNA content. In addition to DNA dye, BrdU (bromodeoxyuridine, a nucleotide analogue), can be incorporated into replicating DNA is used to differentiate DNA synthesis (S phase). FACS FACS (Fluorescence-Activated Cell Sorter) is a specialized version of flow cytometer, that separates cells using a fluorescent marker in addition to analysis. IHC/IF Immunohistochemistry (IHC) / immunofluorescence (IF) uses antibodies raised against the known mitotic markers or certain chemical agents to observe the cell cycle phase. For example, BrU joins newly synthesized DNA, so it marks cells that have just been in S phase; and pH3 is a proliferation marker. LECTURE 2: GENOME ORGANISATION DNA MOLECULE Nucleic acid monomers consist of a 5C sugar, a nitrogenous base, and a phosphate group. The 5 carbons of the sugar join different things: base, 2 hidroxile group or not (if it has it is RNA= ribose; DNA= deoxyribose), 3 hidroxile group, 4 and 5 has the phosphate, and they are all together because of the covalent bonds. When we have sugar and base it is called nucleoside, and when we also have the phosphate group, we have a nucleotide. *Nucleotides always have a negative charge because of the charge of the phosphate, so it is soluble in water. Nucleic acids always polymerize 5’ to 3’, so the next nucleic acid always attaches to the 3’. The nucleotides are covalently linked by phosphodiester bonds, which are very strong, and the hole DNA double strand is established by hydrogen bonds, which are weak (is necessary to separate the chains for replication, transcription, etc), but there are millions of hydrogen bonds, so they are able to have the genome together (with high pH or boiling water the hydrogen bonds would brake). Always the two polymers run antiparallel. *Relatively hydrophilic major groove and relatively hydrophobic minor groove DNA PACKAGING Packaging is important to control the DNA. The genome is not a single molecule, it is divided in pieces that are called chromosomes, in humans specifically 46 chromosomes. Chromatin is a complex of DNA and histone/non-histone chromosomal proteins. In a healthy cell, naked DNA is almost never seen, except the moment of replication and transcription. Promoters of active genes and regulatory sequences are lack of nucleosomes because they are DNase hypersensitive sites. Change in DNase sensitivity can be observed when the cell differentiates (nucleosomes are repositioned). The first level of complexion appears when the bead on a string structure is treated with nucleases and the nucleosomes appear. A nucleosome consists of a histone which is an octamer (2x H2A, H2B, H3, H4) and DNA double helix around the histone core. The histone has a positive charge and DNA has negative charge so they stay together providing a 3X folding of DNA. Linker DNA is sensitive to DNase digestion, which digest the DNA without a particular recognition sequence. -ATAC-Seq (Assay for Transposase-Accessible Chromatin Sequencing) is a method where transposase interacts with the accessible regions of DNA, localizing the adapters. Then it is fragmented, and sequenced by NGS. Height of the peaks in this assay reflects the accessibility of the regions (can know how relaxed the DNA is). Sometimes histone variants can replace typical histone to do a specialized function. H3 variant CENP-A is required for centromere; H2A variant H2A.X is associated with DNA repair and recombination (it calls for help when there is DNA damage); and H2A.Z is associated with transcriptional regulation. *if you don’t check the DNA information but you check the histones you would be able to know where in the genome you are, for example in the centromere* The second level of complexion would be the sole nucleosomes or solenoids. Nucleosomes are usually packed further, by Histone H1 binding to linker DNA. In the meantime, histone tails of the neighboring histones also interact forming fibers or solenoid structure (30 nm fibers). For a higher packaging, solenoids make loops on a protein scaffold for a stronger compaction, which can be relaxed for gene expression. The chromosome looping and condensation occurs at multiple levels to form mitotic chromosome. CHROMATIN REMODELLING By hydrolyzing ATP, chromatin remodeling complex interacts with both DNA and histone, and pushes DNA. Changing the position of the DNA in compared to histone core, locally changes accessibility of the DNA to transcription machinery. This machinery can locally condense or decondense the DNA, making it more, or less accessible to the transcription machinery. Epigenetic code Another way of chromatin remodeling is to covalently modify histone proteins on the N-tails (they are randomly moving and they are open to interaction). Common PTMs are methylation, acetylation, phosphorylation, sumoylation etc. Acetylation is a common PTM based on the addition of acetyl groups to lysine residues on histone tails that weakens the electrostatic attraction between DNA and histone, thereby produces a relaxation and releases DNA for other interactions. At the same time, it reduces its affinity for the neighboring nucleosome, relaxing the solenoid structure. The DNA will be free for the translation machine, so acetylation will produce an increase of the gene expression. Conversely, deacetylation reaction prevents RNA Pol from accessing the region. These reactions are catalysed by HAT (Histone Acetyl Transferase) and HDAC (Histone Deacetylase). We can see how histone modification codes for a language. For example, in the case of H3 (H3K9) it can be methylated which means heterochromatin formation and gene silencing. Histone modifications not only change affinity to DNA, and create docking site for regulatory proteins (non-histone effectors). Bromodomains recognize acetylated Lysine, chromodomains recognized methylated Lysine (most of the type it leads to active gene expression), and PHD interact with chromatin in a more general way. Copying epigenetic information Epigenetic marks and cell identity are inherited with cell proliferation. Epigenetic marks silent some regions for the cells to specialize and once a cell specialize for example in a muscle cell it won’t be able to go back. When a cell divides the DNA is replicated but the histones are not replicated, so you have to produce more via transcription and translation. The chromatin state is inherited to the daughter cells upon cell division. It also spreads to the neighboring nucleosomes, via a chromatin structure copying and modifying enzyme (Reader Writer Complex), creating a wave of condensed chromatin. This will continue until the complex hit a barrier. The heterochromatin grows in both directions and we need these barriers to stop its growth. We can have mutations in the barriers and it can develop a problem if the mutation is near telomers and centromeres. Chromatin levels Regions that are not expressed are more condensed, heterochromatin (10% of DNA, mostly around centromere and telomeres) while regions containing actively expressed genes are more relax, euchromatin. Moreover, heterochromatin can be constitutive (maintains structure) or facultative (reversible). Methods: -ChIP (chromatin immunoprecipitation): It is commonly used to study local chromatin conformation. You first have to cut the chromatin into pieces and then precipitate the pieces by joining antibodies. With this you can see the expressing genes or silent genes in that region. DNA can be hybridized to microarray too and it is called “ChIP on a chip”. -Chromatin conformation capture: it is used to identify sequences lie adjacent. Living cells are cross linked with FA and close chromatin is fixed. They are digested with a restriction enzyme and re-ligated. 3C method uses quantitative PCR of the ligation products to test frequency; and 4C produces an unbiased list of sequences with a given bait sequence. These methods quantify the number of interactions between genomic loci that are nearby in 3-D space, but are separated by many nucleotides in the linear genome. DNA Methylation DNA is modified by methylation on Cytosine-C5 (only if the neighbor is Guanine) forming CpG (can interact with meCpG binding proteins). CpG means that C and G are attached to each other with a phosphodiester bond. If CpGs are methylated it is methylated in both sides of the chain. In mammals 70% of the CpGs are methylated by DNA Methyltransferase enzymes (DNMT). Very rarely the DNA methylation is inherited from the parents, for example in the case that you only need your mother copy, and these are called imprinted genes. Bisulfite modification is a method to identify Cytosine methylation status. With this treatment if C is methylated it still will show a C, but if it is not methylated it will turn into a U. NUCLEAR ORGANISATION During the interphase, the chromosomes are relaxed, but still, their localisation in the nucleus is well-organised. Each chromosome locates in its own territory forming a nuclear architecture (chromosome are separated thank to the work of some enzymes). Heterochromatin interacts with the nuclear lamina, on the periphery of the nucleus, through the Lamina Associated Domains (LADs, part of chromosomes), and euchromatin is located closer to the center. Location of each chromosome can be detected with fluorescent in situ hybridization (FISH). A prominent structure in the nucleus is the nucleolus, made of rRNA genes located on 10 different chromosomes, and it is in-charge of the ribosome production. There are other nuclear speckles (granule clusters) with various functions, no-membrane subnuclear structures. LECTURE 3: GENE EXPRESSION TRANSCRIPTION The central dogma of biology has four main exceptions: reverse transcription, non- coding RNAs, molecules made of RNA such as ribosomes; and cases where the RNA is replicated (ex. COVID). During transcription, the main enzyme is RNA polymerase, which adds new monomers to the 3’ end of the RNA polymer after the PPi group is cut (the PPi released from the monomer provide enough energy to establish a phosphodiester bond and that’s why the polymerase doesn’t need ATP to work). DNA is a doble strand chain and both strands are equally important, both could be used for transcription. The strand used to create RNA is called template, while the coding strand is going to be identical to the transcript. The DNA polymerase is active only during S phase, but RNA polymerase to start needs a primer which is done by a primerase controlled by Cdks. Moreover, the DNA polymerase only starts at the origin of replication while the RNA polymerase could start everywhere. For the transcription complex to start is also necessary that the histones detach and the double strand to be melted. The promotor will always dictate the start of a gene transcription and then the polymerase adds new monomers in 5’- 3’ direction. *We can see that there are two different genes, where there are several RNA pols working at the same time. The transcription starts from the left of both genes* Transcription initiation Eukaryotic RNA Pol requires general transcription factors, which work for the start of the transcription of any protein coding gene. Assembly starts with TATA-box binding protein (TBP) recognizing the TATA box and its domain TFIID recognizing the promoter. TATA box recognition is the first step of transcription initiation in genes transcribed by RNA Pol II (RNA Pol I and III transcribe the non-coding genes). TFIID locally distorts the double helix when it binds to the TATA box, conformational mark that helps the other transcription factors find and join the promoter. TFIIB, TFIIE and TFIIH join forming the transcription initiation complex. *This is not as simple as shown, sometimes an enhancer makes a loop and interacts with the Pol via a mediator. *The RNA Pol II finishes at the transcription termination site RNA processing Before RNA goes to cytosol for translation it has to be controlled and processed, and here we have three methods: capping, splicing, and polyadenylation. Capping (cap added at the 5’ end) and polyadenylation (Poly-A tailing, needs to be recognized by the ribosome to code proteins) occur to the pre-mRNAs only. Capping prevents new mRNA from nucleases of the nucleus, and both increase the stability of mRNA, facilitate the export. RNA splicing is based on the removement of introns from the pre-mRNAs, and it is done by the splicing machinery, which is mostly made of RNAs (small nuclear RNA / snRNA). The regulation of splicing is as complex as transcription regulation, for example, there are certain sequences that mark the start and end of the exon. After all of this, only correct and mature mRNAs can leave to cytosol through nuclear pores complex. How to study transcription -Northern blot: RNA mixture (might be pieces) is separated in a gel and passed to a membrane which is blotted with a probe (small nucleic acid that matches the sequence of a target). It tells the size of the piece and the intensity, so it is only semi-quantitative. -qRT-PCR: The target sequenced is amplified with a polymerase, and the fluorescent reporter is recorded real time. It is quantitative. TRANSLATION Each 3 nucleotides matching an amino acid is called a codon, each tRNA has an anticodon which matches with the codon, and the molecule which compares the anticodon and adds the proper amino acid is called aminoacyl tRNA synthetase. Ribosome is a ribozyme, an RNA chain with a catalytic activity, and it has two subunits with different functions: the small subunit matches the codon with anticodon and large subunit catalyzes the peptide bond. Each ribosome has 3 sites, E, P and A; the new tRNA enters in A site and forms a peptide bond with the charged tRNA on the P site. Empty tRNA is ejected from E site. Once a peptide bond is established, ribosome moves downstream. The initiation complex, which is methionine (initiator codon, AUG) and some translation factors, starts in the cap and moves downstream until it identifies a start codon. Between the cap and the start codon we have an untranslated region which is a regulatory region. This first methionine is removed later with a specific protease in some cases. Finally, translation finishes with a stop codon (UAA, UAG, UGA), which matches a release factor. Both in prokaryotes and eukaryotes, several ribosomes can translate one mRNA simultaneously. These subsequent ribosomes are called polyribosome (aka polysome). But there are differences between eukaryotic and prokaryotic translation machineries like antibiotics can target and block the prokaryotic translation. Protein breakdown After some time, it is necessary to break down proteins in amino acids and it is mainly done during proteolysis, where peptide bonds are cleaved by protease enzymes and the proteasome recycles proteins using ATP energy. It is found both in nucleus and in cytoplasm. The proteins to be degraded in the proteasome are marked by small chemical groups during ubiquitination. Protein modification: when a ribosome translates an RNA, we have a polypeptide. To have a protein it has to be maturated (for example PTM) How to study translation -Western blot: a protein mixture (purified proteins) is separated in a gel and passe to a membrane that is blotted with an antibody that recognizes the target protein. It is semi-quantitative. -Immunostaining (IHC or IF): it uses fluorescently tagged antibodies that recognize proteins in an intact tissue. LECTURE 4: GENETIC VARIATION NUCLEIC ACIDS DNA is a double helix molecule made of two complementary strands (polymer). The monomers are called nucleotide and organisms’ information is stored in its 5 types of nitrogenous bases The DNA has A, G (Purines) and T, C (Pyrimidines), while RNA instead of T has U (also pyrimidine). Complementary base pairing is established by Hydrogen bonding (non-covalent) between the nitrogenous bases where A with T has 2 Hydrogen bonds, and G with C has 3 Hydrogen bonds. MUTATION TYPES -Depurination: removal of purines (A and G) which leads to a loose of information (indel) -Deamination: a C (Cpg) degenerates into a T (Tpg) which leads to a loose of methylation -Thymine dimers Depurination and deamination are repaired by BER (base excision repair), while thymine dimers are corrected by NER (nucleotide excision repair). Sometimes the problem is not that simple, the problem might be bigger and the strands can break. HR can be done when we have a sister chromatid (mostly in G2 phase). By contrast, NHEJ is done when the sister chromatids are not there or the damage is too big and it is based on getting the two ends of the broken DNA together (it would be a big loose of DNA). In single nucleotide mutations a transition is a change of purine to another purine, or a pyrimidine into another pyrimidine, and a transversion is a change of a purine into pyrimidine, or a pyrimidine into a purine. CpG methylation Methylation is needed because sometimes we need to keep some zones or genes silent. In vertebrate genomes, there are rare stretches of unmethylated DNA with abnormal CpG frequency (>50%). These are called CpG islands and they are gene markers. They are associated with transcriptionally active regions (never methylated, C’s do not degenerate in T’s). Small Scale Mutations Single nucleotide alterations are called point mutations. Even though the point mutation is located in an exon, it may not alter the coded amino acid and it is called silent mutation. If the mutation is in an exon, it alters the coded amino acid and it is called Missense mutation (it can change the function of the protein). The change in the codon may not code for a different amino acid, but a stop codon and it is called Nonsense mutation (this causes the premature termination of the translation and truncated, shorter proteins). The nonsense mediated decay controls if there is any premature codon. Instead of substituting with another nt, a nucleotide can be deleted, or added it is called insertion & deletion (indel). They have a major effect because they change the protein sequence, altering the reading frame causing a Frameshift mutation. A missense mutation can be critic if it is in the active region of the protein, while frameshift mutations sometimes are degraded as soon as the reach the surface by the proteosome; one is not more dangerous than other, it depends in things like function. POLYMORPHISM Any alternative information in DNA sequence is called a variant. Certain loci can have two or more variant forms, called polymorphism, while mutation is the change of normal allele. SNPs The most common polymorphism type is variation at a single nucleotide, called a single nucleotide polymorphism (SNP), and the main deposit is SNPedia (each SNP has a unique identifier=rs). Some specific SNPs can be inherited with a mutation of interest, so it can be used as a marker to track the mutation. That’s why we have Genome wide association studies or GWAS, it allows to discover the common genetic variants (polymorphic regions) that can determine risk for a common disease. Dynamic Mutation -Due to slippage, repeat number can increase or decrease among generations (anticipation). The polymerase while reading a repeated sequence makes a jump, which can be forward or back. -Many STRs are polymorphic in the population (CNV or Copy Number Repeat). This increase of copies is problematic in cases like Huntington disease. Evolution of genomes: In addition to SNPs, there are other, larger kinds of polymorphism. Certain mechanisms lead to large scale changes in genome through evolution. These are common ways of how genes evolve, such as mutation in genes, mutations in regulatory DNA, gene duplications, or exon shuffling (the genes exchange exons or there is an extra copy of some exons). The mutations can happen anywhere (gene, promotor, enhancer, etc) and they don’t have any preference. When there is a mutation for example in the enhancer the expression pattern of the gene would be different. Duplication and Divergence Unequal crossing-over events (HR), which often takes place in repeat-rich regions, leads to deletion in one chromatid and duplication in the other, which might span an entire gene. Genes that over time obtain different functions, but back in the day where the same are called homologues and they can be divided into paralogues (they diverge but still coexist in the same species) and orthologues (they develop in different species). But not every duplication & divergence give rise to functional genes. With accumulation of excessive mutations, some genes lose activity (non-functional), turning into a pseudogene. DNA duplication Large-scale sub-genomic / segmental duplication is a result of chromosome translocation. Pericentromeric and subtelomeric regions are relatively unstable and that is why here the segmental duplication is more common. Moreover, during vertebrate evolution, whole genome duplication took place twice at least. Transposition Transposons (mobile genetic elements or jumping genes) can alter activity or regulation of a gene, can promote gene duplication, exon shuffling and genome rearrangement. Moreover, with the terminal repeats and the transposase enzyme, the transposons can target somewhere else in the genome or make their new copies (retrotransposons). HUMAN GENOME STRUCTURE In compared to other organisms, human genes are less dense and mostly interrupted by introns. Human genome contains a lot of “spacer DNA”. In human genome is even more important the question how you regulate the genome than what information is there in the genome (exons). Repetitive DNA Repetitive DNA was thought to be mostly junk. There is not direct function for the repetitive DNA, but there are indirect uses like is better for the mutations to happen here than in the exons, and it might be important for DNA stability and structure. Up to 50% of human genome is repetitive sequence, and there are two forms of it: Tandem repetitive sequence (satellite DNA), and Interspersed repeats Satellite DNA: Its name is because of the localization. According to the size we can divide them in groups: macrosatellites, which are really big (~100kb: AT rich) and the main places you can see them are in centromere and telomere (usually covered by the shelterin complex in order not to lose any information from the ends of the chromosome); minisatellites (10- 100bp; GC rich) which have high mutation rate and are highly diverse in population (variable number tandem repeat or VNTR); and microsatellites which are polymorphic and are used as molecular markers. *Minisatellites are highly polymorphic in the population, these sequences are used in paternity tests and in forensic science* Interspersed repeats: They can change genomic architecture a lot and they are mainly LINEs and SINEs SINEs are generally localized in gene-rich regions, whereas LINEs are enriched in intergenic regions. ENCODE PROJECT ENCODE (Encyclopedia of DNA Elements) project aims at defining all functional DNA elements in human genome and various functions of non-coding DNA. LECTURE 5: DNA DAMAGE AND REPAIR DNA damage can cause mutations which might lead to diseases such as cancer or neurodegeneration, and they also contribute to ageing. Knowledge of DNA repair pathways can help in the creation of cancer therapeutic strategies and genome engineering methods. DNA replication induces errors because every time a cell divides it needs to copy an exact sequence of 6 billion base pairs. We can see nucleotide insertions/deletions, incorporation of wrong or damaged nucleotides. All of these will lead to mismatches or small loops, and if it is not repaired before the next replication it can lead to mutations. MMR PATHWAY (MISMATCH REPAIR) Starts by recognizing the damage. The enzymes MutS or and MutL go upstream of the damage, PCNA ring holds the DNA together, and some exonucleases like EXO1 are recruited to remove a section of the DNA including the damage. EXO1 creates new DNA 5’ to 3’ and finally a ligase seals the DNA. This pathway usually deals with base mismatches. BER PATHWAY (BASE EXCISION REPAIR) This pathway deals with single-strand breaks (SSB) and single base damage. It also deals with several chemical attacks of DNA such as: -Oxidative damage: the ROS (reactive oxygen species) add an oxygen molecule at the guanines. During replication instead of having a C in from of the guanine we will see an A and if we don’t repair it during the next replications, we will finally have a T-A pair. -Hydrolytic attack: deamination. By depurination we lose the purine and we end up only with the sugar, what is seen as a gap in the strain where the A or G was (called site AP). -Alkylation: for example, in the alkylation of guanine we will have an O6-methylguanine that distorts the DNA helix. This is a special case in humans because it has direct repair because of the action of an enzyme called O6 methylguanine methyltransferase that removes the extra group added to the guanine. BER is used when we have small DNA damages. The first step is the DNA glycosylases identifying and removing the damage base (different damage different glycosylase). A base loss as in depurination can feed into the pathway in this next step where an AP endonuclease cuts the sugar-phosphate backbone and a dRpase removes the baseless sugar phosphate living a gap. Here PARP1recruits downstream proteins such as ligase and the polymerase that would fill this empty space. NER PATHWAY (NUCLEOTIDE EXCISION REPAIR) Major type of damage is for cells that are exposed to ultraviolet irradiations, where a dimer will be formed between any two neighboring pyrimidine bases in DNA. Cyclobutane Pyrimidine Dimers or CPDs unable the DNA of creating a normal double helix and it is repaired by NER. In this case there are two ways of recognizing the damage, and it depends on where the damage is. If the damage is inside a gene it is repaired by TC-NER (transcription coupled-NER) where the dimers lead to the DNA pol being stuck and not being able to continue with the transcription, which constitutes a signal. When the damage is outside a gene, the pol is not going to pass through that section of the DNA so here we have GG-NER (global genome-NER). In this case, the problem is recognized by proteins such as UV-DDB. After this, both TC-NER and GG-NER go to the same pathway where a complex that includes several helicases unwind 20-30 nt around the damage. This recruits endonucleases which remove a relatively large part of the DNA and the gap is filled and sealed by a DNA pol and DNA ligase (using the other strand as template). *In this process we can see a couple important proteins like XP (xeroderma pigmentosum) which is one of the first links between DNA repair and cancer. We have to take into account that the DNA in our cells is not free like that, it is in chromosomes where DNA is wrapped up in histones. There are proteins that can clear the area of nucleosomes in order to free the DNA and it to be repaired (for example chromatin remodellers as SWI/SNF and INO80; acetyltransferases as p300 and GCN5; and PARP1). Their action would relax the chromatin and the NER will take place. DOUBLE STRAND BREAKS (DSBs) DSBs are highly deleterious. The damage is sensed by the formation of γH2AX (phosphorylated histone variant), which can spread even megabases detecting the damage. There are two pathways or mechanisms (depend on the phase of the cell cycle). Nonhomologous end joining (NHEJ): we take the two ends and stick them together. It is the main mechanism in mammals. The damage is recognized by the Ku proteins and then a processing is done just in case the ends are not straight, and finally there is a ligation. Homologous recombination (HR): it uses a homologous template to repair the damage. It is limited to S and G2 because they are the phases where we have the sister chromatids as the homologous template. After resection the RPA binding protein joins the DNA and then Rad51 makes a filament, filament that goes looking for homology and when it is found, this strand (sister chromatid) can be used as the template. SYNTHETIC LETHALITY There is a huge number of genes involved in DNA repair, and this can be used to understand different therapies, in particular cancer therapies. A synthetic lethality interaction occurs between two genes when the perturbation of only one gene is viable (the other can compensate) but the perturbation of both genes simultaneously results in the loss of viability. What we can do is, if we know that one gene is mutated, we can use an inhibitor in the other one to kill this particular cell. An example of this is Olaparib (Lyparza), a PARP inhibitor, which is used to treat cancer with mutations in BRCA1 and BRCA2. LECTURE 6: THE RNA WORLD DNA vs RNA Although both of them are nucleic acids, there are certain differences between DNA and RNA. In RNA the ribose sugar has an extra –OH on C2, and it has Uracil base instead of Thymine. RNA is a single strand molecule where the 2’-OH group is free for interaction allowing RNA to gain enzymatic activity and catalyze reactions (makes RNA less stable). The single stranded nature of RNA molecule allows it to obtain secondary structures other than double helix, via conventional and non-conventional base pairing, intramolecular and intermolecular interactions. The most typical secondary structures are double strand, hairpin loops (tRNA where there are several loops formed thanks to intramolecular interactions; the loops can also fold and interact), and junctions. Most enzymes are proteins, but RNA can fold into distinct conformations and show enzymatic activity, and here is where ribozymes appear. There are a few natural catalytic RNAs, but with experiments and artificial selection, novel ribozymes can be obtained in vitro and they can do activities like DNA ligation, splicing, RNA phosphorylation, etc. RNA World Hypothesis According to RNA world hypothesis, RNA was the first “living molecule” capable of carrying information and catalysis, the autocatalytic matter. We have some prove of it like present-day translation machinery, which is still RNA dependent, or RNA Polymerisation. All RNA information is copied from the DNA molecule by RNA Polymerase enzyme, but there are other enzymes that can use RNA such as the Reverse Transcriptase (in retroviruses) which copies RNA information to DNA, or RNA-Dependent RNA Polymerase (RNA replicase), which in some RNA viruses replicates the RNA. mRNAs Transcription In eukaryotes, there are 3 different RNA Pol enzymes, transcribing different kinds of genes. mRNAs coding proteins are transcribed by RNA Pol II, who has a tail and allows mRNA processing by PTMs. In a cell rRNA dominates because translation is the main mechanism in life, that is why two different polymerases, RNA Pol I and III, transcribe rRNA genes. RNA Pol I works in the nucleolus, while RNA Pol II leads the regulation because depending on which genes are expressed, we can have different type of cells. mRNA Processing As mRNA is transcribed, it has to go through certain modifications that distinguish mRNAs from the other kind of RNAs. -5’ Capping: the cap is a reverse attached guanine variant that is joined by a specialized enzyme to the beginning of the mRNA. It forms a binding site for translation initiation factors and allows mRNA circularization (to make sure that the RNA is intact it has to be in a circular formation, by joining 5’ and 3’). -3’ Poly-A tailing: the 3’ end of the RNA is poly Adenylated, by adding AMP residues (~200 nt) by Poly-A polymerase (PAP polymerase). Poly-A tails are useful because they can be targeted with oligo-dT primers allowing us to distinguish the mRNA from other types of RNA. *The cap and the tail protect new mRNA from exonuclease attack. The RNA goes under a quality check before leaving the nucleus through the nuclear pore complex, and if it is alright the initiation factor bind the mRNA and it would be ready for translation -RNA Splicing: in the protein, typically each exon codes for a different domain, and introns and exons are recognized by binding of different subset of proteins (introns are usually bind to hnRNP complexes). snRNPs recognize the end and the beginning of the intron (snRNAs recognize the signal on mRNA by base pairing), they assemble forming a loop which is cut and released. After that the exons are joined together. mRNA Decay An important step in translational control is the survival span of RNAs. Nonsense mediated mRNA decay (NMD) happens, where faulty mRNAs with a premature stop codon are found and destroyed in bodies (P-bodies). Here we have different paths: Deadenylation-dependent path (cap, tail and associated proteins removal), Deadenylation-independent path (cap removal), or Endonuclease-mediated path (endonuclease cuts in the middle of the gene and different type of enzymes degrade it). NONCODING RNAs Gene Annotation is a complex activity, but it allows everyone to know genetic differences between species like C. elegans and Homo Sapiens. The difference in complexity between them doesn’t relay on the number of protein coding genes, or even in the noncoding genes, it relays in the number of transcripts. Human beings can produce different transcripts and proteins from the same genes thank to alternative splicing. Gene annotation studies mostly depend on mRNA capturing assays, but it wasn´t sensitive to small molecule identification. Advancement of the technique and a paradigm shift were required for the discovery of noncoding RNAs (ncRNAs). As more research is conducted, blurrier the gene concept is getting. We now know that >80% of the genome transcribed; many genes overlap; there are alternative promoters for some genes; Multiple TSS; RNA editing; Alternative splicing; Trans- splicing (gene in chromosome 1 and gene in chromosome 2 and both are transcribed came together and are translated as one polypeptide; Antisense transcripts; and Single transcript with multiple genes. rRNAs rRNA genes exist in five clusters on the short arm of chromosome 13, 14, 15, 21, 22, each with hundreds of gene copies (nucleolus). 18S rRNA makes up the small subunit, while 28S, 5.8S, and 5S make up the large subunit. Moreover, 18S, 5.8S and 28S are transcribed by RNA Pol I into a single molecule, while 5S rRNA is transcribed by RNA Pol III, from a cluster of >250 gene copies. In mature rRNA, 10% of the bases are modified, it has more nucleotides than A, T, G, C, U: dihydrouridine, pseudouridine, inosine, N,N’-dimethylguanosine. tRNAs tRNAs decode 61 codons that specify 20 amino acids. But we don’t really have 20 amino acids, we have 21 which is Selenocysteine (different function). UGA is normally a stop codon but at high Selenium concentration, it inserts Selenocysteine residue instead. miRNAs MicroRNAs are small (22-nt), single-stranded RNA molecules that can bind target mRNA molecules by complementarity. They are obtained by processing a longer precursor molecule transcribed by RNA Pol II. So as mRNAs it also has a cap and a tail, but it doesn’t meet with the ribosome. it folds and joins the RISC proteins and together they search for complementary target mRNA. If they find an extensive match the mRNA is rapidly degraded, and if they find a less extensive match the mRNA translation is reduced, and it is sequestered and eventually degraded (while degradation the RISC complex is released). RNA Interference -siRNAs: small interfering RNAs are similar to miRNAs in size and function, but their precursor molecule is slightly different. Knocking down gene expression by siRNA is called RNA interference (RNAi), and it’s a popular method for knock down experiments. This is not its only function, in yeast is used for chromatin remodelling. -piRNAs: Piwi-interacting RNAs act in germline, to protect genome from mobile genetic elements’ mobilization and to induce heterochromatin formation to block the expression of parasitic DNA. TERC RNA TERC is telomerase reverse transcriptase RNA functioning at the telomeres. It’s synthesized by a single gene copy and it is used as a template to enlarge telomeres. snRNAs Small nuclear RNAs (snRNAs), 60-360 nt long, have role in assisting general gene expression, mostly at the level of post-transcriptional processing. Many of these RNAs are Uridine rich, and named accordingly (ex: U2 snRNA). Not all snRNAs in the nucleoplasm function as part of the spliceosome (protein complex that catalyses the removal of introns), for example U7 snRNA is important in histone mRNA 3’ processing. snoRNAs Some snRNAs are involved in post-transcriptional processing of rRNAs, and they are small nucleolar RNAs. scaRNAs Small Cajal body RNAs (scaRNA) are similar to snoRNAs, but they are confined to Cajal bodies, where snRNPs get maturated. Like snoRNAs, scaRNAs are located in the introns of the genes transcribed by RNA Pol II. There are other membraneless granules in cell that process RNA, like Processing bodies (P-bodies) and stress granules. circRNAs Circular RNAs are RNA molecules where 5’ and 3’ ends are covalently attached. They are circularised with a specialised alternative splicing method. They can have regulatory functions, like transcriptional pausing, parental gene expression, can act as a miRNA sponge, or can even code for proteins. Long Noncoding RNAs Many lncRNAs are thought to have regulatory role in animal cells, including antisense transcripts, and ncRNA capped, spliced and poly(A) tailed. Some are tissue specific, but functions are mostly unknown. XIST is a common example of lncRNA, that induce inactivation of one of the X chromosomes in the early embryogenesis. In this process not the whole chromosome is inactivated, there are two pseudoautosomal regions that X chromosome shares with Y chromosome. RNA EDITING Exception to central dogma, some information in RNA might not originate from the DNA. In some organisms, sequences can be inserted to or deleted from RNAs. In human cells, some nucleotides are observed to be edited. C -> U editing: Deamination in Apolipoprotein B gene, to create a stop codon (catalysed by APOBECs). A -> I editing: Deamination of Adenine is performed by ADAR enzymes. Inosine base pairs with Cytosine. Sometimes alters splicing. U -> C editing: Wilms tumor gene. LECTURE 7: GENE EXPRESSION 2 The definition of gene is an evolving field, maybe in several years we have a new definition. Nowadays, she’ll say that the definition of gene is: “A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript”. We have so many different functions encoded in one genome, but they are done by different structures and systems. The non-coding elements that are active in each tissue and cell type orchestrate gene expression, leading to the differentiation of the cells and the formation of different cell types and systems/organs. A differential gene expression is the expression of genes that can lead to the production of different proteins or either a different amount of production (more or less protein production). REGULATORY ELEMENTS There are several regulatory elements, and we can organize them by their proximity to a gene: -In genes: splice sites (allow to have different splicing isoforms from the same gene), transcription start sites and transcription termination sequences (sequencies that encode the beginning and end of the protein), and promoters. -Throughout the genome: enhancers, silencers, insulators, and non-coding RNAs. -Others: telomeres, centromeres, pseudogenes, repetitive regions (are thought to have an important function for example in evolution), transposable elements, etc. Promoters and Transcription start sites (TSS) TSS are the sequences where transcription begins (genes can have multiple), while promoters are primary regulators of gene expression. Promoters contain a core sequence that is bound by general transcription factors (TFs) and includes the TATA box, which is bound by TBP (TATA box binding protein). Every gene has a promoter, some genes have more than one, and they specify the transcription initiation site and regulate amount of expression. How do promoters work? Promoters contain DNA sequences that are able to “recruit” proteins that bring RNA Polymerase and initiate transcription. The higher the ability to recruit RNA Polymerase, the more RNA will be produced. Enhancers Enhancers can turn genes ON. They regulate gene expression (modulation of transcription level) and enhance the activity of promoters they are associated to. Can be located in introns, upstream or downstream their target gene. Their size can vary and their distribution can also vary (we can have some of them that work in isolation and they are called orphan enhancers and other that work in groups of transcription enhancers). Only some genes have enhancers, one gene can have multiple enhancers, and some enhancers can regulate more than one gene. Moreover, enhancers are more cell type and tissue-specific than promoters, can be active in only specific tissues, and during certain developmental stages (here is where the differential genetic expression between cell types and stages comes to a play). They are also important in human diseases because across all the non-coding DNA elements, they seemed to be the most enriched ones in disease risk DNA variants. How do enhancers work? Enhancers recruit “Transcription Factors” that bind to specific sequences on DNA. Those factors then “loop” to promoters (allowing the enhancer to touch the promoter) and aid in recruiting RNA polymerase. They can dictate gene expression patterns. Silencers Silencers are the conceptual opposite of enhancers they silence the activity of promoters they are associated to. They are involved in regulation of gene expression because they are responsible for the decreasing expression of genes. They can be inside promoters or distal, and they work by looping the promoter and preventing activation. Insulators Insulators define the regulatory domains in which either enhancers or silencers can regulate gene expression. They are a barrier in the way that they create a regulatory domain, which is the portion of the DNA that interacts with a certain gene and is responsible for its regulation. This would be to block the activity of an enhancer/silencer to spread (preventing its interaction with other genes that they should not regulate). It is common that they contain sequences of CTCF and other protein factors that are important for genome architecture. IMPLICATION IN HUMAN DISEASE Many monogenic diseases are caused by mutations in protein coding sequences, but some genetic variants associated with polygenic diseases (type 2 diabetes) often reside in non-coding sequences. Some monogenic diseases can also be caused by non-coding mutations. It is always easier to find a mutation in a coding sequence than in a non-coding one because in non-coding sequence we don’t have a code like in proteins, but also because the non-coding sequence is around 98% of the whole genome. What they do is that for a certain cell type or tissue you can create a regulatory map where you can overlay the positions of the DNA variants. You then prioritize which ones you would take to a lab. The most likely disease causing are enhancers and promoters. LECTURE 8: TELOMERE AND TELOMERASE TELOMERES In terms of history, in the 1930s, by using radiation to break the chromosome, two geneticists realized that the natural chromosome ends are different from the rest. Between 1950s and 1960s the end replication problem was also discovered. And finally, by the 1970s it was possible to sequence the telomere DNA and it was found that it was formed by short DNA repeats. So, now we know that the telomere is the end part of the chromosome which is formed by DNA short repeats (TTAGGG). But there is more, it is formed by DNA repeats (TTAGGG) and telomere binding proteins. These proteins form a complex called the shelterin complex which is formed by 6 protein subunits where the main ones are TRF1 and TRF2 which are binding the two strands together (TTAGGG- AATCCC). Then, Pot1 is binding the single-strand part of the telomere (G-strand overhang); TPP1 interacts with Pot1, and Rap1 is binding to TRF2. Finally, Tin 2 is the center and interacts with all the subunits forming the complex. The t-loop (similar to the d-loop in DNA repair) is formed when the G overhang comes back and goes in between the two strands of DNA. In this looping TRF2 and its binding protein Rap1 have an important role. Function -Disguise chromosome ends (marking the end) -Protect chromosome ends from recognition as DNA break -As a buffer zone to protect protein-coding genes from chromosomal attrition. This is a process where the telomeres are shortened after each round of cell division due to DNA end replication problem -Contribute to genome integrity and stability. DNA End Replication Problem In the replication, each strand of the DNA is separately replicated. The leading strand is directly synthesis in the 5’-3’ direction, while the lagging strand needs several primers for its synthesis forming the Okazaki fragments. In this case, after the removal of the RNA primers it results a gap which the ligase cannot join, so it leads to a shortened telomere. In the second division, the leading strand would be shortened, and it continues like this every replication. TELOMERASE It is a ribonucleoprotein complex that was discovered in 1984 and its main components are: -Catalytic subunit, TERT which functions as a reverse transcriptase -RNA subunit, TR/TERC which functions as a template for TERT allowing it to add new nucleotides based on this template -Dyskerin, an RNA binding protein to stabilise TR Its function is simple, the telomerase adds new repeats and then thank you to RNA primer, the DNA polymerase can synthesis the complementary strand. The telomerase is not constitutively expressed in all of our cells. During the development (zygote) the telomerase is almost inactive, in the morula phase the telomerase starts to be active, and it has a pick of activity at blastocyte phase. After that, its activity would gradually decrease. After birth, the telomerase is activated in a high level in germline tissues, it also has a positive activity in stem/progenitor cell compartments, but it is inactive in all the somatic cells. TELOMERE IN CELLULAR AGEING Cellular ageing is a biological process, characterized by a progressive loss of physiological integrity, leading to cell senescence (replicative senescence) which is a permanent proliferation arrest. These senescence cells are enlarged and have a flat morphology; they have an increased cytoplasm/nucleus ratio; they have accumulation of SA-B-galactosidase and lipofuscin; and they are lack of replication (the cell cannot longer divide). Hayflick Limit Theory Hayflick limit is the number of times a normal human cell population will divide. Hayflick hypothesis says that the cell ageing process is controlled by a biological clock that is contained within each living cell. Telomeres have been found as the biological (mitotic) clock for cell proliferation. Induction of cell senescence Multiple factors can induce replicative senescence, such as DNA damage, oncogene activation, or telomere dysfunction. It is also known that TERT or the catalytic subunit of the telomerase is the key factor for telomerase activity, so several studies have been carried out using hTERT. For example, there is a study where “ectopic expression of hTERT in human fibroblast extend telomere length and lifespan”. CELLULAR AGEING AND ORGANISM AGEING Cell is the basic building block of tissues and organs. Many tissues in our body require cell turnover to replace damaged and died cells: skin cell turnover occurs ~30 to 60 days in adults; intestinal epithelium turnover 3-5 days in small intestine; or even red blood cell lifespan is ~120 days. Cell turnover requires proliferation of stem/progenitor cells, so if we have cellular senescent, the tissue homeostasis will be reduced leading to organism ageing. This also relates to telomere; it has been shown that with age its length is reduced. There are several factors that affect telomere length such as genetic factors, sex (longer telomeres in women), paternal age at conception (offspring of older fathers have longer telomeres), development (the shortened is faster during early development), lifestyle and behavior (physical activity decelerates attrition, alcohol and smoking accelerate attrition), and environmental factors (inflammation, stress). CANCER One of the most evident hallmarks of cancer is their sustaining proliferative signalling. More than 90% human of cancers have active telomerase and stable telomere length, while the other 10% human cancers use alternative lengthening of telomeres. So, we can see that stable telomeres are necessary for cancer cells. In vitro they can produce a cancer cell from a normal cell by continuous activating the telomerase and mutating P53 and Ras. But in cancer the telomerase is not always activated, after replicative senescence some cells are able to escape to a state called crisis state where the telomerase is turned on and the cells would be immortal. LECTURE 9: VECTOR BASED GENE THERAPY FOR CNS DISEASES GENE DELIVERY SYSTEMS Vectors: 1. Adeno-Associated Virus (AAV): around 20nm of size; their genome is ssDNA; they have a capacity of around 4.7 kb; integrates with very low frequency; they are shown to infect neurons, astrocytes, glial, and ependymal cells. Their main application in CNS would be Parkinson’s, Alzheimer’s, Batten, and Canavan disease, but preliminary studies suggest that AAV vectors could also be used in diseases like epilepsy or Huntington’s. 2. Retrovirus: around 100nm of size; their genome is ssRNA; their capacity is around 8kb; they generally integrate with high efficiency; and are shown to infect neurons and astroglia cells. They are used in clinical trials for treatment of Parkinson’s and Alzheimer’s, but retrovirus vectors are being developed for Huntington’s and lysosomal storage diseases. *Lentiviruses are a group of retroviruses (ex: HIV-1) 3. Adenovirus: 70-100nm size; their genome is dsDNA; their capacity is around 36kb; their integration is minimal; and they are shown to infect neural, astroglia, and human glioma cells. They are not in clinical use for CNS gene therapy due to their toxicity, but they have been used as an anti-cancer agent due to their oncolytic potential. 4. Herpesvirus: around 186nm of size; their genome is dsDNA; their capacity is around 150kb; their integration is minimal; and they are shown to infect neurons. They are not in clinical use for CNS gene therapy due to their toxicity and production, but herpesvirus vectors are being developed for Parkinson’s and as anti-cancer therapy. Antisense DNA oligonucleotides: they are pieces of DNA that are complementary to a RNA sequence blocking its translation to proteins, reducing the disease. In the UK there are several clinical trials using them, in neurological diseases like Huntington. *We can also have cell therapies HIV-1 Lentivirus This Human Immunodeficiency Virus targets immune cells and causes slow diseases (“lenti” = slow), in this case slow takeover of the immune system. This virus was isolated and sequenced being able to understand its genomic structure. It has a 5’ and 3’ LTR that contain enhancers to drive expression, also three gene families (gag, pol, env). Gag encodes for the glycoproteins, the matrix, the nucleocapsid; the pol encodes enzymes like polymerase, reverse transcriptase, the integrase, the RNAse; and env encodes the external part (gp120). It also has non-structural genes that are important during the virus life cycle such as vif, vpr, vpu and nef. This results in a virion that contains 2 copies of single-stranded genomic RNA, integrase reverse transcriptase (RT), and protease (PR). The HIV-1 cycle begins with the envelope protein binds to immune cells and goes in. On its way to the nucleus, it gets reverse transcription into DNA through the RT. It goes into the nucleus trough the nuclear pore where it integrates into the cells’ DNA through the integrase parasitizing the cells. After this, the immune cells would create copies of the virus genome which would go out of the nucleus, translating its genes in the cytoplasm and creating the new virion which would go out of the targeted cell for a new infection. By doing this, through the years it is able to take over the immune control and suppressing it. Lentivirus vector production system We have packaging cell lines where we transfect four vectors with the different parts of the lentivirus genome (and adding our targeted gene in the fourth vector). Such cell lines would contain three vector components (Gag/Pol, VSV-G envelope and genome expression constructs). As the vesicular stomatitis virus (VSV-G) envelope protein is cytotoxic, its expression must be regulated. It is also desirable to regulate Gag/Pol expression to minimise metabolic burden on the cell. The Tet repressor (TetR) system was selected to regulate expression of VSV-G and Gag/Pol, necessitating the introduction of a fourth construct, encoding TetR, into the cell line. Adeno-associated virus (AAV) They have become the major viral vectors in clinical trials. They are for example used in hematopoietic disease where you introduce your targeted gene (correct sequence of DNA) via the vectors in stem cells and put them into patients. But it is necessary to use drugs to eliminated all the affected hematopoietic progenitors. AAV9 is the most used adeno-associated virus. CELL SPECIFIC TARGETING You can use specific promotors, envelop or capsid modification, or miRNA vectors (normally used with AAV vectors to target neurons) to make the vector target a specific type of cell. To introduce the vectors (LV/AAV) in the body are four main ways: intraparenchymal injection (administering substances directly into the brain parenchyma), intrathecal injection (administration for drugs via an injection into the spinal canal), intramuscular injection (administration of drugs into the muscle), and intravenous injection (administration of drugs into the systemic venous circulation; completely restricted to AAV, doesn´t work with LV). -Intraparenchymal delivery: in Parkinson’s disease there have been several gene therapy trials and one of them tried to deliver dopamine when it was missing. There were two main approaches, dopaminergic and neuroprotective approach. -Intramuscular delivery: they have tried to target motor neurons with lentivirus vectors by adding to the vector surface the hormone IGF-1which has a receptor at the neuromuscular junction (IGF1-R). -Intrathecal delivery: GAN (Giant Axonal Neuropathy) is an autosomal recessive disease mainly characterize by enlarged axons (because of loss of function mutations in GAN gene). They tried to treat it by intrathecal delivery of AAV9 containing the GAN gene -Intravenous delivery: they tried to treat kids with X-linked Adrenoleukodystrophy by delivering a Hematopoietic Stem Cell with a lentiviral vector. It was quite successful. GENE THERAPY TOXICITY ISSUES Adverse events are rare in gene therapy clinical trials but they have occurred as the number of patients treated increase. LV can produce insertional mutagenesis causing Myelodysplastic Syndrome. AAV are related to hepatotoxicity, thrombotic microangiopathy (TMA), and neurotoxicity. LECTURE 10: TRANSGENIC MOUSE MODELS Trans genesis implicates germline genetically engineered animals. A broad definition includes somatic modifications. In this case we are going to talk about genetically engineered mouse models or GEMM. THE PURPOSE OF TRANSGENIC MOUSE MODELS: -Study molecular basis of tissue-specific and stage-specific gene expression in the developing mouse -Investigate the phenotypic effects of altered gene expression (overexpression, misexpression, inhibition). -Introduce foreign DNA as insertional mutagen and characterize mutated gene TYPES -Gain of function: random insertion; knock-in permissive locus model (constitutive or conditional); reporter knock-in -Loss of function: knock-out mice (constitutive or conditional) *Conditional: with controls that permit it to be switch on in specific cell types; constitutive: knock-in /knock-out in the germline that would go free through the animal TRANSGENIC MOUSE PRODUCTION There are three main methods that allow to create a transgenic mouse: -Microinjection into fertilized oocytes: we obtain a mouse embryo at the 8-cell stage from a donor female mouse which is injected in-vitro with a retrovirus that contains the transgene. The embryo is implanted into another female which offspring would be screened for the presence of the gene. Specialised equipment for injection is needed. -Viral vectors into pre or post-implantation embryos: we obtain an oocyte from a donor female mouse which is fertilized, and then the DNA of the transgene is injected to the male pronucleus. The injected oocyte is implanted into a pseudopregnant mouse which offspring would be screened for the presence of the gene. -Targeting vectors into ES cells (embryonic stem cells), selection of targeted ES cells, implantation into early embryos: DNA with the targeted transgene is introduced into an ES cells culture. Then the ES cells that have incorporated the DNA are selected, expanded, and injected into early mouse embryos. The embryo would be implanted into a pseudopregnant mouse which would have a chimeric offspring. This offspring is mated with WT mice and after obtaining their offspring the homozygotes of the mutation would be selected. Targeted transgenic mice ROSA is a locus which is located far away from other genes so it doesn´t interfere, it gives you clean expression making the whole process faster. CRISPR/Cas9 GENE EDITING You can very specifically introduce double-strand DNA sequences, so it has revolutionised the way in which transgenics are made. They are trying to deliver Cas9 with vectors, but it is a really big endonuclease so it is difficult to delivered it just with them. TET-INDUCIBLE SWITCH (TET OPERON) One of the systems you can use is a switch in the transgene, where we have two options: 1. TET-OFF system: we have the TRE sequence (Tet operator sequence) and the tTA transcriptional activator binds to induce activity. But in the presence of the tetracycline (Tet; used like a drug) the trans activator is dissociated from the TRE and it switches off the gene. 2. TET-ON system: we need the tetracycline that will join the reverse- tTA (rtTA) allowing it to bind with the TRE sequence and activation the gene expression. Without the drug there is no expression. HUMANIZED XENOGRAFT MODELS OF CANCER: Xenotransplantation of human cancer into mice were really difficult because of rejection, so they have created humanized models to be able to implant the xenograft to do studies like immune response to the tumour. They are immunodeficient mice that are injected with isolated CD34+ hematopoietic stem cells obtaining humanized mice. SITE SPECIFIC RECOMBINASES: CRE-loxP is the mainly used one. It can be used in three situations: ·When we have a gene between two loxP sites, we can use the Cre recombinase to remove that gene (it would remove everything in between). ·If we have something that stops the expression of a gene between two loxP sites we can remove that stop sequence to induce the gene expression. ·DIO/FLEx vectors: we have doble loxP sites in both sides of an inverted gene we can add Cre recombinase in order to flip the gene. *We can also use FLP-frt and Dre-rox DREADDS: CHEMOGENETIC MODULATION It is an example of using drugs to modulate gene expression. You introduce mutations creating unique neural receptors which can either activate or inactivate when they bind to a specific drug. With this we can either hyperpolarised neurons inducing silencing or depolarised producing neuronal activation. OPTOGENETIC MODULATION They use Cre recombinase and transgenic animal. They use a toxin which takes away dopaminergic neurons destroying its pathway and creating a model of Parkinson. This can be pointing to a therapeutic direction for Parkinson disease. LECTURE 11: CRISPR Life in nature is highly competitive, and species evolved weapons against each other. In molecular biology and biotechnology, we often make use of the tools discovered in nature: -Fungi spread antibiotics to kill bacteria in their environment, by targeting the bacterial translation system specifically. -Bacteria fight common virus infections by restriction endonuclease. -Bacterial immunity, remember the previously infected viruses and fight with them via CRISPR (genetic scissors). The scientists have been performing direct genetic manipulation (genetic engenieering) via several methods, molecular tools adapted from smaller organisms. Many are slow, costly, and the genome might have several off-targets. -Restriction endonuclease: DNA scissors, adapted from bacteria. -UAS-Gal4: Drosophila, tissue specific ectopic expression. Adapted from yeast. -FLP-FRT: Drosophila, site specific recombinase, from yeast. -Cre-LoxP: Mice, site specific recombinase. Adapted from phage. -ZFN: Zinc finger nucleases, modified transcription factor. -TALEN: Transcription activator-like effector nucleases, adapted from proteobacteria. -CRISPR: Adapted from bacteria to several organisms. In 2012, it was discovered a molecular tool in Streptococcus pyrogenes bacterium and they proposed an alternative methodology based on RNA-programmed Cas9 that could offer considerable potential for gene-targeting and genome-editing application. In few years, CRISPR- Cas9 system started to be used worldwide being applied to different organisms and many derivatives of the system developed. MOLECULAR TOOL The CRIPSR system is an ancient bacterial adaptive immune system against viruses, consisting of two basic components: DNA sequence (previously infected phages’ DNA was captured inside the bacterial DNA); and a nuclease which selectively digests phage DNA when they infect again. In this system we can distinguish three parts: -CRISPR or “Clustered Regularly Interspaced Short Palindromic Repeats” is a sequence formed by 20-40 bp palindromic repeats interspaced by short non-repetitive sequences or protospacers (where pieces of viral genomes are kept). -The tracrRNA or trans-activating CRISPR RNA which is non-coding RNA matching the repeat sequences. -Cas operon which contains the Cas genes: ·Cas9: Interference machinery, endonuclease, digests the attacking phage DNA. ·Cas1, Cas2, Csn2: works to embed new phage DNA to the memory. They don’t have the purpose of attacking the phage; when there is another phage these enzymes take its DNA to keep it in the bacterial genome. The CRISPR array or crRNA (guide) is transcribed into a long pre-crRNA transcript, while the tracrRNA is transcribed separately and interacts with the repeat sequences by base-pairing (then it cuts them and be able to recognise the whole virus genome). Mature crRNA-tracrRNA engages with Cas9 to target phage DNA, which recognizes the viral genome because it is complementary to the crRNA (they don’t cut their own genome because the bacterial genome is marked, PAM sequence). The information used to degrade the phage would be passed to next generation. Cas Enzyme Cas9 is a multidomain enzyme which interacts with DNA opening up its double-stranded, where one of the strands would make hydrogen bonds with the crRNA (gRNA). This enzyme is a single protein which contains one helicase and two endonuclease domains (HNH & RuvC), each nuclease domain cleaving one strand of target DNA. -NUC lobe: Nuclease lobe, containing HNH & RuvC and the variable C-terminal domain (CTD) -REC lobe: Recognition lobe of Cas9 It is possible to deactivate Cas9 domains. If either HNH or RuvC are disabled we have a nickase which only cleaves one of the phosphodiester bonds having some freedom in the DNA. If both HNH and RuvC are disabled we have a dCas9 or dead Cas9. Target Sequence Where Cas9 targets on the genome depends on where the gRNA matches by Watson-Crick base pairing. In compared to restriction endonuclease, CRISPR-Cas9 increased specificity tremendously (RE: 1/46 vs CRISPR: 1/420 = 1 in 1000 billion). Practically, a 20-nt sequence is supposed to be unique in the human genome. However, our genome is full of repeat sequences, so depending on where you target, your gRNA might be complementary with several sequences. Also keep in mind, gRNA can bind with imperfect matching. PAM sequence (protospacer adjacent motif), downstream to crRNA (or gRNA), is important for both targeted destruction and uptake into CRISPR array (also prevents Cas digesting the bacterial chromosome). The PAM sequence is NGG for SpyCas9, cleavage occurring 3 bp upstream. The crRNA and tracrRNA could be brought together in a synthetic, chimeric, continuous 83nt “single guide RNA” or “guide RNA” (sgRNA / gRNA). In molecular biology, gRNA is usually transcribed from an expression vector where the target (crRNA half) sequence is cloned. So, when you need to use CRISPR you only have to design the 20 nt guide sequence, the rest is always the same. What exactly happens is that he doble strand DNA is locally melted by a helicase enzyme. The 20 nt targets the now single strand DNA by base pairing and 3 nt upstream of the PAM sequence a cut is made by the HNH and RuvC domain. Guide RNA The seed region is about 10 nt PAM proximal region (3’ end) and it is really critical for target selection, because a mismatch in this region is less tolerated. To avoid off-targets people usually design three different guides and use the more accurate one. Also know that it is necessary to check the cells/patients/animal, because sometimes SNPs can produce mismatches. An expression vector usually has a promoter for gRNA transcription, a continuous gRNA scaffold, Cas9 or dCas9, and accessory proteins for different purposes. CRISPR-BASED TECHNOLOGIES Modifying the Genome Cas9 creates double-strand breaks (DSB) where you need to insert two guides at the same time. After that the cell reacts by calling the repair machinery, where two things can be done: NHEJ which creates random indels and the deletion size is not controlled; or HDR where a desired template is supplied in parallel (you have to supply the donor DNA) and creates well-defined genetic changes, including deletion and substitution (only works in dividing cells). Derivates of CRISPR With dCas9 enzyme, RNA, agents or other enzymes can be targeted very specifically on a genetic locus. It’s a home delivery method for genetic elements, without digesting DNA. -CRISPRa is local transcriptional activation method. Transcriptional activators can be attached to dCas9 or gRNA scaffold allowing you to overexpress a gene. -You can do the reverse and inhibit transcription. CRISPRi is based on recruiting dCas9 to a locus and the enzyme with the gRNA can be used for transcriptional repression. It might not be efficient enough in eukaryotes, so repression domains like KRAB are added to the system. -The system can be employed for epigenetic modifications. Epigenetic modifying enzymes such as p300 or LSD1 can be attached to dCas9. Histone markings can be altered this way. -Certain loci of the genome can be revealed if fluorescent tags are attached to dCas9: fluorescent labelling. -Base editing: Cytidine deaminase enzyme can be fused to Cas9 nickase, creating cytidine to uridine conversion at a desired location. It can create a point mutation wherever desired. -Prime editing: Cas9 nickase is fused to reverse transcriptase. Prime editing guide RNA instead of sgNRA. It specifies the target site and and encodes the desired edit too. It can be used for precise deletions or insertions and all 12 points mutations. CRISPR APPLICATIONS -Create a random size deletion in a locus to disable a gene/element -Create a small or large deletion with known breakpoints in a locus (knock out / LOF) -Insert a reporter or a tag in a specific target (homology-dependent knock in) -Precise point mutations (transition or transversion) -Activate or silence transcription in the target locus -Change the epigenetic architecture in the target locus -Specifically mark the target locus (genome imaging) LECTURE 12: FRAGILE X SYNDROME Fragile X syndrome is a monogenic disease with a 1:4000 prevalence in male and a 1:8000 prevalence in female. Between its characteristics we can observe intellectual disability, attention deficit, anxiety, and autistic-like behavior among others. It is a progressive disease, where the person affected gets worse while ageing. It is diagnosed with the “Fragile X chromosome” which is shown as a break in the marker chromosome (Xq27.3) that becomes visible when the cells are cultures in folate-deficient media (the X chromosome is not actually broken). It is monogenic, X-linked disease originated from the abnormalities of FMR1 gene (Fragile X Mental Retardation 1; protein is FMRP). However, its inheritance doesn’t follow a typical X-linked pattern. In its pedigrees the number of affected individuals increases with the generations because of the two-step process or Sherman paradox. This paradox is based on a premutation that expands becoming a full mutation that shows the disease. Whether an individual will be affected or not depends on the number of CGG repeats (microsatellite) in the FMR1 locus. UNSTABLE MUTATION These CGG repeats are transcribed but not translated because they are after the transcription start (+1) but before the ATG (where the ribosome starts, translation start codon). The repeats are exactly located at the 5’ UTR and it is a case of microsatellite causing disease. The premutation is constituted by 55-200 CGG repeats (can be considered carriers), and when a patient has more than 200 repeats it turns into a full mutation showing the disease. The repeat number has a tendency to expand over generations, especially in the lagging strand where the polymerase can jump back or forward. If it jumps back the number of repeats would be increased and that is why this is called a dynamic mutation or anticipation. While talking about expression, it is important to know that the premutation increases the transcription rate (while decreasing translation rate), and the full mutation blocks any kind of transcription and translation, creating a total loss of function (LOF) of FMR1. This premutation doesn’t lead to a fragile X phenotype, but it can lead to Fragile X-Related Primary Ovarian Insufficiency (FXPOI) and Fragile X-Related Tremor/Ataxia Syndrome (FXTAS). CpG Methylation Expansion of CGG repeats to full mutation induces DNA methylation of 5’ UTR and the promoter; hypoacetylation of the histones; chromosomal condensation; and transcriptional splicing. These epigenetic marks would be inherited by next generations. *How could CRISPR be employed? Treat FXS by delivering a dCas9 to turn on the region, deleting some repeats (only able in embryos), or inserting a functional copy of the gene. Eve creating an animal model by a gene knock-out (adding CGG repeats doesn’t do anything in mice). RNA & PROTEIN & PHENOTYPE FMRP Protein This protein is formed by the NLS and NES sequences, K homology domain 1 & 2 (KH1, KH2), Agenet domains, and the RGG box (not a domain). If you see an NLS (nuclear localization signal) the protein will always go to the nucleus and with NES (nuclear export signal) the protein goes out, so with both you know that the protein is going in and out the nucleus (shuttling). The K domains are RNA binding elements and the Argent domains have epigenetic purpose. There are alternative splice products of the gene that can acquire different functions. *There are two single nucleotide mutation that produce Fragile X Syndrome too (R138Q, 1304N). The protein was identified together with the homologs (paralogs) FXR1 and FXR2. They are not identical copies so they can partially take its function but not totally. This could be a reason that explains why the disease varies between patients, because the compensation mechanisms changes depending on which homolog does it (FXR1, FXR2). Although FMRP is widely expressed in the human body, it is especially abundant in the brain and in the testes. It is mostly cytoplasmic, though it is also observed in P bodies, Cajal bodies and stress granules (RNA aggregates, they store RNA). Neural Pathogenesis Neurons are big cells and each part of them needs different mRNAs and the protein that transfers these RNAs is FMRP. Loss of FMRP produces enlargement of caudate nucleus which are important for executive functions and repetitive behavior. This loss mainly affects the synapse maturation (the synapses might be short and thick, but the immature ones are long and thin), but the neurons are intact. That is why patients have problems with learning (mental retardation), because there is not information passed between neurons. If there is not FMRP there would not be actin in the dendritic spine, so it would loss its shape. FMRP protein also interacts with several glutamatergic and GABAergic pathways (mTORC pathways, ERK pathway, activation of K+ channels, endocannabinoid system) which would be disrupted in Fragile X patients. MODEL ORGANISMS To build them we need the ortholog protein (same protein than FMRP but in other organism), that is why instead of comparing DNA sequences, to create the animal model protein sequences are compared. While mammalians have three paralogs (FMR1, FXR1, FXR2), Drosophila has a single ortholog, dFmr1 (in the Drosophila model the whole gene is removed). Loss of FMRP protein causes enlargement in synapses in model animals. LECTURE 13: THE HUMAN REGULOME Regulome refers to the whole set of regulatory components in a cell. Those components can be regulatory elements, genes, mRNAs, proteins, and metabolites. So, we have just one genome, but multiple regulomes. Can epigenomic information be harnessed to understand the genetic basis of a disease? To answer that, we are going to first talk about the case of monogenic diseases like mendelian types of diabetes. Monogenic diabetes is often neonatal, could be autosomal dominant or recessive, is less common than other types of diabetes, and often is related to problems in pancreatic development or functional defects (also insulin resistance). IDENTIFYING CAUSAL MUTATIONS FOR MONOGENIC DIABETES The study of Weedon et al (Nat Genetics 2014) might be a good example. They started doing whole genome sequencing in 2 consanguineous families and they identify 3 million variants. By excluding heterozygous and common variants they ended having 2.8million of variants from which they select the variants located in regulatory regions (7 variants) to identify if they produce any disease. They finally obtained 1 candidate enhancer variant in a shared locus (pancreatic agenesis). They discovered that mutations in this enhancer were the most common cause of isolated pancreatic agenesis (extremely rare anomaly which results from defective pancreas formation). Pancreas agenesis is a monogenic disease, so the next question was if this approach could also be taken for complex/polygenic traits or diseases. IDENTIFYING CAUSAL MUTATIONS FOR POLYGENIC DIABETES In this case the example would be Type 2 diabetes which often has late-onset, is often related to obesity, is polygenic, is affected by environmental factors, is the most frequent form, and it is characterised by β-cell dysfunction + insulin resistance. There are currently >700 loci that carry variants that associate with Type 2 diabetes (T2D) susceptibility. Most common SNPs associated with increased risk of developing diabetes are noncoding SNPs which reside within pancreatic islet transcriptional enhancers. Enhancers are bidirectional (can regulate genes which are upstream and downstream) and can regulate several genes simultaneously. Also, they can also activate genes that are separated by a long distance from the enhancer (around a mega base) because the genome is a highly dynamic structure. This dynamicity produces looping, which allows enhancers to get next to genes that are far away (Enhancers can regulate genes that are far in the linear genome, but close in the 3D space). HiC methods High-throughput conformation capture (HiC) methods allow mapping of physical interactions between distal genomic regions, it tells you where in the genome the parts that are connected during looping come from. Using promoter capture Hi-C, we have assigned ~80% of human islet enhancers to target genes. Cas9 protein You can use CRISPR-Cas9 technology to cut a whole enhancer and look which are the expression changes of the genes that enhancer regulates. But by cutting the whole enhancer you can cause a whole new range of other unwanted effects that can also impact gene expression. So, what they do these days is something called CRISPR activation or inactivation. You attach activators or repressors to the Cas9 and you are able to change the activity of the enhancer and then analyse the change in the genes that the enhancer affect (using dCas9). Polygenic risk scores Score that is given to a person or population based on which genetic variants they have. They can help us to know their probability of having a disease (disease risk) (they can also be used for non- disease things). In the case of type 2 diabetes, we can see that the variants located within these enhancers actually make up a very small subset of the total variants that have been identified. But only with that subset of data (only looking to 1.6% of the genome), we can obtain an informative polygenic risk score. TAKE HOME MESSAGES -Integration of chromatin interaction maps with (epi)genome editing enables association of disease-relevant enhancers with their target genes. -Enhancer-promoter hubs reveal additional complexity of the genetic architecture of T2D, with hub enhancers often regulating multiple genes in islets. -Sequence variation within enhancer hubs contributes to T2D susceptibility, and associates with islet cell dysfunction.