Lecture 3 – Genomes PDF
Document Details
![WellRunMint8812](https://quizgecko.com/images/avatars/avatar-7.webp)
Uploaded by WellRunMint8812
Anglia Ruskin University
Tags
Summary
This lecture covers essential concepts in genomics, including the structure and function of genomes, DNA sequencing techniques like Sanger and Next Generation Sequencing (NGS), and the organization of the human genome, including repetitive sequences and non-coding DNA. It also discusses the implications of NGS in clinical applications and genomic epidemiology, providing a detailed overview of recent developments in the field.
Full Transcript
Lecture 3 – Genomes The genome of any organism is the complete set of genetic material for that organism. This will be DNA for all organisms, except for some viruses that use RNA. In eukaryotes, the genome usually refers to all the DNA found in nuclear chromosomes (i.e., doesn’t include mitochondria...
Lecture 3 – Genomes The genome of any organism is the complete set of genetic material for that organism. This will be DNA for all organisms, except for some viruses that use RNA. In eukaryotes, the genome usually refers to all the DNA found in nuclear chromosomes (i.e., doesn’t include mitochondrial DNA). Genomes are the dynamic information centre of the cell and effectively contain the instructions for assembly and maintenance of the organism. How those instructions are read in our cells is a major research enterprise worldwide in both basic biology and clinical medicine. What is a eukaryotic genome? Genome = one set of chromosomes (in us, = 23). Eukaryotes are diploid as we have a pair of each chromosome (2 x 23 = 46) Two ways to think about any genome, including the human genome, are in terms of structure and function. In structural terms: obvious questions are how big is the genome (how much DNA) and how is it physically organised into genes and chromosomes. In functional terms, we want to know how many genes there are in the genome, what they do and how they interact with one another. 1. Extracting information content of the genome – Genome/DNA sequencing The advent of DNA sequencing in the 1970s has revolutionised Molecular Biology. Two DNA sequencing techniques were developed to read off the order of nucleotides in a given fragment of DNA. 1. The first technique for sequencing DNA = chain termination method by di-deoxynucleotides - was developed by Fred Sanger at the MRC Laboratory of Molecular Biology in Cambridge (earning him his second Nobel prize; the first was for protein sequencing when he was in the Biochemistry Department). 2. The other method developed by Maxam and Gilbert at Harvard is of historical interest (i.e., was never used very much), for which Wally Gilbert shared the Nobel Prize with Sanger.) Figure 1. The di-deoxynucleotide (ddNTP) used in Sanger sequencing. The chain termination technique The Sanger method relies on the fact that DNA polymerase can incorporate 2’, 3’- dideoxynucleotides (ddNTPs) into growing DNA chains. o However, as the dideoxy does NOT have a 3’ OH group, it terminates the chain (hence the name, chain termination sequencing) – act as terminators The original technique (first used to sequence a small viral genome) – SEE DIAGRAM BELOW! § DNA melting to separate complementary strands = template § E-coli enzyme = fragment of polymerase I makes copies of the template strand § DNA of interest was primed by a synthetic oligonucleotide. § Four reactions set up: ALL contained dATP, dCTP, dGTP and dTTP in the presence of radiolabel – these are the nucleotides which elongated the primer and formed a sequence However, each reaction was spiked with a small amount of either 1. ddATP, 2. ddCTP, 3. ddGTP or 4. ddTTP. § This results in the 4 reactions containing populations of new DNA chains randomly terminated at either A, G, C or T residues by the ddNTPs (where the dNTPs would have been) Each is of a differing length where it has been terminated à one of 3 bases, one of 4 bases, one of 5 bases, ect. à the total number of different lengths = number of bases of the DNA being sampled (- the primer). These can be resolved by gel electrophoresis, and ordered from smallest (first to be synthesised) to largest -> based on the reaction it was from, DNA can be sequenced. o Eg. Based off autoradiogram, smallest à largest = CCATCGTTGA, which is complementary to GGTAGCAACT of the template stand As the DNA fragments are radiolabelled, the gel is exposed to X-ray film to visualise the DNA bands and the sequence simply read from the bottom to the top of the autoradiogram, as shown in the figure below. Figure 2. The Sanger sequencing method. The modern approach to Sanger sequencing 1. ddNTPs conjugated with specifically- coloured fluorescent markers, instead of radioactivity. a. This enables a sequencing reaction to be assembled in one tube and detection to be automated. 2. Use fine capillaries for separating the DNA fragments, rather than large gels. Besides the move to fluorescent nucleotides and the use of capillaries rather than traditional gels, sequencing techniques did not fundamentally changed in thirty years, other than in the scale of the projects. o This has set a time and cost limit on what you sequence and how much you sequence. o The Human Genome Project (1990-2003) that provided the first draft of the human genome at great effort and cost (> 10 years, and 2.7 billion dollars) used Sanger sequencing. Next Generation Sequencing (NGS) o Whilst the Sanger method remains the gold standard for accuracy, DNA sequencing was revolutionised in the 2000s by the development of multiple techniques for rapid, cheap sequencing, known collectively as Next Generation Sequencing (NGS), based on novel technologies that increased tremendously both throughput and speed. o The main difference between Sanger and NGS = Sequencing throughput o Sanger does a single DNA fragment at a time, but NGS is massively parallel sequencing millions of fragments (25-30 bases) simultaneously per run § Massive parallel (NGS): reads are then computationally assembled into chromosomes and genomes through mapping the reads onto the reference genome produced by the Huan Genome Project – demanding for storage space and power o Advantages of NGS: With the availability of large storage space and powerful computers, NGS sequencing has become very cheap, delivering a human genome overnight for < $1000. o These new sequencing methods are being used in revolutionary new scientific and clinical applications, to study human genetic variation, understand the genetic basis of disease and tracking pathogens in the world-wide population. o Example = Illumina Implications of NGS technology Ability to catalogue human genetic variation: o Genetic variants are discovered through mapping the sequence reads onto the reference genome from the Human Genome Project Clinical implication of DNA sequencing – personalised genomic healthcare: o NHS is an international leader in correlating genetic variants with disease such as cancer (e.g. precision oncology – derived from 13,880 tumours from the 100,00 genomes project) to make genomics part of routine healthcare Genomic epidemiology: o Rapid identification and tracking of pathogen outbreaks, and sequencing based taxonomy of pathogenic bacteria/viruses Privacy issues for indiviuals and society: o Companies such as DNA Complete take people’s personal data – can this be exploited? 2. Genome content and size Some important information about size and complexity of DNA and genomes was obtained before the advent of sequencing, by ingenious biochemical experiments that studied the rate of reannealing of DNA molecules. See lecture 2 REVERSIBLE DNA denaturing/melting: In solution, raise the temperature beyond DNA melting temperature à molecule can separate into its component strands Depends on the length of DNA and its sequence, as well as the ionic composition of the solvent and base composition o Ie. G:C-rich sequences will have higher melting temperatures than A:T-rich ones, because of the different number of hydrogen bonds holding together the base pairs. Figure 3. DNA melting – the structural conversion from double helix to separate strands. DNA melting can be induced by temperature and is reversible. Hybridisation: Reannealing rate = function of complexity: decrease temp < melting temp à strands of DNA can specifically pair up in solution again The rate at which it anneals (finds the complementary sequence) is inversely proportional to its complexity = abundance of unique sequence features o Simple homogenous DNA à more repetitive DNA is à less unique à less complex à greater probability of finding complementary strand à higher rate of reannealing Complexity in more detail Eukaryotic DNA = fractions of differing complexity: Reannealing studies of higher eukaryotic DNA revealed multiple phases, suggesting that a range of different sequence compositions were present, ranging from abundant, simple repetitive DNA to unique sequences. o Fortunately, things have moved on a long way since these important, but rather low resolution, studies. As discussed above, it is now possible to determine the precise DNA sequence of even highly complicated genomes by direct sequencing. The C-value paradox: DNA content does NOT correlate with organism complexity o Notably, there is a lack of correlation between genome size and organismal complexity, § Simple organisms, such as bacteria, have relatively small genomes, 4.7 x 106 DNA bases in the case of E. coli. § More complex organisms, such as yeast (a single-celled eukaryote) have bigger genomes (13 x 106 bases), and multicellular organisms have even bigger genomes: the human genome has >3 x109 bases. o Example; the genome of some unicellular eukaryote dinoflagellates, responsible for harmful algal blooms, can be an order of magnitude larger than ours. Gene number and density: Comparisons of gene number yields even more surprises. Yeast, with 13 x106 bases of DNA, has 6000 genes Humans, with 3 x 109 bases of DNA (230 times as much DNA as yeast) BUT has only 20,000 genes (just over 3 times as much as yeast) Worm C. elegans with 1 x 108 bases has 19,000 genes o Therefore, number of bases does not correlate with number of genes o Larger genomes don’t always mean more genes, and more complex organisms don’t need more genes than smaller ones ! Ensembl 3. Organisation of the human genome – repetitive sequences Non-coding DNA Following on from end of section 2, Analyses of complex eukaryotic genomes reveals that very little of the genome encodes genes, (regions of DNA that make functional RNAs). o For example, the human genome is about 230 times as big as the budding yeast genome, yet only encodes 3.3 times the number of genes that yeast does. o It is not that human coding sequences are any bigger than those of yeast, rather the human genome contains large amounts (98.5 %) of non-coding DNA without a clearly defined function = junk DNA. § This is an overestimate, because as you will hear in your RNA and transcription lectures, humans have complex requirements for gene regulation and some of the non- coding DNA contains regulatory sequences that direct expression of the coding regions – the role of non-coding DNA is controversial o Nevertheless, at most 25% of the human genome is the “footprint” of a gene. Thus at least 75% of the genome has no known function. § A small part of this will code for the several classes of functionally important NEW non-coding RNA that have been identified, such as lncRNA (long non-coding RNA), miRNA (microRNA) (see lecture 1) – but this still leaves a lot of unexplained DNA. Repetitive elements in the human genome Classed into two categories: Tandem repeats: short nucleotide stretches in head-to-tail arrangement àààà o Telomeres, centromeres, satellite DNA Interspersed repeats: mobile genetic elements that can move around the genome o transposons or Alu repeats Segmental duplications = low-copy repeats: ranging from 1 to 400 kb that typically share a high level of (>90%) sequence identity and represent ~5 % of our genome. Heterochromatin vs Euchromatin: Heterochromatin = densely packed regions = regions of the genome that are not transcribed (or very little). o Heterochromatin = contains sequence repeats o Two major region of heterochromatin (in eukaryotes): 1. Telomeres (the ends of linear chromosomes) 2. Centromeres (region of attachment of the two sister chromatids, forms a structure that is bound by microtubules in mitosis). a. Centromere position determines the length of the two arms of the chromosome. i. By convention: shorter arm = ‘p’ (after ‘petit’) and the longer arm = ‘q’. b. Centromeres contain large tandem arrays of a repeat known as satellite DNA (171 bp long). Euchromatin = less tightly packed regions of the genome = typically contain transcribed genes Figure 4. Telomeres and centromeres in the mitotic chromosome. Telomeres: The ends of linear chromosomes (Tandem Repeats) Background info: The issue with linear chromosomes: While leading-strand synthesis can proceed continuously along to the end of the chromosome, the lagging-strand synthesis mechanism means that the very end of the chromosome either isn’t replicated or is left with an RNA-DNA duplex. o If this is not compensated for then each successive generation would see a progressive shortening of the chromosome and potential loss of genetic information, as the RNA part would not be replicated = a gap at the 5’ end would form Figure 5. The problem of the ends of linear chromosomes – DNA replication would leave a short RNA-DNA hybrid at the end of one of the two copies of the chromosome. The solution = telomeres: To overcome this, eukaryotes have unique DNA structures at the end of their chromosome – telomeres. Functions (2): 1. Buffer during replication – prevent important genetic information being lost 2. Protective capping – prevents end-to-end fusion, degradation and recomination The telomere sequences These are short DNA sequences (TTAGGG in humans and vertebrates) repeated about 1000 times to produce a protective cap for the chromosome end. o This sequence is added by an enzyme called telomerase. 1. Telomerase contains BOTH a protein and an RNA component. The RNA component actually serves as a template for DNA synthesis à telomerase a self-templating reverse transcriptase o Reverse transcriptases = class of enzyme that uses RNA to make DNA. § Reverse transcriptases are found in many types of RNA viruses, including HIV and RNA intermediate viruses (Hep B) – used by telomeres and retrotransposons § Overturned the Central Dogma of molecular biology, as proposed by Francis Crick, that stated that DNA makes RNA makes protein. 2. Telomerase is ONLY active during embryonic development Somatic cells lack telomerase activity à hayflick limit = the critical length which acts as a signal to stop dividing à therefore only undergo a limited (25-30) number of divisions o Cellular aging and senescence: Does the decreasing size of the telomeric region in somatic cells act as an molecular clock? o Telomeres and cancer: Cancer cells can reactivate telomerase activity à maintain telomere length à hayflick limit not reached à contribute to immortality § Inhibitors of telomerase = anti- cancer drugs? o This repetitive sequence of DNA can adopt some unusual structures: 1. G-quadruplex (G4) - four GGG triplets come together to form three stacked planar G quartets held together by Hoogsteen hydrogen bonds. (based on non-Watson/Crick base-pairing) 2. T loop = a loop where the 3’-verhand folds back and invades the double stranded region of the telomere to closes off the end of the chromosome T loops confer stability and protection to telomeric DNA The purpose of these structures is not clear, but they may act to prevent the exposed ends of the chromosomes from being mistakenly recognised by the cell’s DNA damage response systems as damaged DNA.. Figure 6. Telomerase is a reverse transcriptase enzyme that combines a short RNA with a protein to add multiple copies of a short DNA sequence to the ends of chromosomes. Figure 7. DNA structures at telomeres. On left, a drawing of G4 DNA, formed four, closely psaced repeats of G trpiplets. On the right, micrographs showing examples of T-loops. In a T loop the chromosomal overhang reaches back and invades the double-stranded region of the telomere. The dynamic genome – transposons (Interspersed Repeats) The repetitive sequences mentioned above emphasise that the genome is not static but is undergoing constant change. o Many of the repetitive sequences in the human genome are derived from fascinating features of DNA called mobile genetic elements = interspersed repeats Mobile genetic elements = regions of DNA that can move around and insert in other parts of the genome. Have a defined size/structure: flanked by short repeats and often contain protein-coding genes (needed for their transposition) By doing so, they can cause mutations, depending on where they insert. The first examples of mobile genetic elements were found by Barbara McClintock in maize and popularly referred to as ‘jumping genes’. Classifications of mobile genetic elements Class: Two broad types of mobile genetic elements are known: 1. DNA only Transposons (class II) = DNA transposons are regions of DNA that are moved around the genome by an enzyme called a TRANSPOSASE (for DNA transposons) a. 2-3% of genome b. Example: DNA Transposon Tn3 (antibiotic resistance) In bacteria, many transposons carry antibiotic resistance genes and so can propagate antibiotic resistance within and even between strains and species. 2. Retrotransposons (class I) DNA is transcribed to RNA, and then the RNA is reverse transcribed to DNA, which is then inserted back into a different part of the genome a. 42% of genome Activity – are they still mobile?: o Active – can still mobilise o Inactive (majority) = transposon fossils Autonomy: o Autonomous – can encode all proteins needed for mobilisation o Non-autonomous – cannot encode all proteins needed for mobilisation Transposition DNA transposons mechanism: They express one enzyme (transposase) that performs transposition o Transposase process: recognises Terminal Inverted Repeats (TIR) flacking the DNA transposon à catalyses both its excision from the original position à integrates into new site RNA transposons mechanism: Transposons are mutagenic and causes several diseases in humans, including forms of muscular dystrophy and haemophilia. However, transposons may also be useful. They may be beneficial on an evolutionary timescale by facilitating shuffling of coding sequences or moving genes so that they are under the control of a new promoter (thereby changing where and when it is expressed). It may be that transposons have helped generate our immune system diversity system. o The RAG recombinases that mediate antibody diversity generation are related to transposon-encoded enzymes and may therefore be derived from an ancient transposon. Transposition may facilitate the appearance of new genes and new patterns of gene expression within a species à evolution! Lecture 4 – Working with DNA DNA is relatively easy to purify from cells and tissues, but how do we make sense of DNA? We can study it functionally (ask what it does) and we can read it, to see if we can interpret its sequence. To do so, we need to be able to manipulate DNA: cut it into segments, paste DNA molecules together, copy it and read it = Recombinant DNA technology We can therefore use this to elucidate the function of a gene or region of a genome All of these things are possible and in common research and clinical use, mainly due to some simple properties of DNA and knowledge of the enzymes that can manipulate DNA. 1.The physical properties of DNA strands: denaturing/renaturing (DNA melting and hybridisation) DNA melting and hybridisation: As introduced in lecture 1, IN SOLUTION the two strands of DNA can separate in solution, AND likewise complementary strands of DNA can be renatured when cooled (sometimes called re-annealing) in solution depending on: 1. Temperature 2. the length of the dsDNA strands 3. their base composition (GC content) 4. the ionic strength of the solvent. DNA hybridisation of dsDNA is the backbone of many techniques in molecular biology, including the polymerase chain reaction (see below), genotyping by in situ hybridisation, and Southern blotting and FISH (fluorescence in situ hybridisation) Hybridisation is also the technical basis of some of the newer DNA-based technologies, such as DNA microarrays (also called DNA chips). Figure 1. Clinical uses of FISH. Screening for the chromosomal BCR/ABL t(9;22)(q34;q11) translocation, causative of Chronic Myeloid Leukemia, with fluorescent probes for Abl and BCR. Optical properties of DNA Another important physical property of nucleic acids (DNA (and RNA)) is its strong absorption of UV light (with a DNA peak at 260 nm due to the aromatic bases o This allows for easy measurement of DNA concentration in solution and also is a means of following the melting and re- annealing process § Due to the hypochromicity effect: base stacking in the double helix reduces the UV absorption lower in the double helix compared to the denatured state 2. Cutting DNA – restriction endonucleases The bacteria restriction-modification system – the CLEAVING mechanism (Aspect 1) Restriction-modification system Bacteria have an anti-viral defence mechanism that uses enzymes that digest viral DNA = DNA restriction endonucleases, to destroy incoming viral DNA. Will cut BOTH strands of the sugar-phosphate backbone = double strand break à cleaving the viral DNA and destroying it. They can form either: 1. Blunt ends: cut both DNA strands = EcoRV enzyme 2. Sticky ends: cut each strand in different positions, thereby generating overhanging stretches of DNA = EcoRI enzyme a. 3. 4. Figure 2. Restriction enzymes can cut the two strands of DNA in different ways Classes of restriction endonucleases There are hundreds of restriction enzymes known, grouped into four classes: I, II, III and IV. o Type II restriction enzymes are the most widely used for molecular biology applications (over 3500 identified) § Recognise sequences that are commonly palindromic and can range in length from 4-8np long of bases à form homo-dimers à cleave within recognition sequence. Enzymes distinguish between bacterial and viral DNA because they can 1. recognise specific DNA sequences, which may be absent from the bacterial genome, and also are 2. specific for either methylated or non-methylated DNA (part 2 of defence mechanism ) Uses of REs Restriction endonucleases have many uses: 1. Cut and paste DNA cloning 2. Mapping complex genomes a. Used before DNA sequencing to analyse unknown regions of DNA 3. Genotyping (Restriction fragment length polymorphisms (RFLPs) a. Study of genetic differences between individuals 4. Making designer pieces of DNA in the test tube a. By cutting and pasting bits of DNA together that have compatible sticky ends. 5. Making DNA constructs - depends on another enzyme that we met in the context of DNA replication, DNA ligase, which stitches together fragmented DNA. DNA molecules that are the result of laboratory manipulation are generally referred to as recombinant DNA and the set of related techniques to manipulate DNA in this fashion is known as recombinant DNA technology. The bacteria restriction-modification system – the MODIFCATION stage (Aspect 2) The second aspect of the defence mechanism is the modification of the bacterial DNA to protect it from cleavage by the endonucleases Enzymes distinguish between bacterial and viral DNA because they can 1. recognise specific DNA sequences, which may be absent from the bacterial genome, and also are 2. specific for either methylated or non- methylated DNA o So bacteria protect their genome against restriction endoinucleases from part 1 by DNA methylation (epigenetic modification) at adenine or cytosine residues within the cell) Therefore, bacteria modify the viral DNA so they can selectively cleave the viral DNA à restricting the viral infection. DNA methylation DNA methylation is performed by DNA methyltransferase enzyme o Bacteria have many methyltransferases that methylate Adenine (position N6 (6mA) is most common) within restriction sequences 3. Visualising DNA – gel electrophoresis and Southern blotting. Gel electrophoresis Another useful property of DNA is that it is negatively charged (one negative charge per phosphate = polyanion) à chemically relatively stable in solution. This property allows DNA to be separated by gel electrophoresis = based on its size/length o In the presence of an electrical field in a porous matrix (eg. Agarose, polyacrylamide gel), whereby DNA runs towards the +ve electrode (See Mott lectures) Visualising DNA (after separation): directly in the gel by dyes/stains that bind double-stranded DNA (between base pairs) à produce sriong fluoresce signal under ultra-violet light. o The commonest of these is ethidium bromide, which intercalates into DNA. BUT… ethidium bromide is also a mutagen and probably a carcinogen à now use safer, more sensitive flurophore molecules Southern Blotting = DNA visualisation DNA can be transferred from the gel to a nylon membrane, and then analysed by hybridisation with a labelled, complementary DNA probe à used to to see the location DNA you’ve separated by gel electrophoresis and whether it contains specific sequences that you are looking for. o We also have the Northern blot = RNA and Western blot = proteins. Figure 3. DNA separated by size by gel electrophoresis, stained with the fluorescent dye ethidium bromide. 4. Making and amplifying DNA – harnessing microbial genetics Besides being able to cut and paste bits of DNA together, the final critical element in manipulating DNA is the ability to propagate that DNA without any errors. To do this, we can take advantage of half a century of microbial genetics and their bacterial plasmids. The commonest way to make lots of copies of your favourite piece of DNA is to cut and paste it into a bacterial plasmid = circular molecule of extrachromosomal DNA. o Depending on the bacterial strain, can be present in hundreds of copies per cell. Process: restriction endonuclease sites cut plasmid open à plugging your bit of DNA into a plasmid with SAME STICKY ENDS (remember, prokaryotes = single origin of replication) à DNA ligase to ligate DNA into plasmid à introduce it into appropriate bacteria à DNA will be copied and amplified and easily purified back from the growing bacteria. o Used in antibiotic resistance (bad) Figure 4. A typical circular plasmid used to amplify DNA in E. coli, containing an ampicillin resistance gene (green) and a region with many unique restriction sites for inserting the DNA you wish to amplify (Polylinker). Bacteriophage (less commonly used alternative to plasmids), viruses that infect bacteria - carry the bit of DNA. Again, you use restriction enzymes and DNA ligase to cut and paste your piece of DNA into the phage genome so it gets copied along with the virus when the virus infects bacteria. The downside (to all approaches): there is a limit to the length of the piece of DNA you can propagate in a plasmid or bacteriophage - typically around several thousand base pairs. There are more specialised systems for manipulating larger fragments of DNA that you will hear about in your lectures next term – yeast and bacterial artificial chromosomes (YACs and BACs). 5. Amplifying DNA in a tube: PCR – the polymerase chain reaction (better than microbes) Polymerase chain reaction (PCR): makes large amount of a defined region of DNA in a test tube, WITHOUT having to use bacteria A big step in making working with DNA easier Highly sensitive (single molecule amplification) Figure 5. First two cycles of a PCR reaction. Requirements for PCR 1. Specific pair of single stranded short (20b) DNA primers (oligonucleotides) a. anneal to the sequence at the ends of the stretch of DNA you wish to amplify b. As in DNA replication, these primers provide a free 3’ end of DNA that DNA polymerase can extend from c. One key aspect of the reaction is that the primers are present in large molar excess over the template. Thus, when one melts the template DNA and then lowers the temperature, the primers hybridise specifically with the targeted region of the template DNA. 2. Heat stable DNA polymerase = TAQ POLYMERASE a. Comes from thermophilic micro-organisms, such as Thermus aquaticus, that normally grows at high temperature (i.e., in hot springs). b. Is able to withstand heating and cooling = thermostable, an essential property to make PCR practically useful. i. Most polymerase proteins (such as E. coli DNA polymerase) denature at the >90oC temperatures used for PCR (when denaturing DNA) 1. Inefficient process: as the enzyme would have to be added again every cycle, due to inactivation of enzyme after a cycle. ii. With Taq polymerase, the enzyme is added at the beginning of the PCR process and stays active through >40 cycles of heating and cooling – more efficient 3. Template DNA a. A region of DNA up to several 1000 bases long can be targeted and selectively, exponentially amplified. 4. Supply of nucleotides (dNTPs) 5. A buffer The PCR cycle: done by cyclically varying the temperature PCR proceeds in a series of cycles à each cycle, number of target DNA molecules is doubled by copying. o Each step doubles the number of target DNA molecules, that gets exponentially amplified according to the relationship: 2n, where n is the number of cycles. Each cycle has three steps: 1. DENATURE: high temperature (>90oC) to separate the double- stranded DNA (melting) 2. ANNEAL: a lower temperature step (typically around 55-60 oC) to allow the primers to hybridise to their target sequences (annealing) 3. ELONGATE: 72 oC step to allow the heat-stable DNA polymerase used to make the DNA copy (extension). Clinical/research applications of PCR Useful anytime you want to detect specific DNA sequences Virology – diagnose infectious disease in patienst Oncology - detect pathological events such as mutations for cancer Clinical genetics – study genetic variation o DNA sequences act as unambiguous identifiers for forensics – DNA sample from sperm, hair or blood can be amplified to locate donor (used in courts/crimes) 6. Emerging technologies – CRISPR/Cas9 A recent technological advance that is revolutionizing molecular biology and holds great promise for treatment of genetic conditions is Clustered Regularly-Interspersed Short Palindromic Repeats = CRISPR/Cas9 system o The technology is adapted from a form of recently-discovered immunity present in Archaea and some Bacteria, that protects them against viruses. Two main components: o 1. CRISPR sequences: short viral DNA sequences present in the microbial genome that are transcribed into RNA and used as guides to direct the Cas9. o hort viral DNA sequences present in the microbial genome that are transcribed into RNA and used as guides to direct the Cas9 nuclease to cleave the DNA of the invading virus during an infection. o 2. Cas9 protein: nuclease that cleaves DNA sequence of the invading virus (during infection), targeted by the guide RNA (which have been transcribed from CRISPR sequence) The schematic of the system: CRISPR/Cas9 has proved to be very successful as a fast and efficient genome editing tool. o Gene knockouts and allele replacements can be generated by preparing guide RNAs targeting the desired genomic locus. o CASGEVY: The first genetic treatment based on CRISPR has recently been approved, for patients suffering from sickle-cell anaemia and beta-thalassemia. § The therapy reactivates fetal haemoglobin in patients with defective adult haemoglobin. 7. Replication Errors DNA replication is very accurate (1 error in 10^9 nucelotides), but not infallible… 3 types of replication errors: 1. Base mismatches - from replicative polymerases (if missed by proofreading can still be fixed by Mismatch Repair – next lecture). 2. Nucleotide misincorporation - by error-prone Translesion Synthesis (TLS) Polymerases. 3. Deletions or Insertions - (repeat instability, e.g. trinucleotide repeat diseases). 1. Base mismatches Incorporates the wrong base For cancers: single point mutations (deletion) causes hyper-mutation phenotype in the exonuclease domain of Pol epsilon (leading strand polymerase in eukaryotes) – impair the ability of epsilon to perform proof reading – drives cancer formation 2. Nucleotide misincorporation Assa Translesion DNA synthesis A mechanism to bypass DNA lesions damaged sites (chemically altered bases on the DNA template) during replication o Advantage: DNA replication does not stop o Disadvantage: Increased mutagenesis due to lack of proof reading and poor nucleotide selectivity 3. Deletions or insertions Variable penetrance vs variably expressivity à at 40+ CAG repeats, almost 100% penetrance Lecture 5 – DNA damage and repair 1. DNA damage Approx 30,000 damage events per mammalian cell happen per day. Nonetheless, DNA must maintain genetic information in the face of two major challenges that could corrupt that information: 1. errors introduced by the DNA replication process and 2. mutations. It took a relatively long time from the discovery of DNA as the carrier of genetic information before scientists realized that, although DNA is a highly stable molecule, it is prone to chemical and physical damage, and therefore cells must have ways to repair the damage and preserve their genetic code. Many endogenous (within organism) and exogenous (environmental) causes of DNA damage exist, à lead to mutations. These include: o Endogenous: o repliction errors (lecture 4) o free radicals that might be by-products of metabolism o the natural chemical instability of the DNA o Exogenous: o UV light o high-energy ionising radiation (X-rays) o genotoxic chemical mutagens present in the environment (cigarette smoke industrial waste, pollution) The consequences on DNA: Will attempt to repair DNA though repair pathways: if not, then accumulation unrepaired DNA damage à genomic instability à cell death or disease Chemical lesions to DNA structure: o As a consequence, thousands of potentially toxic chemically alterations in the structure of DNA occur in a human cell every day. These include: o Release of the aromatic bases caused by hydrolysis of the glycosidic bond o Oxidative damage to the bases o Intra- and inter-strand crosslinking of the bases o Breaks in one (nick) or both (double-strand break or DSB) phospho-ester strands of the DNA. Replication errors (see section 2 below) o Because DNA repair mechanisms are not completely effective à decay of the covalent structure of DNA may be expected to contribute to mutagenesis à aging and carcinogenesis. Ames test: a bacterial assay for testing carcinogenicity (based on the corroloation of carcinogenesis and mutagenesis) o Strain of salmonella typthmurium that are his- spread on culture plate lacking histadine o Exposure to the chemical à mutations convert to his+, resulting in colony growth after 2 days (37ºC) o Number of colonies WITHIN absence of mutagen related to mutagenicity. DNA repair pathways o We have evolved a set of ~150 evolutionary human genes that are deputed to repair DNA and maintain genetic stability. o In order for cells to deal with damage, they must 1. detect that damage has occurred, then 2. switch on genes to repair the damage and perform the repair reaction. o In eukaryotes à a complex series of checkpoint signals - ensure that the cell cycle is arrested while the damage is repaired so that the cells don’t try to replicate the DNA and either make the error worse or fix the mutation in the genome in the daughter cells of the damaged cell. o Eukaryotic cells may make the decision to commit suicide (apoptosis) rather than live with potentially mutated DNA. Figure 1. DNA damage and its downstream cellular responses. Pathways of DNA repair: 5.2 Replication errors (See lecture 4) There is a number of ways in which errors can occur during replication: 1. Base mismatches 2. Low-fidelity copying by Translesion-synthesis DNA Polymerases 3. Insertions & deletions (collectively known as indels) 1. Base mismatches Polymerases are usually high fidelity… DNA replication is remarkably accurate. Firstly, replicative DNA polymerases have a intrinsically high fidelity, as the incoming nucleotide must have properly paired with Watson-Crick hydrogen bonding to the templating base in the parental strand before DNA polymerase adds it to the 3’ end of new DNA strand. Furthermore, replicative DNA polymerases can correct the occasional nucleotide misincorporation by virtue of their 3’ to 5’ exonuclease (proof-reading) activity. Proof- reading allows the enzyme to remove the nucleotide it has just incorporated, giving it a second chance to get the right nucleotide. However, may still incorporate wrong base and escape proofreading… Classifications are: Transition = purine/pyramidine to same version Transversion = purine/pyramidine to pyramidine/purine CAUSES OF POINT MUTATIONS: A. Deamination of aromatic primary amines Most prevalent form is DNA oxidative deamination of cytosine to form uracil o Either spontaneously or by treatment with an oxidising agent (eg. Nitrous acid) o U, a base normally found in RNA, not DNA, for reasons that will soon become obvious § U prefers to base pair with A and so if uncorrected before DNA replication, this could result in conversion of a C-G base pair to T-A in later generations = transition, pyrimidine to pyrimidine conversion Figure 6. Deamination of cytosine changes the base to uracil, converting a C-G pair to a U-A pair when the DNA is replicated. B. Damage within cellular metabolism Guanine is oxidised to 8-oxoguanine (8-oxoG) – very mutagenic! o Caused by genotoxic by-product (= hydroxyl radical) of the Fenton reaction within the ETC of aerobic respiration where superoxide anion radial O2- is formed: o o When the modified strand is replicated, without repair the 8-oxoG will base pair with a C or T, causing a GC à TA transversion (purine to pyrimidine) § = frequent somatic mutations in human cancers. Solution = Mismatch Repair: Mismatched nucleotides that escape proof- reading are corrected by the specialised proteins of Mismatch Repair (MMR) (see later). Together, these mechanisms push the accuracy of DNA replication to just one wrong nucleotide every 108 – 109 on average. The importance of these fidelity mechanisms is revealed by the finding that mutations in the proof-reading domain of Pol epsilon, the DNA polymerase responsible for leading strand synthesis, cause a ‘hyper- mutation’ phenotype that drives cancer formation (see lecture 4) Figure 2. Accuracy of DNA synthesis of DNA polymerases. 2. Translesion synthesis Sometimes during replication, damage to the DNA template is not immediately repaired; instead, specialized DNA polymerases known as DNA translesion synthesis (TLS) polymerases are deployed by the cell to insert residues opposite damaged sites, so that DNA replication can proceed – eg. Bypasses error. Several DNA TLS polymerases have been discovered that deal with different types of damage o Recruitment of the appropriate TLS polymerase is coordinated by the sliding clamp PCNA, that coordinates place swapping at the 3’-end of the new DNA strand with the stalled replicative polymerase. Advantage: allows the cell to continue the process of DNA synthesis without interruption Disadvantage: TLS polymerases lack proof-reading ability à increased change of nucleotide misincorporation Figure 3. Lesion bypass by DNA TLS polymerases. 3. Insertions & deletions (à Trinucleotides Diseases) Indels = one or more nucleotide pairs are inserted in or deleted from DNA… Occasionally, the copying process will cause local expansion or contraction in the DNA. Regions of short tandem repeats (eg. CAG) display polymorphism (variable number of repeats between parental alleles) à instability à some repeat sequences prone to errors during replication and the location of STRs can determine their disease causing probability à trinucleotide repeat diseases, that occur when the expansion impairs the biochemical function an behaviour of a gene product. The mechanisms of repeat expansions are under investigation; one possibility is that, during DNA replication, slippage of the DNA polymerase from the template strand à complementarity of repeats of the same synthesised strand form transient hairpin formation in the repeat regions à causes the replication machinery to inadvertently expand the repeated region o Can be repaired with Mismatch Repair EXAMPLE = Huntington’s Disease: example of how the replication and repair machinery can contribute to the progression of disease This is an autosomal dominant, genetic condition caused by an insertion of a repeated sequence of three bases (CAG) in the Huntingtin gene. If an individual has fewer than 30 copies of the CAG repeat, they do not have the disease. BUT… Over 40 repeats, they develop the classic cognitive and motor symptoms of the disease (The variable penetrance is almost 100%) The disease exhibits the phenomenon of ANTICIPATION: affected offspring often develop the disease at a younger age than their affected parents did, and this is because they have more copies of the repeat than their parents. **DNA Methylation Methylation modification of DNA (on C for eukaryotes or A = (N6 position) for bacteria residues) that is not mutagenic, but is used as a way of controlling gene expression: In bacteria = A methylation = key part of restriction-modification system o Dam methyltransferase adds a methyl group to the A residue in the sequence GATC, one effect of which is to protect the DNA from digestion with the bacterium’s own restriction endonucleases. In eukaryotes = C methylation to form 5-methylcytosine (5mC) o DNA -methyltransferase (DNMT) catalyses reaction § As methylaton is REVERSIBLE, can be reversed by Ten eleven translocase (TET); 5mC à C – usually done for epigenetic reprogramming o DNA methylation tends to occur at regions containing repeats of CG dinucleotide referred to as CpG islands. § o Methylation of CpG islands switches off the expression of nearby genes and is an important mechanism for controlling gene expression (usually gene silencing = repressing transcription) § Basis for the phenomenon of genomic imprinting, = genes show parent-of-origin expression (for example, only the copy you inherit from your father is expressed, the maternal copy is methylated and silent). DNA methylation will shut down transcription in one parental chromosome Needed for cell differentiation and embryonic development Is epigenetically inherited, not through Mendelian inheritance o Disease Example = Prader-Willy syndrome § Deletion of part of the paternal chromosome 15q, but corresponding part of the maternal Chr15q is imprinted and silent 5.3 Types of DNA repair mechanisms Broadly speaking, there are two major categories of DNA repair mechanisms, and each with their own sub-categories: 1. Single stranded DNA repair (3): recognise incorrect/damaged base --> remove/repair damaged base/nucleotide à fill gap using DNA polymerase and DNA ligase a. base excision (BER) b. nucleotide excision (NER) c. mismatch repair (MMR) 2. Double strand break repair (2) a. non-homologous end-joining (NHEJ) b. homologous recombination (HR) Direct DNA damage reversal (bacteria ONLY) UV light exposure can cause base cross-links called Cyclo-butane Pyrimidine Dimers = CPD, which is a cyclobutyl ring formed between adjacent thymine residues to form a thymine dimer à deforms the double helix and obstructs DNA replication and RNA transcription In bacteria: DIRECT REVERSAL of photo-reaction using photolyase enzymes that split the pyrimidine dimers. o Photolyase binds to DNA à chromophore absorbs light in the 300-500 nm region à transfers energy absorbed FADH, then to electron which splits dimer into two single T residues In humans: indirectly repair using NER Figure 5. UV-radiation can crosslink adjacent thymidine on a DNA strand, leading to formation of a cyclobutyl pyrimidine dimer (CPD). Pathways of DNA repair: Single stranded DNA repair (3) 1. Base Excision Repair (BER) - changes to single bases. Cytosine can deaminate spontaneously or by treatment with oxidising agents to form uracil (U, a base normally found in RNA, not DNA, for reasons that will soon become obvious). U prefers to base pair with A and so if uncorrected before DNA replication, this could result in conversion of a C-G base pair to T-A in later generations (pyrimidine to pyrimidine conversion in known as transition). Figure 6. Deamination of cytosine changes the base to uracil, converting a C-G pair to a U-A pair when the DNA is replicated. BER can be used for damaged base repair, such as that discussed in the section above (CàU transition and Gà8-oxoG transversion) The steps: 1. Excision of damaged base through extra-helical recognition and cleavage DNA glycosylases a. DNA glycolases cause base-flipping i. DNA Glycosylase recognises base (eg. Uracil) ii. Flips the base out of the double helix iii. Cuts it by cleaving the glycosidic linkage to the deoxyribose suagr iv. Forms an abasic site (sugar with no base) à b. Specific recognition of different types of bases needs specific glycosylases i. For C à U transition = uracil DNA glycosylase (UDG) ii. For 8-oxoG repair = 8-oxoGuanine DNA Glycosylase (OGG1) 2. APE1 nuclease recognises and cleaves the DNA backbone a. Produces a 3’ OH group upstream of the abasic site à acts as a priming site for a DNA polymerase 3. Two pathways that can be taken, a. short patch and b. long batch BER a. SHORT PATCH - a specialised DNA repair polymerase, pol beta, fills in the correct nucleotide. i. Ribose-phosphate portion of the abasic nucleotide is removed ii. Missing nucleotide is replaced by DNA Pol Beta iii. The nick is sealed by DNA ligase III b. LONG PATCH - the same proteins that process Okazaki fragments in replication synthesise over the abasic site i. DNA Pol Delta synthesises over the abasic site ii. The displaced flap of the old strand is cleaved by FEN1 iii. The nick is sealed by DNA ligase I Figure 7. Steps of BER. 2. Nucleotide Excision Repair (NER) – recognition of bulky lesions. NER is used mostly to repair BULKY DNA lesions that locally distort the double helix – mainly caused by UV exposure and tobacco (such as CPDs that cannot be directly reversed like in bacteria – see previous section) Specificity of NER vs BER: In BER, there is specific recognition of the lesion by specific glycosylases, whereas in NER there is NO specific recognition…. There are two types of NER, differing in mechanism of lesion recognition, but are identical in all subsequence steps 1. Global Genomic NER (GG) 2. Transcription-coupled NER (TC) The steps: 1. NER pathway detects the lesion by recognising the change in the DNA structure a. a single detection mechanism therefore operates in all NER cases. 2. TFIIH (NER protein machinery) binds to lesion 3. TFIIH assembles at the site and unwinds the DNA (approx. 30 nucleotides) around the damaged site to release the lesion strand 4. The strand containing the lesion is then cleaved upstream = 5’ (by XPG endonuclease) and downstream = 3’ (by XPF/ERCC1 endonuclease) of the lesion 5. Results in a short single stranded gap. 6. This gap is then filled in by DNA polymerase and sealed by DNA ligase. A Figure 8. The steps of NER. Disease from mutated NER gene = xeroderma pigmentosum autosomal recessive syndrome. Extreme sensitivity to UV light à premature skin aging and 100x increased incidence of skin tumours o 20% have neurological conditions 3. Mismatch repair Base mismatches missed by replicative DNA polymerase proof reading are corrected by MMR… See above section Although DNA polymerases do not make many errors, occasionally mismatched bases are incorporated into DNA. Bacteria - have a system to detect mismatched bases and correct them. However if, for example a G is incorporated across from A, how does the machinery tell if the A or the G is the correct base? o In bacteria, this problem is solved by the fact that DNA is normally methylated, as discussed above. o However, a new strand of DNA generated during replication is not methylated until some time after replication is completed. § Thus, newly replicated DNA is hemimethylated = parental template strand marked by methyl groups and the new DNA not methylated. Strand discrimination: nicks present in the strand act as strand discrimination signals If the mismatch repair machinery detects a mismatch, it preferentially repairs the non-methylated new DNA, and does so by excising a single strand of the new DNA that flanks the error rather akin to the way the NER machinery does. The single-strand gap is then filled by DNA polymerase. The steps (in bacteria) 1. The methylated strand ios recognised by the methyl group on the GATC sequence 2. Homodimer MutS binds to the mismatched base pair eg. G and T 3. MutL and MutH (= endonuclease) join to the MutS 4. One of the MutH subunits binds to the methyl group and DNA in between loops out a. The endonuclease activity of MutH is activated 5. Non-methylated strand cut by MutH 6. DNA in mismatch region degraded 7. New, correct bases synthesised and added to new chain by DNA polymerase § Figure 9. The bacterial mismatch repair system. Disease caused by mutation in MMR gene = hereditary nonpolyposis colorectal cancer syndrome. Double stranded break (DSB) repair (2) – MOST DANGEROUS! Caused by radiation and reactive oxygen species produced in the cell, or during replication – nick formed identified by replication fork o Generates free ends that can lead to chromosomal rearrangements § Can either lead to cell death or can be highly mutagenic, potentially moving regions of chromosomes around. However… DSBs are nonetheless needed in important physiological processes such as meiosis and VDJ recombination Repair mechanisms: 1. non-homologous end joining (NHEJ) = direct rejoining of DNA ends (strands aligned, trimmed and ligated) a. Less complex J b. Low fidelity = error prone (due to rarely bein a clean cut, so needs editing à risk of intels) L 2. homologous recombination (HR) = template based repair, causing the exchange of homologous segments between two DNA molecules a. More complex L b. High fidelity = less error prone J How to choose between the two paths? 1. Non-homologous end joining (NHEJ) As its name suggests, non-homologous end joining (NHEJ) acts to simply stick the two broken DNA ends together. This can restore the original sequence but often the ends require some cleaning up and so this can result in the loss of DNA sequences at the joining site. NHEJ is particularly important in the resting, G1 and G0, phases of the eukaryotic cell cycle. The steps: Ku = heterodimer, and the Ku-DNA complex causes dimerization which aligns the strands o is the DBS sensor, and is released by proteolytic cleavage Trimming of the ends could cause potential for mutations, but is much less likely than leaving the DBS without repair Figure 10. NHEJ is inherently mutagenic. As in single stranded repair, loss of function mutations in the genes encoding NHEJ proteins are found in cancers in humans: for example, mutations in DNA ligase IV are found in certain leukaemias. Benefit of NHEJ = V(D)J recombination for antibody diversity Enables for extensive generation of antibodies for a wide range of antigens The genes for antibodies are generated by joining of a number of different DNA segments. o In principle, the number of different antibodies that can be produced depends on the number of combinations possible. o This is because of the low-fidelity = error prone nature of NHEJ = The recombination events that are used to do this are somewhat increasing the diversity generated by combining DNA segments by adding or removing small stretches DNA. To introduce even more variation, enzymes such as terminal transferase that adds extra nucleotides at the junction point, further diversify the antibody repertoire by being sloppy in how many bases they add. o The pivotal role of the NHEJ pathway in mediating this process is shown by the effects of mutations in genes encoding components of the NHEJ: immunodeficiency and failure to perform antibody diversity generation. 2. Homologous recombination (HR) HR is a more complex, but potentially error-free, repair mechanism. This takes advantage of the fact that most organisms have more than a single copy (= extensive homology) of a given chromosome and that this extra copy can act as a template for repair of a damaged chromosome. The steps: 1. MRN and CtIP proteins initiate and then extend the single stranded resection of the 5’ end à leaves an extended single stranded 3’ end 2. 3’ overhang (= tail, see Miska 3) is bound by RPA (replication protein A, binds to ssDNA which stabilises the repair complex, and prevents degradation of the ssRNA overhangs) 3. BRAC2 loads highly conserved proteins RAD51 (eukaryotes)/RecA (bacteria) in place of RPA 4. Homology search = This protein scans the genome for identical double stranded sequences (ie. Donor DNA on a sister chromatid) 5. On finding a suitable target, directs “strand invasion” = replacing one of the duplex strands with the single strand from the damaged DNA. a. The strands of the complementary DNA strands are nicked, and invading ssDNA forms a heteroduplex joint = D loop b. DNA polymerase extends the invading strand using the homologous template c. CROSSING OVER: i. Forms Holliday intermediates = mobile four stranded junction of DNA, that can move along structure via branch migration ii. Holliday junction resolution via cleavage of strands: either of strands that crossed over (=gene reshuffling for ie. Genetic meiosis/diversity) or strand that did not (conserving parental configuration) (50% probability) iii. Formation of two linear duplexes d. GENE CONVERSION: i. This can then prime replication from the invaded end, copying DNA from the intact strand allowing the gap to be bypassed. 6. Figure 11. Double-strand DNA break repair by homologous recombination One ended DBS (At replication forks) As well as helping repair double strand breaks arising via DNA damage, one of the most important roles for HR is in restarting stalled and broken DNA replication forks = one ended DBS There is an average of 10 failed replication forks per eukaryotic cell cycle, so it is essential that these be resolved for cell division to proceed to completion with the genome intact. Benefits of HR = Meiosis HR in meiosis is needed to produce genetic diversity Done after the S2/G2 phase, before meiosis I o Chromosome crossover forms; point of contact = chiasmata 5.4 DNA repair in health and disease The loss or mutation of genes involved in DNA repair processes can be associated with cancer (and some neurological conditions)… The genetic instability caused by a loss of repair pathways promotes uncontrolled proliferation of cells As described above, loss of function in the proteins involved in NER and mismatch repair leads to cancer predisposition. Similarly, a number of genes required for DSB repair also display cancer predisposition phenotypes. o For example BRCA2, mutated in sporadic and inherited breast cancers, plays a role in HR. o o o Many other proteins, including ATM, Chk2, p53, Nbs1 and Mre11, all of which play roles in DSB repair, when mutated, cause elevated cancer rates. o On the other hand, there are also examples of brain tumours that resist radiation treatment by increased ability to repair radiation- induced breaks. o In general, DSB repair acts positively to restore or maintain genomic integrity. § However, DSB processes can also occasionally have deleterious consequences. For example a translocation between parts of chromosomes 9 and 22 is associated with Chronic Myelogenous Leukaemia. DNA repair in clinical practice Finally, the idea of exploiting the various pathways of DNA repairs as potential drug targets for cancer therapy has taken hold and has become a very active area of basic and translational research. SYNTHETIC LETHALITY = inhibition/loss of BOTH genes (not just one) leads to cell death. Although the concept of targeting cellular mechanisms that maintain genomic stability might seem counterintuitive, the approach is based on the observation that cancer cells often inactivate some of their own DNA repair mechanisms, to acquire a so-called mutator phenotype, and become overly reliant on smaller set of repair pathways. By hitting such pathways, it is possible to induce cancer cell death, while causing only minor harm to health cells, that have a full range of repair mechanisms still at their disposal. Example = Olaparib for breast ovarian cancer, the first of a new class of molecules known as PARP inhibitors, that are highly effective in cancer patients with defective BRCA2 genes.