Principles and Techniques of Biochemistry and Molecular Biology PDF

162 Molecular biology, bioinformatics and basic techniques 5.6 THE MANIPULATION OF NUCLEIC ACIDS – BASIC TOOLS AND TECHNIQUES 5.6.1 Enzymes used in molecular biology The discovery and characterisation of a number of key enzymes has enabled the development of various techniques for the analysis and manipulation of DNA. In particular the enzymes termed type II restriction endonucleases have come to play a key role in all aspects of molecular biology. These enzymes recognise certain DNA sequences, usually 4–6 bp in length, and cleave them in a deﬁned manner. The sequences recognised are palindromic or of an inverted repeat nature. That is they read the same in both directions on each strand. When cleaved they leave a ﬂush-ended or staggered (also termed a cohesive-ended) fragment depending on the particular enzyme used (Fig. 5.20). An important property of staggered ends is that those produced from different molecules by the same enzyme are complementary (or ‘sticky’) and so will anneal to each other. The annealed strands are held together only by hydrogen bonding between complementary bases on opposite strands. Covalent joining of ends on each of the two strands may be brought about by the enzyme DNA ligase (Section 6.2.2). This is widely exploited in molecular biology to enable the construction of recombinant DNA, i.e. the joining of DNA fragments from different sources. Approximately 500 restriction (a) Enzyme Recognition sequence Products 5–CCGG–3 5–C CGG–3 Hpa II 3–GGCC–5 3–GGC C–5 Hae III 5–GGCC–3 5–GG CC–3 3–CCGG–5 3–CC GG–5 BamHI 5–GGATCC–3 5–G GATCC–3 3–CCTAGG–5 3–CCTAG G–5 5–GTTAAC–3 5–GTT AAC–3 Hpa I 3–CAATTG–5 3–CAA TTG–5 (b) EcoR I GAATTC Hind III AAGCTT Pvu II CAGCTG BamHI GGATCC Fig. 5.20 Recognition sequences of some restriction enzymes showing (a) full descriptions and (b) conventional representations. Arrows indicate positions of cleavage. Note that all the information in (a) can be derived from knowledge of a single strand of the DNA, whereas in (b) only one strand is shown, drawn 5’ to 3’; this is the conventional way of representing restriction sites. 163 5.6 The manipulation of nucleic acids – basic tools and techniques Table 5.3 Types and examples of typical enzymes used in the manipulation of nucleic acids Enzyme Speciﬁc example Use in nucleic acid manipulation DNA pol I DNA-dependent DNA polymerase 50 !30 !50 f exonuclease activity Klenow DNA pol I lacks 50 !30 exonuclease activity T4 DNA pol Lacks 50 !30 exonuclease activity DNA polymerases Taq DNA pol Thermostable DNA polymerase used in PCR Tth DNA pol Thermostable DNA polymerase with RT activity T7 DNA pol Used in DNA sequencing f T7 RNA pol DNA-dependent RNA polymerase RNA polymerases T3 RNA pol DNA-dependent RNA polymerase Qß replicase RNA-dependent RNA polymerase, used in RNA f ampliﬁcation DNase I Non-speciﬁc endonuclease that cleaves DNA Exonuclease III DNA-dependent 30 !50 stepwise removal of nucleotides Nucleases RNase A RNases used in mapping studies RNase H Used in second strand cDNA synthesis S1 nuclease Single-strand-speciﬁc nuclease Reverse transcriptase AMV-RT RNA-dependent DNA polymerase, used in cDNA synthesis Transferases Terminal transferase Adds homopolymer tails to the 30 end of DNA (TdT) Ligases T4 DNA ligase Links 50 -phosphate and 30 -hydroxyl ends via phosphodiester bond Kinases T4 polynucleotide Transfers terminal phosphate groups from ATP to kinase (PNK) 50 -OH groups Phosphatases Alkaline phosphatase Removes 50 -phosphates from DNA and RNA Transferases Terminal transferase Adds homopolymer tails to the 30 end of DNA Methylases EcoRI methylase Methylates speciﬁc residues and protects from cleavage by restriction enzymes Notes: PCR, polymerase chain reaction; RT, reverse transcriptase; cDNA, complementary DNA; AMV, avian myeloblastosis virus. 164 Molecular biology, bioinformatics and basic techniques enzymes have been characterised that recognise over 100 different target sequences. A number of these, termed isoschizomers, recognise different target sequences but produce the same staggered ends or overhangs. A number of other enzymes have proved to be of value in the manipulation of DNA, as summarised in Table 5.3, and are indicated at appropriate points within the text. 5.7 ISOLATION AND SEPARATION OF NUCLEIC ACIDS 5.7.1 Isolation of DNA The use of DNA for analysis or manipulation usually requires that it is isolated and puriﬁed to a certain extent. DNA is recovered from cells by the gentlest possible method of cell rupture to prevent the DNA from fragmenting by mechanical shearing. This is usually in the presence of EDTA which chelates the Mg2þ ions needed for enzymes that degrade DNA termed DNase. Ideally, cell walls, if present, should be digested enzymatically (e.g. lysozyme treatment of bacteria), and the cell membrane should be solubilised using detergent. If physical disruption is necessary, it should be kept to a minimum, and should involve cutting or squashing of cells, rather than the use of shear forces. Cell disruption (and most subsequent steps) should be per- formed at 4 C, using glassware and solutions that have been autoclaved to destroy DNase activity. After release of nucleic acids from the cells, RNA can be removed by treatment with ribonuclease (RNase) that has been heat-treated to inactivate any DNase contaminants; RNase is relatively stable to heat as a result of its disulphide bonds, which ensure rapid renaturation of the molecule on cooling. The other major contaminant, protein, is removed by shaking the solution gently with water-saturated phenol, or with a phenol/chloroform mixture, either of which will denature proteins but not nucleic acids. Centrifugation of the emulsion formed by this mixing produces a lower, organic phase, separated from the upper, aqueous phase by an interface of denatured protein. The aqueous solution is recovered and deproteinised repeatedly, until no more mater- ial is seen at the interface. Finally, the deproteinised DNA preparation is mixed with two volumes of absolute ethanol, and the DNA allowed to precipitate out of solution in a freezer. After centrifugation, the DNA pellet is redissolved in a buffer containing EDTA to inactivate any DNases present. This solution can be stored at 4 C for at least a month. DNA solutions can be stored frozen although repeated freezing and thawing tends to damage long DNA molecules by shearing. The procedure described above is suitable for total cellular DNA. If the DNA from a speciﬁc organelle or viral particle is needed, it is best to isolate the organelle or virus before extracting its DNA, since the recovery of a particular type of DNA from a mixture is usually rather difﬁcult. Where a high degree of purity is required DNA may be subjected to density gradient ultracentrifugation through caesium chloride which is particularly useful for the preparation of plasmid DNA. A ﬂow chart of DNA extraction is indicated in Fig. 5.21. 165 5.7 Isolation and separation of nucleic acids Homogenise Cells/Tissues 4°C/sterile equipment Cellular Lysis Detergent/Lysozyme Chelating Agents EDTA/Citrate Proteinase Agents Proteinase K Phenol Extraction Phenol/Chloroform Alcohol Precipitation 70%/100% Ethanol Redissolve DNA TE Buffer (Tris-EDTA) Fig. 5.21 General steps involved in extracting DNA from cells or tissues. It is possible to check the integrity of the DNA by agarose gel electrophoresis and determine the concentration of the DNA by using the fact that 1 absorbance unit equates to 50 mg ml1 of DNA and so: 50A260 ¼ concentration of DNA sample ðmg ml1 Þ Contaminants may also be identiﬁed by scanning UV spectrophotometry from 200 nm to 300 nm. A ratio of 260 nm : 280 nm of approximately 1.8 indicates that the sample is free of protein contamination, which absorbs strongly at 280 nm. 5.7.2 Isolation of RNA The methods used for RNA isolation are very similar to those described above for DNA; however, RNA molecules are relatively short, and therefore less easily damaged by shearing, so cell disruption can be rather more vigorous. RNA is, however, very vulnerable to digestion by RNases which are present endogenously in various concentrations in certain cell types and exogenously on ﬁngers. Gloves should therefore 166 Molecular biology, bioinformatics and basic techniques Treat Reagents Treat with RNase inhibitors e.g. diethylpyrocarbonate (DEPC) Homogenise Cells/Tissues 4°C/treated reagents Cellular Lysis Detergent/Lysozyme RNA solvents Proteinase Agents Guanadinium thiocyanate Proteinase K Phenol Extraction Phenol/Chloroform Alcohol Precipitation 70%/100% Ethanol Redissolve RNA Fig. 5.22 General steps involved in extracting RNA from cells or tissues. be worn, and a strong detergent should be included in the isolation medium to immediately denature any RNases. Subsequent deproteinisation should be particularly rigorous, since RNA is often tightly associated with proteins. DNase treatment can be used to remove DNA, and RNA can be precipitated by ethanol. One reagent in particular which is commonly used in RNA extraction is guanadinium thiocyanate which is both a strong inhibitor of RNase and a protein denaturant. A ﬂow chart of RNA extraction is indicated in Fig. 5.22. It is possible to check the integrity of an RNA extract by analysing it by agarose gel electrophoresis. The most abundant RNA species, the rRNA molecules 23S and 16S for prokaryotes and 18S and 28S for eukaryotes, appear as discrete bands on the agarose gel and thus indicate that the other RNA components are likely to be intact. This is usually carried out under denaturing conditions to prevent secondary structure formation in the RNA. The concentration of the RNA may be estimated by using UV spectrophotometry. At 260 nm 1 absorbance unit equates to 40 mg ml1 of RNA and therefore: 40A260 ¼ concentration of DNA sample ðmg ml1 Þ 167 5.7 Isolation and separation of nucleic acids Cellular mRNA (heterogeneous size transcripts) AAAAAAA AAAAAAA AAAAAAA Poly(dT) affinity column AAAAA TTTT TTTTT TTTTTT TTTTTT AAAAAAA TTTT Poly(A)+ RNA binds to poly(dT) AAAAAAA TTTTT TTTT TTTT Non-poly(A)+ RNA and DNA are washed through column in high salt concentrations Poly(A) + RNA is eluted by changing to low salt concentrations Fig. 5.23 Afﬁnity chromatography of poly(A)þRNA. Contaminants may also be identiﬁed in the same way as that for DNA by scanning UV spectrophotometry; however, in the case of RNA a 260 nm : 280 nm ratio of approximately 2 would be expected for a sample containing no protein (Section 5.8.1). In many cases it is desirable to isolate eukaryotic mRNA which constitutes only 2–5% of cellular RNA from a mixture of total RNA molecules. This may be carried out by afﬁnity chromatography on oligo(dT)-cellulose columns. At high salt concen- trations, the mRNA containing poly(A) tails binds to the complementary oligo(dT) molecules of the afﬁnity column, and so mRNA will be retained; all other RNA molecules can be washed through the column by further high salt solution. Finally, the bound mRNA can be eluted using a low concentration of salt (Fig. 5.23). Nucleic acid species may also be subfractionated by more physical means such as electro- phoretic or chromatographic separations based on differences in nucleic acid fragment sizes or physicochemical characteristics. Nanodrop spectrophotometer systems have also aided the analysis of nucleic acids in recent years in allowing the full spectrum of information whilst requiring only a very small (microlitre) sample volume. 168 Molecular biology, bioinformatics and basic techniques 5.7.3 Automated and kit-based extraction of nucleic acids Most of the current reagents used in molecular biology and the most common techniques can now be found in kit form or can be automated, and the extraction of nucleic acids by these means is no exception. The advantage of their use lies in the fact that the reagents are standardised and quality control tested providing a high degree of reliability. For example glass bead preparations for DNA puriﬁcation have been used increasingly and with reliable results. Small compact column-type prepar- ations such as QIAGEN columns are also used extensively in research and in routine DNA analysis. Essentially the same reagents for nucleic acid extraction may be used in a format that allows reliable and automated extraction. This is of particular use where a large number of DNA extractions are required. There are also many kit-based extraction methods for RNA; these in particular have overcome some of the problems of RNA extraction such as RNase contamination. A number of fully automated nucleic acid extraction machines are now employed in areas where high throughput is required, e.g. clinical diagnostic laboratories. Here the raw samples such as blood specimens are placed in 96- or 384-well microtitre plates and these follow a set computer-controlled processing pattern carried out robotically. Thus the samples are rapidly manipulated and extracted in approximately 45 min without any manual operations being undertaken. 5.7.4 Electrophoresis of nucleic acids Electrophoresis in agarose or polyacrylamide gels is the most usual way to separate DNA molecules according to size. The technique can be used analytically or prepara- tively, and can be qualitative or quantitative. Large fragments of DNA such as chromosomes may also be separated by a modiﬁcation of electrophoresis termed pulsed ﬁeld gel electrophoresis (PFGE). The easiest and most widely applicable method is electrophoresis in horizontal agarose gels, followed by staining with ethidium bromide. This dye binds to DNA by insertion between stacked base pairs (intercalation), and it exhibits a strong orange/red ﬂuorescence when illuminated with ultraviolet light (Fig. 5.24). Very often electrophoresis is used to check the purity and intactness of a DNA preparation or to assess the extent of an enzymatic reaction during for example the steps involved in the cloning of DNA. For such checks ‘minigels’ are particularly convenient, since they need little preparation, use small samples and give results quickly. Agarose gels can be used to separate molecules larger than about 100 bp. For higher resolution or for the effective separation of shorter DNA molecules polyacrylamide gels are the preferred method. When electrophoresis is used preparatively, the piece of gel containing the desired DNA fragment is physically removed with a scalpel. The DNA may be recovered from the gel fragment in various ways. This may include crushing with a glass rod in a small volume of buffer, using agarase to digest the agarose leaving the DNA, or by the process of electroelution. In this method the piece of gel is sealed in a length of dialysis tubing containing buffer, and is then placed between two electrodes in a tank containing more buffer. Passage of an electrical current between the electrodes causes 169 5.7 Isolation and separation of nucleic acids NH2 Ethidium bromide intercalates between the planer rings of the DNA double helix. Under ultraviolet irradiation the Br– intercalating ethidium bromide fluoresces N+ and the DNA becomes visible H2N C2H5 A photograph of an agarose gel stained with ethidium bromide and illuminated with UV irradiation showing discrete DNA bands Fig. 5.24 The use of ethidium bromide to detect DNA. DNA to migrate out of the gel piece, but it remains trapped within the dialysis tubing, and can therefore be recovered easily. 5.7.5 Automated analysis of nucleic acid fragments Gel electrophoresis remains the established method for the separation and analysis of nucleic acids. However a number of automated systems using pre-cast gels and standardised reagents are available that are now very popular. This is especially useful in situations where a large number of samples or high-throughput analysis is required. In addition technologies such as the Agilents’ Lab-on-a-chip have been developed that obviate the need to prepare electrophoretic gels. These employ microﬂuidic circuits constructed on small cassette units that contain interconnected micro-reservoirs. The sample is applied in one area and driven through microchannels under computer- controlled electrophoresis. The channels lead to reservoirs allowing, for example, incubation with other reagents such as dyes for a speciﬁed time. Electrophoretic separation is thus carried out in a microscale format. The small sample size minimises sample and reagent consumption and the units, being computer controlled, allow data 170 Molecular biology, bioinformatics and basic techniques to be captured within a very short timescale. More recently alternative methods of analysis including high performance liquid chromatography based approaches have gained in popularity, especially for DNA mutation analysis. Mass spectrometry is also becoming increasingly used for nucleic acid analysis. 5.8 MOLECULAR BIOLOGY AND BIOINFORMATICS 5.8.1 Basic bioinformatics Bioinformatics is now an established and vital resource for molecular biology research and is also a mainstay of routine analysis of DNA. This increase in use of bioinformatics has been driven by the increase in genetic sequence information and the need to store, analyse and manipulate the data. There are now a huge number of sequences stored in genetic databases from a variety of organisms, including the human genome. Indeed the genetic information from various organisms is now an indispensable starting point for molecular biology research. The main primary data- bases include GenBank at the National Institutes of Health (NIH) in the USA, EMBL at the European Bioinformatics Institute (EBI) at Cambridge, UK and the DNA Database of Japan (DDBJ) at Mishima in Japan. These databases contain the nucleotide sequences which are annotated to allow easy identiﬁcation. There are also many other databases such as secondary databases that contain information relating to sequence motifs, such as core sequences found in cytochrome P450 domains, or DNA-binding domains. Importantly all of the databases may be freely accessed over the internet. A number of these important databases and internet resources are listed in Table 5.4. Consequently the new expanding and exciting areas of bioscience research are those that analyse genome and cDNA sequence databases (genomics) and also their protein counterparts (proteomics). This is sometimes referred to as in silico research. 5.8.2 Analysing information using bioinformatics One of the most useful bioinformatics resources is termed BLAST (Basic Local Align- ment Search Tool) located at the NCBI (www.ncbi.nlm.nih.gov). This allows a DNA sequence to be submitted via the internet in order to compare it to all the sequences contained within a DNA database. This is very useful since it is possible once a nucleotide sequence has been deduced by, for example, Sanger sequencing, to identify sequences of similarity. Indeed if human sequences are used and have already been mapped it is possible to locate their position to a particular chromosome using NCBI Map Viewer. Further resources such as ORF (open reading frame) ﬁnder allow a search to be undertaken for open reading frames, e.g. sequences beginning with a start codon (ATG) and continuing with a signiﬁcant number of ‘coding’ triplets before a stop codon is reached. There are a number of other sequences that may be used to deﬁne coding sequences; these include ribosome binding sites, splice site junctions, poly(A) polymerase sequences and promoter sequences that lie outside the coding 171 5.9 Molecular analysis of nucleic acid sequences Table 5.4 Nucleic acid and protein database resources available on the World Wide Web Database or resource URL (uniform resource locator) General DNA sequence databases EMBL European Bioinformatics Institute GenBank US genetic database resource DDBJ Japanese genetic database Protein sequence databases Swiss-Prot European protein sequence database UniProt TREMBL European protein sequence database Protein structure databases PDB Protein structure database Genome project databases Human Genome Database, USA dbEST cDNA and partial sequences Généthon Genetic maps based on repeat markers regions. A number of bioinformatics resources such as GRAIL can be used to identify such features in a DNA sequence. 5.9 MOLECULAR ANALYSIS OF NUCLEIC ACID SEQUENCES 5.9.1 Restriction mapping of DNA fragments Restriction mapping involves the size analysis of restriction fragments produced by several restriction enzymes individually and in combination (Section 5.6.1). The principle of this mapping is illustrated in Fig. 5.25, in which the restriction sites of two enzymes, A and B, are being mapped. Cleavage with A gives fragments 2 and 7 kb from a 9 kb molecule, hence we can position the single A site 2 kb from one end. Similarly, B gives fragments 3 and 6 kb, so it has a single site 3 kb from one end; but it is not possible at this stage to say if it is near to A’s site, or at the opposite end of the DNA. This can be resolved by a double digestion. If the resultant fragments are 2, 3 and 4 kb, then A and B cut at opposite ends of the molecule; if they are 1, 2 and 6 kb, the sites are near each other. Not surprisingly, the mapping of real molecules is rarely 172 Molecular biology, bioinformatics and basic techniques Measured sizes Treatment Interpretation of fragments (kb) 9 No digestion 9 A Enzyme A 2+7 2 7 A B Enzyme B 3+6 EITHER 3 6 A B OR 6 3 A B Enzymes A + B 2, 3 + 4 2 4 3 A B alternative result1, 2 + 6 2 6 1 Fig. 5.25 Restriction mapping of DNA. Note that each experimental result and its interpretation should be considered in sequence, thus building up an increasingly unambiguous map. as simple as this, and bioinformatic analysis of the restriction fragment lengths is usually needed to construct a map. 5.9.2 Nucleic acid blotting methods Electrophoresis of DNA restriction fragments allows separation based on size to be carried out, however it provides no indication as to the presence of a speciﬁc, desired fragment among the complex sample. This can be achieved by transferring the DNA from the intact gel onto a piece of nitrocellulose or nylon membrane placed in contact with it. This provides a more permanent record of the sample since DNA begins to diffuse out of a gel that is left for a few hours. First the gel is soaked in alkali to render the DNA single stranded. It is then transferred to the membrane so that the DNA becomes bound to it in exactly the same pattern as that originally on the gel. This transfer, named a Southern blot after its inventor Ed Southern, can be performed electrophoretically or by drawing large volumes of buffer through both gel and mem- brane, thus transferring DNA from one to the other by capillary action (Fig. 5.26). The point of this operation is that the membrane can now be treated with a labelled DNA molecule, for example a gene probe (Section 5.9.4). This single-stranded DNA probe will hybridise under the right conditions to complementary fragments immobilised onto the membrane. The conditions of hybridisation, including the temperature and salt concen- tration, are critical for this process to take place effectively. This is usually referred to as 173 5.9 Molecular analysis of nucleic acid sequences Nylon or WEIGHT nitrocellulose membrane Absorbent tissue Chromatography paper Gel Buffer Chromatography paper Fig. 5.26 Southern blot apparatus. the stringency of the hybridisation and it is particular for each individual gene probe and for each sample of DNA. A series of washing steps with buffer is then carried out to remove any unbound probe and the membrane is developed after which the precise location of the probe and its target may be visualised. It is also possible to analyse DNA from different species or organisms by blotting the DNA and then using a gene probe representing a protein or enzyme from one of the organisms. In this way it is possible to search for related genes in different species. This technique is generally termed zoo blotting. The same basic process of nucleic acid blotting can be used to transfer RNA from gels onto similar membranes. This allows the identiﬁcation of speciﬁc mRNA sequences of a deﬁned length by hybridisation to a labelled gene probe and is known as Northern blotting. It is possible with this technique to not only detect speciﬁc mRNA molecules but it may also be used to quantify the relative amounts of the speciﬁc mRNA. It is usual to separate the mRNA transcripts by gel electrophoresis under denaturing conditions since this improves resolution and allows a more accur- ate estimation of the sizes of the transcripts (Section 5.7.2). The format of the blotting may be altered from transfer from a gel to direct application to slots on a speciﬁc blotting apparatus containing the nylon membrane. This is termed slot or dot blotting and provides a convenient means of measuring the abundance of speciﬁc mRNA transcripts without the need for gel electrophoresis; it does not, however, provide information regarding the size of the fragments. 5.9.3 Design and production of gene probes The availability of a gene probe is essential in many molecular biology techniques yet in many cases is one of the most difﬁcult steps. The information needed to produce a gene probe may come from many sources; however, the availability of bioinformatics resources and genetic databases has ensured that this is the usual starting point for gene probe design. In some cases it is possible to use related genes, that is from the same gene family, to gain information on the most useful DNA sequence to use as a probe. Similar proteins or DNA sequences but from different species may also provide a starting 174 Molecular biology, bioinformatics and basic techniques Polypeptide Phe Met Pro Trp His Corresponding T T T nucleotide 5 TTC ATC CCC TGG CAC 3 sequences A G Fig. 5.27 Oligonucleotide probes. Note that only methionine and tryptophan have unique codons. It is impossible to predict which of the indicated codons for phenylalanine, proline and histidine will be present in the gene to be probed, so all possible combinations must be synthesised (16 in the example shown). point with which to produce a so-called heterologous gene probe. Although in some cases probes are already produced and cloned it is possible, armed with a DNA sequence from a DNA database, to chemically synthesise a single-stranded oligo- nucleotide probe. This is usually undertaken by computer-controlled gene synthesisers which link dNTPs (deoxyribonucleoside triphosphates) together based on a desired sequence. It is essential to carry out certain checks before probe production to determine that the probe is unique, is not able to self-anneal or that it is self-complementary, all of which may compromise its use. Where little DNA information is available to prepare a gene probe it is possible in some cases to use the knowledge gained from analysis of the corresponding protein. Thus it is possible to isolate and purify proteins and sequence part of the N-terminal end or an internal region of the protein. From our knowledge of the genetic code, it is possible to predict the various DNA sequences that could code for the protein, and then synthesise appropriate oligonucleotide sequences chemically. Due to the degen- eracy of the genetic code most amino acids are coded for by more than one codon, therefore there will be more than one possible nucleotide sequence that could code for a given polypeptide (Fig. 5.27). The longer the polypeptide, the greater the number of possible oligonucleotides that must be synthesised. Fortunately, there is no need to synthesise a sequence longer than about 20 bases, since this should hybridise efﬁ- ciently with any complementary sequences, and should be speciﬁc for one gene. Ideally, a section of the protein should be chosen which contains as many tryptophan and methionine residues as possible, since these have unique codons, and there will therefore be fewer possible base sequences that could code for that part of the protein. The synthetic oligonucleotides can then be used as probes in a number of molecular biology methods. 5.9.4 Labelling DNA gene probe molecules An essential feature of a gene probe is that it can be visualised or labelled by some means. This allows any complementary sequence that the probe binds to be ﬂagged up or identiﬁed. There are two main types of label used for gene probes: traditionally this has been carried out using radioactive labels, but gaining in popularity are non-radioactive labels. 175 5.9 Molecular analysis of nucleic acid sequences Perhaps the most common radioactive label is 32-phosphorus (32P), although for certain techniques 35-sulphur (35S) and tritium (3H) are used. These may be detected by the process of autoradiography where the labelled probe molecule, bound to sample DNA, located for example on a nylon membrane, is placed in contact with an X-ray-sensitive ﬁlm. Following exposure the ﬁlm is developed and ﬁxed just as a black-and-white negative. The exposed ﬁlm reveals the precise location of the labelled probe and therefore the DNA to which it has hybridised. Non-radioactive labels are increasingly being used to label DNA gene probes. Until recently radioactive labels were more sensitive than their non-radioactive counter- parts. However, recent developments have led to similar sensitivities which, when combined with their improved safety, have led to their greater acceptance. The labelling systems are either termed direct or indirect. Direct labelling allows an enzyme reporter such as alkaline phosphatase to be coupled directly to the DNA. Although this may alter the characteristics of the DNA gene probe it offers the advantage of rapid analysis since no intermediate steps are needed. However indirect labelling is at present more popular. This relies on the incorporation of a nucleotide which has a label attached. At present three of the main labels in use are biotin, ﬂuorescein and digoxygenin. These molecules are covalently linked to nucleotides using a carbon spacer arm of 7, 14 or 21 atoms. Speciﬁc binding pro- teins may then be used as a bridge between the nucleotide and a reporter protein such as an enzyme. For example, biotin incorporated into a DNA fragment is rec- ognised with a very high afﬁnity by the protein streptavidin. This may either be coupled or conjugated to a reporter enzyme molecule such as alkaline phosphatase. This is able to convert a colourless substrate p-nitrophenol phosphate (PNPP) into a yellow-coloured compound p-nitrophenol (PNP) and also offers a means of signal ampliﬁcation. Alternatively labels such as digoxygenin incorporated into DNA sequences may be detected by monoclonal antibodies, again conjugated to repor- ter molecules such as alkaline phosphatase. Thus rather than the detection system relying on autoradiography which is necessary for radiolabels, a series of reactions resulting in the products of either a colour, light or the product of a chemilu- minescence reaction take place. This has important practical implications since autoradiography may take 1–3 days whereas colour and chemiluminescent reactions take minutes. 5.9.5 End labelling of DNA molecules The simplest form of labelling DNA is by 50 or 30 end-labelling. 50 end labelling involves a phosphate transfer or exchange reaction where the 50 phosphate of the DNA to be used as the probe is removed and in its place a labelled phosphate, usually 32P, is added. This is usually carried out by using two enzymes; the ﬁrst, alkaline phosphat- ase, is used to remove the existing phosphate group from the DNA. Following removal of the released phosphate from the DNA, a second enzyme, polynucleotide kinase, is added which catalyses the transfer of a phosphate group (32P-labelled) to the 50 end of the DNA. The newly labelled probe is then puriﬁed, usually by chromatography through a Sephadex column, and may be used directly (Fig. 5.28). 176 Molecular biology, bioinformatics and basic techniques Purify gene probe fragment or 5 P 3 synthesise oligonucleotide Alkaline phosphatase treatment 5 3 of probe to remove 5-phosphate P P P dATP Polynucleotide kinase transfers phosphate 5 3 group from donor to 5 end of probe 5 end of probe is radiolabelled 5 P 3 and gene probe is purified Fig. 5.28 End-labelling of a gene probe at the 5’ end with alkaline phosphatase and polynucleotide kinase. Synthesise oligonucleotide or 5 3 purify gene probe fragment dNTP P Transfer labelled dNTP to the 5 3 3 end using terminal transferase N 3 end of probe is radiolabelled 5 P 3 and gene probe is purified Fig. 5.29 End-labelling of a gene probe at the 3’ end using terminal transferase. Note that the addition of a labelled dNTP at the 3’ end alters the sequence of the gene probe. Using the other end of the DNA molecule, the 30 end, is slightly less complex. Here a new dNTP which is labelled (e.g. 32P-adATP or biotin-labelled dNTP) is added to the 30 end of the DNA by the enzyme terminal transferase. Although this is a simpler reaction a potential problem exists because a new nucleotide is added to the existing sequence and so the complete sequence of the DNA is altered which may affect its hybridisation to its target sequence. End-labelling methods also suffer from the fact that only one label is added to the DNA so they are of a lower activity in comparison to methods which incorporate label along the length of the DNA (Fig. 5.29). 5.9.6 Random primer labelling and nick translation The DNA to be labelled is ﬁrst denatured and then placed under renaturing conditions in the presence of a mixture of many different random sequences of hexamers or hexanucleotides. These hexamers will, by chance, bind to the DNA sample wherever they encounter a complementary sequence and so the DNA will rapidly acquire an approximately random sprinkling of hexanucleotides annealed to it. Each of the hexamers can act as a primer for the synthesis of a fresh strand of DNA catalysed by DNA polymerase since it has an exposed 30 hydroxyl group. The Klenow fragment of DNA polymerase is used for random primer labelling because it lacks a 50 to 177 5.9 Molecular analysis of nucleic acid sequences Single-stranded DNA probe Anneal random primers to gene probe Random primer 3 5 3 5 3 5 3 5 5 3 DNA polymerase (Klenow) and dNTPs, one of which is labelled Labelled dNTP 3 5 5 3 Double-stranded labelled gene probe Fig. 5.30 Random primer gene probe labelling. Random primers are incorporated and used as a start point for Klenow DNA polymerase to synthesise a complementary strand of DNA whilst incorporating a labelled dNTP at complementary sites. 30 exonuclease activity. This is prepared by cleavage of DNA polymerase with subtilisin, giving a large enzyme fragment which has no 50 to 30 exonuclease activity, but which still acts as a 50 to 30 polymerase. Thus when the Klenow enzyme is mixed with the annealed DNA sample in the presence of dNTPs, including at least one which is labelled, many short stretches of labelled DNA will be generated (Fig. 5.30). In a similar way to random primer labelling the polymerase chain reaction may also be used to incorporate radioactive or non-radioactive labels (Section 5.11.4). A further traditional method of labelling DNA is by the process of nick translation. Low concentrations of DNase I are used to make occasional single-strand nicks in the double-stranded DNA that is to be used as the gene probe. DNA polymerase then ﬁlls in the nicks, using an appropriate dNTP, at the same time making a new nick to the 30 side of the previous one (Fig. 5.31). In this way the nick is translated along the DNA. If labelled dNTPs are added to the reaction mixture, they will be used to ﬁll in the nicks, and so the DNA can be labelled to a very high speciﬁc activity. 5.9.7 Molecular-beacon-based probes A more recent development in the design of labelled oligonucleotide hybridisation probes is that of molecular beacons. These probes contain a ﬂuorophore at one end of the probe 178 Molecular biology, bioinformatics and basic techniques 5 G C G T A A G 3 3 C G C A T T C 5 One strand is nicked and nucleotide removed by DNase I 5 G G T A A G 3 3 C G C A T T C 5 Gap filled by labelled nucleotide and next nucleotide removed dCTP by DNA polymerase I 5 G C T A A G 3 3 C G C A T T C 5 Nick moves from 5 to 3 dGTP 5 G C G A A G 3 3 C G C A T T C 5 dTTP 5 G C G T A G 3 3 C G C A T T C 5 Fig. 5.31 Nick translation. The removal of nucleotides and their subsequent replacement with labelled nucleotides by DNA polymerase I increase the label in the gene probe as nick translation proceeds. and a quencher molecule at the other. The oligonucleotide has a stem–loop structure where the stems place the ﬂuorophore and quencher in close proximity. The loop structure is designed to be complementary to the target sequence. When the stem–loop structure is formed the ﬂuorophore is quenched by Förster or ﬂuorescence resonance energy transfer (FRET), i.e. the energy is transferred from the ﬂuorophore to the quencher and given off as heat. The elegance of these types of probe lies in the fact that upon hybridisation to a target sequence the stem and loop move apart, the quenching is then lost and emission of light occurs from the ﬂuorophore upon excitation. These types of probe have also been used to detect nucleic acid ampliﬁcation system products such as the polymerase chain reaction (PCR) and have the advantage that it is unnecessary to remove the unhybridised probes. 5.10 THE POLYMERASE CHAIN REACTION (PCR) 5.10.1 Basic concept of the PCR The polymerase chain reaction or PCR is one of the mainstays of molecular biology. One of the reasons for the wide adoption of the PCR is the elegant simplicity of the 179 5.10 The polymerase chain reaction (PCR) Complex genomic ‘template’ DNA Region to be amplified ‘target’ DNA expanded view of DNA region 5 3 3 5 PCR primers designed to each DNA strand that flanks region to be amplified 5 3 3 5 Primer 2 Primer 1 5 3 3 5 Primers are complementary to existing sequences necessitating that some flanking sequence information is known Fig. 5.32 The location of polymerase chain reaction (PCR) primers. PCR primers designed for sequences adjacent to the region to be ampliﬁed allow a region of DNA (e.g. a gene) to be ampliﬁed from a complex starting material of genomic template DNA. reaction and relative ease of the practical manipulation steps. Indeed combined with the relevant bioinformatics resources for its design and for determination of the required experimental conditions it provides a rapid means for DNA identiﬁcation and analysis. It has opened up the investigation of cellular and molecular processes to those outside the ﬁeld of molecular biology. The PCR is used to amplify a precise fragment of DNA from a complex mixture of starting material usually termed the template DNA and in many cases requires little DNA puriﬁcation. It does require the knowledge of some DNA sequence information which ﬂanks the fragment of DNA to be ampliﬁed (target DNA). From this infor- mation two oligonucleotide primers may be chemically synthesised each comple- mentary to a stretch of DNA to the 30 side of the target DNA, one oligonucleotide for each of the two DNA strands (Fig. 5.32). It may be thought of as a technique 180 Molecular biology, bioinformatics and basic techniques Denaturation ds DNA denatured by heating to > 94 °C Extension Annealing 1 PCR Cycle Taq polymerase extends target sequences Oligo primers bind to target sequences Fig. 5.33 A simpliﬁed scheme of one PCR cycle that involves denaturation, annealing and extension. ds, double-stranded. analogous to the DNA replication process that takes place in cells since the outcome is the same: the generation of new complementary DNA stretches based upon the existing ones. It is also a technique that has replaced, in many cases, the traditional DNA cloning methods since it fulﬁls the same function, the production of large amounts of DNA from limited starting material; however, this is achieved in a fraction of the time needed to clone a DNA fragment (Chapter 6). Although not without its drawbacks the PCR is a remarkable development which is changing the approach of many scientists to the analysis of nucleic acids and continues to have a profound impact on core biosciences and biotechnology. 5.10.2 Stages in the PCR The PCR consists of three deﬁned sets of times and temperatures termed steps: (i) denaturation, (ii) annealing and (iii) extension. Each of these steps is repeated 30–40 times, termed cycles (Fig. 5.33). In the ﬁrst cycle the double-stranded template DNA is (i) denatured by heating the reaction to above 90 C. Within the complex DNA the region to be speciﬁcally ampliﬁed (target) is made accessible. The temperature is then cooled to 40–60 C. The precise temperature is critical and each PCR system has to be deﬁned and optimised. One useful technique for optimisation is touchdown PCR where a programmable cycler is used to incrementally decrease the annealing temperature until the optimum is derived. Reactions that are not optimised may give rise to other DNA products in addition to the speciﬁc target or may not produce any 181 5.10 The polymerase chain reaction (PCR) ampliﬁed products at all. The annealing step allows the hybridisation of the two oligonucleotide primers, which are present in excess, to bind to their complementary sites that ﬂank the target DNA. The annealed oligonucleotides act as primers for DNA synthesis, since they provide a free 30 hydroxyl group for DNA polymerase. The DNA synthesis step is termed extension and is carried out by a thermostable DNA polymer- ase, most commonly Taq DNA polymerase. DNA synthesis proceeds from both of the primers until the new strands have been extended along and beyond the target DNA to be ampliﬁed. It is important to note that, since the new strands extend beyond the target DNA, they will contain a region near their 30 ends that is complementary to the other primer. Thus, if another round of DNA synthesis is allowed to take place, not only the original strands will be used as templates but also the new strands. Most interestingly, the products obtained from the new strands will have a precise length, delimited exactly by the two regions comple- mentary to the primers. As the system is taken through successive cycles of denatura- tion, annealing and extension all the new strands will act as templates and so there will be an exponential increase in the amount of DNA produced. The net effect is to selectively amplify the target DNA and the primer regions ﬂanking it (Fig. 5.34). One problem with early PCR reactions was that the temperature needed to denature the DNA also denatured the DNA polymerase. However the availability of a thermo- stable DNA polymerase enzyme isolated from the thermophilic bacterium Thermus aquaticus found in hot springs provided the means to automate the reaction. Taq DNA polymerase has a temperature optimum of 72 C and survives prolonged exposure to temperatures as high as 96 C and so is still active after each of the denaturation steps. The widespread utility of the technique is also due to the ability to automate the reaction and as such many thermal cyclers have been produced in which it is possible to program in the temperatures and times for a particular PCR reaction. 5.10.3 PCR primer design and bioinformatics The speciﬁcity of the PCR lies in the design of the two oligonucleotide primers. These have to not only be complementary to sequences ﬂanking the target DNA but also must not be self-complementary or bind each other to form dimers since both prevent DNA ampliﬁcation. They also have to be matched in their GC content and have similar annealing temperatures. The increasing use of bioinformatics resources such as Oligo, Generunner and Geneﬁsher in the design of primers makes the design and the selection of reaction conditions much more straightforward. These resources allow the sequences to be ampliﬁed, primer length, product size, GC content, etc. to be input and, following analysis, provide a choice of matched primer sequences. Indeed the initial selection and design of primers without the aid of bioinformatics would now be unnecessarily time-consuming. It is also possible to design primers with additional sequences at their 50 end such as restriction endonuclease target sites or promoter sequences. However modiﬁcations such as these require that the annealing conditions be altered to compensate for the areas of non-homology in the primers. A number of PCR methods have been developed where either one of the primers or both are random. This gives rise to 182 Molecular biology, bioinformatics and basic techniques Cycle 1 5 3 3 5 5 3 3 5 5 3 3 5 5 3 3 5 5 3 3 5 Cycle 2 5 3 3 5 5 3 3 5 5 3 3 5 Cycle 3 5 3 3 5 5 3 3 5 5 3 3 5 5 3 3 5 5 3 3 5 Fig. 5.34 Three cycles in the PCR. As the number of cycles in the PCR increases, the DNA strands that are synthesised and become available as templates are delimited by the ends of the primers. Thus speciﬁc ampliﬁcation of the desired target sequence ﬂanked by the primers is achieved. Primers are denoted as 5’ to 3’. 183 5.10 The polymerase chain reaction (PCR) arbitrary priming in genomic templates but interestingly may give rise to discrete banding patterns when analysed by gel electrophoresis. In many cases this technique may be used reproducibly to identify a particular organism or species. This is some- times referred to as random ampliﬁed polymorphic DNA (RAPD) and has been used successfully in the detection and differentiation of a number of pathogenic strains of bacteria. In addition primers can now be synthesised with a variety of labels such as ﬂuorophores bound to them allowing easier detection and quantitation using techniques such as qPCR (Section 5.10.7). 5.10.4 PCR ampliﬁcation templates DNA from a variety of sources may be used as the initial source of ampliﬁcation templates. It is also a highly sensitive technique and requires only one or two molecules for successful ampliﬁcation. Unlike many manipulation methods used in current molecular biology the PCR technique is sensitive enough to require very little template preparation. The extrac- tion from many prokaryotic and eukaryotic cells may involve a simple boiling step. Indeed the components of many extraction techniques such as SDS and proteinase K may adversely affect the PCR. The PCR may also be used to amplify RNA, a process termed RT–PCR (reverse transcriptase–PCR). Initially a reverse transcription reaction which converts the RNA to cDNA is carried out (Section 6.2.5). This reaction normally involves the use of the enzyme reverse transcriptase although some thermostable DNA polymerases used in the PCR such as Tth have a reverse transcriptase activity under certain buffer conditions. This allows mRNA transcription products to be effectively analysed. It may also be used to differentiate latent viruses (detected by standard PCR) or active viruses which replicate and thus produce transcription products and are thus detectable by RT–PCR (Fig. 5.35). In addition the PCR may be extended to determine relative amounts of a transcription product. 5.10.5 Sensitivity of the PCR The enormous sensitivity of the PCR system is also one of its main drawbacks since the very large degree of ampliﬁcation makes the system vulnerable to contamination. Even a trace of foreign DNA, such as that even contained in dust particles, may be ampliﬁed to signiﬁcant levels and may give misleading results. Hence cleanliness is paramount when carrying out PCR, and dedicated equipment and in some cases dedicated laboratories are used. It is possible that ampliﬁed products may also contaminate the PCR although this may be overcome by UV irradiation to damage already ampliﬁed products so that they cannot be used as templates. A further interesting solution is to incorporate uracil into the PCR and then treat the products with the enzyme uracil N-glycosylase (UNG) which degrades any PCR amplicons with incorporated uracil rendering them useless as templates. In addition most PCRs are now undertaken using hotstart. Here the reaction mixture is physically separated from the template or the enzyme: when the reaction begins mixing occurs and thus avoids any mispriming that may have arisen. 184 Molecular biology, bioinformatics and basic techniques + Extract poly(A) RNA 5 AAAAAAAAA 3 Anneal poly(dT) primer 5 AAAAAAAAA 3 3 TTTTTT 5 + dNTPs Extend with reverse transcriptase to form cDNA 5 AAAAAAAAA 3 3 TTTTTT 5 Use cDNA directly in the PCR 3 TTTTTT 5 Fig. 5.35 Reverse transcriptase–PCR (RT–PCR): mRNA is converted to complementary DNA (cDNA) using the enzyme reverse transcriptase. The cDNA is then used directly in the PCR. 5.10.6 Applications of the PCR Many traditional methods in molecular biology have now been superseded by the PCR and the applications for the technique appear to be unlimited. Some of the main techniques derived from the PCR are introduced in Chapter 6 while some of the main areas to which the PCR has been put to use are summarised in Table 5.5. The success of the PCR process has given impetus to the development of other ampliﬁcation techniques that are based on either thermal cycling or non-thermal cycling (isothermal) methods. The most popular alternative to the PCR is termed the ligase chain reaction or LCR. This operates in a similar fashion to the PCR but a thermostable DNA ligase joins sets of primers together which are complementary to the target DNA. Following this a similar exponential ampliﬁcation reaction takes place producing amounts of DNA that are similar to the PCR. A number of alternative ampliﬁcation techniques are listed in Table 5.6. 5.10.7 Quantitative PCR (qPCR) One of the most useful PCR applications is quantitative PCR or qPCR. This allows the PCR to be used as a means of identifying the initial concentrations of DNA or cDNA template used. Early qPCR methods involved the comparison of a standard or 185 5.10 The polymerase chain reaction (PCR) Table 5.5 Selected applications of the PCR. A number of the techniques are described in the text of Chapters 5 and 6 Field or area of study Application Speciﬁc examples or uses General molecular biology DNA ampliﬁcation Screening gene libraries Gene probe production Production/labelling Use with blots/hybridisations RNA analysis RT–PCR Active latent viral infections Forensic science Scenes of crime Analysis of DNA from blood Infection/disease monitoring Microbial detection Strain typing/analysis RAPDs Sequence analysis DNA sequencing Rapid sequencing possible Genome mapping studies Referencing points in genome Sequence-tagged sites (STS) Gene discovery mRNA analysis Expressed sequence tags (EST) Genetic mutation analysis Detection of known mutations Screening for cystic ﬁbrosis Quantiﬁcation analysis Quantitative PCR 50 Nuclease (TaqMan assay) Genetic mutation analysis Detection of unknown mutations Gel-based PCR methods (DGGE) Protein engineering Production of novel proteins PCR mutagenesis Molecular archaeology Retrospective studies Dinosaur DNA analysis Single-cell analysis Sexing or cell mutation sites Sex determination of unborn In situ analysis Studies on frozen sections Localisation of DNA/RNA Notes: RT, reverse transcriptase; RAPDs, rapid ampliﬁcation polymorphic DNA; DDGE, denaturing gradient gel electrophoresis. control DNA template ampliﬁed with separate primers at the same time as the speciﬁc target DNA. However these types of quantitation rely on the fact that all the reactions are identical and so any factors affecting this may also affect the result. The introduction of thermal cyclers that incorporate the ability to detect the accumulation of DNA through ﬂuorescent dyes binding to the DNA has rapidly transformed this area. In its simplist form a PCR is set up that includes a DNA-binding cyanine dye such as SYBR green. This dye binds to the major groove of double-stranded DNA but not single-stranded DNA and so as amplicons accumulate during the PCR process SYBR green binds the double-stranded DNA proportionally and ﬂuorescence emission of the dye can be detected following excitation. Thus the accumulation of DNA ampli- cons can be followed in real time during the reaction run. In order to quantitate unknown DNA templates a standard dilution is prepared using DNA of known concentration. As the DNA accumulates during the early exponential phase of the reaction an arbitrary point is taken where each of the dilluted DNA samples cross. This is termed the crossing threshold on Ct value. From the various Ct values a log 186 Molecular biology, bioinformatics and basic techniques Table 5.6 Selected alternative ampliﬁcation techniques to the PCR. Two broad methodologies exist that either amplify the target molecules such as DNA and RNA or detect the target and amplify a signal molecule bound to it Technique Type of assay Speciﬁc examples or uses Target ampliﬁcation methods Ligase chain reaction (LCR) Non-isothermal, employs Mutation detection thermostable DNA ligase Nucleic acid sequence Isothermal, involving use of RNA, Viral detection, e.g. HIV based ampliﬁcation (NASBA) RNase H/reverse transcriptase, and T7 DNA polymerase Signal ampliﬁcation methods Branched DNA ampliﬁcation Isothermal microwell format using Mutation detection (b-DNA) hybridisation or target/capture probe and signal ampliﬁcation Note: HIV, human immunodeﬁciency virus. graph is prepared from which an unknown concentration can be deduced. Since SYBR green and similar DNA-binding dyes are non-speciﬁc, in order to determine if a correctly sized PCR product is present most qPCR cyclers have a built-in melting curve function. This gradually increases the temperature of each tube until the double-stranded PCR product denatures or melts and allows a precise although not deﬁnitive determination of the product. Conﬁrmation of the product is usually obtained by DNA sequencing. 5.10.8 The TaqMan system In order to make qPCR speciﬁc a number of strategies may be employed that rely on speciﬁc hybridisation probes. One ingenious method is called the TaqMan assay or 50 nuclease assay. Here the probe consists of an oligonucleotide labelled with a ﬂuorescent reporter at one end of the molecule and quencher at the other end. The PCR proceeds as normal and the oligonucleotide probe binds to the target sequence in the annealing step. As the Taq polymerase extends from the primer its 50 exonuclease activity degrades the hybridisation probe and releases the reporter from the quencher. A signal is thus generated which increases in direct proportion to the number of starting molecules and ﬂuorescence can be detected in real time as the PCR proceeds (Fig. 5.36). Although relatively expensive in comparison to other methods for determining expression levels it is simple, rapid and reliable and now in use in many research and clinical areas. Further developments in probe-based PCR systems have also been used and include scorpion probe systems, ampliﬂuor and real-time LUX probes. 187 5.11 Nucleotide sequencing of DNA R Q 5 5 R Q 5 5 R Q 5 5 5 5 Fig. 5.36 5’ Nuclease assay (TaqMan assay). PCR is undertaken with RQ probe (reporter/quencher dye). As R–Q are in close proximity, ﬂuorescence is quenched. During extension by Taq polymerase the probe is cleaved as a result of Taq having 5’ nuclease activity. This cleaves R–Q probe and the reporter is released. This results in detectable increase in ﬂuorescence and allows real-time PCR detection. 5.11 NUCLEOTIDE SEQUENCING OF DNA 5.11.1 Concepts of nucleic acid sequencing The determination of the order or sequence of bases along a length of DNA is one of the central techniques in molecular biology. Although it is now possible to derive amino acid sequence information with a degree of reliability it is frequently more convenient and rapid to analyse the DNA coding information. The precise usage of codons, information regarding mutations and polymorphisms and the identiﬁcation of gene regulatory control sequences are also only possible by analysing DNA sequences. Two techniques have been developed for this, one based on an enzymatic method frequently termed Sanger sequencing after its developer, and a chemical method called Maxam and Gilbert, named for the same reason. At present Sanger 188 Molecular biology, bioinformatics and basic techniques sequencing is by far the most popular method and many commercial kits are available for its use. However, there are certain occasions such as the sequencing of short oligonucleotides where the Maxam and Gilbert method is more appropriate. One absolute requirement for Sanger sequencing is that the DNA to be sequenced is in a single-stranded form. Traditionally this demanded that the DNA fragment of interest be inserted and cloned into a specialised bacteriophage vector termed M13 which is naturally single-stranded (Section 6.3.3). Although M13 is still universally used the advent of the PCR has provided the means not only to amplify a region of any genome or cDNA but also very quickly generate the corresponding nucleotide sequence. This has led to an explosion in the accumulation of DNA sequence infor- mation and has provided much impetus for gene discovery and genome mapping (Section 6.9). The Sanger method is simple and elegant and mimics in many ways the natural ability of DNA polymerase to extend a growing nucleotide chain based on an existing template. Initially the DNA to be sequenced is allowed to hybridise with an oligonu- cleotide primer, which is complementary to a sequence adjacent to the 30 side of DNA within a vector such as M13 or in an amplicon. The oligonucleotide will then act as a primer for synthesis of a second strand of DNA, catalysed by DNA polymerase. Since the new strand is synthesised from its 50 end, virtually the ﬁrst DNA to be made will be complementary to the DNA to be sequenced. One of the dNTPs that must be provided for DNA synthesis is radioactively labelled with 32P or 35S, and so the newly synthesised strand will be labelled. 5.11.2 Dideoxynucleotide chain terminators The reaction mixture is then divided into four aliquots, representing the four dNTPs, A, C, G and T. In addition to all of the dNTPs being present in the A tube an analogue of dATP is added (20 30 -dideoxyadenosine triphosphate (ddATP)) which is similar to A but has no 30 hydroxyl group and so will terminate the growing chain since a 50 to 30 phosphodiester linkage cannot be formed without a 30 -hydroxyl group. The situation for tube C is identical except that ddCTP is added; similarly the G and T tubes contain ddGTP and ddTTP respectively (Fig. 5.37). Since the incorporation of ddNTP rather than dNTP is a random event, the reaction will produce new molecules varying widely in length, but all terminating at the same type of base. Thus four sets of DNA sequence are generated, each terminating at a different type of base, but all having a common 50 end (the primer). The four labelled and chain-terminated samples are then denatured by heating and loaded next to each other on a polyacrylamide gel for electrophoresis. Electrophoresis is performed at approximately 70 C in the presence of urea, to prevent renaturation of the DNA, since even partial renaturation alters the rates of migration of DNA fragments. Very thin, long gels are used for maximum resolution over a wide range of fragment lengths. After electrophoresis, the positions of radioactive DNA bands on the gel are deter- mined by autoradiography. Since every band in the track from the ddATP sample must contain molecules which terminate at adenine, and those in the ddCTP terminate 189 5.11 Nucleotide sequencing of DNA Fragment to be sequenced, cloned in M13 phage 3 – – – AG – – – CT GCTCGCAT – – – 5 TC – – – GA Primer DNA polymerase 4 dNTPs (radioactive) ddGTP Synthesis of complementary second strands: 5 TC – – – GA CddG 3 5 TC – – – GA CGA ddG 3 5 TC – – – GACG AGCddG 3 Denature to give single strands Run on sequencing gel alongside products of ddCTP, ddATP and ddTTP reactions ddA ddC ddG ddT Read sequence of second strand from autoradiograph 3 A T G C G A G 5 Fig. 5.37 Sanger sequencing of DNA. at cytosine, etc., it is possible to read the sequence of the newly synthesised strand from the autoradiogram, provided that the gel can resolve differences in length equal to a single nucleotide (Fig. 5.38). Under ideal conditions, sequences up to about 300 bases in length can be read from one gel. 5.11.3 Direct PCR pyrosequencing Rapid PCR sequencing has also been made possible by the use of pyrosequencing. This is a sequencing by synthesis whereby a PCR template is hybridised to an oligonucleotide and incubated with DNA polymerase, ATP sulphurylase, luciferase and apyrase. During the reaction the ﬁrst of the four dNTPs are added and if incorpor- ated release pyrophosphate (PPi). The ATP sulphurylase converts the PPi to ATP which drives the luciferase-mediated conversion of luciferin to oxyluciferin to generate light. Apyrase degrades the resulting component dNTPs and ATP. This is followed by another round of dNTP addition. A resulting pyrogram provides an output of the sequence. The method provides short reads very quickly and is especially useful for the determination of mutations or SNPs. 1 2 3 A C G T A C G T A C G T Direction of electrophoretic movement Fig. 5.38 Autoradiograph of a DNA sequencing gel. Samples were prepared using the Sanger dideoxy method of DNA sequencing. Each set of four samples was loaded into adjacent tracks, indicated by A,C, G and T, depending on the identity of the dideoxyribonucleotide used for that sample. Two sets of samples were labelled with 35S (1 and 3) and one was labelled with 32P (2). It is evident that 32P generates darker but more diffuse bands than does 35S, making the bands nearer the bottom of the autoradiograph easy to see. However, the broad bands produced by 32P cannot be resolved near the top of the autoradiograph, making it impossible to read a sequence from this region. The much sharper bands produced by 35S allow sequences to be read with conﬁdence along most of the autoradiograph and so a longer sequence of DNA can be obtained from a single gel. 191 5.11 Nucleotide sequencing of DNA It is also possible to undertake nucleotide sequencing from double-stranded molecules such as plasmid cloning vectors and PCR amplicons directly. The double- stranded DNA must be denatured prior to annealing with primer. In the case of plasmids an alkaline denaturation step is sufﬁcient; however, for amplicons this is more problematic and a focus of much research. Unlike plasmids amplicons are short and reanneal rapidly, therefore preventing the reannealing process or biasing the ampliﬁcation towards one strand by using a primer ratio of 100 : 1 overcomes this problem to a certain extent. Denaturants such as formamide or DMSO have also been used with some success in preventing the reannealing of PCR strands following their separation. It is possible to physically separate and retain one PCR strand by incorporating a molecule such as biotin into one of the primers. Following PCR one strand with an afﬁnity molecule may be removed by afﬁnity chromatography with strepavidin, leaving the complementary PCR strand. This afﬁnity puriﬁcation provides single- stranded DNA derived from the PCR amplicon and although it is somewhat time- consuming does provide high-quality single-stranded DNA for sequencing. 5.11.4 PCR cycle sequencing One of the most useful methods of sequencing PCR amplicons is termed PCR cycle sequencing. This is not strictly a PCR since it involves linear ampliﬁcation with a single primer. Approximately 20 cycles of denaturation, annealing and extension take place. Radiolabelled or ﬂuorescent-labelled dideoxynucleotides are then introduced in the ﬁnal stages of the reaction to generate the chain-terminated extension products (Fig. 5.39). Automated direct PCR sequencing is increasingly being reﬁned allowing greater lengths of DNA to be analysed in one sequencing run and provides a very rapid means of analysing DNA sequences. 5.11.5 Automated ﬂuorescent DNA sequencing Advances in ﬂuorescent dye terminator and labelling chemistry have led to the development of high-throughput automated sequencing techniques. Essentially most systems involve the use of dideoxynucleotides labelled with different ﬂuorochromes. Thus the label is incorporated into the ddNTP and this is used to carry out chain termination as in the standard reaction indicated in Section 5.11.1. The advantage of this modiﬁcation is that since a different label is incorporated with each ddNTP it is unnecessary to perform four separate reactions. Therefore the four chain-terminated products are run on the same track of a denaturing electrophoresis gel. Each product with its base-speciﬁc dye is excited by a laser and the dye then emits light at its characteristic wavelength. A diffraction grating separates the emissions which are detected by a charge-coupled device (CCD) and the sequence is interpreted by a computer. The advantages of the technique include real-time detection of the sequence. In addition the lengths of sequence that may be analysed are in excess of 500 bp (Fig. 5.40). Capillary electrophoresis is increasingly being used for the detection of 192 Molecular biology, bioinformatics and basic techniques Denaturation 5 3 3 5 5 3 3 5 ds DNA denatured by heating to > 94°C Extension/termination reaction Primer annealing reaction 5 A Label Primer 3 5 Cycle sequencing 5 3 5 A (one cycle) 3 5 3 5 Taq polymerase extends target sequences Labelled oligo anneals to target sequence until chain terminator is added (e.g. ddA) Fig. 5.39 Simpliﬁed scheme of cycle sequencing. Linear ampliﬁcation takes place with the use of labelled primers. During the extension and termination reaction, the chain terminator dideoxynucleotides are incorporated into the growing chain. This takes place in four separate reactions (A, C, G and T). The products are then run on a polyacrylamide gel and the sequence analysed. The scheme indicates the events that take place in the A reaction only. ds, double-stranded. sequencing products. This is where liquid polymers in thin capillary tubes are used obviating the need to pour sequencing gels and requiring little manual operation. This substantially reduces the electrophoresis run times and allows high throughput to be achieved. A number of large-scale sequence facilities are now fully automated using 96-well microtitre-based formats. The derived sequences can be downloaded automatically to databases and manipulated using a variety of bioinformatics resources. 5.11.6 Alternative DNA sequencing methods Developments in the technology of DNA sequencing have made whole-genome sequencing projects a realistic proposition within achievable timescales; indeed the ﬁrst diploid genome sequence to be completed was of Craig Venter who pioneered high-throughput sequencing. This makes studies on genome variation and evolution viable, as evidenced by the 1000 Genomes Project which is providing high-resolution sequence analysis of genomes. This has been made possible not only by reﬁnements in traditional automated sequencing but also by new developments such as sequen- cing by synthesis and the development of sequencing by hybridisation arrays. These methods are changing the way genome analysis is undertaken and makes individual 193 5.11 Nucleotide sequencing of DNA A Fluorescent chain termination products migrate down single C lane gel past detector G T C G Laser excitation unit G C T Diffraction grating G T G C Charge-coupled device (CCD) T G C G Computer analysis and automated base calling Fig. 5.40 Automated ﬂuorescent sequencing detection using single-lane gel and charge-coupled device. genome analysis a reality. Indeed more advanced methods using nanotechnology are in development and may provide an even more effective means of DNA sequencing. 5.11.7 Maxam and Gilbert sequencing Sanger sequencing is by far the most popular technique for DNA sequencing; however, an alternative technique developed at the same time may also be used. The chemical cleavage method of DNA sequencing developed by Maxam and Gilbert is often used for sequencing small fragments of DNA such as oligonucleotides, where Sanger sequencing is problematic. A radioactive label is added to either the 30 or the 50 ends of a double-stranded DNA sample (Fig. 5.41). The strands are then 194 Molecular biology, bioinformatics and basic techniques 32 5 – – – TACGCTCG – P 3 Single-stranded DNA, labelled only at its 3 end Modification of C using hydrazine, this removes base, leaving ribosyl urea 32 – – – TACGCT G– P 32 – – – TACG TCG– P 32 – – – TA GCTCG– P Cleavage at modified bases, using piperidine 32 G– P 32 TCG– P 32 GCTCG– P plus non-radioactive fragments Separation on sequencing gel alongside products of other modification/cleavage reactions Fig. 5.41 Maxam and Gilbert sequencing of DNA. Only modiﬁcation and cleavage of deoxycytidine is shown, but three more portions of the end-labelled DNA would be modiﬁed and cleaved at G, GþA, and TþC, and the products would be separated on the sequencing gel alongside those from the C reactions. separated by electrophoresis under denaturing conditions, and analysed separately. DNA labelled at one end is divided into four aliquots and each is treated with chemicals which act on speciﬁc bases by methylation or removal of the base. Condi- tions are chosen so that, on average, each molecule is modiﬁed at only one position along its length; every base in the DNA strand has an equal chance of being modiﬁed. Following the modiﬁcation reactions, the separate samples are cleaved by piperidine, which breaks phosphodiester bonds exclusively at the 50 side of nucleotides whose base has been modiﬁed. The result is similar to that produced by the Sanger method, since each sample now contains radioactively labelled molecules of various lengths, all with one end in common (the labelled end), and with the other end cut at the same type of base. Analysis of the reaction products by electrophoresis is as described for the Sanger method. 5.12 SUGGESTIONS FOR FURTHER READING Augen, J. (2005). Bioinformatics in the Post-Genomic Era. Reading, MA: Addison-Wesley. Brooker, R. J. (2005). Genetics Analysis and Principles, 2nd edn. New York: McGraw-Hill. Hartwell, L. et al. (2008). Genetics: From Genes to Genomes, 3rd edn. New York: McGraw-Hill. Lodish, H. et al. (2008). Molecular Cell Biology, 6th edn. San Francisco, CA: W. H. Freeman. Lewin, B. (2007). Genes IX. Sudbury, MA: Jones & Bartlett. Strachan, T. and Read, A. P (2004). Human Molecular Genetics, 3rd edn. Oxford, UK: Bios. Walker, J. M. and Rapley, R. (2008). Molecular Biomethods Handbook, 2nd edn. Totowa, NJ: Humana Press.

Principles and Techniques of Biochemistry and Molecular Biology PDF

Document Details

Tags

Related

Summary

Full Transcript