From Genes to Genomes PDF: Concepts and Applications of DNA Technology

From Genes to Genomes From Genes to Genomes Concepts and Applications of DNA Technology Jeremy W Dale and Malcolm von Schantz University of Surrey, UK Copyright # 2002 by John Wiley & Sons Ltd, Baffins Lane, Chichester, West Sussex PO19 IUD, England National 01243 779777 International (44) 1243 779777 e-mail (for orders and customer service enquiries): [email protected] Visit our Home Page on http://www.wileyeurope.com or http://www.wiley.com All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency, 90 Tottenham Court Road, London, UK W1P 9 HE, without the permission in writing of the publisher. Other Wiley Editorial Offices John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, USA Wiley-VCH Verlag GmbH, Pappelallee 3, D-69469 Weinheim, Germany John Wiley & Sons (Australia) Ltd, 33 Park Road, Milton, Queensland 4064, Australia John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 0512 John Wiley & Sons (Canada) Ltd, 22 Worcester Road, Rexdale, Ontario M9W 1L1, Canada British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 0-471 49782 7 (Hardback) 0-471 49783 5 (Paperback) Typeset in 10.5/13 pt Times by Kolam Information Services Pvt. Ltd, Pondicherry, India Printed and bound in Italy by Conti Tipocolor SpA This book is printed on acid-free paper responsibly manufactured from sustainable forestry, in which at least two trees are planted for each one used for paper production. Contents Preface xi 1 Introduction 1 2 Basic Molecular Biology 5 2.1 Nucleic Acid Structure 5 2.1.1 The DNA backbone 5 2.1.2 The base pairs 7 2.1.3 RNA structure 10 2.1.4 Nucleic acid synthesis 11 2.1.5 Coiling and supercoiling 12 2.2 Gene Structure and Organization 14 2.2.1 Operons 14 2.2.2 Exons and introns 15 2.3 Information Flow: Gene Expression 16 2.3.1 Transcription 16 2.3.2 Translation 19 3 How to Clone a Gene 21 3.1 What is Cloning? 21 3.2 Overview of the Procedures 22 3.3 Gene Libraries 25 3.4 Hybridization 26 3.5 Polymerase Chain Reaction 28 4 Purification and Separation of Nucleic Acids 31 4.1 Extraction and Purification of Nucleic Acids 31 4.1.1 Breaking up cells and tissues 31 4.1.2 Enzyme treatment 32 4.1.3 Phenol±chloroform extraction 32 4.1.4 Alcohol precipitation 33 4.1.5 Gradient centrifugation 34 4.1.6 Alkaline denaturation 34 4.1.7 Column purification 35 4.2 Detection and Quantitation of Nucleic Acids 36 vi CONTENTS 4.3 Gel Electrophoresis 36 4.3.1 Analytical gel electrophoresis 37 4.3.2 Preparative gel electrophoresis 39 5 Cutting and Joining DNA 41 5.1 Restriction Endonucleases 41 5.1.1 Specificity 42 5.1.2 Sticky and blunt ends 45 5.1.3 Isoschizomers 47 5.1.4 Processing restriction fragments 48 5.2 Ligation 49 5.2.1 Optimizing ligation conditions 51 5.3 Alkaline Phosphate 53 5.4 Double Digests 54 5.5 Modification of Restriction Fragment Ends 55 5.5.1 Trimming and filling 56 5.5.2 Linkers and adapters 57 5.5.3 Homopolymer tailing 58 5.6 Other Ways of Joining DNA Molecules 60 5.6.1 TA cloning of PCR products 60 5.6.2 DNA topoisomerase 61 5.7 Summary 63 6 Vectors 65 6.1 Plasmid Vectors 65 6.1.1 Properties of plasmid vectors 65 6.1.2 Transformation 71 6.2 Vectors Based on the Lambda Bacteriophage 73 6.2.1 Lambda biology 73 6.2.2 In vitro packaging 78 6.2.3 Insertion vectors 79 6.2.4 Replacement vectors 80 6.3 Cosmids 83 6.4 M13 Vectors 84 6.5 Expression Vectors 86 6.6 Vectors for Cloning and Expression in Eukaryotic Cells 90 6.6.1 Yeasts 90 6.6.2 Mammalian cells 92 6.7 Supervectors: YACs and BACs 96 6.8 Summary 97 7 Genomic and cDNA Libraries 99 7.1 Genomic Libraries 99 7.1.1 Partial digests 101 7.1.2 Choice of vectors 103 7.1.3 Construction and evaluation of a genomic library 106 CONTENTS vii 7.2 Growing and Storing Libraries 109 7.3 cDNA Libraries 110 7.3.1 Isolation of mRNA 111 7.3.2 cDNA synthesis 112 7.3.3 Bacterial cDNA 116 7.4 Random, Arrayed and Ordered Libraries 116 8 Finding the Right Clone 121 8.1 Screening Libraries with Gene Probes 121 8.1.1 Hybridization 121 8.1.2 Labelling probes 125 8.1.3 Steps in a hybridization experiment 126 8.1.4 Screening procedure 127 8.1.5 Probe selection 129 8.2 Screening Expression Libraries with Antibodies 132 8.3 Rescreening 135 8.4 Subcloning 136 8.5 Characterization of Plasmid Clones 137 8.5.1 Restriction digests and agarose gel electrophoresis 138 8.5.2 Southern blots 139 8.5.3 PCR and sequence analysis 140 9 Polymerase Chain Reaction (PCR) 143 9.1 The PCR Reaction 144 9.2 PCR in Practice 148 9.2.1 Optimization of the PCR reaction 149 9.2.2 Analysis of PCR products 149 9.3 Cloning PCR Products 151 9.4 Long-range PCR 152 9.5 Reverse-transcription PCR 153 9.6 Rapid Amplification of cDNA Ends (RACE) 154 9.7 Applications of PCR 157 9.7.1 PCR cloning strategies 157 9.7.2 Analysis of recombinant clones and rare events 159 9.7.3 Diagnostic applications 159 10 DNA Sequencing 161 10.1 Principles of DNA Sequencing 161 10.2 Automated Sequencing 165 10.3 Extending the Sequence 166 10.4 Shotgun Sequencing: Contig Assembly 167 10.5 Genome Sequencing 169 10.5.1 Overview 169 10.5.2 Strategies 172 10.5.3 Repetitive elements and gaps 173 viii CONTENTS 11 Analysis of Sequence Data 177 11.1 Analysis and Annotation 177 11.1.1 Open reading frames 177 11.1.2 Exon/intron boundaries 181 11.1.3 Identification of the function of genes and their products 182 11.1.4 Expression signals 184 11.1.5 Other features of nucleic acid sequences 185 11.1.6 Protein structure 188 11.1.7 Protein motifs and domains 190 11.2 Databanks 192 11.3 Sequence Comparisons 195 11.3.1 DNA sequences 195 11.3.2 Protein sequence comparisons 199 11.3.3 Sequence alignments: CLUSTAL 206 12 Analysis of Genetic Variation 209 12.1 Nature of Genetic Variation 209 12.1.1 Single nucleotide polymorphisms 210 12.1.2 Large-scale variations 212 12.1.3 Conserved and variable domains 212 12.2 Methods for Studying Variation 214 12.2.1 Genomic Southern blot analysis ± restriction fragment length polymorphisms (RFLPs) 214 12.2.2 PCR-based methods 217 12.2.3 Genome-wide comparisons 222 13 Analysis of Gene Expression 227 13.1 Analysing Transcription 227 13.1.1 Northern blots 228 13.1.2 RNase protection assay 229 13.1.3 Reverse transcription PCR 231 13.1.4 In situ hybridization 234 13.1.5 Primer extension assay 235 13.2 Comparing Transcriptomes 236 13.2.1 Differential screening 237 13.2.2 Subtractive hybridization 238 13.2.3 Differential display 240 13.2.4 Array-based methods 241 13.3 Methods for Studying the Promoter 244 13.3.1 Reporter genes 244 13.3.2 Locating the promoter 245 13.3.3 Using reporter genes to study regulatory RNA elements 248 13.3.4 Regulatory elements and DNA-binding proteins 248 13.3.5 Run-on assays 252 13.4 Translational Analysis 253 13.4.1 Western blots 253 CONTENTS ix 13.4.2 Immunocytochemistry and immunohistochemistry 254 13.4.3 Two-dimensional electrophoresis 255 13.4.4 Proteomics 256 14 Analysis of Gene Function 259 14.1 Relating Genes and Functions 259 14.2 Genetic Maps 259 14.2.1 Linked and unlinked genes 259 14.3 Relating Genetic and Physical Maps 262 14.4 Linkage Analysis 263 14.4.1 Ordered libraries and chromosome walking 264 14.5 Transposon Mutagenesis 265 14.5.1 Transposition in Drosophila 268 14.5.2 Other applications of transposons 270 14.6 Allelic Replacement and Gene Knock-outs 272 14.7 Complementation 274 14.8 Studying Gene Function through Protein Interactions 274 14.8.1 Two-hybrid screening 275 14.8.2 Phage display libraries 276 15 Manipulating Gene Expression 279 15.1 Factors Affecting Expression of Cloned Genes 280 15.2 Expression of Cloned Genes in Bacteria 284 15.2.1 Transcriptional fusions 284 15.2.2 Stability: conditional expression 286 15.2.3 Expression of lethal genes 289 15.2.4 Translational fusions 290 15.3 Expression in Eukaryotic Host Cells 292 15.3.1 Yeast expression systems 293 15.3.2 Expression in insect cells: baculovirus systems 294 15.3.3 Expression in mammalian cells 296 15.4 Adding Tags and Signals 297 15.4.1 Tagged proteins 297 15.4.2 Secretion signals 298 15.5 In vitro Mutagenesis 299 15.5.1 Site-directed mutagenesis 300 15.5.2 Synthetic genes 303 15.5.3 Assembly PCR 304 15.5.4 Protein engineering 304 16 Medical Applications, Present and Future 307 16.1 Vaccines 307 16.1.1 Subunit vaccines 309 16.1.2 Live attenuated vaccines 310 16.1.3 Live recombinant vaccines 312 16.1.4 DNA vaccines 314 x CONTENTS 16.2 Detection and Identification of Pathogens 315 16.3 Human Genetic Diseases 316 16.3.1 Identifying disease genes 316 16.3.2 Genetic diagnosis 319 16.3.3 Gene therapy 320 17 Transgenics 325 17.1 Transgenesis and Cloning 325 17.2 Animal Transgenesis and its Applications 326 17.2.1 Expression of transgenes 328 17.2.2 Embryonic stem-cell technology 330 17.2.3 Gene knock-outs 333 17.2.4 Gene knock-in technology 334 17.2.5 Applications of transgenic animals 334 17.3 Transgenic Plants and their Applications 335 17.3.1 Gene subtraction 337 17.4 Summary 338 Bibliography 339 Glossary 341 Index 353 Preface Over the last 30 years, a revolution has taken place that has put molecular biology at the heart of all the biological sciences, and has had extensive implications in many fields, including the political arena. A major impetus behind this revolution was the development of techniques that allowed the isolation of specific DNA fragments and their replication in bacterial cells (gene cloning). These techniques also included the ability to engineer bacteria (and subsequently other organisms including plants and animals) to have novel properties, and the production of pharmaceutical products. This has been referred to as genetic engineering, genetic manipulation, and genetic modification ± all meaning essentially the same thing. However, many of the applications extend further than that, and do not involve cloning of genes or genetic modification of organisms, although they draw on the knowledge derived in those ways. This includes techniques such as nucleic acid hybridization and the polymerase chain reaction (PCR), which can be applied in a wide variety of ways ranging from the analysis of differentiation of tissues to forensic applica- tions of DNA fingerprinting and the diagnosis of human genetic disorders. In an attempt to cover this range of techniques and applications, we have used the term DNA technology in the subtitle. The main title of the book, From Genes to Genomes, is derived from the progress of this revolution. It signifies the move from the early focus on the isolation and identification of specific genes to the exciting advances that have been made possible by the sequencing of complete genomes. This has in turn spawned a whole new range of technologies (post-genomics) that are designed for genome-wide analysis of gene structure and expression, including com- puter-based analyses of such large data sets (bioinformatics). The purpose of this book is to provide an introduction to the concepts and applications of this rapidly-moving and fascinating field. In writing this book, we had in mind its usefulness for undergraduate students in the biological and biomedical sciences (who we assume will have a basic grounding in molecular biology). However, it will also be relevant for many others, ranging from research workers who want to update their knowledge of related areas to xii PREFACE anyone who would like to understand rather more of the background to current controversies about the applications of some of these techniques. Jeremy W Dale Malcolm von Schantz From Genes to Genomes: Concepts and Applications of DNA Technology. Jeremy W Dale and Malcom von Schantz Copyright  2002 John Wiley & Sons, Ltd. ISBNs: 0-471-49782-7 (HB); 0-471-49783-5 (PB) 1 Introduction This book is about the study and manipulation of nucleic acids, and how this can be used to answer biological questions. Although we hear a lot about the commercial applications, in particular (at the moment) the genetic modifica- tion of plants, the real revolution lies in the incredible advances in our under- standing of how cells work. Until about 30 years ago, genetics was a patient and laborious process of selecting variants (whether of viruses, bacteria, plants or animals), and designing breeding experiments that would provide data on how the genes concerned were inherited. The study of human genetics pro- ceeded even more slowly, because of course you could only study the conse- quences of what happened naturally. Then, in the 1970s, techniques were discovered that enabled us to cut DNA precisely into specific fragments, and join them together again in different combinations. For the first time it was possible to isolate and study specific genes. Since this applied equally to human genes, the impact on human genetics was particularly marked. In parallel with this, hybridization techniques were developed that enabled the identification of specific DNA sequences, and (somewhat later) methods were introduced for determining the sequence of these bits of DNA. Combining those advances with automated techniques and the concurrent advance in computer power has led to the determination of the full sequence of the human genome. This revolution does not end with understanding how genes work and how the information is inherited. Genetics, and especially modern molecular genet- ics, underpins all the biological sciences. By studying, and manipulating, specific genes, we develop our understanding of the way in which the products of those genes interact to give rise to the properties of the organism itself. This could range from, for example, the mechanism of motility in bacteria to the causes of human genetic diseases and the processes that cause a cell to grow uncontrollably giving rise to a tumour. In many cases, we can identify precisely the cause of a specific property. We can say that a change in one single base in the genome of a bacterium will make it resistant to a certain antibiotic, or that a change in one base in human DNA could cause debilitating disease. This only scratches the surface of the power of these techniques, and indeed this book can only provide an introduction to them. Nevertheless, we hope that by the time 2 INTRODUCTION you have studied it, you will have some appreciation of what can be (and indeed has been) achieved. Genetic manipulation is traditionally divided into in vitro and in vivo work. Traditionally, investigators will first work in vitro, using enzymes derived from various organisms to create a recombinant DNA molecule in which the DNA they want to study is joined to a vector. This recombinant vector molecule is then processed in vivo inside a host organism, more often than not a strain of the Escherichia coli (E. coli) bacterium. A clone of the host carrying the foreign DNA is grown, producing a great many identical copies of the DNA, and sometimes its products as well. Today, in many cases the in vivo stage is bypassed altogether by the use of PCR (polymerase chain reaction), a method which allows us to produce many copies of our DNA in vitro without the help of a host organism. In the early days, E. coli strains carrying recombinant DNA molecules were treated with extreme caution. E. coli is a bacterium which lives in its billions within our digestive system, and those of other mammals, and which will survive quite easily in our environment, for instance in our food and on our beaches. So there was a lot of concern that the introduction of foreign DNA into E. coli would generate bacteria with dangerous properties. Fortunately, this is one fear that has been shown to be unfounded. Some natural E. coli strains are pathogenic ± in particular the O157:H7 strain which can cause severe disease or death. By contrast, the strains used for genetic manipulation are harmless disabled laboratory strains that will not even survive in the gut. Working with genetically modified E. coli can therefore be done very safely (although work with any bacterium has to follow some basic safety rules). However, the most commonly used type of vector, plasmids, are shared readily between bacteria; the transmission of plasmids between bacteria is behind much of the natural spread of antibiotic resistance. What if our recombinant plasmids were transmitted to other bacterial strains that do survive on their own? This, too, has turned out not to be a worry in the majority of cases. The plasmids themselves have been manipulated so that they cannot be readily transferred to other bacteria. Furthermore, carrying a gene such as that coding for, say, dogfish insulin, or an artificial chromosome carrying 100 000 bases of human genomic DNA is a great burden to an E. coli cell, and carries no reward whatsoever. In fact, in order to make them accept it, we have to create condi- tions that will kill all bacterial cells not carrying the foreign gene. If you fail to do so when you start your culture in the evening, you can be sure that your bacteria will have dropped the foreign gene the next morning. Evolution in progress! Whilst nobody today worries about genetically modified E. coli, and indeed diabetics have been injecting genetically modified insulin produced by E. coli for decades, the issue of genetic engineering is back on the public agenda, this time pertaining to higher organisms. It is important to distinguish the genetic INTRODUCTION 3 modification of plants and animals from cloning plants and animals. The latter simply involves the production of genetically identical individuals; it does not involve any genetic modification whatsoever. (The two technologies can be used in tandem, but that is another matter.) So, we will ignore the cloning of higher organisms here. Although it is conceptually very similar to producing a clone of a genetically modified E. coli, it is really a matter of reproductive cell biology, and frankly relatively uninteresting from the molecular point of view. By contrast, the genetic modification of higher organisms is both conceptually similar to the genetic modification of bacteria, and also very pertinent as it is a potential and, in principle, fairly easy application following the isolation and analysis of a gene. At the time of writing, the ethical and environmental consequences of this application are still a matter of vivid debate and media attention, and it would be very surprising if this is not still continuing by the time you read this. Just as in the laboratory, the genetic modification as such is not necessarily the biggest risk here. Thus, if a food crop carries a gene that makes it tolerant of herbicides (weedkillers), it would seem reasonable to worry more about increased levels of herbicides in our food than about the genetic modification itself. Equally, the worry about such an organism escaping into the wild may turn out to be exaggerated. Just as, without an evolutionary pressure to keep the genetic modification, our E. coli in the example above died out overnight, it appears quite unlikely that a plant that wastes valuable resources on producing a protein that protects it against herbicides will survive long in the wild in the absence of herbicide use. Nonetheless, this issue is by no means as clear-cut as that of genetically modified bacteria. We cannot test these organisms in a contained laboratory. They take months or a year to produce each generation, not 20 minutes as E. coli does. And even if they should be harmless in themselves, there are other issues as well, such as the one exemplified above. Thus, this is an important and complicated issue, and to understand it fully you need to know about evolu- tion, ecology, food chemistry, nutrition, and molecular biology. We hope that reading this book will be of some help for the last of these. We also hope that it will convey some of the wonder, excitement, and intellectual stimulation that this science brings to its practitioners. What better way to reverse the boredom of a long journey than to indulge in the immense satisfaction of constructing a clever new screening algorithm? Who needs jigsaw and crossword puzzles when you can figure out a clever way of joining two DNA fragments together? And how can you ever lose the fascination you feel about the fact that the drop of enzyme that you're adding to your test tube is about to manipulate the DNA molecules in it with surgical precision? From Genes to Genomes: Concepts and Applications of DNA Technology Jeremy W Dale and Malcom von Schantz Copyright  2002 John Wiley & Sons, Ltd. ISBNs: 0-471-49782-7 (HB); 0-471-49783-5 (PB) 2 Basic Molecular Biology In this book, we assume you already have a working knowledge of the basic concepts of molecular biology. This chapter serves as a reminder of the key aspects of molecular biology that are especially relevant to this book. 2.1 Nucleic Acid Structure 2.1.1 The DNA backbone Manipulation of nucleic acids in the laboratory is based on their physical and chemical properties, which in turn are reflected in their biological function. Intrinsically, DNA is a very stable molecule. Scientists routinely send DNA samples in the post without worrying about refrigeration. Indeed, DNA of high enough quality to be cloned has been recovered from frozen mammoths and mummified Pharaohs thousands of years old. This stability is provided by the robust repetitive phosphate±sugar backbone in each DNA strand, in which the phosphate links the 50 position of one sugar to the 30 position of the next (Figure 2.1). The bonds between these phosphorus, oxygen, and carbon atoms are all covalent bonds. Controlled degradation of DNA requires enzymes (nucleases) that break these covalent bonds. These are divided into endonu- cleases, which attack internal sites in a DNA strand, and exonucleases, which nibble away at the ends. We can for the moment ignore other enzymes that attack for example the bonds linking the bases to the sugar residues. Some of these enzymes are non-specific, and lead to a generalized destruction of DNA. It was the discovery of restriction endonucleases (or restriction enzymes), which cut DNA strands at specific positions, that opened up the possibility of recombinant DNA technology (`genetic engineering'), coupled with DNA ligases, which can join two double-stranded DNA molecules together. RNA molecules, which contain the sugar ribose (Figure 2.2), rather than the deoxyribose found in DNA, are less stable than DNA. This is partly due to their greater susceptibility to attack by nucleases (ribonucleases), but they are also more susceptible to chemical degradation, especially by alkaline condi- tions. 6 BASIC MOLECULAR BIOLOGY 5' end O O P O O 5' CH2 O O base 3' O O P O O 5' CH2 O O base 3' O O P O O 5' CH2 O O base 3' OH 3' end Figure 2.1 DNA backbone OH OH O O 5' CH2 OH 5' CH2 OH 4' 1' 4' 1' 3' 2' 3' 2' OH OH OH 2'-Deoxyribose Ribose Figure 2.2 Nucleic acid sugars 2.1 NUCLEIC ACID STRUCTURE 7 2.1.2 The base pairs In addition to the sugar (20 deoxyribose) and phosphate, DNA molecules contain four nitrogen-containing bases (Figure 2.3): two pyrimidines, thymine (T) and cytosine (C), and two purines, guanine (G) and adenine (A). (Other bases can be incorporated into synthetic DNA in the laboratory, and some- times other bases occur naturally.) Since the purines are bigger than the pyrimidines, a regular double helix requires a purine in one strand to be matched by a pyrimidine in the other. Furthermore, the regularity of the double helix requires specific hydrogen bonding between the bases so that they fit together, with an A opposite a T, and a G opposite a C (Figure 2.4). We refer to these pairs of bases as complementary, and hence to one strand as the complement of the other. Note that the two DNA strands run in opposite directions. In a conventional representation of a double-stranded sequence the `top' strand has a 50 hydroxyl group at the left-hand end (and is said to be written in the 50 to 30 direction), while the `bottom' strand has its 50 end at the right-hand end. Since the two strands are complementary, there is no infor- mation in the second strand that cannot be deduced from the first one. Therefore, to save space, it is common to represent a double-stranded DNA sequence by showing the sequence of only one strand. When only one strand is Purines Pyrimidines H CH3 O N N H Adenine Thymine N H N N N Sugar N O Sugar H N O N H Guanine N N H Cytosine N Sugar N N N H O Sugar H Figure 2.3 Nucleic acid bases 8 BASIC MOLECULAR BIOLOGY H N N H O CH3 Adenine N N H N Thymine Sugar N N O Sugar H N O H N Guanine N N H N Cytosine Sugar N N N H O Sugar H Figure 2.4 Base-pairing in DNA Box 2.1 Complementary sequences DNA sequences are often represented as the sequence of just one of the two strands, in the 50 to 30 direction, reading from left to right. Thus the double-stranded DNA sequence 50 -AGGCTG-30 30 -TCCGAC-50 would be shown as AGGCTG, with the orientation (i.e., the position of the 50 and 30 ends) being inferred. To get the sequence of the other (complementary) strand, you must not only change the A and G residues to T and C (and vice versa), but you must also reverse the order. So in this example, the complement of AGGCTG is CAGCCT, reading the lower strand from right to left (again in the 50 to 30 direction). shown, we use the 50 to 30 direction; the sequence of the second strand is inferred from that, and you have to remember that the second strand runs in the opposite direction. Thus a single strand sequence written as AGGCTG (or more fully 50 AGGCTG30 ) would have as its complement CAGCCT (50 CAGCCT30 ) (see Box 2.1). 2.1 NUCLEIC ACID STRUCTURE 9 Thanks to this base-pairing arrangement, the two strands can be safely separated ± both in the cell and in the test tube ± under conditions which disrupt the hydrogen bonds between the bases but are much too mild to pose any threat to the covalent bonds in the backbone. This is referred to as denaturation of DNA and, unlike the denaturation of many proteins, it is reversible. Because of the complementarity of the base pairs, the strands will easily join together again and renature. In the test tube, DNA is readily denatured by heating, and the denaturation process is therefore often referred to as melting even when it is accomplished by means other than heat (e.g. by NaOH). Denaturation of a double-stranded DNA molecule occurs over a short temperature range, and the midpoint of that range is defined as the melting temperature (Tm ). This is influenced by the base composition of the DNA. Since guanine:cytosine (GC) base pairs have three hydrogen bonds, they are stronger (i.e. melt less easily) than adenine:thymine (AT) pairs, which have only two hydrogen bonds. It is therefore possible to estimate the melting temperature of a DNA fragment if you know the sequence (or the base composition and length). These considerations are important in understanding the technique known as hybridization, in which gene probes are used to detect specific nucleic acid sequences. We will look at hybridization in more detail in Chapter 8. Although the normal base pairs (A±T and G±C) are the only forms that are fully compatible with the Watson±Crick double helix, pairing of other bases can occur, especially in situations where a regular double helix is less important (such as the folding of single-stranded nucleic acids into secondary structures ± see below). In addition to the hydrogen bonds, the double stranded DNA structure is maintained by hydrophobic interactions between the bases. The hydrophobic nature of the bases means that a single-stranded structure, in which the bases are exposed to the aqueous environment, is unstable. Pairing of the bases enables them to be removed from interaction with the surrounding water. In contrast to the hydrogen bonding, hydrophobic interactions are relatively non- specific. Thus, nucleic acid strands will tend to stick together even in the absence of specific base-pairing, although the specific interactions make the association stronger. The specificity of the interaction can therefore be in- creased by the use of chemicals (such as formamide) that reduce the hydropho- bic interactions. What happens if there is only a single nucleic acid strand? This is normally the case with RNA, but single-stranded forms of DNA also exist. For example, in some viruses the genetic material is single-stranded DNA. A single-stranded nucleic acid molecule will tend to fold up on itself to form localized double-stranded regions, including structures referred to as hairpins or stem-loop structures. This has the effect of removing the bases from the surrounding water. At room temperature, in the absence of denaturing agents, 10 BASIC MOLECULAR BIOLOGY a single-stranded nucleic acid will normally consist of a complex set of such localized secondary structure elements, which is especially evident with RNA molecules such as transfer RNA (tRNA) and ribosomal RNA (rRNA). This can also happen to a limited extent with double stranded DNA, where short sequences can tend to loop out of the regular double helix. Since this makes it easier for enzymes to unwind the DNA, and to separate the strands, these sequences can play a role in the regulation of gene expression, and in the initiation of DNA replication. A further factor to be taken into account is the negative charge on the phosphate groups in the nucleic acid backbone. This works in the opposite direction to the hydrogen bonds and hydrophobic interactions; the strong negative charge on the DNA strands causes electrostatic repulsion that tends to repel the two strands. In the presence of salt, this effect is counteracted by the presence of a cloud of counterions surrounding the molecule, neutralizing the negative charge on the phosphate groups. However, if you reduce the salt concentration, any weak interactions between the strands will be disrupted by electrostatic repulsion ± and therefore we can use low salt conditions to increase the specificity of hybridization (see Chapter 8). 2.1.3 RNA structure Chemically, RNA is very similar to DNA. The fundamental chemical difference is that the RNA backbone contains ribose rather than the 20 -deoxyribose (i.e. ribose without the hydroxyl group at the 20 position) present in DNA (Figure 2.5). However, this slight difference has a powerful effect on some properties of the nucleic acid, especially on its stability. Thus, RNA is readily destroyed byexposure to high pH. Under these conditions, DNA is stable: although the strands will separate, they will remain intact and capable of renaturation when the pH is lowered again. A further difference between RNA and DNA is that the former contains uracil rather than thymine (Figure 2.5). Generally, while most of the DNA we use is double stranded, most of the RNA we encounter consists of a single polynucleotide strand ± although we must remember the comments above regarding the folding of single-stranded nucleic acids. However, this distinction between RNA and DNA is not an inherent property of the nucleic acids themselves, but is a reflection of the natural roles of RNA and DNA in the cell, and of the method of production. In all cellular organisms (i.e. excluding viruses), DNA is the inherited material responsible for the genetic composition of the cell, and the replication process that has evolved is based on a double-stranded molecule; the roles of RNA in the cell do not require a second strand, and indeed the presence of a second, complementary, strand would preclude its role in protein synthesis. However, there are some viruses that have double-stranded RNA as their genetic material, 2.1 NUCLEIC ACID STRUCTURE 11 DNA RNA OH OH O O 5' CH2 OH 5' CH OH 2 4' 1' 4' 1' 3' 2' 3' 2' OH OH OH 2'-Deoxyribose Ribose CH3 O O N H N H N N O O Thymine Uracil Figure 2.5 Differences between DNA and RNA as well as some with single-stranded RNA, and some viruses (as well as some plasmids) replicate via single-stranded DNA forms. 2.1.4 Nucleic acid synthesis We do not need to consider all the details of how nucleic acids are synthesized. The basic features that we need to remember are summarized in Figure 2.6, which shows the addition of a nucleotide to the growing end (30 -OH) of a DNA strand. The substrate for this reaction is the relevant deoxynucleotide triphos- phate (dNTP), i.e. the one that makes the correct base-pair with the corres- ponding residue on the template strand. The DNA strand is always extended at the 30 -OH end. For this reaction to occur it is essential that the residue at the 30 -OH end, to which the new nucleotide is to be added, is accurately base- paired with its partner on the other strand. RNA synthesis occurs in much the same way, as far as this description goes, except that of course the substrates are nucleotide triphosphates (NTPs) rather than the deoxynucleotide triphosphates (dNTPs). There is one very important difference though. DNA synthesis only occurs by extension of an existing strand ± it always needs a primer to get it started. RNA polymerases on the other hand are capable of starting a new RNA strand from scratch, given the appropriate signals. 12 BASIC MOLECULAR BIOLOGY 5' end 3' end OH O 3' O P O O 5' CH2 O O O base base O 5' CH2 O O O P O 3' O 3' O P O O 5' CH2 O O O base base O 5' CH2 O O O P O 3' OH 3' O Formation of O P O O phosphodiester base O 5' CH2 O bond O O P O O O O P O O P O dNTP O O 5' CH2 O base 3' OH Figure 2.6 DNA synthesis 2.1.5 Coiling and supercoiling DNA can be denatured and renatured, deformed and reformed, and still retain unaltered function. This is a necessary feature, because as large a molecule as DNA will need to be packaged if it is to fit within the cell that it controls. The DNA of a human chromosome, if it were stretched out into an unpackaged double helix, would be several centimetres long. Thus, cells are dependent on the packaging of DNA into modified configurations for their very existence. Double-stranded DNA, in its relaxed state, normally exists as a right-handed double helix with one complete turn per 10 base pairs; this is known as the B 2.1 NUCLEIC ACID STRUCTURE 13 form of DNA. Hydrophobic interactions between consecutive bases on the same strand contribute to this winding of the helix, as the bases are brought closer together enabling a more effective exclusion of water from interaction with the hydrophobic bases. There are other forms of double helix that can exist, notably the A form (also right-handed but more compact, with 11 bases per turn) and Z-DNA which is a left-handed double helix with a more irregular appearance (a zigzag structure, hence its designation). The latter is of especial interest as certain regions of DNA sequence can trigger a localized switch between the right-handed B form and the left-handed Z form. However, natural DNA resembles most closely the B form, for most of its length. However, that is not the complete story. There are higher orders of conform- ation. The double helix is in turn coiled on itself ± an effect known as super- coiling. There is an interaction between the coiling of the helix and the degree of supercoiling. As long as the ends are fixed, changing the degree of coiling will alter the amount of supercoiling, and vice versa. The effect is easily demon- strated (and probably already familiar to you) with a telephone cord. If you rotate the receiver so as to coil up the cord more tightly and then move the receiver towards the phone you will not only see the supercoiling of the cord but also, if you look more closely, you will see that the tightness of the winding of the cord reduces as it becomes supercoiled. DNA in vivo is constrained; the ends are not free to rotate. This is most obviously true of circular DNA structures such as (most) bacterial plasmids. The net effect of coiling and supercoiling (a property known as the linking number) is therefore fixed, and cannot be changed without breaking one of the strands. In nature, there are enzymes known as topoisomerases (including DNA gyrase) that do just that: they break the DNA strands, and then in effect rotate the ends and reseal them. This alters the degree of winding of the helix and thus affects the supercoiling of the DNA. Topoisomerases also have an ingenious use in the laboratory, which we will consider in Chapter 5. So the plasmids that we will be referring to frequently in later pages are naturally supercoiled when they are isolated from the cell. However, if one of the strands is broken at any point, the DNA is then free to rotate at that point and can therefore relax into a non-supercoiled form, with the characteristic B form of the helix. This is known as an open circular form (in contrast to the covalently closed circular form of the native plasmid). The plasmid will also be in a relaxed form after insertion of a foreign DNA fragment, or other manipu- lations. Although we have resealed all the nicks in the DNA, we have not altered the supercoiling of the molecule; that will not happen until it has been reinserted into a bacterial cell. Some of the properties of the manipulated plasmid, such as its transforming ability and its mobility on an agarose gel, are therefore not the same as those of the native plasmid isolated from a bacterial cell. 14 BASIC MOLECULAR BIOLOGY 2.2 Gene Structure and Organization The definition of a `gene' is rather imprecise. Its origins go back to the early days of genetics, when it could be used to described the unit of inheritance of an observable characteristic (a phenotype). As the study of genetics progressed, it became possible to use the term gene as meaning a DNA sequence coding for a specific polypeptide, although this ignores those `genes' that code for RNA molecules such as ribosomal RNA and transfer RNA, which are not translated into proteins. It also ignores regulatory regions which are necessary for proper expression of a gene although not themselves transcribed or trans- lated. We often use the term `gene' as being synonymous with `open reading frame' (ORF), i.e. the region between the start and stop codons (although even that definition is still vague as to whether we should or should not include the stop codon itself). In bacteria, this takes place in an uninterrupted sequence. In eukaryotes, the presence of introns (see below) makes this definition more difficult; the region of the chromosome that contains the information for a specific polypeptide may be many times longer than the actual coding se- quence. Basically, it is not possible to produce an entirely satisfactory defin- ition. However, this is rarely a serious problem. We just have to be careful as to how we use the word depending on whether we are discussing only the coding region (ORF), the length of sequence that is transcribed into mRNA (including untranslated regions), or the whole unit in the widest sense (including regula- tory elements that are beyond the translation start site). In this section we want to highlight some of the key differences in `gene' organization between eukaryotes and prokaryotes (bacteria), as these differ- ences play a major role in the discussion of the application of molecular biology techniques and their use in different systems. 2.2.1 Operons In bacteria, it is quite common for a group of genes to be transcribed from a single promoter into one long RNA molecule; this group of genes is known as an operon (Figure 2.7). If we are considering protein-coding genes, the tran- scription product, messenger RNA (mRNA), is then translated into a number of separate polypeptides. This can occur by the ribosomes reaching the stop codon at the end of one polypeptide-coding sequence, terminating translation and releasing the product before re-initiating (without dissociation from the mRNA). Alternatively, the ribosomes may attach independently to internal ribosome binding sites within the mRNA sequence. Generally, the genes involved are responsible for different steps in the same pathway, and this 2.2 GENE STRUCTURE AND ORGANIZATION 15 Transcriptional Promoter terminator Gene a Gene b Gene c Gene d TRANSCRIPTION Translation start mRNA sites: TRANSLATION A B C D Figure 2.7 Structure of an operon arrangement facilitates the co-ordinate regulation of those genes, i.e. expres- sion goes up or down together in response to changing conditions. In eukaryotes, by contrast, the way in which ribosomes initiate translation is different, which means that they cannot produce separate proteins from a single mRNA in this way. There are ways in which a single mRNA can give rise to different proteins, but these work in different ways, such as different processing of the mRNA (see below) or by producing one long polyprotein or precursor which is then cleaved into different proteins (as occurs in some viruses). A few viruses do actually have internal ribosome entry sites. 2.2.2 Exons and introns In bacteria there is generally a simple one-for-one relationship between the coding sequence of the DNA, the mRNA and the protein. This is usually not true for eukaryotic cells, where the initial transcription product is many times longer than that needed for translation into the final protein. It contains blocks of sequence (introns) which are removed by processing to generate the final mRNA for translation (Figure 2.8). Introns do occur in bacteria, but quite infrequently. This is partly due to the need for economy in a bacterial cell; the smaller genome and generally more rapid growth provides an evolutionary pressure to remove unnecessary mater- ial from the genome. A further factor arises from the nature of transcription and translation in a bacterial cell. As the ribosomes are translating the mRNA while it is being made, there is usually no opportunity for sections of the RNA to be removed before translation. 16 BASIC MOLECULAR BIOLOGY Promoter Polyadenylation site 5' Precursor mRNA 3' Cap exon 1 intron exon 2 intron exon 3 AAAA Removal of introns (splicing) Mature mRNA AAAA AUG Stop codon Protein Figure 2.8 Exons and introns 2.3 Information Flow: Gene Expression The way in which genes are expressed is sufficiently central to so much of the subsequent material in this book that it is worth reviewing briefly the salient features. The basic dogma (Figure 2.9) is that while DNA is the basic genetic material that carries information from one generation to the next, its effect on the characteristics of the cell requires firstly its copying into RNA (tran- scription), and then the translation of the mRNA into a polypeptide by ribosomes. Further processes are required before its proper activity can be manifested: these include the folding of the polypeptide, possibly in association with other subunits to form a multi-subunit protein, and in some cases modifi- cation, e.g. by glycosylation or phosphorylation. It should be noted that in some cases, RNA rather than protein is the final product of a gene (ribosomal and transfer RNA molecules for example). 2.3.1 Transcription Transcription is carried out by RNA polymerase. RNA polymerase recognizes and binds to a specific sequence (the promoter), and initiates the synthesis of mRNA from an adjacent position. A typical bacterial promoter carries two consensus sequences (i.e. sequences that are closely related in all genes): TTGACA centred at position 35 (i.e. 35 bases before the transcription start site), and TATAAT at 10 (Figure 2.10). It is important to understand the nature of a consensus: few bacterial promoters 2.3 INFORMATION FLOW: GENE EXPRESSION 17 Replication DNA DNA DNA polymerase Transcription RNA polymerase RNA tRNA Translation ribosomes Protein Folding Post-translational modification Biological activity Figure 2.9 Information flow Transcription start site −20 +1 −40 −30 −10 CAGGTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACA GTCCAAATGTGAAATACGAAGGCCGAGCATACAACACACCTTAACACTCGCCTATTGTTAAAGTGT -35 region -10 region mRNA Promoter Figure 2.10 Structure of the promoter region of the lac operon; note that the 35 and 10 regions of the lac promoter do not correspond exactly with the consensus sequences TTGACA and TATAAT respectively have exactly the sequences shown but if you line up a large number of pro- moters you will see that at any one position a large number of them have the same base (see Box 2.2). The RNA polymerase has higher affinity for some promoters than others ± depending not only on the exact nature of the two consensus sequences but to a lesser extent on the sequence of a longer region. The nature and regulation of bacterial promoters, including the existence of alternative types of promoters, is considered further in Chapter 13. 18 BASIC MOLECULAR BIOLOGY Box 2.2 Examples of E. coli promoters Bases matching the ± 10 and 35 consensus sequences are boxed. Spaces are inserted to optimize the alignment. Note that the consensus is derived from a much larger collection of characterized promoters. Position 1 is the transcription start site. In eukaryotes, by contrast, the promoter is a considerably larger area around the transcription start site, where a number of trans-acting transcription factors (i.e. DNA-binding proteins encoded by genes in other parts of the genome) bind to a number of cis-acting promoter elements (i.e. elements that affect the expression of the gene next to them) in a considerably more complex scenario. The need for this added complexity can easily be imagined; if cells carrying the same genome are differentiated into a multitude of cell types fulfilling very different functions, a very sophisticated control system is needed to provide each cell type with its specific repertoire of genes, and to fine-tune the degree of expression for each one of them. Nonetheless, the promoter region, however simple or complex, gives rise to different levels of transcription of various genes. In eukaryotes, the primary transcript, heteronuclear RNA (hnRNA), is very short-lived as such, as it is processed in a number of steps. A specialized nucleotide cap is added to the 50 end; this is the site recognized by the ribosomes in protein synthesis (see below). The precursor mRNA is cleaved at a specific site towards the 30 end and a poly-A tail, consisting of a long sequence of adenosine residues, is added to the cut end.. This is a specific process, governed by polyadenylation recognition sequences in the 30 untranslated region. Nat- ure's `tagging' of mRNA molecules comes in very useful in the laboratory for the isolation of eukaryotic mRNA (see Chapter 7). Finally, in the process of splicing, the introns are spliced out and the exons are joined together. In bacteria, the processes of transcription and translation take place in the same compartment and simultaneously. In other words, the ribosomes trans- lating the mRNA follow closely behind the RNA polymerase, and polypeptide production is well under way long before the mRNA is complete. In eukary- 2.3 INFORMATION FLOW: GENE EXPRESSION 19 otes, by contrast, the mature mRNA molecule is transported out of the nucleus to the cytoplasm where translation takes place. The resulting level of protein production is dependent on the amount of the specific mRNA available, rather than just the rate of production. The level of an mRNA species will be affected by its rate of degradation as well as by its rate of synthesis. In bacteria, most mRNA molecules are degraded quite quickly (with a half life of only a few minutes), although some are much more stable. The instability of the majority of bacterial mRNA molecules means that bacteria can rapidly alter their profile of gene expression by changing the transcription of specific genes. The lifespans of most eukaryotic mRNA molecules are measured in hours rather than minutes. Again, this is a reflection of the fact that an organism that is able to control its own environment to a varying extent is subjected to less radical environmental changes. Consequently, mRNA molecules tend to be more stable in multi- cellular organisms than in, for example, yeast. Nonetheless, the principle remains: the level of an mRNA is a function of its production and degradation rates. We will discuss how to study and disentangle these parameters in Chapter 13. 2.3.2 Translation In bacteria, translation starts when ribosomes bind to a specific site (the ribosome binding site, RBS) which is adjacent to the start codon. The sequence of the ribosome binding site (also known as the Shine±Dalgarno sequence) has been recognized as being complementary to the 30 end of the 16S rRNA (Figure 2.11). The precise sequence of this site, and its distance from the start codon does affect the efficiency of translation, although in nature this is less import- ant than transcriptional efficiency in determining the level of gene expression. Translation efficiency will also depend on the codon usage, i.e. the match between synonymous codons and the availability of tRNA that will recognize each codon. This concept is explored more fully in Chapter 15. Translation mRNA 5' 16 3' sr RN A Initiator 30S ribosomal subunit tRNA met Figure 2.11 Bacterial ribosome binding site 20 BASIC MOLECULAR BIOLOGY In bacterial systems, where transcription and translation occur in the same compartment of the cell, ribosomes will bind to the mRNA as soon as the RBS has been synthesized. Thus there will be a procession of ribosomes following close behind the RNA polymerase, translating the mRNA as it is being produced. So, although the mRNA may be very short-lived, the bacteria are capable of producing substantial amounts of the corresponding polypeptide. In eukaryotes, the mechanism (as usual) is much more complicated. Instead of binding just upstream of the initiation codon, the ribosome binds at the very 50 end of the mRNA to the cap, and reads along the 50 untranslated region (UTR) until it reaches an initiation codon. The sequence AUG may be encoun- tered on the way without initiation; the surrounding sequence is also important to define the start of protein synthesis. The fact that the 50 UTR is scanned in its full length by the ribosome makes it an important region for specifying trans- lation efficiency, and different secondary structures can have either a positive or a negative effect on the amount of protein that is produced. From Genes to Genomes: Concepts and Applications of DNA Technology. Jeremy W Dale and Malcom von Schantz Copyright  2002 John Wiley & Sons, Ltd. ISBNs: 0-471-49782-7 (HB); 0-471-49783-5 (PB) 3 How to Clone a Gene 3.1 What is Cloning? Cloning means using asexual reproduction to obtain organisms that are genet- ically identical to one another, and to the `parent'. Of course, this contrasts with sexual reproduction, where the offspring are not usually identical. It is worth stressing that clones are only identical genetically; the actual appearance and behaviour of the clones will be influenced by other factors such as their environment. This applies equally to all organisms, from bacteria to humans. Despite the emotive language that increasingly surrounds the use of the word `cloning', this is a concept that will be surprisingly familiar to many people. In particular, anyone with an interest in gardening will know that it is possible to propagate plants by taking cuttings, and that in this way you will produce a number of plants that are identical to the parent. These are clones. Similarly, the routine bacteriological procedure of purifying a bacterial strain by picking a single colony for inoculating a series of fresh cultures is also a form of cloning. The term cloning is also applied to genes, as an extension of the concept. If you introduce a foreign gene into a bacterium, or any other type of cell, in such a way that it will be copied when the cell replicates, then you will produce a large number of cells all with identical copies of that piece of DNA ± you have cloned the gene (Figure 3.1). By producing a large number of copies in this way, you can sequence it or label it as a probe to study its expression in the organism it came from. You can express its protein product in bacterial or eukaryotic cells. You can mutate it and study what difference that mutation makes to the properties of the gene, its protein product, or the cell that carries it. You can even purify the gene from the bacterial clone and inject it into a mouse egg, and produce a line of transgenic mice that express it. Behind all of these applica- tions lies a cloning process with the same basic steps. In subsequent chapters, we will consider how this process is achieved, initially with bacterial cells (mainly E. coli) as the host and later extending the discussion to alternative host cells. The purpose of this chapter is to present an overview of the process, with the details of the various steps being con- sidered further in the subsequent chapters. 22 HOW TO CLONE A GENE Bacterial cloning Gene cloning Transformation Mixed Transformed bacterial bacterial culture Mixture of culture DNA fragments Each colony Each colony is derived from a single is derived from cell and contains a a single cell different DNA fragment Bacterial clones Each clone carries a different piece of DNA Figure 3.1 Comparison of bacterial cloning and gene cloning 3.2 Overview of the Procedures Some bacterial species will naturally take up DNA by a process known as transformation. However, most species have to be subjected to chemical or physical treatments before DNA will enter the cells. In all cases, the DNA will not be replicated by the host cell unless it either recombines with (i.e. is inserted into) the host chromosome or alternatively is incorporated into a molecule that is recognized by enzymes within the host cell as a substrate for replication. For most purposes the latter process is the relevant one. We use vectors to carry the DNA and allow it to be replicated. There are many types of vectors for use with bacteria. Some of these vectors are plasmids, which are naturally occurring pieces of DNA that are replicated independently of the chromosome, and are inherited by the two daughter cells when the cell divides. (In Chapter 6 we will encounter other types of vectors, including viruses that infect bacteria; these are known as bacteriophages, or phages for short.) The DNA that we want to clone is inserted into a suitable vector, producing a recombinant molecule consisting of vector plus insert (Figure 3.2). This recombinant molecule will be replicated by the bacterial cell, so that all the cells descended from that initial transformant will contain a copy of this piece of recombinant DNA. A bacterium like E. coli can replicate very rapidly under laboratory conditions, doubling every 20 minutes or so. This exponential 3.2 OVERVIEW OF THE PROCEDURES 23 DNA to be Vector cloned plasmid Ligation Recombinant plasmid Transformation Bacterial cell Bacterial replication Bacterial replication Bacterial culture containing a large number of cells; overnight growth of E. coli will produce about 109 cells/ml Figure 3.2 Basic outline of gene cloning growth gives rise to very large numbers of cells; after 30 generations (10 hours), there will be 1 109 (one thousand million, or 1 000 000 000) descendants of the initial transformant. Each one of these cells carries a copy of the recombin- ant DNA molecule, so we will have produced a very large number of copies of the cloned DNA. Of course exponential growth does not continue indefinitely; after a while, the bacteria start to run out of nutrients, and stop multiplying. (The reasons for growth stopping are actually rather more complex than that, but depletion of nutrients, including diffusion of oxygen, is the main factor.) With E. coli, this commonly occurs with about 1 109 bacteria per ml of culture. However, if we take a small sample and add it to fresh medium, exponential growth will resume. The clone can thus be propagated, and in this way we can effectively produce unlimited quantities of the cloned DNA. If we can get the bacteria to 24 HOW TO CLONE A GENE express the cloned gene, we can also get very large amounts of the product of that gene. In order to carry out this procedure, we require a method for joining pieces of DNA to such a vector, as well as a way of cutting the vector to provide an opportunity for this joining to take place. The key to the development of gene cloning technology was the discovery of enzymes that would carry out these reactions in a very precise way. The main enzymes needed are restriction endonucleases which break the sugar±phosphate backbone of DNA molecules at precise sites, and DNA ligases which are able to join together the fragments of DNA that are generated in this way (Figure 3.3). These enzymes, and the ways in which they are used, are described in more detail in Chapter 5. Once a piece of DNA has been inserted into a plasmid (forming a recombin- ant plasmid) it then has to be introduced into the bacterial host by a trans- formation process. Generally this process is not very efficient so only a small proportion of bacterial cells actually take up the plasmid. However, by using a plasmid vector that carries a gene coding for resistance to a specific antibiotic, we can simply plate out the transformed bacterial culture onto agar plates Restriction Restriction site sites Vector plasmid Cut with restriction endonuclease Linearized plasmid Ligation Recombinant plasmid Figure 3.3 Cutting and joining DNA 3.3 GENE LIBRARIES 25 containing that antibiotic, and only the cells which have received the plasmid will be able to grow and form colonies. This description does not consider how we get hold of a piece of DNA carrying the specific gene that we want to clone. Even a small and relatively simple organism like a bacterium contains thousands of genes, and they are not arranged as discrete packets but are regions of a continuous DNA molecule. We have to break this molecule into smaller fragments, which we can do specifically (using restriction endonucleases) or non-specifically (by mechanical shearing). But however we do it, we will obtain a very large number of different fragments of DNA with no easy way of reliably purifying a specific fragment, let alone isolating the specific fragment that carries the required gene. The only way of separating the fragments is by size, but there are so many fragments that there will be a lot of different pieces of DNA that are so similar in size that they cannot be separated. 3.3 Gene Libraries Fortunately it is not necessary to try to purify specific DNA fragments. One of the strengths of gene cloning is that it provides another, more powerful, way of finding a specific piece of DNA. Rather than attempting to separate the DNA fragments, we take the complete mixture and use DNA ligase to insert the fragments into the prepared vector. Under the right conditions, only one fragment will be inserted into each vector molecule. In this way, we produce a mixture of a large number of different recombinant vector molecules, which is known as a gene library (or more specifically a genomic library, to contrast it with other forms of gene library that will be described in Chapter 7). When we transform a bacterial culture with this library, each cell will only take up one molecule. When we then plate the transformed culture, each colony, which arises from a single transformed cell, will contain a large number of bacteria all of which carry the same recombinant plasmid, with a copy of the same piece of DNA from our starting mixture. So instead of a mixture of thousands (or millions, or tens of millions) of different DNA fragments, we have a large number of bacterial colonies each of which carries one fragment only (Figure 3.4). The production and screening of gene libraries is considered in Chapters 7 and 8, where we will see that a variety of different vectors, other than simple plasmids, are generally used for constructing genomic libraries. We still have a very complex mixture, but whereas purifying an individual DNA fragment is extremely difficult, it is simple to isolate individual bacterial colonies from this mixture ± we just pick them from a plate. Each individual bacterial colony will carry a different piece of DNA from our original complex mixture, so if we can identify which bacterial colony carries the gene that we are interested in, purifying it becomes a simple matter. We just have to pick the 26 HOW TO CLONE A GENE Mixture of DNA fragments Vector plasmid Mixture of recombinant plasmids n Each colony carries a different insert fragment Figure 3.4 Making a genomic library right colony and inoculate it into fresh medium. However, we still have the problem of knowing which of these thousands/millions of bacterial colonies does actually carry the gene that we want. This is considered more fully in Chapter 8, but one commonly used and very powerful method can be intro- duced here as an example. This depends on the phenomenon of hybridization. 3.4 Hybridization If a double-stranded DNA fragment is heated, the non-covalent bonds holding the two strands together will be disrupted, and the two strands will separate. This is known as denaturation, or less formally (and less accurately) as `melting'. When the solution is allowed to cool again, these bonds will reform 3.4 HYBRIDIZATION 27 and the original double-stranded fragment will be re-formed (the two strands are said to anneal). We can utilize this phenomenon to identify a specific piece of DNA in a complex mixture by labelling a specific DNA sequence (the probe), and mixing the labelled probe with the denatured mixture of fragments. When the mixture is cooled down, the probe will tend to hybridize to any related DNA fragments (Figure 3.5), which enables us to identify the specific DNA fragments that we want. For screening a gene library, the labelled probe will hybridize to DNA from any colony that carries the corresponding gene or part of it; we can then recover that colony and grow up a culture from it, thus producing an unlimited amount of our cloned gene. Of course it is not quite as simple as that ± we cannot hybridize the probe to the colonies on an agar plate. However, it is easy to transfer a part of each colony onto a membrane by replication, and then lyse the colony so that the DNA it contains is fixed to the membrane. This produces a pattern of DNA spots on the membrane in positions corresponding to the colonies on the original plate (Figure 3.6), which can then be hybridized to the labelled probe to enable identification, and recovery, of the required colony. Hybridization, using labelled DNA or RNA probes, is an important part of many other techniques that we will encounter in subsequent chapters. Denaturation and Hybridization of DNA re-annealing double-stranded with a labelled probe DNA Denaturation Denaturation Labelled probe Hybridization Annealing with probe Probe detects specific fragment Figure 3.5 Hybridization and gene probes 28 HOW TO CLONE A GENE Collection of recombinant bacteria Comparison with plate identifies the required colony Colonies replicated onto a filter and lysed to create a pattern of DNA spots Filter hybridized with a labelled probe Figure 3.6 Colony hybridization 3.5 Polymerase Chain Reaction The technique known as the polymerase chain reaction (PCR) often provides an alternative to gene cloning and gene libraries as a way of obtaining usable quantities of specific DNA sequences. PCR requires the use a pair of primers that will anneal to sites at either side of the required region of DNA (Figure 3.7). DNA polymerase action will then synthesize new DNA strands starting from each primer. Denaturation of the products, and re-annealing of the primers, will allow a second round of synthesis. Repeated cycles of denatur- ation, annealing and extension will give rise to an exponential amplification of the DNA sequence between the two primers, with the amount of product doubling in each cycle, so that after say 20 cycles there will (theoretically) be 3.5 POLYMERASE CHAIN REACTION 29 DNA template Add primers Denature and re-anneal Primer 2 Primer 1 DNA synthesis Repeat cycle n times Amplified product (2n ) Figure 3.7 Polymerase chain reaction a million-fold increase in the amount of product. This enables the amplification of a specific region of the DNA, and the product can then be cloned directly. The polymerase chain reaction, and some of the many applications, are de- scribed more fully in Chapter 9. In this chapter we have provided a brief overview of the principal methods used in gene cloning. These procedures, and some of the main alternative strategies, are described more fully in subsequent chapters. From Genes to Genomes: Concepts and Applications of DNA Technology. Jeremy W Dale and Malcom von Schantz Copyright  2002 John Wiley & Sons, Ltd. ISBNs: 0-471-49782-7 (HB); 0-471-49783-5 (PB) 4 Purification Nucleic Acids and Separation of 4.1 Extraction and Purification of Nucleic Acids The first step for most of the procedures referred to in this book is to extract the DNA (or for some purposes RNA) from the cell and to purify it by separating it from other cellular components. Although recent technological advances have made this much less of a challenge than it once was, the quality of the starting material remains a crucially important factor for most purposes. For some applications, such as PCR (see Chapter 9) or hybridization analyses (see Chapter 8), less pure material may be acceptable. In this chapter, we review the concepts underlying the most commonly used methods of purifying and frac- tionating nucleic acids; for further experimental details, you will need to consult a laboratory manual (see Appendix A). 4.1.1 Breaking up cells and tissues Although there are many different methods for purifying nucleic acids, they have a number of basic features in common. Firstly, we need the starting material. This could be a culture of bacterial or eukaryotic cells, which would simply need to be separated from the growth medium (for example by centri- fugation), or a more complex tissue sample, which first needs to be homogen- ized so that the individual cells can be lysed. Wherever possible, the material should be freshly harvested or frozen until ready to use, to avoid degradation by enzymes present in the cell extract. The cells then need to be lysed to release their components. The nature of the treatment will vary widely according to the cell type. Bacterial cells have walls that have to be broken before the cell contents can be released. This is usually accomplished by using lysozyme (an enzyme naturally present in egg white and tears for the very purpose of breaking down bacterial cell walls), often in conjunction with EDTA and a detergent such as SDS (sodium dodecyl 32 PURIFICATION AND SEPARATION OF NUCLEIC ACIDS sulphate). EDTA eliminates divalent cations and thus destabilizes the outer membrane in bacteria such as E. coli, and also inhibits DNases that would otherwise tend to degrade the DNA, while the detergent will solubilize the membrane lipids. Plant and fungal cells have cell walls that are different from those in bacteria, and require alternative treatments, either mechanical or enzymatic, while animal cells (which lack a cell wall) can usually be lysed by more gentle treatment with a mild detergent. After breaking up cell walls, and plasma membrane, we find ourselves with a mixture of that material and the intracellular components which have now been released ± a complex solution of DNA, RNA, proteins, lipids and carbohy- drates. Note that the sudden lysis of the cell will usually result in some fragmen- tation of chromosomal DNA. In particular, the bacterial chromosome, which is usually circular in its native state, will be broken into linear fragments. Where it is necessary to obtain very large (even intact) chromosomal DNA, more gentle lysis conditions are necessary (see the description of pulsed field gel electrophor- esis in Chapter 12). Bacterial plasmids, however, are readily obtained in their native, circular, state by standard lysis conditions. The next step in the procedure is to separate the desired nucleic acid from these other components. 4.1.2 Enzyme treatment Removal of RNA from a DNA preparation is easily achieved by treatment with ribonuclease (RNase). Since RNase is a very heat-stable enzyme, it is easy to ensure that it is free of traces of deoxyribonuclease (DNase) that would otherwise degrade your DNA, simply by heating the enzyme before use. Removal of DNA from RNA preparations used to be less easy, since it requires DNase without any RNase activity. However, it is now possible to buy RNase- free DNase (as well as DNase-free ribonuclease). Protein contamination can be removed by digestion with a proteolytic enzyme such as proteinase K. These treatments are applied if necessary in different nucleic acid purifica- tion protocols. However, in some protocols they are omitted, either because the contamination is unimportant for a specific purpose, or because the contamin- ants will be removed anyway through subsequent steps, as described below. 4.1.3 Phenol±chloroform extraction Removal of proteins is particularly important as the cell contains a number of enzymes that will degrade nucleic acids, as well as other proteins that will interfere with subsequent procedures by binding to the nucleic acids. A 4.1 EXTRACTION AND PURIFICATION OF NUCLEIC ACIDS 33 Neutral pH Acidic pH Aqueous layer (DNA and RNA) Aqueous layer (RNA) Protein precipitate Protein precipitate Phenol layer Phenol layer (DNA) Figure 4.1 Phenol extraction classical, and still very frequently employed way of removing proteins is by extraction with liquefied phenol, or preferably a mixture of phenol and chloro- form. Phenol and chloroform are (largely) immiscible with water, and so you will get two layers (phases) when added to your cell extract. When the mixture is vigorously agitated, the proteins will be denatured and precipitated at the interphase (Figure 4.1). If you are using phenol that has been equilibrated with a neutral or alkaline buffer (as is normally the case), the nucleic acids (DNA and RNA) will remain in the aqueous layer. On the other hand, if you carry out the extraction with acidic phenol, DNA will partition into the organic phase, allowing you to recover RNA from the aqueous phase. Phenol is naturally acidic, so equilibration with water, or the use of an acidic buffer, will produce the appropriate conditions. Phenol extraction is also useful in subsequent stages of manipulation when it is necessary to ensure that all traces of an enzyme have been removed before proceeding to the next step. (Note: phenol is highly toxic by skin absorption, and gloves must be worn.) 4.1.4 Alcohol precipitation Following phenol extraction, you will have a protein-free sample of your nucleic acid(s). However, it will probably be more dilute than you want it to be, and furthermore it will contain traces of phenol and chloroform. Phenol in particular does have a significant degree of solubility in water, and could lead to denaturation of enzymes in subsequent steps. The answer is normally to concentrate (and further purify) the solution by precipitating the nucleic acid. This is done by adding an alcohol, either isopropanol or (more frequently) ethanol; in the presence of monovalent cations (Na , K or NH 4 ) a nucleic acid precipitate forms which can be collected at the bottom of the test tube by 34 PURIF

From Genes to Genomes PDF: Concepts and Applications of DNA Technology

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue