Gene Editing Methods PDF
Document Details
Uploaded by ValiantEmerald
Tags
Summary
This document provides an overview of different programmable nucleases used in gene editing, including meganucleases, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and the CRISPR-Cas system. It details their mechanisms, target DNA site sizes, and permissivity to mismatches, along with ease of reprogramming. The document also highlights the advantages and disadvantages of each method, particularly focusing on the CRISPR-Cas system's wider applicability.
Full Transcript
8 152 Fig. 8.2 Overview of the main programmable nucleases systems. Meganucleases are naturally occurring restriction enzymes, recognizing sequences of 12 to 40 base pairs. Zinc-finger nucleases (ZFNs) are chimeric proteins formed by two domains: the DNA recognition motif, composed of a tandem seq...
8 152 Fig. 8.2 Overview of the main programmable nucleases systems. Meganucleases are naturally occurring restriction enzymes, recognizing sequences of 12 to 40 base pairs. Zinc-finger nucleases (ZFNs) are chimeric proteins formed by two domains: the DNA recognition motif, composed of a tandem sequence of zinc-finger units, and the DNA-cleaving domain, harboring the nuclease activity – the bacterial FokI enzyme. Transcription activator-like effector nucleases (TALENs) are a similar Gene Editing set of engineered nucleases, with the DNA recognition motif derived from bacterial transcription activator-like effectors (TALEs). Both ZFNs and TALENs function in pairs. The main CRISPR-Cas system is based on the Streptococcus pyogenes Cas9 ribonucleoprotein. This protein is an endonuclease with two DNA-cutting active sites and binds to guide RNA sequences that are complementary to the target DNA loci. Table 8.1 Main features of the four programmable nuclease platforms used in gene editing. Origin Agents required for DSB DNA-binding mechanism Target DNA site size Permissivity to mismatches Ease of reprogramming Meganucleases Prokaryotes and eukaryotes Single protein (may dimerize) Protein-DNA interaction 12–40 bp Mildly tolerated Very difficult Zinc-finger nucleases Eukaryotes Pair of proteins TALENs Bacteria of the genus Xanthomonas Pair of proteins CRISPR-Cas (Cas9) Prokaryotes (Streptococcus pyogenes) Protein + guide RNA (gRNA or crRNA-tracrRNA) RNA-DNA base pairing Protein-DNA interaction 9–18 (single); 18–36 bp (pair) Mildly tolerated Protein-DNA interaction Up to 20 (single); up to 40 bp (pair) Mildly tolerated 20 bp Possible, but time-consuming Possible, but time-consuming Easy the yeast Saccharomyces cerevisiae and I-CreI from the green algae Chlamydomonas reinhardtii [13]. Although successfully employed for diverse approaches, meganucleases present some significant limitations. Since the DNA-binding and Tolerated DNA-cleaving domain of these proteins are one and the same, it is hard to engineer meganucleases in order to target different base sequences without affecting their cutting performance. Additionally, there is no clear correspondence between the amino acid sequence of the DNA 8.4 Programmable Nucleases Used in Gene Editing recognition site of the proteins and the DNA base pair sequence the domain recognizes, turning rational reengineering difficult, if not impossible. The above disadvantages have precluded meganuclease-based gene editing from ever achieving widespread use. In fact, meganucleases have been largely substituted by other programmable nuclease platforms that have been developed in recent years. 153 two ZFNs are required for DSB to occur and each one of them recognizes 9–18 base pairs, a ZFN pair can recognize 18–36 base pairs, providing the system a high degree of specificity. 8.4.3 Transcription Activator-Like Effector Nucleases Transcription activator-like effector nucleases (TALENs), first described in 2010 [17], are a set of engineered nucleases that are very similar to 8.4.2 Zinc-Finger Nucleases ZFNs in terms of the rationale behind their design. TALENs are also made up of two Zinc-fingers are protein motifs that can be found domains – a DNA-binding domain and a DNA- in several eukaryotic transcription factors, medi- cleavage domain – and they also function in ating interaction of these proteins with their spe- pairs. The DNA-cleavage domain is once again cific DNA targets [14]. Because of the high the bacterial FokI enzyme, but they differ from degree of specificity with which the zinc-finger ZFNs in their DNA recognition domain. motifs bind to particular base pair sequences, In the case of TALENs, recognition and bindthey have been used to generate a class of pro- ing of DNA is mediated by an array derived from grammable nucleases that was first described in bacterial transcription activator-like effectors 1996 [15]. (TALEs). TALEs were first described as an inteZinc-finger nucleases (ZFNs) are chimeric gral part of DNA-binding proteins used by plant proteins mainly composed of two distinct pathogens of the genus Xanthomonas [18]. Each domains: a DNA-binding domain and a DNA- TALE array is composed of a collection of cleaving domain, harboring the actual nuclease 34-amino acid-long modules arranged in tandem, activity [16]. The DNA-binding domain is com- and each of those units is capable of binding one posed of a tandem sequence of zinc-finger units, particular DNA base. Binding specificity is deterforming a zinc-finger array. Since each of those mined by only two residues that vary between units recognizes a particular sequence of three modules (repeat-variable diresidue – RVD) [19]. base pairs, combining different zinc-finger units Importantly, TALE arrays are costumizable yields a zinc-finger array that is able to recognize and can be assembled with a particular repeat a longer base pair sequence. For example, an order that will define their ability to bind a desigarray of 3 zinc-finger units is capable of recog- nated base pair sequence [20, 21]. Since each unit nizing a sequence of 9 base pairs, while an array recognizes one nucleotide, an array of 20 units, of 6 zinc-finger units will recognize an 18-base for example, will distinguish a particular 20 base pair sequence. pair sequence; considering a pair of TALENs is The DNA-cleaving domain linked to the zinc- needed for DSB to occur, the system is overall finger array is a bacterial restriction enzyme – able to recognize sequences of up to 40 base usually FokI, derived from Flavobacterium pairs. okeanokoites. Since FokI requires dimerization to elicit DNA cleavage, a pair of ZFNs is necessary for DNA DSB to occur; one ZFN will bind 8.4.4 CRISPR-Cas Systems to one DNA strand, and the other will bind to the complementary strand. If the target sites of each In 2013, a new programmable nuclease system ZFN are properly spaced, FokI dimerizes and was introduced, and the advantageous features it cuts both DNA strands, generating a DSB. Since presents over the previously existing gene editing 154 platforms have since gathered unprecedented interest by the scientific community – the CRISPR-Cas system. “CRISPR” is an acronym that was first used to describe particular genetic loci found in prokaryotes – clustered regularly interspaced short palindromic repeats [4, 22–24]. Puzzling at first, the extensive research work that then followed to understand the biological importance of these loci led to the recognition that CRISPR loci function as a prokaryotic adaptive immune mechanism, moved against viruses and other external sources of nucleic acids. Briefly put, CRISPR loci function as data banks that allow prokaryotes to rapidly counteract the action of invading pathogens through the action of proteins – CRISPR-associated (Cas) proteins – that are guided by RNA molecules, complementary to DNA sequences of the pathogen. The invading DNA is destroyed by the action of Cas proteins with endonuclease activity. Several classes of CRISPR systems have since been described, varying in the Cas proteins associated with the CRISPR loci, the RNA molecules that participate in the immune mechanisms, and the processes responsible for the maturation of those RNA molecules [25]. New CRISPR variants are continuously being identified, but there is one in particular that is being widely used as a gene editing platform – the one based on the Cas9 protein from a type II CRISPR system of Streptococcus pyogenes, sometimes designated as SpCas9 [26, 27]. Contrary to other CRISPR system types, type II systems require only one protein – Cas9 – to elicit cleavage of DNA. This, combined with the DNA recognition features of SpCas9 (described below) and the very history of its implementation, has led SpCas9 to be the CRISPR system that is currently more well established as a gene editing platform. SpCas9, henceforth designated simply as Cas9, is a ribonucleoprotein with endonuclease activity that is able to generate DNA DSBs through the concerted action of its two nuclease domains, each responsible for the cleavage of one DNA strand: RuvC and HNR. In the bio- 8 Gene Editing logical context of S. pyogenes, Cas9 binding to the target DNA sequence is mediated by an RNA molecule that is complementary to 20 bases of the target DNA – the crRNA. Binding occurs through complementary base pairing. Maturation of the crRNA and binding of crRNA to Cas9 are mediated by another RNA molecule – the trans-activating crRNA (tracrRNA) [28, 29]. In an experimental, or biotechnological, context, Cas9 and the guide RNA (gRNA) molecules can be artificially introduced in a eukaryotic cell, leading to a DNA DSB in the region that is complementary to the crRNA. In order to further simplify the system, the two RNA molecules have been rationally fused to create a single guide RNA (sgRNA) sequence that is sufficient to guide the nuclease activity of Cas9 [27]. Cas9 cannot be freely targeted to every 20-nucleotide sequences of the genome, and one particular requirement must be met in order for it to recognize and bind a designated genetic locus. A specific protospacer-associated motif (PAM) must be localized immediately downstream of the 20-nucleotide sequence that is to be targeted. Among other things, the PAM dictates and defines the search for putative molecular targets and is responsible for initiating binding to the target DNA sequence [28, 29]. In the biological context, the PAM requirement also prevents self-recognition and cleavage of the bacterial DNA, since crRNA-codifying sequences (spacers) lack the PAM [30]. The PAM of S. pyogenes Cas9 corresponds to an NGG motif, where N can be any nucleotide and GG are two sequential guanine nucleotides; however, it should be noted that different Cas nucleases have different PAM requirements. In a gene editing setting, the PAM actually circumscribes the range of possible targets the Cas9 protein can have, but since the motif is very short, this requirement does not constitute a great limitation. In other words, two sequential guanine nucleotides – or two sequential cytosines, in the complementary strand – are sufficiently common for Cas9 to be directed to almost anywhere in the genome. 8.5 Editing Genes Using CRISPR-Cas 8.4.5 omparing the Four Classes C of Programmable Nucleases Used in Gene Editing As with any other biotechnological tools, each class of designer nucleases employed in gene editing presents advantages and disadvantages. Although the CRISPR-Cas system is regarded as having several unprecedented advantages over the other systems, the main shortcoming of this platform is its relative proneness to elicit off-target effects. In fact, Cas9 is known to tolerate mismatches in the 20-nucleotide sequence complementarity and is thus capable of producing DSBs at unintended sites [24]. Meganucleases, ZFNs and TALENs, while also capable of tolerating mismatches, are more restringing [31]. However, the use of CRISPR-Cas presents diverse benefits over the other platforms, which were arguably responsible for its noteworthy increase in popularity since its inception [32]. Perhaps the principal advantage of CRISPR-Cas is its simple design, in what concerns to the engineering that is necessary to reprogram it to target a particular desired locus. While meganucleases are near-impossible to reprogram and both ZFNs and TALENs reengineering entail expensive and time-consuming rounds of carefully planned molecular cloning [33], directing Cas9 to a new site requires only the definition of a new 20- nucleotide sequence upstream of a PAM, which should be in the vicinity of the target site. That gRNA containing the 20-nucleotide sequence can then be introduced in cells along with Cas9, which does not require reengineering. Additionally, editing using the CRISPR-Cas system requires only one protein, instead of a pair as was the case of ZFNs and TALENs. The system is highly efficient, meaning that the probability of Cas9 inducinhg DSBs in a large percentage of the cells in a population is high [4]. CRISPR-Cas is also amenable to multiplexing, i.e., targeting several sites simultaneously [24]; using just one protein and several different gRNAs, it is possible to induce DSB at various locations at once. Finally, and as will be described later in this chapter, CRISPR-Cas is a very versa- 155 tile system and can be easily employed to perform operations that go beyond the introduction of DSBs in the DNA. The process of selecting a particular nuclease- based gene editing platform must contemplate the experimental objectives at hand, as well as the technical requirements, time constraints and costs that a particular approach may entail. The relevance of the diverse existing nuclease platforms notwithstanding, taking into account the relative simplicity of the CRISPR-Cas system and the growing interest it congregates, the following sections will focus on some practical considerations concerning gene editing using the CRISPR-Cas system. 8.5 diting Genes Using E CRISPR-Cas As mentioned above, nuclease-based gene editing relies on the action of a molecular scissor – an endonuclease – that cuts a DNA molecule in the vicinity of the site that is to be edited, and on endogenous DNA repair mechanisms that will elicit changes in the nucleotide sequence. In order to knock out a gene using CRISPR- Cas, Cas9 can be directed by a sgRNA (or a crRNA-tracrRNA pair) to an early exon of that target gene. The subsequent DSB and the ensuing NHEJ may produce a small indel mutation, which in turn can lead to a shift in the DNA reading frame, generating a premature stop codon. Alternatively, Cas9 can be directed to regions both upstream and downstream of the initiation codon (ATG), utterly removing it and preventing translation from taking place. These excision approaches can also be used to remove particular genetic regions from the genome. If a particular insertion or substitution is intended, gRNAs should direct Cas9 to the vicinity of the region to be altered. If a homology template is provided, either in the form of a single-stranded DNA molecule (for short, <200 nucleotides insertions or substitutions) or a donor plasmid (for longer insertions or substitutions), there is a possibility that the region suffering DSB will be repaired by HDR using the 8 156 exogenous template as a model, leading to the introduction of the desired sequence [10]. Design of the gRNA sequences can be performed using several bioinformatic tools that allow searching a particular sequence for PAMs and the corresponding 20 nucleotide sequences. Many of these platforms also curate every possible gRNA sequence in a designated genome and have algorithms that allow predicting the probability of off-target effects [34]. Sequences should be selected using criteria that minimize putative off-target effects, by decreasing the number of overall off-target effects and/or minimizing the number of putative off-targets in coding regions. The CRISPR-Cas tools can then be introduced in cells through a variety of methods (Fig. 8.3). Cas9 and the sgRNA (or the crRNA-tracrRNA pair) can be directly introduced in the cells in their “natural” form: as a protein and as RNA molecules, respectively. In vitro-synthetized or recombinant Cas9, along with in vitro-synthetized gRNAs, are assembled as a ribonucleoprotein complex, also in vitro, and then introduced in cells through transfection, electroporation or microinjection. Both agents can also be delivered in the form of RNA, through the same methods. Cas9-codifying mRNA will then be translated inside the cytoplasm, and the ribonucleoprotein complexes will assemble intercellularly. Finally, Cas9 and the gRNAs can be administered in the form of DNA plasmids, which will be transcribed in the cell. These plasmids may be amenable to viral production, allowing viral delivery of the CRISPR-Cas system. 8.6 Expanding the Possibilities of Gene Editing with CRISPR-Cas Even 1 year after its introduction, CRISPR-Cas had already been successfully used to modify the genome of diverse organisms, from bacteria to yeast and from agriculturally interesting plants to human cells [5]. CRISPR-Cas-based gene editing is regarded as a time- and cost-efficient method for generating new animal models, in particular of nontraditional animal species, for which previ- Gene Editing ous attempts at genetic manipulation have been unfruitful [35]. Numerous publications in recent years underline the potential of CRISPR-Cas as a very versatile tool for manipulating genomes, in ways that go beyond those dependent on the introduction of DNA DSBs. 8.6.1 ene Editing Beyond DNA G Double-Strand Breaks The above sections probably demonstrate the potential of Cas9 endonuclease activity in easily generating the conditions for gene editing to occur. The applicability of the CRISPR-Cas system goes, however, way beyond what can be achieved through DNA DSBs, and, in fact, gene editing as a whole can be regarded as a means to alter not only the genome but also its context and its products [4]. Cas9-derived tools developed so far offer the promise of altering epigenetic markers in the DNA, the architecture of the chromatin and the levels of transcribed RNA molecules, amid many other possibilities. Mutating a single amino acid in one or both nuclease activity sites of Cas9 renders the protein partially, or completely, inactive, respectively [5, 22]. When only one site is mutated, the resulting protein is termed a nickase (nCas9), since it is still able to make a nick in the DNA duplex, by cutting one of the DNA strands. The completely inactivated Cas9 is termed dead Cas9 (dCas9), and, although no longer able to produce DNA breaks, it retains its ability to specifically bind to the DNA, through complementary base pairing between the gRNAs and their targets. dCas9 can be fused to other proteins or effector domains, and several such fusion variants have been developed so far. In common they all have the fact that, by taking advantage of the homing capacity of dCas9, they are able to direct the activity of the particular effectors they harbor to specific genetic loci [32] (Fig. 8.4). Notably, zinc-fingers and TALEs can also be used as homing devices of proteins other than the FokI endonuclease, but the simplicity of CRISPR-Cas has allowed a quick and diverse expansion of its potential in this regard.