Gene Editing PDF
Document Details
Uploaded by ValiantEmerald
C. Nóbrega et al.
Tags
Summary
This book chapter provides an overview of gene editing technologies, including their basis, mechanisms, and applications in biomedical research and gene therapy. It discusses various programmable nucleases like meganucleases, zinc-finger nucleases, TALENs, and CRISPR-Cas systems, and explores their potential for targeted gene modifications.
Full Transcript
8 Gene Editing 8.1 Rewriting the Typewriter For many decades, scientists have tried to develop methods to modify the fundamental code that underlies every single organism – the genome. The reasons behind this intent have been diverse: altering genes and proteins in order to study their roles an...
8 Gene Editing 8.1 Rewriting the Typewriter For many decades, scientists have tried to develop methods to modify the fundamental code that underlies every single organism – the genome. The reasons behind this intent have been diverse: altering genes and proteins in order to study their roles and functions; modifying cells and organisms with scientific, or commercial interest; or developing strategies and approaches to tackle human diseases, among many others. Beginning in the 1920s, scientists have made use of electromagnetic radiation, mutagenic chemical compounds, recombinant DNA and molecular cloning technology, transfection methods, viruses and transposons in order to alter the nucleic acids inside a cell, generating random mutations, inserting or removing genes, or modifying existing ones [1]. Each of these techniques has revolutionized the fields of molecular biology and biotechnology and has contributed to the development of many other areas. However, the degree of precision and fidelity these techniques allow is below the ideal, perhaps utopic, scenario of being downright able to “freely rewrite” an existing genome. In eukaryotic cells, perhaps the approach that has come consistently closest to this objective relies on the exploitation of homologous recombination mechanisms as a means to target and alter specific genetic loci [2, 3]. When a DNA sequence is inserted into a cell, there is the chance © Springer Nature Switzerland AG 2020 C. Nóbrega et al., A Handbook of Gene and Cell Therapy, https://doi.org/10.1007/978-3-030-41333-0_8 that it will be randomly inserted into one of its chromosomes. However, if that DNA fragment is additionally flanked by two regions that are homologous to a particular DNA sequence of the target cell genome, it is possible that the endogenous homologous recombination mechanisms will lead to the substitution of the homologous site with the exogenous sequence, thereby inserting or substituting the region in between the homology arms. This gene targeting strategy has been recurrently and reliably applied to the generation of knockout mouse models, as well as knock-in animals in which particular genetic sequences have been altered or inserted. These types of models have come to be some of the most valuable tools in biomedical research, providing the bases for studies that have unveiled crucial aspects of human pathologies and contributing to the development and preclinical testing of therapeutic approaches. Nonetheless, this proven method of modifying genomes may not be practical, feasible, or at all possible in every type of setting. The dominion scientists have acquired over mouse genetics is not always easily translated to other organisms, which may have their own particularities regarding life cycle, reproductive behavior, development, or the very molecular mechanisms that take place inside their cells. Moreover, generation of knock-in and knockout animals relies on a series of procedures, using, for example, stem cell cultures and animal crossings, which are not 147 8 148 conceivable in several other contexts, such as when aiming at more straightforward forms of cell manipulation. Genome manipulation as a possible pathway to the treatment of human diseases should ideally target a patient’s cells directly and allow a high degree of control over the changes that are introduced. It is tempting to conjecture that techniques may be developed that, by allowing the direct rewriting of the four-letter blueprints of the human organism, will be able to delete disease- associated genes and correct pathogenic mutations. Over the last few years, the promise of more direct, precise, and versatile methods for “rewriting” the eukaryotic genome has increasingly drawn the attention of the scientific community. Researchers are now employing a variety of new ways to manipulate genes, collectively termed as gene, or genome, editing. These techniques offer the promise of being able to target precise regions of the genome of any cell and produce a variety of customizable modifications with an unprecedented degree of consistency. 8.2 The Basis of Gene Editing Present-day approaches to gene editing rely on two conditions: (a) the ability to define the region of the genome that is to be altered and (b) the capacity to effect the actual changes or, more precisely, to create the conditions for the desired alterations to occur [4, 5]. These two abilities combine to generate the desired modification(s), at the intended locus (or loci). Definition of the target site is accomplished by molecules that specifically bind to a particular nucleotide sequence and subsequently cleave both chains of the DNA, producing a double- strand break (DBS) [6]. These molecules are endonucleases – enzymes that are able to separate nucleotides adjacently localized in the middle of a polynucleotide chain, by cutting the phosphodiester bond existing between them. In order to target a particular sequence with as much specificity as possible, endonucleases used in gene editing must be as selective as possible in Gene Editing regard to the DNA sequence that they bind to and cut. The region to be altered can coincide with the sequence targeted by the endonucleases or, alternatively, that region can be in the close vicinity of the targeted sequece. Because of the central role endonucleases have in current approaches to gene editing, the term gene editing could, and perhaps should, be more appropriately substituted by “nuclease- based gene editing” [6]. However, DNA DSBs in isolation would be insufficient to edit genes and genomes. The actual modifications that then take place in the nucleotide sequence are in fact enacted by the cell, namely, through its endogenous DNA repair mechanisms [4, 5]. As a direct consequence of a DSB, DNA repair systems are recruited to the vicinity of the DSB site [7]. Changes to the nucleotide sequence may be introduced upon DNA repair, and, if appropriate conditions are established, those changes will result in the desired modification of the target locus. The following section briefly outlines the two main mechanisms through which DNA DSBs are repaired in a cell and the ways they can be exploited in order to generate a particular desired alteration in the genome. The section after that will describe the four main classes of endonucleases that can be used to introduce the DSBs responsible for triggering modifications at the intended sites. 8.3 NA Double-Strand Break D Repair Mechanisms Maintaining the integrity of the genome is of crucial importance for the preservation of cellular homeostasis and for the overall health of the organisms the cells compose. For this reason, Life has evolved diverse systems and molecular pathways that ensure that the DNA is appropriately repaired in case an insult threatens its integrity. DNA DSBs in particular are mainly repaired through one of two mechanisms: nonhomologous end-joining (NHEJ) and homology-directed repair (HDR; Fig. 8.1). A vast array of proteins participates in both pathways, performing intricate biochemical and 8.3 DNA Double-Strand Break Repair Mechanisms 149 Fig. 8.1 Main mechanisms of DNA double-strand break (DSB) repair. Nonhomologous end-joining (NHEJ) involves resealing of the DSB site by simple linking of the two free ends of the DNA double strand. However, this process is prone to errors and often leads to the introduction or deletion of nucleotides. Homology- directed repair (HDR) is a more complex mechanism, in which the DSB is repaired using a homologous DNA molecule as template. While NHEJ may be exploited in order to generate gene knockouts or delete genetic elements, HDR may be used to introduce or substitute specific nucleotide sequences in a genome. structural operations that are beyond the scope of this chapter. Nonetheless, regarding their role in gene editing, it is important to understand that, perhaps ironically, these DNA repair mechanisms are not error-proof and can thus be manipulated in order to produce intentional changes in the genome. 8.3.1 Nonhomologous End-Joining Between the two DSB repair pathways, nonhomologous end-joining (NHEJ) is the simpler mechanism, leading to the straightforward resealing of the DSB by “regluing” the free ends left at each side of the break site [7, 8]. The proteins 8 150 mediating this pathway “mend” the “wound” that was introduced in the DNA, but, importantly, a nucleotidic “scar” may be left behind. In fact, this mechanism is somewhat error-prone, considering that NHEJ machinery may introduce or delete a small number of nucleotides as part of the process of resealing the break. As a result, the nucleotide sequence at the DSB site undergoes a small mutation, which may consist of a small insertion or a small deletion of nucleotides [4, 5]. This type of mutation is termed an indel. The number of nucleotide pairs that are added or removed from the DNA chains as part of an indel varies. If the DSB occurs at an exonic region of a gene and an indel is subsequently introduced, these small insertions or deletions of nucleotides may produce alterations in the reading frame of the mRNA molecules that will be transcribed from that gene [6]. This occurs when the indel size is not a multiple of 3. A frequent consequence of such DNA frameshifts is the appearance of premature stop codons that will halt translation and thus inhibit expression of the gene targeted by the DSB. 8.3.2 Homology-Directed Repair Gene Editing vicinity of the DSB site [4]. In a normal biological context, the repair template for HDR corresponds to the sister chromatid of the one that underwent the DSB [9]. In the context of gene editing, an exogenous DNA repair template can be provided. Repair templates can be designed so as to induce precise alterations to the genome; they must include sequences bearing complete homology to regions in the vicinity of the DSB site, but they can also include an intentionally designed, and altered, sequence. Usually, these exogenous repair templates consist of a DNA sequence including two homology arms, flanking the region that is to be inserted in the vicinity of the break site or that will substitute a portion of the genome at that vicinity. Upon DSB, the HDR machinery will repair the break using the exogenous template as a model, thereby introducing the altered sequence into the genome that is being repaired. 8.3.3 anipulating DNA Double- M Strand Break Repair Mechanisms to Edit Genomes NHEJ and HDR define, and limit, what kind of genome alterations can be achieved through Homology-directed repair (HDR) is a more con- nuclease-based gene editing. The action of both servative, and elaborate, mechanism, whereby mechanisms can be directed to produce different DSB repair is performed using a homologous changes that may be advantageous in the context DNA molecule as a repair template [7, 9]. The of biomedical investigation and gene therapy process involves (a) the generation of single- development [6]. stranded DNA (ssDNA) overhangs at the break NHEJ is an exogenous template-independent site; (b) homology-directed invasion of the DNA mechanism and relies only on the ability to pretemplate by the ssDNA; and (c) synthesis of cisely define the site at which DSBs will be introDNA primed by the invading DNA strand and duced. As explained above, simply producing a using the homologous DNA duplex as a tem- DSB in the codifying region of a gene can be sufplate. Both DNA molecules are then separated ficient to knock out that gene: the DSB can prothrough one of several different possible mecha- duce an indel mutation that will generate a nisms that may, or may not, involve DNA cross- premature stop codon. In the same way, an indel over. Whatever the case, at the end of process, can be enough to restore the reading frame of a the DNA molecule that underwent DSB and gene bearing a frameshift-inducing mutation. HDR is seamlessly repaired, in the large major- Moreover, NHEJ can also be used to “excise” a ity of cases [7]. particular genetic region. Upon producing two Though HDR is generally not as error-prone DSBs, one upstream and another downstream of as NHEJ, it may also be directed to produce a region that is to be deleted, the NHEJ pathway desirable alterations of the DNA sequence at the may reseal the DNA by uniting the end upstream 8.4 Programmable Nucleases Used in Gene Editing of the first break site and the end downstream of the second, thus excluding the intervening region. Taking advantage of HDR requires not only the ability to direct the repair machinery to the target site by inducing a DNA DSB but also the provision of a homologous repair template that will bear the particular alterations to be introduced. Broadly speaking, HDR allows for both substitutions and insertions of particular nucleotide sequences. In principle, HDR can be used to introduce a mutation of one or more nucleotides, to correct a particular mutation, or to eliminate a particular gene sequence, by providing a template in which that sequence was removed. Through HDR, particular genes can be inserted in the genome: a particular therapeutic gene may be introduced at a designated site, or a particular tag or fluorescent protein can be introduced in frame with another existing gene. It must be noted, however, that this type of outline assumes that scientists would have complete control over the DSB repair mechanisms employed by the cell. This is not the current reality. The factors that determine whether DSBs are repaired through one path or the other are still being elucidated [7, 9]. Overall, NHEJ is favored over HDR, making its applications more reliable. What is more, the absence of a repair template would completely preclude HDR in favor of NHEJ, making it more reliable still. HDR- dependent strategies are more challenging. Given its endogenous dependence on homologous sister chromatids, HDR occurs only in dividing cells, excluding any HDR-based strategy from use on postmitotic cells. Additionally, the principles governing exogenous repair template design are still not completely clear. Although small insertions or substitutions can be reliably enacted, introduction of longer sequences is still fraught with several experimental limitations [10]. 8.4 Programmable Nucleases Used in Gene Editing Gene editing relies on endonucleases to precisely define the region of the genome that will be altered. In order for a particular class of endonu- 151 cleases to be suitable for this end, they have to possess a series of characteristics that are not transversal to all nucleases that can be found in Nature. For these reasons, while some nucleases used in gene editing are, in fact, more or less similar to their natural cognates, others are artificial chimeric proteins, engineered from naturally occurring proteins and protein domains. Nucleases used in gene editing must be able to cut both chains of a DNA duplex and be highly specific, in regard to the nucleotidic sequences they target. If that was not the case, DSBs could be inserted in several different sites of the genome at the same time, producing unintended changes. A high specificity minimizes putative off-target effects. Overall, the longer a particular base sequence that a nuclease recognizes, the greater the specificity of the nuclease, since the probability of that sequence being repeated in the genome is lower. Additionally, nucleases used in gene editing are preferably programmable, i.e., they are amenable to being redesigned and reengineered in order to target them to different gene loci, with high specificity, according to the aims of the gene editing approach at hand. Since the 1980s, four classes of endonucleases have been selected and engineered for use in gene editing approaches (Fig. 8.2; Table 8.1). 8.4.1 Meganucleases Meganucleases, also named homing endonucleases, are naturally occurring restriction enzymes that are found in diverse organisms, including bacteria, archaea, fungi, algae and plants [11, 12]. Contrary to the restriction enzymes that are routinely employed in molecular cloning, such as EcoRI or HindIII, meganucleases recognize extended base pair sequences: from 12 to 40 base pairs, contrasting with the 6 base pairs of those, and many other, traditional restriction enzymes. This long recognition sequences are responsible for the meganuclease designation and for the high degree of target discrimination these enzymes possess. Among the meganucleases most used in genome engineering are I-SceI from 8 152 Fig. 8.2 Overview of the main programmable nucleases systems. Meganucleases are naturally occurring restriction enzymes, recognizing sequences of 12 to 40 base pairs. Zinc-finger nucleases (ZFNs) are chimeric proteins formed by two domains: the DNA recognition motif, composed of a tandem sequence of zinc-finger units, and the DNA-cleaving domain, harboring the nuclease activity – the bacterial FokI enzyme. Transcription activator-like effector nucleases (TALENs) are a similar Gene Editing set of engineered nucleases, with the DNA recognition motif derived from bacterial transcription activator-like effectors (TALEs). Both ZFNs and TALENs function in pairs. The main CRISPR-Cas system is based on the Streptococcus pyogenes Cas9 ribonucleoprotein. This protein is an endonuclease with two DNA-cutting active sites and binds to guide RNA sequences that are complementary to the target DNA loci. Table 8.1 Main features of the four programmable nuclease platforms used in gene editing. Origin Agents required for DSB DNA-binding mechanism Target DNA site size Permissivity to mismatches Ease of reprogramming Meganucleases Prokaryotes and eukaryotes Single protein (may dimerize) Protein-DNA interaction 12–40 bp Mildly tolerated Very difficult Zinc-finger nucleases Eukaryotes Pair of proteins TALENs Bacteria of the genus Xanthomonas Pair of proteins CRISPR-Cas (Cas9) Prokaryotes (Streptococcus pyogenes) Protein + guide RNA (gRNA or crRNA-tracrRNA) RNA-DNA base pairing Protein-DNA interaction 9–18 (single); 18–36 bp (pair) Mildly tolerated Protein-DNA interaction Up to 20 (single); up to 40 bp (pair) Mildly tolerated 20 bp Possible, but time-consuming Possible, but time-consuming Easy the yeast Saccharomyces cerevisiae and I-CreI from the green algae Chlamydomonas reinhardtii [13]. Although successfully employed for diverse approaches, meganucleases present some significant limitations. Since the DNA-binding and Tolerated DNA-cleaving domain of these proteins are one and the same, it is hard to engineer meganucleases in order to target different base sequences without affecting their cutting performance. Additionally, there is no clear correspondence between the amino acid sequence of the DNA 8.4 Programmable Nucleases Used in Gene Editing recognition site of the proteins and the DNA base pair sequence the domain recognizes, turning rational reengineering difficult, if not impossible. The above disadvantages have precluded meganuclease-based gene editing from ever achieving widespread use. In fact, meganucleases have been largely substituted by other programmable nuclease platforms that have been developed in recent years. 153 two ZFNs are required for DSB to occur and each one of them recognizes 9–18 base pairs, a ZFN pair can recognize 18–36 base pairs, providing the system a high degree of specificity. 8.4.3 Transcription Activator-Like Effector Nucleases Transcription activator-like effector nucleases (TALENs), first described in 2010 [17], are a set of engineered nucleases that are very similar to 8.4.2 Zinc-Finger Nucleases ZFNs in terms of the rationale behind their design. TALENs are also made up of two Zinc-fingers are protein motifs that can be found domains – a DNA-binding domain and a DNA- in several eukaryotic transcription factors, medi- cleavage domain – and they also function in ating interaction of these proteins with their spe- pairs. The DNA-cleavage domain is once again cific DNA targets [14]. Because of the high the bacterial FokI enzyme, but they differ from degree of specificity with which the zinc-finger ZFNs in their DNA recognition domain. motifs bind to particular base pair sequences, In the case of TALENs, recognition and bindthey have been used to generate a class of pro- ing of DNA is mediated by an array derived from grammable nucleases that was first described in bacterial transcription activator-like effectors 1996 [15]. (TALEs). TALEs were first described as an inteZinc-finger nucleases (ZFNs) are chimeric gral part of DNA-binding proteins used by plant proteins mainly composed of two distinct pathogens of the genus Xanthomonas [18]. Each domains: a DNA-binding domain and a DNA- TALE array is composed of a collection of cleaving domain, harboring the actual nuclease 34-amino acid-long modules arranged in tandem, activity [16]. The DNA-binding domain is com- and each of those units is capable of binding one posed of a tandem sequence of zinc-finger units, particular DNA base. Binding specificity is deterforming a zinc-finger array. Since each of those mined by only two residues that vary between units recognizes a particular sequence of three modules (repeat-variable diresidue – RVD) [19]. base pairs, combining different zinc-finger units Importantly, TALE arrays are costumizable yields a zinc-finger array that is able to recognize and can be assembled with a particular repeat a longer base pair sequence. For example, an order that will define their ability to bind a desigarray of 3 zinc-finger units is capable of recog- nated base pair sequence [20, 21]. Since each unit nizing a sequence of 9 base pairs, while an array recognizes one nucleotide, an array of 20 units, of 6 zinc-finger units will recognize an 18-base for example, will distinguish a particular 20 base pair sequence. pair sequence; considering a pair of TALENs is The DNA-cleaving domain linked to the zinc- needed for DSB to occur, the system is overall finger array is a bacterial restriction enzyme – able to recognize sequences of up to 40 base usually FokI, derived from Flavobacterium pairs. okeanokoites. Since FokI requires dimerization to elicit DNA cleavage, a pair of ZFNs is necessary for DNA DSB to occur; one ZFN will bind 8.4.4 CRISPR-Cas Systems to one DNA strand, and the other will bind to the complementary strand. If the target sites of each In 2013, a new programmable nuclease system ZFN are properly spaced, FokI dimerizes and was introduced, and the advantageous features it cuts both DNA strands, generating a DSB. Since presents over the previously existing gene editing 154 platforms have since gathered unprecedented interest by the scientific community – the CRISPR-Cas system. “CRISPR” is an acronym that was first used to describe particular genetic loci found in prokaryotes – clustered regularly interspaced short palindromic repeats [4, 22–24]. Puzzling at first, the extensive research work that then followed to understand the biological importance of these loci led to the recognition that CRISPR loci function as a prokaryotic adaptive immune mechanism, moved against viruses and other external sources of nucleic acids. Briefly put, CRISPR loci function as data banks that allow prokaryotes to rapidly counteract the action of invading pathogens through the action of proteins – CRISPR-associated (Cas) proteins – that are guided by RNA molecules, complementary to DNA sequences of the pathogen. The invading DNA is destroyed by the action of Cas proteins with endonuclease activity. Several classes of CRISPR systems have since been described, varying in the Cas proteins associated with the CRISPR loci, the RNA molecules that participate in the immune mechanisms, and the processes responsible for the maturation of those RNA molecules [25]. New CRISPR variants are continuously being identified, but there is one in particular that is being widely used as a gene editing platform – the one based on the Cas9 protein from a type II CRISPR system of Streptococcus pyogenes, sometimes designated as SpCas9 [26, 27]. Contrary to other CRISPR system types, type II systems require only one protein – Cas9 – to elicit cleavage of DNA. This, combined with the DNA recognition features of SpCas9 (described below) and the very history of its implementation, has led SpCas9 to be the CRISPR system that is currently more well established as a gene editing platform. SpCas9, henceforth designated simply as Cas9, is a ribonucleoprotein with endonuclease activity that is able to generate DNA DSBs through the concerted action of its two nuclease domains, each responsible for the cleavage of one DNA strand: RuvC and HNR. In the bio- 8 Gene Editing logical context of S. pyogenes, Cas9 binding to the target DNA sequence is mediated by an RNA molecule that is complementary to 20 bases of the target DNA – the crRNA. Binding occurs through complementary base pairing. Maturation of the crRNA and binding of crRNA to Cas9 are mediated by another RNA molecule – the trans-activating crRNA (tracrRNA) [28, 29]. In an experimental, or biotechnological, context, Cas9 and the guide RNA (gRNA) molecules can be artificially introduced in a eukaryotic cell, leading to a DNA DSB in the region that is complementary to the crRNA. In order to further simplify the system, the two RNA molecules have been rationally fused to create a single guide RNA (sgRNA) sequence that is sufficient to guide the nuclease activity of Cas9 [27]. Cas9 cannot be freely targeted to every 20-nucleotide sequences of the genome, and one particular requirement must be met in order for it to recognize and bind a designated genetic locus. A specific protospacer-associated motif (PAM) must be localized immediately downstream of the 20-nucleotide sequence that is to be targeted. Among other things, the PAM dictates and defines the search for putative molecular targets and is responsible for initiating binding to the target DNA sequence [28, 29]. In the biological context, the PAM requirement also prevents self-recognition and cleavage of the bacterial DNA, since crRNA-codifying sequences (spacers) lack the PAM [30]. The PAM of S. pyogenes Cas9 corresponds to an NGG motif, where N can be any nucleotide and GG are two sequential guanine nucleotides; however, it should be noted that different Cas nucleases have different PAM requirements. In a gene editing setting, the PAM actually circumscribes the range of possible targets the Cas9 protein can have, but since the motif is very short, this requirement does not constitute a great limitation. In other words, two sequential guanine nucleotides – or two sequential cytosines, in the complementary strand – are sufficiently common for Cas9 to be directed to almost anywhere in the genome. 8.5 Editing Genes Using CRISPR-Cas 8.4.5 omparing the Four Classes C of Programmable Nucleases Used in Gene Editing As with any other biotechnological tools, each class of designer nucleases employed in gene editing presents advantages and disadvantages. Although the CRISPR-Cas system is regarded as having several unprecedented advantages over the other systems, the main shortcoming of this platform is its relative proneness to elicit off-target effects. In fact, Cas9 is known to tolerate mismatches in the 20-nucleotide sequence complementarity and is thus capable of producing DSBs at unintended sites [24]. Meganucleases, ZFNs and TALENs, while also capable of tolerating mismatches, are more restringing [31]. However, the use of CRISPR-Cas presents diverse benefits over the other platforms, which were arguably responsible for its noteworthy increase in popularity since its inception [32]. Perhaps the principal advantage of CRISPR-Cas is its simple design, in what concerns to the engineering that is necessary to reprogram it to target a particular desired locus. While meganucleases are near-impossible to reprogram and both ZFNs and TALENs reengineering entail expensive and time-consuming rounds of carefully planned molecular cloning [33], directing Cas9 to a new site requires only the definition of a new 20- nucleotide sequence upstream of a PAM, which should be in the vicinity of the target site. That gRNA containing the 20-nucleotide sequence can then be introduced in cells along with Cas9, which does not require reengineering. Additionally, editing using the CRISPR-Cas system requires only one protein, instead of a pair as was the case of ZFNs and TALENs. The system is highly efficient, meaning that the probability of Cas9 inducinhg DSBs in a large percentage of the cells in a population is high [4]. CRISPR-Cas is also amenable to multiplexing, i.e., targeting several sites simultaneously [24]; using just one protein and several different gRNAs, it is possible to induce DSB at various locations at once. Finally, and as will be described later in this chapter, CRISPR-Cas is a very versa- 155 tile system and can be easily employed to perform operations that go beyond the introduction of DSBs in the DNA. The process of selecting a particular nuclease- based gene editing platform must contemplate the experimental objectives at hand, as well as the technical requirements, time constraints and costs that a particular approach may entail. The relevance of the diverse existing nuclease platforms notwithstanding, taking into account the relative simplicity of the CRISPR-Cas system and the growing interest it congregates, the following sections will focus on some practical considerations concerning gene editing using the CRISPR-Cas system. 8.5 diting Genes Using E CRISPR-Cas As mentioned above, nuclease-based gene editing relies on the action of a molecular scissor – an endonuclease – that cuts a DNA molecule in the vicinity of the site that is to be edited, and on endogenous DNA repair mechanisms that will elicit changes in the nucleotide sequence. In order to knock out a gene using CRISPR- Cas, Cas9 can be directed by a sgRNA (or a crRNA-tracrRNA pair) to an early exon of that target gene. The subsequent DSB and the ensuing NHEJ may produce a small indel mutation, which in turn can lead to a shift in the DNA reading frame, generating a premature stop codon. Alternatively, Cas9 can be directed to regions both upstream and downstream of the initiation codon (ATG), utterly removing it and preventing translation from taking place. These excision approaches can also be used to remove particular genetic regions from the genome. If a particular insertion or substitution is intended, gRNAs should direct Cas9 to the vicinity of the region to be altered. If a homology template is provided, either in the form of a single-stranded DNA molecule (for short, <200 nucleotides insertions or substitutions) or a donor plasmid (for longer insertions or substitutions), there is a possibility that the region suffering DSB will be repaired by HDR using the 8 156 exogenous template as a model, leading to the introduction of the desired sequence [10]. Design of the gRNA sequences can be performed using several bioinformatic tools that allow searching a particular sequence for PAMs and the corresponding 20 nucleotide sequences. Many of these platforms also curate every possible gRNA sequence in a designated genome and have algorithms that allow predicting the probability of off-target effects [34]. Sequences should be selected using criteria that minimize putative off-target effects, by decreasing the number of overall off-target effects and/or minimizing the number of putative off-targets in coding regions. The CRISPR-Cas tools can then be introduced in cells through a variety of methods (Fig. 8.3). Cas9 and the sgRNA (or the crRNA-tracrRNA pair) can be directly introduced in the cells in their “natural” form: as a protein and as RNA molecules, respectively. In vitro-synthetized or recombinant Cas9, along with in vitro-synthetized gRNAs, are assembled as a ribonucleoprotein complex, also in vitro, and then introduced in cells through transfection, electroporation or microinjection. Both agents can also be delivered in the form of RNA, through the same methods. Cas9-codifying mRNA will then be translated inside the cytoplasm, and the ribonucleoprotein complexes will assemble intercellularly. Finally, Cas9 and the gRNAs can be administered in the form of DNA plasmids, which will be transcribed in the cell. These plasmids may be amenable to viral production, allowing viral delivery of the CRISPR-Cas system. 8.6 Expanding the Possibilities of Gene Editing with CRISPR-Cas Even 1 year after its introduction, CRISPR-Cas had already been successfully used to modify the genome of diverse organisms, from bacteria to yeast and from agriculturally interesting plants to human cells [5]. CRISPR-Cas-based gene editing is regarded as a time- and cost-efficient method for generating new animal models, in particular of nontraditional animal species, for which previ- Gene Editing ous attempts at genetic manipulation have been unfruitful [35]. Numerous publications in recent years underline the potential of CRISPR-Cas as a very versatile tool for manipulating genomes, in ways that go beyond those dependent on the introduction of DNA DSBs. 8.6.1 ene Editing Beyond DNA G Double-Strand Breaks The above sections probably demonstrate the potential of Cas9 endonuclease activity in easily generating the conditions for gene editing to occur. The applicability of the CRISPR-Cas system goes, however, way beyond what can be achieved through DNA DSBs, and, in fact, gene editing as a whole can be regarded as a means to alter not only the genome but also its context and its products [4]. Cas9-derived tools developed so far offer the promise of altering epigenetic markers in the DNA, the architecture of the chromatin and the levels of transcribed RNA molecules, amid many other possibilities. Mutating a single amino acid in one or both nuclease activity sites of Cas9 renders the protein partially, or completely, inactive, respectively [5, 22]. When only one site is mutated, the resulting protein is termed a nickase (nCas9), since it is still able to make a nick in the DNA duplex, by cutting one of the DNA strands. The completely inactivated Cas9 is termed dead Cas9 (dCas9), and, although no longer able to produce DNA breaks, it retains its ability to specifically bind to the DNA, through complementary base pairing between the gRNAs and their targets. dCas9 can be fused to other proteins or effector domains, and several such fusion variants have been developed so far. In common they all have the fact that, by taking advantage of the homing capacity of dCas9, they are able to direct the activity of the particular effectors they harbor to specific genetic loci [32] (Fig. 8.4). Notably, zinc-fingers and TALEs can also be used as homing devices of proteins other than the FokI endonuclease, but the simplicity of CRISPR-Cas has allowed a quick and diverse expansion of its potential in this regard. 8.6 Expanding the Possibilities of Gene Editing with CRISPR-Cas 157 Fig. 8.3 Possible delivery strategies for the CRISPR- Cas9 system.The different components of the CRISPR- Cas9 system can be provided in multiple forms. Cas9 can be delivered as a Cas9 cDNA-containing plasmid, as an mRNA molecule or as a recombinant protein, already complexed with the guide RNA molecule(s). Apart from this route, the guide RNA can also be delivered in an unconjugated form to cells, as well as in the form of DNA plasmids. dCas9 fused to a transcriptional activator such as VP64 can be directed to a particular gene, recruiting transcriptional machinery that will lead to an increase of the expression levels of that gene. Conversely, dCas9 conjugated with a transcriptional repressor such as the Krüppel- associated box (KRAB) domain can be used to decrease the expression of a target gene, without introducing a DNA DSB and thus preventing the possibility of an undesirable indel mutation. Transcription regulation can also be achieved using dCas9 fused to epigenetic modifiers that alter the acetylation levels of histones, or the methylation levels of the DNA, or pairs of dCas9 fused with proteins that are able to interact and thereby “bend” the chromatin, generating loops in its topology [32]. Fusing fluorescent proteins such as the green fluorescent protein (GFP) can be used to pinpoint the localization of a particular genetic sequence in a chromosome. What is more, mutation of the target genome has also been shown to be achievable without recurring to DNA DSBs. Fusing deaminase proteins such as APOBEC1 along with excision repair inhibitor UGI to dCas9 or nCas9 has been shown to produce direct conversion between nucleotides. This type of strategy, termed base 158 8 Gene Editing Fig. 8.4 Versatility of the CRISPR-Cas9 system. The figure highlights possible applications of the CRISPR- Cas9 system that go beyond the introduction of DNA double-strand breaks. These application are based on the use of partially (nCas9) or completely (dCas9) catalytically inactivated Cas9 and include transcription activation or repression, epigenetic modulation and base editing, among others. editing, has been successfully employed in converting cytosines to thymines and adenines to guanines [36, 37]. In addition to this search to expand what Cas9 can do, scientists have also looked for ways to reliably control when and where its action takes place. In fact, several variants of inducible Cas9 have also been developed. Among them, some respond to chemical compounds and others to ligand-binding, and still others are activated by optical light [32]. In recent years, reengineering of Cas9 and even of the gRNA sequences has yielded a vast array of tools that is continuously being expanded 8.7 Limitations and Challenges to Current Gene Editing Strategies 159 on and that is thus optimizing the applications of the CRISPR-Cas9 system. high versatility are important factors in this ongoing search for the “ultimate” Cas protein. 8.6.2 8.7 Cas Variants The Cas9 protein from S. pyogenes is not the only Cas that has drawn the attention of scientists looking at improving the existing gene editing platforms. Many more Cas proteins have been identified and more or less extensively characterized, leading to the creation of an ever-expanding Cas protein library. Its remaining members, although not having such a widespread use as SpCas9, certainly contribute to the ever-growing possibilities offered by the CRISPR-Cas system. Different Cas proteins, including different Cas9 proteins from species other than S. pyogenes, have been described to be diversely prone to off-target effects and are known to display distinctive PAM requirements [32]. Some of these PAMs are longer than three nucleotides, and although this restricts the range of targets the respective Cas proteins can have, the increase can be beneficial in further decreasing the number of off-targets. Different PAMs may also allow targeting regions that the NGG PAM requirement of SpCas9 does not allow. Different Cas proteins also display different molecular sizes, and smaller Cas variants may be advantageous regarding their delivery to cells, for example. SpCas9 is too large for its codifying DNA to be included in AAV particles along with the gRNA; as such, currently each agent has to be provided in a different AAV vector. Smaller Cas variants may be more amenable to AVV-mediated delivery. Finally, some Cas proteins have been described to target nucleic acids differently from SpCas9 [38]. Cas12a, for example, has been shown to produce staggered DNA breaks, leading to the generation of sticky ends at the cut site. Cas13a has been shown to target RNA. New Cas variants are continuously coming to light, in a constant search for a particular Cas protein, or set of Cas proteins, that may display advantages over all others. A smaller molecular size, a low incidence of off-target effects, and a Limitations and Challenges to Current Gene Editing Strategies As promising as it may be, implementation of gene editing in an experimental setting, let alone in the development of a gene therapy approach, is subject to several limitations and challenges that must be accounted for. Some of them can be avoided or controlled for, but others will require continuous investigation and development before they can be tackled with unquestionable success. Overall, challenges to the application of gene editing include (a) the ability to properly deliver the molecular tools, (b) problems with specificity, (c) guarantees of fidelity, and (d) control over DSB repair and HDR [4–6]. As explained in Chap. 4, delivery of gene therapy agents faces several physiological barriers that limit their efficacy. What is more, it is important to ensure that the molecular tools reach the proper organ or tissue where their therapeutic effect will take place. All the while, the delivery mechanisms should pose no threat to the health and safety of the organism. Selection and design of methods for the delivery of gene editing tools should abide by the same principles as other gene therapy approaches, possibly preferring ex vivo strategies over local or systemic administration as a means to improve safety and minimize delivery to unintended tissues or cells [31]. As mentioned above, none of the gene editing nuclease platforms is devoid of possible off- target effects, by virtue of the permissiveness every system has, on a higher or lesser degree, to mismatch tolerance. Ideally, the nuclease systems would have no off-targets, but since it is currently impossible to ensure this, it is crucial that gene editing strategies minimize the occurrence of unintended modifications. This can be done by selecting variants with a proven lower degree of off-target activity or, in the case of CRISPR-Cas, selecting gRNA sequences with low levels of predicted off-targets. After editing, it is important 8 160 to be able to search the targeted genome for putative off-target mutations. This can be done by whole-genome sequencing and through other, recently developed, targeted approaches, but such techniques may not be available to every lab. Fidelity, in the context of gene editing, describes the degree to which the intended alteration was inserted in the genome and chiefly concerns mutations introduced by HDR [6]. As a result of a DSB and in the presence of an HDR repair template, the intended insert may be introduced at a site other than the one expected, more than one copy of the insert may be introduced, and translocations may occur, among many other unintended changes with disastrous effects. Techniques that allow confirmation that only the intended changes took place are necessary. Finally, the infrequent occurrence of HDR is a limiting factor for strategies that rely on this pathway of DSB repair for the intended genome alteration to occur. HDR is limited to dividing cells, and its rate is generally low, compared to NHEJ [9]. Several lines of research have invested in increasing the rate of HDR, but, as with any other manipulation that may interfere with DNA repair mechanisms, the possibility of unexpected outcomes may outbalance the benefits of the intervention. Nonetheless, in the case of the CRISPR-Cas system, diverse approaches have been described to increase success of HDR- mediated editing, including employing nickases instead of wild-type Cas9, since a DNA nick is less prone to induce NHEJ-derived indels while still potentiating HDR; enriching cell cultures with cells in the G2/M phase of the cell cycle; inducing “cold shocks”; overexpressing the Rad51 protein; employing small molecules and other chemical compounds (RS-1, Brefeldin A, L755507, Nocodazole); and fusing Cas9 with proteins involved in HDR, among many others [39–43]. Studies aiming to define the best parameters for donor repair template design are also ongoing. 8.8 Gene Editing ene Editing as a Tool G for Human Disease Therapy The possibilities offered by existing gene editing strategies can be translated into diverse approaches to tackle human health conditions and disease. Importantly, gene editing may be a significant tool not only for direct therapeutic intervention but also in other, no less relevant, steps of the therapy development pipeline [44]. Nuclease-based gene editing can be employed in basic research, assisting in the investigation of gene functions. The CRISPR-Cas system, in particular, can be used to perform high-throughput screening of disease modifiers, which may yield important clues into disease pathogenesis and possibly constitute relevant therapeutic targets [45]. Gene editing is also a potent method for generating disease models and can be used to develop isogenic cell lines, i.e., cell cultures derived from individuals, where disease-causing genes can be introduced or removed, thus producing cultures with the same genetic background, in which the only modifying factor is the designated genetic factor [46]. Additionally, gene editing allows for the rapid and inexpensive generation of animal models, compared with traditional methods used for generating transgenic knockout, and knock-in animals [35]. CRISPR- Cas9 is particularly well-suited for modeling complexed diseases, by allowing the alteration of several genes simultaneously through its multiplexing capability [47]. Concerning the use of gene editing as a therapeutic approach, the interest this field is drawing has led research teams to develop and test diverse gene therapy strategies that are based on the growing capabilities of the described systems to operate changes in the DNA. Current literature offers diverse examples, overall focusing on one of several routes: (a) correcting pathogenic mutations; (b) inactivating diseasecausing genes; (c) re-establishing gene functions; (d) eliminating disease-causing elements; (e) introducing protective mutations; and (f) Review Questions generating cells with therapeutic activities. Examples of the potential of these approaches abound in reports using both cell cultures and animal models, and some strategies have already transitioned to clinical testing. Two examples of gene editing-based approaches to therapy follow: one focusing on a genetic disorder and another on an infectious disease. As described in Chap. 7, Duchenne muscular atrophy (DMD) is a hereditary disorder that arises as a consequence of mutations in the DMD gene, which in healthy individuals codifies dystrophin, a protein that plays a crucial role in muscular structure and physiology (Fig. 7.4). Amoasii and collaborators systemically administered AVVs codifying CRISPR-Cas tools to dogs harboring a deletion of exon 50 of the DMD gene, which is a hot spot for mutations linked to DMD, in humans. CRISPR-Cas activity was directed at an early region of exon 51. Overall, treated animals displayed an improvement in muscular histology and dystrophin levels. Indels were detected at the targeted genomic site, and the amelioration observed was related to reestablishment of the reading frame of the gene or exon 51 skipping [48]. Human immunodeficiency virus (HIV) infection of T-cells relies on interaction of the virus with cell receptors that mediate its internalization (Fig. 3.4). As explained in Chap. 3, one of these receptors is CCR5, and it has been known for some time that individuals bearing mutated forms of this receptor are refractory to HIV infection. It has been thus hypothesized that knocking out CCR5 gene function through the action of programmable nucleases may be beneficial in preventing HIV entry into lymphocytes [49, 50]. In 2014, in the very first clinical trial using gene editing, HIV patient cells were edited ex vivo with ZFNs and then autologously reinfused, with promising results in what regards to viral loads in circulation [51–53]. Other clinical trials using similar rationales were also underway at the time, and others followed, with a clinical trial for HIV using CRISPR-Cas-edited stem cells being currently underway [54]. 161 This Chapter in a Nutshell • Current gene editing strategies rely on two conditions, (i) the ability to define the specific region of the genome to be altered and (ii) the ability to create conditions for the desired alterations to occur. • The definition of the target site is accomplished by molecules that specifically bind to a nucleotide sequence and then cleave the DNA producing a double-strand break (DSB). These are then repaired through different endogenous mechanisms of DNA repair. • There are two main mechanisms of DNA DSB repair, nonhomologous end-joining (NHEJ) and homology-direcgted repair (HDR), with the former being simpler and the latter involving a DNA molecule as template for the repair. • Four main systems of programmable nucleases have been used in gene editing: meganucleases, zinc-finger nucleases, TALENs, and the CRISPR-Cas system. • The CRISPR-Cas system presents advantageous features over the other programmable nuclease systems. Several variations of the system have been developed, or example in order to alter epigenetic markers, the architecture of the chromatin and the levels of transcribed RNA molecules. • Despite being an important promise for gene therapy, gene editing faces important limitations and challenges. Some are also found in other gene therapy strategies, but other are especific to this approach. Review Questions 1. In the context of gene therapy, gene editing systems can be used to: (a) Disrupt a mutated gene (b) Substitute a malfunctioning gene (c) Regulate the expression of a mutated gene (d) None of the above (e) All of the above