Chapter 2 - Notes PDF
Document Details
Uploaded by SpellboundLove
Universiteit Gent
Tags
Summary
This document is a chapter on directed evolution. It discusses the process of adapting proteins for human purposes by iteratively evolving them and finding desirable outcomes.
Full Transcript
2 - Directed Evolution Bringing New Chemistry to Life This section was adapted from the Nobel lecture of Frances Arnold: Innovation by Evolution (2019) Nature, herself a brilliant chemist and by far the best engineer of all time, invented life that has flourished for billions of years under an aston...
2 - Directed Evolution Bringing New Chemistry to Life This section was adapted from the Nobel lecture of Frances Arnold: Innovation by Evolution (2019) Nature, herself a brilliant chemist and by far the best engineer of all time, invented life that has flourished for billions of years under an astonishing range of conditions. I am among the many inspired by the beauty and remarkable capabilities of living systems, the breathtaking range of chemical transformations they have invented, the complexity and myriad roles of the products. Where does this chemistry come from? It derives from enzymes, the protein catalysts that make life possible, molecular machines that perform chemistry no human has matched. Evolution, a Grand Diversity-Generating Machine Equally awe-inspiring is the process by which Nature created these enzyme catalysts. The process is evolution, the diversity-generating machine that created all life on earth, starting more than three billion years ago. Responsible for adaptation, optimization, and innovation in the living world, evolution executes a simple algorithm of diversification and natural selection that works at all levels of complexity from single protein molecules to whole ecosystems. No comparably powerful design process exists in the world of human engineering. 2-1 I wanted to engineer Nature's enzymes to make ones tailored to, and uniquely suited for, human purposes. For close to five thousand years we have made use of microbial enzymes to brew beer and leaven bread. Once the protein catalysts were identified and isolated, many more diverse applications were devised. Today, enzymes are used to diagnose and treat disease, reduce farm waste, enhance textiles and other materials, synthesize industrial and pharmaceutical chemicals, and empower our laundry detergents. But so much more could be achieved if we understood how to build new ones. Early protein engineers struggled mightily with this goal. In those days, we did not know enough about how a DNA sequence encodes enzyme function to design enzymes for human applications. Exploring the Universe of Possible Proteins Some researchers think of the protein universe as the set of all proteins that Nature has devised. But these proteins, relevant to biology, are an infinitesimal fraction of the possible proteins. The universe of possible proteins, my universe, contains solutions to many of humanity's greatest needs. I wanted to explore this universe to find those proteins that will serve humanity. There are thousands of ways to make one change in the amino acid sequence of a protein. There are millions of ways to modify it by two changes, and so on—the numbers grow so rapidly that making a single copy of each protein altered by only 1 % of its sequence would require the weight of the world in materials. And the vast majority of these modified sequences are neither usable nor useful. The challenge therefore is to discover protein sequences that provide new benefits and deliver novel improvements on a thrifty scale of weeks, rather than millennia or eons. Consider an ordered space in which any protein sequence is surrounded by neighbors that have a single mutation. For evolution to work, there must exist functional proteins adjacent to one another in this space. Although most sequences do not encode functional proteins, evolution will work even if just a few meaningful proteins lie nearby. Given low levels of random mutation, the filter of natural selection can find those sequences that retain function. In fact, many of today's proteins are the products of a few billion years of mostly such gradual change. Many of these mutations are neutral and change little, but others can be deleterious. Natural selection picks the wheat from the chaff and guides mutating proteins along continuously functional paths through the vast space of sequences mostly devoid of function. 2-2 But by using evolution I want to make better proteins, proteins that serve my purposes. Thus directed protein evolution becomes a search on a new fitness landscape, where fitness is performance and is defined by the artificial selection I impose. I reasoned that directed evolution could find and follow continuous paths leading to higher fitness. A Process for Evolving Proteins in the Laboratory To devise a directed evolution strategy suitable for enzymes, I started with the fundamental rule: “You get what you screen for.” We were generating enzymes of interest in recombinant microorganisms by inserting genetic material that we could mutate in the test tube. We used common microbes like Escherichia coli or yeast to produce “libraries” of mutant enzymes to test for desired functions and we turned to good old-fashioned analytical chemistry to develop reproducible, reliable screens that reported what mattered to us. To measure what mattered, we were limited to monitoring the few thousand protein variants we could express and array in readily available 96-well plates or on a petridish. Therefore, we could only search deeply those sequences one or two mutations away from the starting protein. Given that such a small change in sequence would be expected to generate only small improvements in function, we would have to deploy reproducible screening assays capable of finding those rare and only slightly improved protein mutants. A desirable mutation might yield only a two-fold increase in catalytic activity or a few degrees’ step up in melting temperature. To achieve significant changes, we would have to multiply those benefits over successive generations. 2-3 This strategy works well when re-optimizing enzymes for new tasks. While a natural enzyme generally performs well in its biological job, it is often less enthusiastic about doing a new job and initially works poorly. New demands change the fitness landscape, often knocking a protein down from a position that was painstakingly acquired through the work of natural evolution. Sequential rounds of random mutation and screening for improved performance, however, can accumulate the beneficial mutations needed to climb to a new peak. To illustrate, in the late 1980s my research group started to re-engineer a protease, subtilisin E, to perform its hydrolytic reaction under non-natural conditions. We chose to have the enzyme function in high concentrations of the organic solvent DMF, which causes wild-type subtilisin E to lose most of its activity. We performed iterations of random mutagenesis and screening for activity in increasing concentrations of the organic solvent and evolved an enzyme that performed as well in 60 % DMF as its wild-type parent did in the absence of DMF, which corresponds to a 256-fold increase in activity. Strikingly, this enzyme adapted rapidly to a challenge it presumably had not encountered during its evolution. Furthermore, the mutations that led to the improved performance were unexpected. We could not explain how mutations located on loops surrounding the enzyme's active site enhanced activity in high concentrations of organic solvent, much less plan them in a rational approach to engineering an enzyme with this new capability. But we had a process that gave the right result, even if that result would require much more reverse engineering to understand fully. 2-4 Practical Protocol For Directed Evolution For further reading: Zeymer et al. 2018 - Directed Evolution of Protein Catalysts Following the principles of natural evolution, iterative cycles of mutagenesis and screening/selection are applied to optimize a protein’s properties. First, its encoding gene is submitted to mutagenesis to generate DNA libraries. The biosynthetic machinery of a cell is then exploited to produce the corresponding enzyme variants. Finally, catalysts with improved properties are identified by screening or selection and the respective genes are amplified. An effective assay requires tight linkage of genotype and phenotype, so that promising variants can be subjected to further cycles of optimization. 2-5 Creating Genetic Diversity In the laboratory, random variation in DNA can be generated by PCR-methods. Perhaps the most popular technique is error-prone PCR (epPCR). It introduces copying errors by imposing imperfect or 'sloppy', and thus mutagenic reaction conditions that reduce the fidelity of the polymerase. The initial protocol made use of (1) higher concentrations of polymerase/dNTP and longer extension times, (2) addition of MnCl2 to reduce base pair specificity and (3) higher concentrations of MgCl2 to stabilize non-complementary base pairs. As a result, the mutagenesis rate of Taq polymerase increased to about ~2% per nucleotide position. This rate is rather high (especially for longer sequences) and will mainly yield inactive enzymes (considering the fact that most mutations are deleterious). Furthermore, the resulting libraries contain a disproportionately large number of A→G and T→C transitions, causing a prominent mutational bias (i.e. some codons occur more frequently than others). Nowadays, commercial epPCR formulations are available that solve most of these issues. A popular example is the GeneMorph II Random Mutagenesis kit, which contains a mixture of two polymerases that balance each other’s bias. The mutation rate can also be tightly controlled (ideally to about 3 mutations per protein) by simply changing the amount of template DNA or the number of PCR cycles. Once the target region has been subjected to epPCR, the mutated DNA fragment can be placed back into an expression vector (e.g. pET) by conventional restriction and ligation, or by using the fragment as a 'megaprimer' in a wholeplasmid PCR. The (non-mutated) template DNA can then be removed by digestion with the restriction enzyme DpnI, which recognizes the methyl groups that were added by the host E. coli during the intial cloning of the wild-type gene. 2-6 A major downside of all amplification-based procedures is that the odds of substituting adjacent nucleotides are extremely slim. Indeed, not all codons are accessible by a single nucleotide exchange. On average, the nine possible codons that can result from a single point mutation encode only 5-6 different amino acids. Moreover, some amino acids are encoded by fewer codons than others. Error-prone mutagenesis thus samples mutations at any position in the protein sequence, but only does so sparsely. The vast majority of random mutagenesis approaches for directed evolution only seek to introduce amino acid substitutions, while insertions or deletions (InDels) remain an overlooked source of variation, despite their frequent occurrence in natural protein evolution. The random insertion or deletion of nucleotides in genomes is commonly caused by a phenomenon called replication slippage or slipped strand mispairing. DNA polymerase occasionally pauses and dissociates from the DNA during replication, making it possible for the end of the growing strand to separate from the template and then reanneal to a homologous region located downstream or upstream. When the polymerase eventually resumes replication, it will have skipped ahead or backtracked from where it first halted, resulting in a deletion or insertion, respectively. Regions with genetic stutters (e.g. short tandem repeats or homopolymeric runs) are particularly susceptible to replication slippage. 2-7 Although a few error-prone polymerases exist that frequently make slippage-related errors, those typically insert or remove just one or two bases. The reading frame is then disrupted, resulting in a library that mostly consists out of non-functional variants. A modern strategy that randomly introduces short InDels of exactly one, two or three nucleotide triplets is TRIAD, i.e. Transposition-based Random Insertion And Deletion mutagenesis. First, an engineered transposon fragment is randomly inserted within the target DNA sequence, which ultimately determines the location of the forthcoming InDel event in each variant. The transposon for deletion contains recognition sites for a restriction enzyme that will remove the transposon together with 3 bp of the original target sequence. Self-ligation results in the reassembly of the target sequence minus 3 bp, yielding a library of variants with deletions of one triplet. The transposon for insertion contains recognition sites for two restriction enzymes, which will remove the transposon and create an insertion site for a shuttle cassette. This shuttle cassette carries one, two or three randomized nucleotide triplets. Another digestion step using a restriction site removes the shuttle sequence, but leaves the triplet insertions behind. A completely different way of generating random variation is by mimicking sexual evolution, where new genes are created through recombination of parental genes. In the lab, those parental genes can be homologous natural sequences or can originate from an epPCR library. In the conventional protocol for sexual evolution, called DNA shuffling, the parental genes are partially digested with DNase, followed by recombination of the obtained fragments by PCR. However, the staggered extension process (StEP) has become more popular, as generating and purifying fragments is no longer needed. Instead, the recombined genes are synthesized in the presence of the parental genes by sequential annealing of the nascent polynucleotide 2-8 to different templates with abbreviated extension times, allowing only a small portion of the gene to be filled in. Selection vs Screening In the next step, the obtained mixture of plasmids is transformed into a suitable expression host to generate the corresponding proteins. Absolutely crucial is the fact that each cell will only take up one plasmid and will, thus, only produce a single variant. This enables the processing of individual proteins (rather than the mixture), either by plating the culture on a solid medium (resulting in colonies that are genetically uniform) or by sorting the different cell types in a liquid medium (e.g. by FACS or microfluidics screening). Afterwards, the plasmids are isolated from the positive ‘hits’ and sequenced to identify the beneficial mutations. Selection relies on a direct link between cell growth and the improved enzyme activity, which is possible if the target substrate can be used as (sole) carbon source. The transformed cells that are “most fit” will grow faster and outcompete the cells containing less successful enzyme variants. Selection thus allows enormous libraries of up to ~109 variants to be processed at once. Selection on a solid medium is preferred when only a limited number of variants will allow growth: the few colonies can then simply be picked from the plate. Liquid media are more advantageous when many enzymes will catalyze the desired reaction to some extent: the best enzyme variants will then enable their host cells to reach the exponential growth phase more quickly, outnumbering the others. Upon the first signs of growth, those cells can be isolated by streaking on a fresh petridish, or enriched by transferring an inoculum to fresh medium. 2-9 Agar plate screening is a simplistic format of screening and involves the incubation of colonies with the enzyme substrate. The crucial factor with this screening technique is that substrate conversion should create a visual signal, such as fluorescence or color, to identify colonies expressing an enzyme with desirable properties. For example, agar plate screening was used in an influential early study for the conversion of a β-galactosidase into a β-fucosidase. E. coli transformant colonies (~103 per plate) were absorbed to filter paper and then incubated with the chromogenic substrate analogue 5-bromo-4-chloro-3-indoyl β-fucoside. Colonies that possessed β-fucosidase activity released the chromophore, which forms an insoluble dimer that only colours the positive colonies blue. 2-10 The most common in vitro method in identifying desirable variants is a microplate-based screening. Transformed cells are plated out on petridishes and colonies are individually picked for further growth in wells of microtiter plates. With an automated colony picker, thousands of wells can be inoculated per hour. After growth, the minicultures are replicated by inoculating new microtiter plates containing fresh medium, and the original plates are stored as back-ups. The new plates are used to produce the enzyme variants (expression followed by cell lysis) and perform the reaction. The activity assay should be fast and amendable to high-throughput screening, preferably using a liquid-handling (pipetting) robot and a microplate reader. Although chromogenic substrate analogues are a popular option, they are not ideal in light of the first law of directed evolution: "you get what you screen for". Instead, the actual reaction product should be converted to a different molecule that can be measured easily. Cells can also be screened directly as a mixture in solution by means of fluorescence-activated cell sorting (FACS), where a flow cytometer is used to isolate the positive ‘hits’. The challenge is, however, to generate a fluorescent signal that remains tethered to or contained within the cell. The most straightforward option is to use the cytoplasm as a reaction vessel, at least if the substrate can enter the cell and the product can not escape. With such an approach, it was possible to screen a library of mutant fucosyltransferases for activity on lactosamine. Indeed, 2-11 the fluorescently labelled disaccharide could readily pass the cell membrane but that was no longer the case for the produced trisaccharide because of its larger size. An alternative strategy is to use surface display techniques where a (secreted) enzyme remains attached to the cell through fusion with a membrane anchor. Although this offers more flexibility with respect to reaction conditions, generating a fluorescent reaction signal that sticks to the cell is not obvious. However, the technique is commonly used for the screening of antibodies and was included in the 2018 Nobel Prize in Chemistry (George Smith for phage display, next to Frances Arnold for directed evolution). While in vivo compartments are attractive engineering units, the requirement for a cell-tethered assay signal presents a significant limitation for the types of enzymes that can be engineered. To overcome these limitations, engineering platforms have been developed that allow freedom of product and substrate diffusion by using a man-made compartment to spatially segregate the assay for enzyme variant activity (phenotype) with the corresponding DNA or cell (genotype). By in vitro compartmentalization (IVC), the library is partitioned into millions of micron-scale water-in-oil droplets, with each droplet acting as an independent microreactor 2-12 containing a single library member. Droplets are generated in microfluidic devices at thousands of droplets per second, with a size that is only slightly larger than that of a cell (~ μm, fL) The major advantage of IVC is that droplets are compatible with a wide range of enzyme expression and assay format modalities. In fact, even cell-free systems may be used, whereby a single gene is artificially translated within the droplet by just adding the required cellular componets. 2-13 Deconvolution Techniques that introduce random variation in DNA sequences, such as error-prone PCR and shuffling, typically introduce more than one mutation at a time. As a result, the positive 'hits' that are obtained after a round of directed evolution may contain not only beneficial mutations, but also mutations that are superfluous, or even detrimental. It is recommended to perform a deconvolution experiment where all of the mutated positions in the hit are reverted back to the wild-type amino acid one by one. If reversion of a mutation has a neutral or positive effect on the properties of the protein, it can be permanently discarded. Alternatively, the mutations can be introduced into the wild-type sequence separately to evaluate their individual effect. The need for deconvolution underlines another downside of random directed evolution: the beneficial effect of a mutation can easily be concealed by the negative effect of other mutations that were introduced simultaneously. 2-14 The Quest for the Optimal Starting Point Further reading: Trudeau 2019 - Protein engineers turned evolutionists Several factors appear to determine the success of enzyme engineering projects. As it turns out, these factors are similar to those that dictate the success of living organisms to tackle new environmental challenges via the evolution of new enzymatic capabilities. Foremost, the availability of a suitable starting point (a pre-existing, wild-type enzyme in the natural context) is critical. Some enzymes appear highly evolvable (amenable to sequence changes) and innovable (amenable to the acquisition of new functions), and in contrast others appear functionally ‘frozen’. To evolve a new function, an enzyme must be able to accept mutations while retaining its overall structure and catalytic function. Robustness to mutations often correlates with configurational stability. Most mutations are destabilizing, and mutations that mediate new functions are no exception. A more stable protein scaffold can accept more mutations before its configurational stability drops to a level at which solubility and functionality are compromised (the stability threshold). A key realization in this respect is that stability and activity do not necessarily trade-off, and it is possible to obtain enzymes that are both highly stable and highly active. Traditionally, thermostable enzymes originating from thermophilic hosts were chosen as starting points for directed evolution. A second key feature of engineer-able enzymes is their ability to readily acquire new functions. In nature, this process typically occurs through evolutionary intermediates that are generalists. Enzymes evolving towards new activity typically retain their original activity and also exhibit wider promiscuity towards targets they are not selected for. Such generalist enzymes comprise a great starting point for directed evolution, although this property is far more difficult to predict than stability. It can be useful to compare several potential starting enzymes in terms of their ability to use a range of alternative substrates, and then selecting the enzyme that is least restricted to its original activity. Note that, although promiscuity and/or broad substrate acceptance provides a starting point, it does not guarantee that a new enzyme with high catalytic efficiency can be evolved from this starting point. Ideally, an enzyme should be used that already exhibits the desired function - even when it does so very poorly. 2-15 A successful but rather laborious technique for amplifying an enzyme's promiscuity is neutral drifting. Sequential rounds of random mutagenesis are performed, each time selecting variants that maintain the enzyme's original function. The result is a collection of diverse mutant enzymes that are all properly folded and fully functional. The repeated rounds of mutation and selection lead to degree of mutational robustness and evolvability that are often not present in the specialized wild-type starting enzyme. Indeed, some of the seemingly neutral mutations that accumulate during the drifting procedure may increase the enzyme's stability or subtly tweak the physicochemical environment of the active site, making it more promiscuous and more permissive to adopt new functions. The resulting genetic polymorphism of the drifted libraries may thus provide an essential advantage in later rounds of adaptive directed evolution, where the selection pressure is changed to drive the enzyme's evolution towards new functions. Careful analysis of such neutral drift pathways has revealed that the sequences that have acquired the highest promiscuity are those that have become more similar to the ancestor from which they had naturally evolved. Therefore, starting the engineering effort directly from that ancestor also is a very attractive and powerful strategy. Indeed, ancestral sequence reconstruction will be discussed as a form of rational design in the next chapter. 2-16 The Dynamics and Constraints of Enzyme Evolution Further reading: Kaltenback 2014 - Dynamics and Constraints of Enzyme Evolution General Model of Enzyme Evolution The remarkable functional divergence that is found in nature has led to a consensus that enzymes are highly evolvable molecules, which can tolerate drastic sequence changes and rapidly adapt to new functions. However, there has been a growing realization that enzyme evolution is not limitless, and that various pathways lead to dead-ends. Indeed, many directed evolution experiments come to an early halt after only a minor improvement or no improvement at all. A good understanding of the constraints of enzyme evolution can help engineers in their search for novel biocatalysts. How does a new enzymatic function evolve? In 1970, Maynard Smith published a seminal paper that stated, “If evolution by natural selection is to occur, functional proteins must form a continuous network which can be traversed by unit mutational steps without passing through non-functional intermediates”. In other words, the functional expansion of an enzyme superfamily must occur gradually and smoothly by accumulating mutations one step at a time and forming a continuous network in sequence space. Therefore, in order to form a continuous network, a foundation for divergence of the new function must pre-exist in the superfamily. The existence of secondary, low-level enzymatic functions, referred to as “promiscuity”, thus provides a reservoir of candidates for evolutionary tinkering. At a certain point of evolution, if a promiscuous function becomes advantageous, it can then be further increased. Eventually, gene duplication occurs, and each copy may diverge, specialize, and become a new member of its enzyme superfamily. During the adaptive process, as well as through genetic drift, the newly evolved enzyme may acquire novel promiscuous activities previously non-existent in its functional reservoir, and in this way, provide new starting points for further divergence. 2-17 The concept of a fitness/sequence landscape is useful to visualize adaptive evolutionary processes. The landscape might be deserted over large areas, as the majority of sequences do not code for a folded, active enzyme. Sequence clusters coding for active enzymes exhibit higher fitness and are depicted as a peak. Moving across the sequence space at the same fitness height is known as neutral roaming and can enable a protein to move closer to a higher peak. In turn, climbing the peak uphill for further optimization is referred to as adaptive walking. Multiple catalytic activities can be depicted as several overlaid fitness landscapes in sequence space, each of which possesses peaks of higher fitness and valleys. In certain regions, peaks from different fitness landscapes may overlap, which represents catalytic promiscuity or multifunctionality. Evolutionary related catalytic activities, that is, reactions catalyzed by an enzyme superfamily, generate a mountain range of overlapping peaks, in which adaptation can occur through a “continuous network” of functional sequences. 2-18 Mutational Epistasis Causes Ruggedness of the Fitness Landscape If a fitness landscape is smooth and single-peaked, evolution is not constrained and all possible routes from a given coordinate to the peak are viable. However, if the fitness landscape is rugged and riddled with valleys that separate secondary, suboptimal peaks, evolution will be strongly constrained as valleys, or regions of lower fitness between peaks, cannot be crossed by adaptive evolution. For example, a single point mutation can completely abolish enzymatic function by knocking-out a critical catalytic residue or the ability to fold correctly. This observation, and the fact that many directed evolution experiments experience dead-ends, suggest that fitness landscapes are rugged to some extent. The interaction between individual mutations means their effect is dependent on the genetic background in which they occur, causing ruggedness of the fitness landscape. The term “epistasis” was originally introduced to define interactions between genes, but has also been applied to the interaction between mutations in a single gene. Epistasis can be formally described as a situation where the combined effect of two mutations is not simply additive but is either more (synergistic) or less (antagonistic) than the sum of their individual effects. Strong epistasis is likely to create a rugged landscape where evolutionary trajectories are restricted because some mutations are only favorable in the presence or absence of others. On such a landscape, the order and combination of mutations becomes crucial. In this way, evolution becomes contingent on historical events; depending on which mutation initially occurs, a different pathway and eventually a different outcome may result. Indeed, restrictive 2-19 mutations can prohibit other mutations from fixating by negating their beneficial effect (i.e. antagonistic epistasis). On the other hand, permissive mutations may have to be fixated prior to the appearance of the other mutations (i.e. synergistic epistasis). For example, a crucial function-switching mutation at an active site position may cause steric clashes with amino acids at neighbouring positions, preventing it from being identified as a positive mutation. However, if the cause of this steric clash is removed by a (neutral) permissive mutation at one of those neighbouring positions, the function-switching mutation can be introduced successfully afterwards. Connectivity Between Functions on a Multi-Dimensional Fitness Landscape The overlap between functional regions from different landscapes provides a springboard for the evolution of enzymatic activity. So far, we have primarily discussed cases where an activity was already present promiscuously in the enzyme and could be further enhanced by mutations and selection. However, in some cases, the parent enzyme does not have an inherent basal level of the activity in question. How can large evolutionary distances be overcome? One way to overcome a greater distance on the fitness landscape is to break down the challenge into several, smaller challenges. An intermediary substrate, which resembles both the native and ultimately desired substrate, can be used. Increases in activity towards the intermediate substrate may then introduce the desired activity. This approach is named substrate walking. An outstanding example of was the experimental evolution of a transaminase for the commercial synthesis of the antidiabetic drug sitagliptin. Because the active site of the starting enzyme was too small to transaminate the sitagliptin precursor, the enzyme was first evolved to bind a smaller intermediate, revealing a single mutation that improved activity toward this functional steppingstone. Subsequently, the active site was widened further by other mutations. It was only after activity for the intermediate substrate had been enhanced that successful selections for the actual substrate could be performed. 2-20 Alternatively, one could also try and make bigger jumps in the sequence space that enable to cross valleys of lower fitness between peaks that are not connected. Such a drastic effect cannot be obtained with simple point mutations but requires more extensive modifications of the sequence that are the result of recombination events. In that way, multiple changes can be introduced at once, which might be hard to mimic with stepwise mutations as some of the intermediates might be not/less functional. Even changes in the overall length of the sequence can be generated through truncations and elongations. 2-21 Illustrations Difference between TRIAD libraries and conventional substitution libraries Further reading: Emond 2020 - Accessing unexplored regions of sequence space in directed enzyme evolution via insertion/deletion mutagenesis Phosphotriesterase (PTE) is an enzyme that hydrolyzes the pesticide paraoxon, but it also exhibits promiscuous esterase activity. TRIAD libraries were created, containing PTE variants with insertions (+3, +6 or +9 bp) or deletions (-3, -6 or -9 bp) that were primarily in-frame. For comparison, a conventional substitution library was created as well. By screening these variants for their activity on paraoxon, it was found that InDels are more deleterious than point mutations overall: 83% of deletions and 77% of insertions were strongly deleterious (< 10% activity), compared with only 24% in the substitution library. However, a few InDels were found to have a beneficial effect (activity increased > 50%) on native activity, while the hundreds of tested substitutions were neutral at best. The same trend was also apparent when screening these variants on alternative substrates such as 4-nitrophenyl butyrate: InDels were more likely to yield variants with improved promiscuous arylesterase activity than substitutions, with improvements of activity ranging from 2- to 140-fold. However, these adaptive InDels appeared to have a more drastic negative effect on the native phosphotriesterase activity, indicating a more severe trade-off (on average) between maintaining original activity, and enhancing promiscuous activity. In other words: InDels allows enzyme engineers to make larger leaps across sequence space, pushing the properties of variants in libraries towards extremes. 2-22 The influence of neutral genetic drifting on enzyme properties Further reading: Daudé 2019 - Neutral Genetic Drift-Based Engineering of a Sucrose-Utilizing Enzyme toward Glycodiversification The potential of neutral drifting for obtaining a collection of polymorphic enzyme variants was demonstrated in a study that used amylosucrase as a model enzyme. Amylosucrase catalyzes the synthesis of an amylose polymer from the disaccharide sucrose. However, it is rather promiscuous and can also transfer the glucosyl moiety of sucrose to other molecules. The amylosucrase gene was randomly mutated by error-prone PCR and, after cloning, the resulting library was screened for the original, native activity. A few hundred active clones were retained, and these became the starting point for another round of mutagenesis. A thorough analysis of ~400 “neutral” variants revealed a surprising functional diversity, despite the fact that most of them only held one or two mutations at the amino acid level. Numerous variants were capable of transfering the glucosyl group of sucrose to several other molecules with considerably higher activities than the wild-type enzyme. Furthermore, some variants lost their ability to use a non-natural chromogenic substrate analogue instead of sucrose, while others showed a 9-fold improvement. A few variants became more thermostable (higher Tm). Clearly, it is possible to take significant steps across an enzyme's fitness landscape, for better or for worse, with merely a few mutations that are neutral towards the original activity. 2-23