Chapter 1 - Notes PDF
Document Details
Uploaded by SpellboundLove
Universiteit Gent
Tags
Summary
This document is an introductory chapter on proteins and biocatalysis for a potentially academic audience. It focuses on defining proteins, describing enzyme structures, and reviewing some fundamental kinetic concepts
Full Transcript
1 - Introduction A Primer on Proteins Based on: Biocatalysis: a status report, Bommarius 2015 Proteins are a class of macromolecules consisting of a linear chain of 20 different types of amino acids. The fairly frequent disulfide bonds between cysteines are formed through crosslinking of side-chain...
1 - Introduction A Primer on Proteins Based on: Biocatalysis: a status report, Bommarius 2015 Proteins are a class of macromolecules consisting of a linear chain of 20 different types of amino acids. The fairly frequent disulfide bonds between cysteines are formed through crosslinking of side-chain interactions but do not constitute branching of the backbone chain. Extremely rarely, an additional genetically coded twenty-first amino acid is found, such as selenocysteine in formate dehydrogenase or glutathione peroxidases, or pyrrolysine, the twenty-second amino acid, in methanogens (belonging to the domain of Archaea). The sequence variety of proteins is large, 20N possible combinations, but not infinite. Monomers of enzymes typically consist of 147 (bovine lysozyme) to 667 (alcohol oxidase from Aspergillus) amino acid residues, most of them clustering between 280 and 400 residues. Protein sequences are searchable with a variety of algorithm-based tools, such as BLAST, ClustalW or FASTA, in a variety of databases, such as NCBI or ExPaSy. Recent access to GenBank revealed 184,938,063,614 stored bases. Most sequences deposited in data banks hail from genome sequencing projects and are not actually functionally verified but annotated, i.e., computationally compared to sequences with known function and thus surmised to have the same function as the homologs. 1-1 Despite the huge number of protein sequences, proteins only fold into comparatively few threedimensional structures, approximately 1,300 folds (http://www.proteinstructures. com/Structure/Structure/protein-fold.html). Protein structures are deposited in databases such as the Protein Data Bank (PDB) of the NIH (http://www.rcsb.org/pdb/home/home.do). As of January 5, 2015, there were 105,499 structures deposited in the PDB, among them 34,987 distinct proteins and 7,454 nucleic acid–containing structures. With regard to the structures, 89% were obtained via X-ray crystallography, 10% through solution NMR spectroscopy, and 1% via electron microscopy. A Primer on Biocatalysts The three-dimensional structure of enzymes contains a discrete binding pocket called active site, where small molecules can bind and react. These reactions are subject to the same rules of physics and chemistry as are observed with other types of catalysts. Enzymes do differ from other catalysts in their ability to exclude water from the active site and thus access unusual redox potentials or apparent pKa values. Examples for unusual pKa values, especially of carboxyl or amine groups in proteins, include 6.5 for Glu in lysozyme, 5.9 for Lys in acetoacetate decarboxylase, or 3.4 for His in papain. The kinetics of a one-substrate enzyme reaction typically can be described by the MichaelisMenten equation. The underlying mechanism assumes two elementary reaction steps: reversible formation of an enzyme-substrate complex followed by irreversible reaction to product. The maximum reaction rate at saturation [S] is denoted by vmax = kcat ·[E]. The KM value corresponds to the substrate concentration at half saturation (vmax/2) and is a measure of binding affinity of the substrate to the enzyme: a high KM value corresponds to loose binding, a low value to tight binding between enzyme and substrate. At low substrate concentration, the equation simplifies to v = vmax[S]/KM and consequently is first order with respect to the substrate concentration [S]. At high substrate concentration, the equation simplifies to v = vmax and consequently is zeroth order with respect to [S]. In all situations, v is proportional (firstorder with respect) to [E]. 1-2 v0 = E+ S k1 k-1 ES Vmax [S] KM + [S] k2 E+P What is the most relevant criterion to gauge process-relevant activity of enzymes? Enzyme specificity kcat /KM, favored by biochemists, can be misleading at substrate concentrations much higher than KM, as is the case for most industrial applications ([S] > 1M). For applications in synthesis, the time to reach a certain degree of conversion (preferably 99%) is most crucial. Several common measures of enzyme performance are all inadequate in that regard. For example, kcat by itself does not take into account product inhibition. Both kcat /KM and kcat commonly do not take into account strongly varying substrate concentration during synthesis runs, so different degrees of conversion result in very different residence times. Recently, the average velocity during a specified degree of conversion (e.g. 99%) has been proposed as an alternative measure, termed catalytic effectiveness. One important reason for the lack of comparable studies across different catalytic technologies is the absence of accepted catalytic performance criteria. The most important ones are: Space-time-yield (STY) as a measure of volumetric productivity. Space-time-yields can be calculated as the quotient of final product concentration and residence time in the reactor ([P]/t) or simply the moles of product per unit time and reactor volume (mProd/V·t). Thus, space-time-yields are influenced by the solubility of the product as well as the substrate (via the attainable reaction rate). Kinetic stability (i.e. half-life time t50 at a certain T) of an enzyme is much more relevant for a process than its thermodynamic stability (i.e. melting temperature Tm at which the protein unfolds). Operating a process close to the Tm generally results in maximal activity but also in rapid deactivation. Therefore, the optimal process temperature typically is about 10°C lower. 1-3 The total turnover number (TTN) is especially useful for cost estimations, as it directly scales the product yield to the catalyst input. The TTN is a dimensionless number, defined as the ratio of moles of product generated divided by the moles of biocatalyst used in a reaction; an alternative definition is the number of catalytic events performed by one active site of one molecule of the enzyme during its lifespan. Enantioselectivity of the reaction can be measured by the enantiomeric excess (e.e.) of the product, which results from the preference (kcat/KM) of the enzyme for one optical isomer over the other. Enzyme Engineering For further reading: Engineering the third wave of biocatalysis, Bornscheuer et al. 2012 The foremost strength of biocatalysts is their often superb specificity: enzymes often far surpass other catalysts with respect to chemical specificity, regioselectivity, or enantioselectivity. Biocatalysts commonly operate at nearly ambient conditions with respect to temperature and pH value in mostly aqueous solution. Operation at near ambient conditions promises energy efficiency of biocatalytic processes. Typical renewable raw materials, such as carbohydrates or fatty acids, are more hydrophilic than raw materials obtained from petrochemicals, which tend to be hydrocarbons. Such hydrophilic raw materials are 1-4 advantageously processed in aqueous solvents, so biocatalysis is a good technological fit for renewable raw materials. Although biocatalysts are often highly active and extremely selective, there are still drawbacks associated with biocatalysis as a generally applicable technique. A decisive weakness of biocatalysts is their very limited range of stability with respect to temperature, solvents, pH value, ionic strength, and salt type. Furthermore, the development of new or improved biocatalysts still is prone to chance and does not follow a set of generally applicable rules. The arguably most important reason for the progress of biocatalysis over the past decade is the advance of protein engineering. Progress can be divided into three waves: 1. Rational design by site-directed mutagenesis (SDM), was introduced by Smith and coworkers in 1985, a few years after the discovery of recombinant DNA technology. It involves the introduction of so-called point mutations whereby a given amino acid at a predetermined site in the protein is replaced by one of the other 19 amino acids. An important shortcoming of SDM is that detailed information is required regarding the three-dimensional structure and mechanism of the enzyme. Indeed, the expectation to design enzymes with predictable properties turned out to be way premature. 2. In contrast, directed evolution requires no structural information whatsoever. In 1993, Nobel laureate Frances Arnold published a seminal paper, describing the use of errorprone PCR (epPCR) for the random mutagenesis of the protease subtilisin to increase its stability in organic solvents. In 1994, Stemmer reported DNA shuffling as complementary technique that mimics recombination events during natural evolution. These efforts were oriented towards stability, but Reetz demonstrated in 1997 that similar strategies can be used to increase the (enantio)selectivity of an enzyme. Although directed evolution is able to access a vastly wider sequence space than rational design, the processing of huge libraries (~106 variants) takes a lot of time and requires high-throughput screening equipment. 3. The third wave, known as semi-rational engineering, has been gaining momentum since 2005 and aims to improve the hit rate of directed evolution while decreasing library size. To that end, the randomization is limited to crucial positions by means of site-saturation mutagenesis. The resulting “smart/focused” libraries are then small enough (~102 variants) to be screened manually. Relevant hotpots for mutagenesis can be found through the in silico analysis of both structure (modeling, docking) and 1-5 sequence (conservation, correlation). In addition, reconstructing sequences from extinct/ ancient enzymes can help to introduce more flexibility, and generate better starting points for the evolution of new catalytic functions. Despite significant advances, major challenges remain before advantages of biocatalysis can be fully harnessed. Enzyme engineering is much faster than it was ten years ago, but changing 30-40 amino acids and screening tens of thousands of candidates still requires a large research team. Many, if not all, engineering strategies will yield improved variants, but some will yield better variants and find them faster. Which ones are the better strategies is still unclear. Directly comparing strategies for the same problem and testing the assumptions behind different strategies will identify the most efficient ones. The first assumption is that the goal can be achieved using enzyme engineering. The thermodynamics of reactions involving non-natural substrates may be less favourable than that of reactions involving natural substrates, and attaining certain enzyme activities may be 1-6 thermodynamically impossible. Diffusion sets an upper limit to reaction rates. A closer integration of thermodynamics and biocatalytic process development is highly desirable in designing new processes. Secondly, enzyme engineering assumes that individual mutations are additive. Although mutations are mostly non-interactive, many interactive mutations are highly useful but difficult to study. One way of identifying cooperative effects is by statistical analysis, but better techniques are needed to predict at an early stage of protein engineering which additional mutations are possibly additive, and which lead to a dead end. Third, computer design of new enzyme activities is not accurate. Design still requires testing 10–20 predictions and usually results in an enzyme with low activity, which then requires substantial further engineering. For example, the initial computer-based design of an enzyme for the manufacture of sitagliptin yielded an enzyme that converted only 0.1 substrate molecule per day, yet that substrate fits well in the active site within the computer model derived from the crystal structure. New enzymes can be designed to catalyse reactions not found in nature (e.g. Kemp-elimination), but such activities are so far too low for practical use. Better understanding of the mechanistic and especially dynamic aspects of enzymatic catalysis is needed. Successful Examples For further reading: Biocatalysis engineering: the big picture, Sheldon et al. 2017 Recent advances in protein engineering have achieved the equivalent of converting mouse proteins into human proteins. The amino-acid sequences of similar proteins in mice and human typically differ by 13%. Today’s advanced protein engineering makes similar changes in converting a wild-type enzyme into an enzyme suitable for chemical process applications. This protein engineering is equivalent to compressing the 75,000,000-yr evolution of an early mammal into modern-day mice and humans into several months of laboratory work. Consistent with the more extensive changes made in these proteins, the properties have also changed more dramatically. The catalytic properties of the enzymes have improved quantitatively by factors of thousands to millions, and the engineered enzymes now can act in unusually harsh conditions. The understanding of protein engineering currently allows for dramatic improvements in enzymatic performance to be realized in just a few months. In the past, an 1-7 enzyme-based process was designed around the limitations of the enzyme; today, the enzyme is engineered to fit the process specifications. As a result, biocatalysis is becoming an increasingly important tool in chemical synthesis. A pertinent example is the two-step, three enzyme process for the synthesis of a key intermediate for atorvastatin, the active ingredient of Pfizer's cholesterol lowering blockbuster drug Lipitor. Initially, the enzyme activities were too low for commercial viability and the use of high enzyme loadings led to emulsion formation and problematical product recovery. Codexis used DNA shuffling to improve the activity and stability of the ketoreductase (KRED), while maintaining the nearly perfect enantioselectivity exhibited by the wild-type enzyme. Because of the much lower enzyme loading, no emulsion problems were encountered, and phase separation required less than one minute. Similarly, the activity of the wild-type halohydrin dehalogenase (HHDH) was extremely low and the enzyme showed poor stability in the presence of substrate and product and strong product inhibition. After many rounds of DNA shuffling and screening, in the presence of increasing product concentrations, the inhibition was largely overcome and the HHDH activity increased >2500 fold. 1-8