Chapter 1 The Covalent Structure of Proteins PDF
Document Details
Uploaded by ClearedMossAgate3840
جامعة تبريز كلية العلوم الطبيعية
Geoffrey Allen
Tags
Summary
This chapter discusses the covalent structure of proteins, focusing on the peptide bonds that link amino acids in linear sequences. It explores the various classifications of amino acids and their roles in protein structure and function. The chapter also highlights post-translational modifications and the importance of mass spectrometric techniques in studying protein structures.
Full Transcript
# Chapter 1 The Covalent Structure of Proteins ## Geoffrey Allen ### Abstract The essential structural feature of all proteins is the peptide bond linking the protein L-amino acids in a linear sequence. The protein amino acids may be classified in many ways, but each has unique features that cont...
# Chapter 1 The Covalent Structure of Proteins ## Geoffrey Allen ### Abstract The essential structural feature of all proteins is the peptide bond linking the protein L-amino acids in a linear sequence. The protein amino acids may be classified in many ways, but each has unique features that contribute to the overall structure and function of a protein. Many post-translational modifications of amino acid residues in proteins occur during biosynthesis and degradation in vivo, and most of those identified to date are discussed here; in vitro chemical modifications are not addressed. Currently, the large majority of protein sequence data are deduced from nucleotide sequences; however, protein chemical methods for sequencing are essential for the confirmation of protein structures and the determination of post-translational modifications. Mass spectrometric techniques have become particularly important in these endeavors. ### The Protein Amino Acids and the Peptide Bond The essential covalent structural feature of all proteins is the peptide bond linking α-amino and α-carboxyl groups of amino acids in a linear sequence of typically 100–1000, but occasionally fewer or more, amino acid residues. Natural proteins are built almost exclusively from the set of 18 common protein L-amino acids, the L-imino acid proline, and glycine (Table 1), although the side chains of many of these may be modified in a wide variety of ways. A rare exception is the incorporation of L-selenocysteine into a few proteins. D-Amino-acid residues are not incorporated biosynthetically into proteins, although trace amounts may form through racemization, for example, following deamidation of asparagine residues and rehydration of the cyclic imide intermediate. Chemically synthesized proteins may, of course, incorporate nonnatural amino acids, limited only by the imagination and skill of the organic chemist. Nonnatural amino acids may also be incorporated into proteins biosynthetically, by feeding amino acid analogs, assisted by mutations in the host, or under stress, or during in vitro translation in the presence of unnaturally charged semisynthetic tRNA. The description of protein structure follows rules defined by international convention (IUPAC-IUB, 1970, 1984) as shown in outline in Figure I and Table 1. Various journals, however, adopt different conventions for numbering residues in a sequence, such as Cys25, Cys Cys-25, Cys25, or C25; superscripts are inconvenient for electronic databases, and thus the second of these is used here. The nature of the peptide bond has been exhaustively investigated. The quantum mechanical description corresponds to a model in which a electron density is distributed across the N-C=O. system, with partial electronic charge distribution as shown in Figure 2. One consequence of this electronic structure is that there is a substantial energetic barrier to rotation about the C-N bond, with energy minima in planar configurations of all six atoms of the peptide unit, either trans or cis; the trans form is enthalpically preferred over the cis form, and the cis form is very rare in proteins. In peptide units with the imino group of proline residues, there is little difference in free energy between cis and trans configurations, and cis acylproline bonds are relatively common, although they are still in a small minority. Interconversion of cis and trans forms, via rotation of the C(O)-N bond, is slow at the temperatures of living organisms; this has a significant effect on the rate of folding of nascent or denatured polypeptide chains into the native conformation. A second aspect of the electronic structure is the large dipole moment (about 3.46 D) of the peptide unit. The high polarity leads to favorable interactions with solvents of high dielectric constant and preferred formation of hydrogen bonds with NH donor and O=C acceptor groups, either with solvent molecules or, and in the case of globular proteins, to a major extent, with other peptide units or polar side chains. The side chains (R) of the amino acid residues provide the variety of functional groups that confer particular properties on individual protein molecules, leading to a specific folded conformation, degree of solubility, state of aggregation, ability to form complexes with ligands or other macromolecules, enzyme activity, etc. The frequency of occurrence of each residue in globular proteins is shown in Table 2. The distribution of the various side chains along a particular protein does not generally form a regular pattern (exceptions are fibrous proteins such as collagen) but is far from random: evolution has selected only part of a set of (admittedly very numerous) sequences (out of a very much larger number (20", where is the number of amino acid residues) of potential sequences) that can form functionally useful protein molecules. Nature also finds the twenty (twenty one if selenocysteine is included) mRNA-encoded and ribosomally incorporated amino acids inadequate for the diverse functions required of proteins, and many covalent modifications of nascent polypeptide chains take place during the biosynthesis and maturation of protein molecules. The set of 20 common protein amino acids may be classified in a variety of ways, such as large, small, charged, neutral, hydrophilic, and hydrophobic. The masses and volumes of the residues are listed in Table 2. Many attempts have been made to classify amino acid residues on a linear scale of hydrophobicity (Table 3). Further examples are given by Heringa et al. (this volume). Such linear scales, although of sonme value in correlating preferred structural positions of amino acid side chains in globular proteins, do not adequately represent the subtle differences in properties of the side chains that confer specific properties at specific points in a protein structure. A more informative, although qualitative, method of classifying amino acid residues is as overlapping sets with related properties, as shown in Figure 3; such Venn diagrams may be applied to computer analysis of protein sequences (Taylor, 1986; Heringa et al., this volume). The physical properties of the natural protein amino acids have also been analyzed statistically; ten orthogonal factors were derived (Kidera et al., 1985). In the following paragraphs some features of each amino acid residue that contribute to unique functions in proteins are liighlighted. ### Glycine The simplest amino acid, unique among the protein amino acids in its lack of dissymmetry, glycine frequently plays an important role in protein structures where the lack of a β carbon atom permits a substantially greater degree of conformational flexibility and attainable conformational space than for any other residue. Glycine is thus often located in tight turns, and in positions where bulky side chains would sterically prevent close packing of helices (as in collagen) or binding of substrates. The absence of a sterically hindering side chain also confers greater than normal chemical reactivity at adjacent peptide bonds. For example, Asn-Gly sequences can form cyclic imide structures, with deamidation, much more readily than other Asn-Xaa sequences. Glycine also contributes to sites recognized by enzymes catalyzing specific modifications of proteins, such as the signal sequences for N-terminal myristoylation and arginine methylation. The nitrogen atom in C-terminally amidated peptide hormones is contributed by a glycine residue in the precursor peptides. ### Alanine One of the most abundant amino acid residues in proteins, alanine is weakly hydrophobic. It is often chosen as a substitute for other amino acid residues in specific mutagenesis experiments designed to test the functional importance of particular side chains. Perhaps surprisingly, it has been found that a large number of residues may be changed to alanine without significant alteration of the tertiary structures of some proteins. As is the case with the other aliphatic residues, valine, isoleucine, and leucine, chemical reactivity is very low, and this is reflected in the lack of covalent modifications of these residues in proteins. ### Valine The methyl substituent on the β carbon of this moderately hydrophobic residue reduces conformational flexibility and confers steric hindrance to chemical reactions at the adjacent peptide bonds, particularly where the adjacent residue also bears a β-branched side chain (valine or isoleucine). ### Isoleucine As with valine, the β-branched side chain sterically hinders reactions at adjacent peptide bonds. The hydrophobic side chain prefers a location within the interior of folded protein structures, and the β-sheet secondary structure also accommodates the side chain more readily than does an α-helix. Isoleucine has a second asymmetric center, with the configuration shown in Table 1. Inversion of the configuration at the carbon (which can occur during vigorous acid treatment with HI) yields L-alloisoleucine, a diastereoisomer with different chemical and physical properties. ### Leucine Often the most common individual amino acid in globular proteins, leucine, a large hydrophobic residue, supplies a large proportion of the aliphatic side chains that constitute the hydrophobic cores characteristic of this class of protein. Leucine is also a major component of transmembrane helices in integral membrane proteins and is distributed in a regular pattern in "leucine zipper" structures, where hydrophobic interaction of leucine side chains along one edge of each of two α-helices favors dimerization. ### Phenylalanine The bulky hydrophobic side chain of phenylalanine is an important constitutent of hydrophobic cores of globular proteins and transmembrane domains of integral membrane proteins. The phenyl ring does, however, have some weakly polar properties that contribute to preferred orientations in interactions with other aromatic rings and other polar groups. ### Methionine One of the two sulfur-containing protein amino acids, methionine is similar in its bulk and hydrophobicity to leucine, but has unique chemical properties. The sulfur atom can interact with metal ions such as zinc, copper, and mercury, and is chemically reactive, particularly toward oxidizing agents. Oxidation to the sulfoxide is commonly observed in proteins stored in vitro in the absence of antioxidants, and in certain instances it is observed in vivo. The thioether group can also be alkylated, although much less readily than the thiol group of cysteine residues, and reacts with cyanogen bromide under acidic conditions, resulting in specific cleavage of methionyl peptide bonds-a reaction that has been much used in sequence determination of proteins. Methionine labeled with 35S is widely used in experiments on protein biosynthesis. ### Serine The primary aliphatic hydroxyl group group of serine residues is both a donor and acceptor of hydrogen bonds, frequently interacting with solvent water at the surface of folded proteins. The oxygen atom is weakly nucleophilic, reacting with strongly electrophilic reagents such as acetic anhydride. However, its nucleophilicity is enormously enhanced in the active sites of important classes of proteases (the serine proteases) and esterases, where acyl-enzyme intermediates lie on the catalytic pathway. More stable covalent links are also formed in vivo; those formed with carbohydrate (O-linked glycosylation) and phosphate are important examples. Reversible phosphorylation of scrine (and threonine) residues in many proteins acts as a control signal in many cellular biochemical pathways. Glycosylated and esterified serine residues are sensitive to β-elimination under alkaline conditions, yielding dehydroalanine derivatives that can react further, for example by Michael addition with nucleophiles. ### Threonine The secondary hydroxyl group of threonine residues shares some chemical and biochemical properties with the primary hydroxyl group of serine residues, providing sites for O-linked glycosylation and phosphorylation, but is relatively sterically hindered and chemically less reactive. Threonine shares with isoleucine the possession of a second chiral center; the configuration is shown in Table 1. ### Cysteine The thiol group of cysteine, in its deprotonated form, thiolate (pK, typically about 9.0), is a powerful nucleophile, readily reacting with aldehydes, and acylating and alkylating reagents. This property is functionally important in enzyme activity, for example in thiol proteases, such as papain, and in glyceraldehyde-3-phosphate dehydrogenase. Its affinity for metal ions such as zinc, iron, and copper (and toxic metals such as cadmium and mercury) is also important for many biological activities of proteins. The formation of disulfide cross-links within or between peptide chains through oxidation of cysteine to cystine residues provides essential stability to many extracellular proteins. This reaction is reversible under reducing conditions, such as the presence of dithiothreitol, although denaturing conditions may also be required. The redox environment within living cells is generally such that most cysteine residues remain reduced, although reversible formation of disulfides either within proteins or with glutathione is important in some biological functions. Further oxidation to sulfinic and sulfonic acids takes place in vitro under strongly oxidizing conditions. The reactivity of the thiol group is such that in in vitro experimental work, many proteins require anti oxidant and anti electrophile protection for retention of functional activity, and covalent modification, typically alkylation, is generally performed during determination of covalent structures. Biochemically, cysteine residues may also be modified by acylation, for example with fatty acids (palmitoylation), or alkylation, for example by farnesylation. Cysteine residues can also serve for covalent linkage of prosthetic groups such as heme. Cystine residues undergo β-elimination under alkaline conditions, yielding dehydro-alanine, which subsequently reacts with nucleophiles yielding a variety of products, such as lanthionine and lysinoalanine, typically found in wool proteins. ### Aspartic Acid The carboxylic acid groups of aspartyl residues (pKa typically about 4.0) are generally ionized as carboxylates under physiological conditions. The highly polar side chains are predominantly located on the surfaces of proteins, where they may contribute to metal ion binding sites, in particular for Ca²⁺, and sites for positively charged ligands. Strong hydrogen bonds are also formed between the carboxylate group and donor groups such as guanidinium. The aspartyl β-carboxyl group may also take part directly in enzymic catalysis, for example in aspartyl proteases such as pepsin, renin, and retroviral proteases, and forms an intermediate acyl phosphate during catalysis by the cation-translocating Na⁺, K⁺-ATPase and Ca²⁺-ATPase of plasma membranes and Ca²⁺-ATPase of sarcoplasmic reticulum. Chemically, aspartic acid side chains undergo typical reactions of carboxylic acids, such as the formation of esters and amides in the presence of activating agents, such as strong acids or carbodiimides, and nucleophiles, such as alcohols and amines, respectively. The proximity of the side-chain carboxyl group to the α-carboxyl peptide bond permits significant neighboring group effects, and in dilute acid conditions aspartyl peptide bonds are more rapidly hydrolyzed than other aminoacyl peptide bonds. In particular, aspartyl proline bonds are substantially more sensitive to hydrolysis by weak acids than any other peptide bonds. Aspartic acid residues may also be modified in vivo to β-hydroxyaspartic acid residues. ### Asparagine The β-amide group of asparagine is a hydrogen bond donor and acceptor, and this polar neutral residue may have a structurally important role within hydrogen bond networks, as well as forming hydrogen bonds with solvent water. Asparagine residues are often located in turns in protein folded structures. The amide group is relatively easily hydrolyzed, particularly where the asparagine residue is followed in the sequence by glycine and to a lesser extent serine or other small residues. In the absence of steric hindrance, a cyclic imide structure may forın (with loss of ammonia); this may undergo racemization and subsequent cleavage to form L- or D-aspartyl or L- or D-isoaspartyl residues. The imide structure is also susceptible to reaction with hydroxylamine, resulting in cleavage of the peptide chain. A major role of asparagine is as the linkage site for N-linked glycosylation, mainly in extracellular or extramembranous segments of eukaryotic glycoproteins. The large majority of such structures have substituted N-acetylglucosamine linked through a glycosyl bond with the nitrogen atom of the β-carboxamide group. A necessary, but not sufficient, condition for the transfer of the glycosyl moiety to asparagine is the sequence-Asn-Xaa-(Ser or Thr)-, where Xaa is any amino acid except Pro, and Cys may possibly substitute for Ser or Thr. Further post-translational modifications of asparagine are rare. ### Glutamic Acid The carboxyl group of glutamic acid residues has a higher intrinsic pKa (around 4.5) than that of aspartic acid residues, but in general has similar functions and properties in contributing to electrostatic and hydrogen bonding interactions within proteins and with ligands, including metal ions. Glutamic acid residues do not, however, possess unusual peptide bond lability, nor are they essential active-site residues in acid proteases. In vivo modificaton to γ-carboxyglutamic acid residues is essential to the function of various blood-clotting proteins, such as prothrombin, where the malonic acid group enhances affinity for Ca²⁺. ### Glutamine The properties of glutamine are in general similar to those of asparagine, although, as the amide of an acid weaker than aspartic acid, glutamine residues are less labile than asparagine residues. However, at the N-terminus of a protein, spontaneous enzyme-catalyzed cyclization, with elimination of ammonia, forming pyroglutamyl residues, takes place. Some glutamine side chains are substrates for transglutaminases, which catalyze linkage to lysyl side chains. ### Arginine The strongly basic (pKa around 12.0) guanidine group of arginine residues is protonated at all physiologically relevant pH values. The positive charge, together with multiple hydrogen bond donating capacity and a high degree of polarity, ensures surface localization of arginine side chains, except in rare instances of formation of buried guanidinium-carboxylate salt bridges. Arginine residues function in proteins generally as positively charged groups, contributing to the binding of negatively charged ligands such as phosphate and phosphate esters, including nucleic acids. Together with lysine residues, arginine residues provide positively charged signals for membrane protein assembly, pro-protein cleavage, and nuclear and nucleolar localization. Rather unreactive chemically at neutral pH, arginine residues yield cyclic, frequently fluorescent, products with α- or β-diketones and related compounds at alkaline pH. In vivo, methylation and ADP-ribosylation are important modifications. ### Lysine Although less strongly basic than arginine residues, the high pKa (typically around 10.5) of the ε-amino groups of lysine ensures that at neutrality most lysyl side chains bear a positive charge and are generally excluded from the non polar interiors of globular proteins. The aliphatic tetramethylene group is hydrophobic, but the hydrophilicity of the protonated amino group dominates. The long flexible side chain contributes to a lack of definition of many ε-amino groups in crystallographically determined structures. As carriers of positive charge, lysine residues contribute to the binding of negatively charged ligands, often involving strong hydrogen bonds. Chemically, the reactions of lysine side chains are those typical of primary aliphatic amines, such as acylation by acid anhydrides and active esters, and modification by isocyanates (yickling substituted ureas), isothiocyanates (yielding thiocarbamyl derivatives), aldehydes (yielding complex products, including cross-linking with other side chains), and imidate esters (yielding amidines). In vivo, lysine is the linkage point for many prosthetic groups, such as biotin and retinal. The side chain is also the site of enzyme-catalyzed oxidation, yielding a variety of cross-linking structures in collagen and elastin, and hydroxylation to hydroxylysine residues. ### Histidine The imidazole group of histidine residues has a typical pK around 6.5, closer to neutrality than other functional groups on proteins. Histidine residues frequently act as proton-transferring catalysts in enzyme reactions; the pll dependence of many enzymic reaction rates around neutrality is associated with protonation of histidine side chains. The two sites for protonation and hydrogen bonding, linked by charge delocalization across the imidazole ring, contribute to its power as an acid-base catalytic group; its involvement in catalysis by serine proteases is a classic example. Histidine side chains are also favored ligands, together with cysteine, for binding metal ions such as Zn²⁺, Cu²⁺, and Fe²⁺. Chemically, the imidazole group is a nucleophile that is acylated by carboxylic acid anhydrides (although the products are hydrolytically unstable) and aromatic sulfonyl chlorides, and alkylated, albeit slowly, with iodoacetic acid. The group is also susceptible to photooxidation and iodination. In vivo post-translational modifications reflect this reactivity. In in vitro studies on small proteins, histidine residues have provided useful proton magnetic resonance signals for studying physicochemical properties, particularly microscopic pH titration. ### Tyrosine The phenolic hydroxyl group of tyrosine residues confers a significant degree of polarity, and the side chain is generally partially exposed to solvent in globular protein structures. The pKa of tyrosine residues in proteins is generally about 10.0, so most tyrosine residues are neutral at physiological pH. Tyrosine side chains can act as ligands for metals, such as Fe²⁺, and take part in hydrogen bonding as donor and acceptor. The phenolate is a reactive nucleophile, both at the oxygen (yielding esters with acid anhydrides or aromatic sulfonyl chlorides) and aromatic ring carbons (undergoing iodination, nitration, condensation with formaldehyde, etc.). Covalent in vivo modifications include phosphorylation (protein tyrosine phosphate ester formation providing important messages in cellular function), sulfation, and, in invertebrate extracellular proteins, cross-linking to other tyrosine residues. Tyrosine residues are also the source of thyroid hormones. ### Tryptophan Tryptophan residues make up only a small percentage of most proteins, but, with the largest side chain of the ribosomally incorporated amino acid residues, they frequently play a key role in structural packing interactions. The high hydrophobicity of the side chain is tempered by the presence of the indole NH group, which can serve as a hydrogen bond donor and favors contact with polar groups. Tryptophan residues often form part of ligand-binding sites, such as those of antibodies, avidin, and lysozyme. The indole ring of tryptophan residues is susceptible to electrophilic attack, for example by halogens; products of oxidative halogenation undergo further degradative reactions, including peptide bond cleavage. The light sensitivity of proteins is to a large extent due to the oxidative degradation of tryptophan residues. Tryptophan residues have the greatest UV absorbance and fluorescence of the protein amino acids, properties used widely in physicochemical studies on protein structural changes and ligand binding. ### Proline Unique as a secondary amino acid, proline residues have restricted conformational mobility. Although hydrophobic, proline residues are often located at the surfaces of proteins in tight turns. Although prolyl residues, lacking an NH group, are unable to take part in full α-helical hydrogen bonding, they are occasionally present in helices, where they confer a distortion on the helical axis. The presence of cis peptidylproline bonds in proteins (as well as the more frequent trans form) and the slow rate of cis-trans interconversion have substantial effects on the rates of polypeptide chain folding in vitro; in vivo there is enzymic catalysis. Chemically, proline residues are unreactive, but the unusually basic peptide nitrogen contributes to the high sensitivity to acid hydrolysis of the aspartyl-proline peptide bond. Owing to their unique sterochemistry, prolyl residues are generally not cleavage sites for common endopeptidases, but proline-specific endopeptidases occur. Prolinedependent properties of peptides and proteins have been reviewed (Yaron and Naider, 1993), as have the structure and function of proline-rich regions (Williamson, 1994). ### Selenocysteine Although not generally regarded as one of the set of primary protein amino acids, selenocysteine differs from other known variant amino acids in that it is incorporated on the ribosome, using the UGA codon (normally a terminator) in a specific context, rather than as a post-translational modification (Böck et al., 1991). Rather few seleocysteine-containing proteins are known, and in most of these a single selenocysteine residue functions at the active sites in oxidation-reduction reactions. Selenocysteine is chemically similar to cysteine, but is more reactive as a nucleophile and more sensitive to oxidation and binding to heavy metal ions. ### Post-Translational Modifications: Terminal Groups Multifarious covalent structural modifications of amino acid residues take place in vivo subsequent to their incorporation into nascent polypeptide chains during translation on the ribosome. Here we are concerned not with the biosynthetic pathways, which are to be discussed in Volume 3, but with the resulting structures. Proteolytic cleavage of the peptide chain during maturation, a very common event, but one that does not modify the amino acid residues, is also treated in Volume 3. The field of post-translational modification has been extensively reviewed (Wold, 1981; Wold and Moldave, 1984). #### Terminal Amino Group The initial step of protein biosynthesis in bacteria generally involves the formation of a peptide bond between N-formyl-L-methionyl-tRNA met and the second aminoacyl (RNA. Thus nascent proteins bear amino-terminal N-formylmethionyl residues. In eukaryotes, the analogous methionyl-tRNA met is incorporated. Although N-terminal methionyl residues derived from the initiating methionine are not uncommon in mature proteins, the N-formyl group is generally very rapidly removed; subsequent cleavage of the methionyl residue occurs in the majority of cases. The effectiveness of the methionyl aminopeptidase at this cleavage is dependent upon the identity of the second residue. In yeast (Saccharomyces cerevisiae) the methionine is completely removed if the penultimate residue has a radius of gyration of 0.129 nm or less (Gly, Ala, Ser, Cys, Thr, Pro, Val), although a proline residue at the third position causes partial inhibition if Thr or Val is in the second position (Moerschell et al., 1990). In fungal and mammalian mitochondria, cleavage of the initiator methionine does not occur, although in at least one plant mitochondrial protein it does (Braun and Schmitz, 1993). Overexpression of proteins in Escherichia coli using plasmid technology can lead to unnatural retention of the methionyl residue as the aminopeptidase becomes saturated or the nascent protein is protected through segregation into insoluble inclusion bodies. Following removal of the formyl group and/or methionine, many proteins are further modified by acylation or alkylation of the amino-terminal residue (Figure 4). #### Terminal α-Carboxyl Group Modification of the terminal α-carboxyl group (apart from proteolytic cleavage) in proteins is less frequently reported than that of the α-amino group. There are, however, several well authenticated post-translational structural changes (Figure 5). ##### Amide The presence of a C-terminal amide group in proteins appears to be essentially limited to small peptide hormones and insect toxins (Mains et al., 1983); some examples are oxytocin, vasopressin, calcitonin, aMSH, gastrin, and melittin. The amide group is generally essential for full biological activity of these peptides. The peptide hormones are derived from precursor proteins via oxidation of intermediates with C-terminal glycine residues, resulting in retention of the glycine nitrogen atom within the amide group (Eipper and Mains, 1988). ##### Methyl Esters Methyl esters of C-terminal S-polyisoprenylated cysteine residues are found in a variety of eukaryotic proteins (see Thioether Linkages, below). A second type of C-terminal methylation, of a leucine residue, has been found in a 36-kDa polypeptide of bovine brain (Xie and Clarke, 1993). ##### Amino Acid Addition α-Tubulin is reversibly modified in vivo by the addition of tyrosine in a peptide bond to the C-terminus (Raybin and Flavin, 1977; Nath and Flavin, 1979). The function of this modification is still unclear. ##### Poly(ADP-ribosyl)ation In addition to two glutamic acid residue side-chain carboxyl groups, the C-terminal carboxyl group of Lys-213 of rat liver histone HI accepts ADP-ribose in an ester (O-glycosidic) linkage (Ogata et al., 1980b). Further attachment of ADP-ribosyl residues through ribosyl-ribosyl linkages yields poly(ADP-ribosyl) derivatives. This modification is reversible in vivo (de Murcia et al., 1988). ### Side-Chain Modifications of Amino-Acid Residues #### Arginine ##### Methylation Three N-methylated derivatives of arginine (Nω-methyl-, Nω,Nω-dimethyl-, and Nω,Nω,Nω-trimethyl-) (Figure 6) have been identified in several proteins, predominantly nucleic acid-binding proteins such as histones and nucleolar proteins, and myelin basic protein, but also a variety of others, including myosin and tooth matrix proteins (Park and Paik, 1990; Kim et al., 1990; Lischwe, 1990). Methylation of myelin basic protein is essential for the correct formation of compact myelin sheath. The proportion of each of the three methylarginine derivatives in mouse brain myelin basic protein varies with the age of the animal and is altered in dismyelinating mutant mice (Rawal et al., 1992). The function of methylation is not fully understood; it is possible that resistance to proteolysis is one factor. ##### Phosphorylation Nω-Phosphorylarginine is also present in myelin basic protein (Smith et al., 1976). A rat liver DNA-bound arginine specific protein kinase, which phosphorylates itself and an 11-kDa chromosomal protein, has been described (Levy-Favatier et al., 1987). ##### ADP-ribosylation Nω-ADP-ribosyl arginine (α and β anomers) (Figure 6) is present in proteins modified by the bacterial enterotoxins cholera toxin and E. coli heat labile toxin (Moss and Vaughan, 1988). These toxins transfer the ADP-ribosyl group from NAD to specific arginine residues in the α subunits of guanine nucleotide regulatory proteins of adenylate cyclase complexes, resulting in activation of adenylate cyclase. Cholera toxin also ADP-ribosylates transducin, a related regulatory GTPase, on Arg-174 (Yatsunami and Khorana, 1985; Abood et al., 1982). Clostridium perfringens iota toxin ADP-ribosylates Arg-177 in skeletal muscle actin (Vandekerckhove et al., 1987). Vertebrate arginine-specific ADP-ribosyltransferases are also known (Ueda and Hayaishi, 1985; Donnelly et al., 1992; Zolkiewska et al., 1992), as are ADP-ribosylarginine hydrolases. It is likely that one function of this modification is to regulate the activity of the G-protein-mediated metabolic pathways. A rabbit muscle ADP-ribosyl transferase modifies desmin, inhibiting its assembly into filaments (Huang et al., 1993). A third class of arginine-specific ADP-ribosyl transferases is encoded by bacteriophage; the T4 phage mod protein ADP ribosylates an arginine in the α subunit of E. coli RNA polymerase (Goff, 1984). Dinitrogenase reductase in the photosynthetic bacterium Rhodospirillium rubrum is regulated by specific, reversible ADP-ribosylation, and genes coding for the enzymes have been described (Fitzmaurice et al., 1989). ##### Citrulline Citrulline, derived by deimination of arginine residues, is present in hair and skin proteins (Rogers et al., 1977; Steinert and Idler, 1979; Rothnagel and Rogers, 1984). ##### Ornithine Ornithine, presuinably derived from arginine, was found in an unusual hydroxyproline-rich glycoprotein lectin from potato tubers (Allen and Neuberger, 1973). ##### Pentosidine The side chain of arginine residues may condense with oxidized derivatives of glycated lysine residues to form pentosidine (Figure 11j) (Grandhce and Monnier, 1991). #### Asparagine ##### Glycosylation One of the most frequent post-translational modifications of eukaryotic plasma membrane extracellular domains and secreted and lysosomal proteins is glycosylation of asparagine side chains (N-linked glycosylation), with β-N-acetylglucosamine as the linkage unit to the nitrogen atom. A large variety of glycan structures have been determined; the majority of these fall into three main structural types, high mannose, complex, and hybrid, depicted in Figure 7a. The structures of the most common monosaccharide moieties found in glycoproteins are shown in Figure 7b. The field of N-linked glycosylation is very large; it has been reviewed in detail (Kornfeld and Kornfeld, 1985; Kobata, 1992; Allen and Kisailus, 1992; Lis and Sharon, 1993). The biosynthetic pathway in eukaryotes involves the transfer of a dolichyl-linked precursor glycan, Glc3-Man, GlcNAc₂-, to the amide group of asparagine side chains in the sequence-Asn-Xaa-(Ser or Thr)-, where Xaa is any amino acid residue except Pro (Kobata, 1992; Kornfeld and Kornfeld, 1985). Cys may substitute for Ser or Thr in rare cases, for example in human von Willebrand factor (Titani et al., 1986) and protein C (Grinnell et al., 1991). Following addition of the glycan chain, maturation involves removal of the glucose units and in most cases trimming of the mannose residues, followed by the addition of fucose, galactose, N-acetylglucosamine and sialic acid, and occasionally additional carbohydrate structures, phosphate and sulfate. Yeast (S. cerevisiae) extracellular glycoproteins bear large polymannosylated structures (Kukuruzinska et al., 1987), and mutant strains with defective glycosylation have been described (Ballou, 1990). The methylotrophic yeast Pichia pastoris produces simpler mannose-rich glycan structures in secreted proteins (Trimble et al., 1991). In insect cells (studied in baculovirus-infected cells), relatively simple glycan chains are often present (Knepper et al., 1992), although complex carbohydrate (bisialo-biantennary) has also been observed (Davidson et al., 1990). Other insect glycoprotein structures are more complex: Asn-linked oligosaccharides from apolipoprotein III of Locusta migratoria have 2-aminoethylphosphonate linked to the 6-position of mannose or N-acetylglucosamine residues (Hård et al., 1993). Some examples of N-linked glycan structures are given in Table 7. It is not clear how the specificity for the formation of the precise glycan chains arises. Not all extracellular asparagine residues in the Asn-Xaa-(Ser or Thr) sequence are glycosylated; in many glycoproteins there is heterogencity, with some molecules carrying more glycan chains than others; there is generally microheterogeneity within the glycan structures also, but a particular site carries predominantly a single class of glycan chain. For example, human melanoma cell tissue plasminogen activator has high mannose chains on Asn-117, and complex tri- or tetra-antennary chains on Asn-448, and about 50% of the protein has mainly biantennary complex chains on Asn-184 (Pohl et al., 1987). Lysosomal enzymes in many cell types bear phosphorylated mannosyl residues (Man-6-P) on sone of their high-mannose and hybrid oligosaccharide structures, which serve as recognition signals for the targeting of these enzymes to lysosomes. The phosphate group is added in a two-step process, involving transfer of N-acetylglucosamine-1-phosphate from UDP-GlcNAc to mannosyl residues, followed by hydrolysis of the intermediate phosphodiester (von Figura and Hasilik, 1986). The initial recognition of nascent proteins by the GlcNAc phosphotransferase involves extended protein domains, at least in the case of procathepsin D (Baranski et al., 1992). An unusual form of N-linked glycosylation is present in nephritogenoside, a basement membrane glycopeptide; a glucose trisaccharide is linked to the amide side chain in the Asn-Pro-Leu N-terminal sequence (Shibata et al., 1988). Asparagine residues in intracytoplasmic proteins and the intracellular domains of transmembrane proteins are rarely glycosylated, although Pedemonte and Kaplan (1992) described N-linked glycosylation of the intracellular domain of the sodium pump α-subunit, presumably on asparagine residues. Asparagine residues in prokaryotic proteins are not generally glycosylated; most characterized exceptions are from the archaebacteria (Lechner and Wieland, 1989), although eubacterial N-linked glycosylation has been reported (Erickson and Herzberg, 1993). The cell surface of Halobacteria has a glycoprotein with glucosylated asparagine as the linkage structure to a sulfated polyglucuronic chain (Wieland et al., 1983). β-Glc-Asn has also been observed in the eukaryotic extracellular protein laminin (Schreiner et al., 1994). The asparaginyl-N-acetylgalactosamine link (in the amino acid sequence Asn-Ala-Ser) to a sulfated repeating unit saccharide containing galactose, galacturonic acid, N-acetylglucosamine, N-acetylgalactosamine,and 3-0-methylgalacturonic acid is also present (Paul et al., 1986). The bulky glycan chains of typical N-linked glycoproteins are very hydrophilic and generally protrude from the surface of the protein in a highly hydrated form. In some instances, however, as in the conserved carbohydrate chains in the F domain of immunoglobulin IgG, the carbohydrate