Chapter 3

Amino Acids, Peptides, and Proteins

Textbook pages 358–474 (Lehninger, 8e) · 25 MCQs below · Source: printed chapter text extracted from the PDF

CHAPTER 3 AMINO ACIDS, PEPTIDES, AND PROTEINS different combinations and sequences. From these building blocks, different organisms can make such widely diverse products as enzymes, hormones, antibodies, transporters, light- harvesting complexes in plants, the ﬂagella of bacteria, muscle ﬁbers, feathers, spider webs, rhinoceros horn, antibiotics, and myriad other substances that have distinct biological functions (Fig. 3-1). Among these protein products, the enzymes are the most varied and specialized. As the catalysts of almost all cellular reactions, enzymes are one of the keys to understanding the chemistry of life, and thus they provide a focal point for any course in biochemistry. FIGURE 3-1 Some functions of proteins. (a) The light produced by fireflies is the result of a reaction involving the protein luciferin and ATP, catalyzed by the enzyme luciferase (see Box 13-1). (b) Erythrocytes contain large amounts of the oxygen-transporting protein hemoglobin. (c) The protein keratin, formed by all vertebrates, is the chief structural component of hair, scales, horn, wool, nails, and feathers. The black rhinoceros is extinct in the wild because of the belief prevalent in some parts of the world that a powder derived from its horn has aphrodisiac properties. In reality, the chemical properties of powdered rhinoceros horn are no diﬀerent from those of powdered bovine hooves or human fingernails. Protein structure and function are the topics of this and the next three chapters. Here, we begin with a description of the fundamental chemical properties of amino acids, peptides, and proteins. We also consider how a biochemist works with proteins. The material is organized around four principles: In every living organism, proteins are constructed from a common set of 20 amino acids. Each amino acid has a side chain with distinctive chemical properties. Amino acids may be regarded as the alphabet in which the language of protein structure is written. In proteins, amino acids are joined in characteristic linear sequences through a common amide linkage, the peptide bond. The amino acid sequence of a protein constitutes its primary structure, a ﬁrst level we will introduce within the broader complexities of protein structure. For study, individual proteins can be separated from the thousands of other proteins present in a cell, based on differences in their chemical and functional properties arising from their distinct amino acid sequences. As proteins are central to biochemistry, the puriﬁcation of individual proteins for study is a quintessential biochemical endeavor. Shaped by evolution, amino acid sequences are a key resource for understanding the function of individual proteins and for tracing broader functional and evolutionary relationships. 3.1 Amino Acids Proteins are polymers of amino acids, with each amino acid residue joined to its neighbor by a speciﬁc type of covalent bond. (The term “residue” reﬂects the loss of the elements of water when one amino acid is joined to another.) Proteins can be broken down (hydrolyzed) to their constituent amino acids by a variety of methods, and the earliest studies of proteins naturally focused on the free amino acids derived from them. Twenty different amino acids are commonly found in proteins. The ﬁrst to be discovered was asparagine, in 1806. The last of the 20 to be found, threonine, was not identiﬁed until 1938. All the amino acids have trivial or common names, in some cases derived from the source from which they were ﬁrst isolated. Asparagine was ﬁrst found in asparagus, and glutamate in wheat gluten; tyrosine was ﬁrst isolated from cheese (its name is derived from the Greek tyros, “cheese”); glycine (Greek glykos, “sweet”) was so named because of its sweet taste. Learning the names, structures, and chemical properties of the 20 common amino acids found in proteins is one of the key memorization trials of every beginning biochemistry student. The necessity rapidly becomes apparent in succeeding chapters. It is impossible to discuss protein structure, protein function, ligand-binding sites, enzyme active sites, and most other biochemical topics without this foundation. The amino acids are part of the biochemistry vocabulary. Amino Acids Share Common Structural Features All 20 of the common amino acids are α -amino acids. They have a carboxyl group and an amino group bonded to the same carbon atom (the α carbon) (Fig. 3-2). They differ from each other in their side chains, or R groups, which vary in structure, size, and electric charge, and which inﬂuence the solubility of the amino acids in water. In addition to these 20 amino acids, there are many less common ones. Some are residues modiﬁed aer a protein has been synthesized, others are amino acids present in living organisms but not as constituents of proteins, and two are special cases found in just a few proteins. The common amino acids of proteins have been assigned three- letter abbreviations and one-letter symbols (see Table 3-1), which are used as shorthand to indicate the composition and sequence of amino acids polymerized in proteins. FIGURE 3-2 General structure of an amino acid. This structure is common to all but one of the α -amino acids. (Proline, a cyclic amino acid, is the exception.) The R group, or side chain (purple), attached to the α carbon (gray) is diﬀerent in each amino acid. KEY CONVENTION The three-letter code is easily understood, the abbreviations generally consisting of the ﬁrst three letters of the amino acid name. The one-letter code was devised by Margaret Oakley Dayhoff, considered by many to be the founder of the ﬁeld of bioinformatics. The one-letter code reﬂects an attempt to reduce the size of the data ﬁles (in an era of limited computer memory) used to describe amino acid sequences. It was designed to be easily memorized, and understanding its origin can help students do just that. For six amino acids (CHIMSV), the ﬁrst letter of the amino acid name is unique and thus is used as the symbol. For ﬁve others (AGLPT), the ﬁrst letter of the name is not unique but is assigned to the amino acid that is most common in proteins (for example, leucine is more common than lysine). For another four, the letter used is phonetically suggestive (RFYW: aRginine, Fenylalanine, tYrosine, tWiptophan). The rest were harder to assign. Four (DNEQ) were assigned letters found within or suggested by their names (asparDic, asparagiNe, glutamEke, Q- tamine). That le lysine. Only a few letters were le, and K was chosen because it was the closest to L. Margaret Oakley Dayhoﬀ, 1925–1983 For all the common amino acids except glycine, the α carbon is bonded to four different groups: a carboxyl group, an amino group, an R group, and a hydrogen atom (Fig. 3-2; in glycine, the R group is another hydrogen atom). The α -carbon atom is thus a chiral center (p. 61). Because of the tetrahedral arrangement of the bonding orbitals around the α -carbon atom, the four different groups can occupy two unique spatial arrangements, and thus amino acids have two possible stereoisomers. Since they are nonsuperposable mirror images of each other (Fig. 3-3), the two forms represent a class of stereoisomers called enantiomers (see Fig. 1-21). All molecules with a chiral center are also optically active — that is, they rotate the plane of plane-polarized light (see Box 1-2). FIGURE 3-3 Stereoisomerism in α -amino acids. (a) The two stereoisomers of alanine, - and -alanine, are nonsuperposable mirror images of each other (enantiomers). (b, c) Two diﬀerent conventions for showing the configurations in space of stereoisomers. In perspective formulas (b), the solid wedge-shaped bonds project out of the plane of the paper, the dashed bonds behind it. In projection formulas (c), the horizontal bonds are assumed to project out of the plane of the paper, the vertical bonds behind. However, projection formulas are oen used casually and are not always intended to portray a specific stereochemical configuration. See Figure 3-4 for an explanation of the , -system for specifying absolute configuration. KEY CONVENTION Two conventions are used to identify the carbons in an amino acid — a practice that can be confusing. The additional carbons in an R group are commonly designated β , γ , δ , ε , and so forth, proceeding out from the α carbon. For most other organic molecules, carbon atoms are simply numbered from one end, giving highest priority (C-1) to the carbon with the substituent containing the atom of highest atomic number. Within this latter convention, the carboxyl carbon of an amino acid would be C-1 and the α carbon would be C-2. In cases such as amino acids with heterocyclic R groups (e.g., histidine), where the Greek lettering system is ambiguous, the numbering system is used. For branched amino acid side chains, equivalent carbons are given numbers aer the Greek letters. Leucine thus has δ 1 and δ 2 carbons (see the structure in Fig. 3-5). Special nomenclature has been developed to specify the absolute conﬁguration of the four substituents of asymmetric carbon atoms. The absolute conﬁgurations of simple sugars and amino acids are speciﬁed by the D, L system (Fig. 3-4), based on the absolute conﬁguration of the three- carbon sugar glyceraldehyde, a convention proposed by Emil Fischer in 1891. (Fischer knew what groups surrounded the asymmetric carbon of glyceraldehyde but had to guess at their absolute conﬁguration; he guessed right, as was later conﬁrmed by x-ray diffraction analysis.) For all chiral compounds, stereoisomers having a conﬁguration related to that of L-glyceraldehyde are designated L, and stereoisomers related to D-glyceraldehyde are designated D. The functional groups of L- alanine are matched with those of L-glyceraldehyde by aligning those that can be interconverted by simple, one-step chemical reactions. Thus the carboxyl group of L-alanine occupies the same position about the chiral carbon as does the aldehyde group of L-glyceraldehyde, because an aldehyde is readily converted to a carboxyl group via a one-step oxidation. Historically, the similar l and d designations were used for levorotatory (rotating plane-polarized light to the le) and dextrorotatory (rotating light to the right). However, not all L-amino acids are levorotatory, and the convention shown in Figure 3-4 was needed to avoid potential ambiguities about absolute conﬁguration. By Fischer’s convention, L and D refer only to the absolute conﬁguration of the four substituents around the chiral carbon, not to optical properties of the molecule. FIGURE 3-4 Steric relationship of the stereoisomers of alanine to the absolute configuration of - and -glyceraldehyde. In these perspective formulas, the carbons are lined up vertically, with the chiral atom in the center. The carbons in these molecules are numbered beginning with the terminal aldehyde or carboxyl carbon (red), 1 to 3 from top to bottom as shown. When presented in this way, the R group of the amino acid (in this case the methyl group of alanine) is always below the α carbon. - Amino acids are those with the α -amino group on the le, and -amino acids have the α -amino group on the right. Another system of specifying conﬁguration around a chiral center is the RS system, which is used in the systematic nomenclature of organic chemistry and describes more precisely the conﬁguration of molecules with more than one chiral center (p. 17). The Amino Acid Residues in Proteins Are Stereoisomers Nearly all biological compounds with a chiral center occur naturally in only one stereoisomeric form, either D or L. The amino acid residues in protein molecules are almost all L stereoisomers, with less than 1% being found in the D-conﬁguration. The rare D-amino acid residues generally have a precise structural purpose, and they are introduced to a protein by enzyme-catalyzed reactions that occur aer the proteins are synthesized on a ribosome. It is remarkable that virtually all amino acid residues in proteins are L stereoisomers. When chiral compounds are formed by ordinary chemical reactions, the result is a racemic mixture of D and L isomers, which are difﬁcult for a chemist to distinguish and separate. But to a living system, D and L isomers are as different as the right hand and the le. The formation of stable, repeating substructures in proteins (Chapter 4) requires that their constituent amino acids be of one stereochemical series. Cells are able to speciﬁcally synthesize the L isomers of amino acids because the active sites of enzymes are asymmetric, causing the reactions they catalyze to be stereospeciﬁc. Amino Acids Can Be Classified by R Group Knowledge of the chemical properties of the common amino acids is central to an understanding of biochemistry. The topic can be simpliﬁed by grouping the amino acids into ﬁve main classes based on the properties of their R groups (Table 3-1), particularly their polarity, or tendency to interact with water at biological pH (near pH 7.0). The polarity of the R groups varies widely, from nonpolar and hydrophobic (water-insoluble) to highly polar and hydrophilic (water-soluble). A few amino acids — especially glycine, histidine, and cysteine — are somewhat difﬁcult to characterize or do not ﬁt perfectly in any one group. They are assigned to particular groupings based on considered judgments rather than absolutes. TABLE 3-1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins pKa values Amino acid Abbreviation/symbol Mr pK1 (—COOH) pK2 (—NH+3) pKR (R group) pI Hydro inde Nonpolar, aliphatic R groups Glycine Gly G 75 2.34 9.60 5.97 Alanine Ala A 89 2.34 9.69 6.01 Proline Pro P 115 1.99 10.96 6.48 Valine Val V 117 2.32 9.62 5.97 Leucine Leu L 131 2.36 9.60 5.98 Isoleucine Ile I 131 2.36 9.68 6.02 Methionine Met M 149 2.28 9.21 5.74 Aromatic R groups Phenylalanine Phe F 165 1.83 9.13 5.48 Tyrosine Tyr Y 181 2.20 9.11 10.07 5.66 Tryptophan Trp W 204 2.38 9.39 5.89 Polar, uncharged R groups Serine Ser S 105 2.21 9.15 5.68 Threonine Thr T 119 2.11 9.62 5.87 Cysteine Cys C 121 1.96 10.28 8.18 5.07 Asparagine Asn N 132 2.02 8.80 5.41 a e Glutamine Gln Q 146 2.17 9.13 5.65 Positively charged R groups Lysine Lys K 146 2.18 8.95 10.53 9.74 Histidine His H 155 1.82 9.17 6.00 7.59 Arginine Arg R 174 2.17 9.04 12.48 10.76 Negatively charged R groups Aspartate Asp D 133 1.88 9.60 3.65 2.77 Glutamate Glu E 147 2.19 9.67 4.25 3.22 Mr values reflect the structures as shown in Figure 3-5. The elements of water (Mr 18) are deleted when the amino acid is incorporated into a polypeptide. A scale combining hydrophobicity and hydrophilicity of R groups. The values reflect the free energy (ΔG) of transfer of the amino acid side chain from a hydrophobic envfavorable (ΔG<0; negative value in the index) for charged or polar amino acid side chains, and it is unfavorable (ΔG>0; positive value in the index) for amino acids witchains. See Chapter 11. Source: Data from J. Kyte and R. F. Doolittle, J. Mol. Biol. 157:105, 1982. The first value in each row is the average occurrence in more than 1,150 proteins. Source: Data from R. F. Doolittle, in Prediction of Protein Structure and the Principles of Prp. 599, Plenum Press, 1989. The second and third values are, respectively, from the complete proteomes of nine mesophilic bacterial species and seven thermophilic bactecommonly encountered temperatures, whereas thermophiles grow at elevated temperatures up to and beyond the boiling point of water. The decline in glutamine occurretendency of this amino acid to deaminate at high temperatures. Source: Data from A. C. Singer and D. A. Hickey, Gene 317:39, 2003. As originally composed, the hydropathy index takes into account the frequency with which an amino acid residue appears on the surface of a protein. As proline oen aplower score than its chain of methylene groups would suggest. Cysteine is generally classified as polar, despite having a positive hydropathy index. This reflects the ability of the sulfhydryl group to act as a weak acid and to form a weanitrogen. The structures of the 20 common amino acids are shown in Figure 3-5, and some of their properties are listed in Table 3-1. Within each class there are gradations of polarity, size, and shape of the R groups. a b c d e FIGURE 3-5 The 20 common amino acids of proteins. The structural formulas show the state of ionization that would predominate at pH 7.0. The unshaded portions are those that are common to all the amino acids; the shaded portions are the R groups. Although the R group of histidine is shown uncharged, its pKa (see Table 3-1) is such that a small but significant fraction of these groups are positively charged at pH 7.0. The protonated form of histidine is shown above the graph in Figure 3-12b. Nonpolar, Aliphatic R Groups The R groups in this class of amino acids are nonpolar and hydrophobic. The side chains of alanine, valine, leucine, and isoleucine tend to cluster together within proteins, stabilizing protein structure through the hydrophobic effect. Glycine has the simplest structure. Although it is most easily grouped with the nonpolar amino acids, its very small side chain makes no real contribution to interactions driven by the hydrophobic effect. Methionine, one of the two sulfur-containing amino acids, has a slightly nonpolar thioether group in its side chain. Proline has an aliphatic side chain with a distinctive cyclic structure. The secondary amino (imino) group of proline residues is held in a rigid conformation that reduces the structural ﬂexibility of polypeptide regions containing proline. Aromatic R Groups Phenylalanine, tyrosine, and tryptophan, with their aromatic side chains, are relatively nonpolar (hydrophobic). All can contribute to the hydrophobic effect. The hydroxyl group of tyrosine can form hydrogen bonds, and it is an important functional group in some enzymes. Tyrosine and tryptophan are signiﬁcantly more polar than phenylalanine because of the tyrosine hydroxyl group and the nitrogen of the tryptophan indole ring. Tryptophan and tyrosine, and to a much lesser extent phenylalanine, absorb ultraviolet light (Fig. 3- 6; see also Box 3-1). This accounts for the characteristic strong absorbance of light by most proteins at a wavelength of 280 nm, a property exploited by researchers in the characterization of proteins. FIGURE 3-6 Absorption of ultraviolet light by aromatic amino acids. Comparison of the light absorption spectra of the aromatic amino acids tryptophan, tyrosine, and phenylalanine at pH 6.0. The amino acids are present in equimolar amounts (10−3 M ) under identical conditions. The measured absorbance of tryptophan is more than four times that of tyrosine at a wavelength of 280 nm. Note that the maximum light absorption for both tryptophan and tyrosine occurs near 280 nm. Light absorption by phenylalanine generally contributes little to the spectroscopic properties of proteins. BOX 3-1 METHODSAbsorption of Light by Molecules: The Lambert-Beer Law A wide range of biomolecules absorb light at characteristic wavelengths, just as tryptophan absorbs light at 280 nm (seeFig. 3-6). Measurement of light absorption by a spectrophotometer is used to detect and identify molecules and tomeasure their concentration in solution. The fraction of the incident light absorbed by a solution at a given wavelength isrelated to the thickness of the absorbing layer (path length) and the concentration of the absorbing species (Fig. 1). Thesetwo relationships are combined into the Lambert-Beer law, log =εcl where I0 is the intensity of the incident light, I is the intensity of the transmitted light, the ratio I/I0 (the inverse of theratio in the equation) is the transmittance, ε is the molar extinction coeﬀicient (in units of liters per mole-centimeter), c isthe concentration of the absorbing species (in moles per liter), and l is the path length of the light-absorbing sample (incentimeters). The Lambert-Beer law assumes that the incident light is parallel and monochromatic (of a single wavelength) and that the solvent and solute molecules are randomly oriented. The expression log (I0/I) is called theabsorbance, designated A. FIGURE 1 The principal components of a spectrophotometer. A light source emits light along a broad spectrum, then themonochromator selects and transmits light of a particular wavelength. The monochromatic light passes through the sample in acuvette of path length l. The absorbance of the sample, log (I0/I), is proportional to the concentration of the absorbing species.The transmitted light is measured by a detector. It is important to note that each successive millimeter of path length of absorbing solution in a 1.0 cm cell absorbs not aconstant amount but a constant fraction of the light that is incident upon it. However, with an absorbing layer of fixed pathlength, the absorbance, A, is directly proportional to the concentration of the absorbing solute. The molar extinction coeﬀicient varies with the nature of the absorbing compound, the solvent, and the wavelength, andalso with pH if the light-absorbing species is in equilibrium with an ionization state that has diﬀerent absorbanceproperties. Polar, Uncharged R Groups The R groups of these amino acids are more soluble in water, or more hydrophilic, than those of the nonpolar amino acids because they contain functional groups that form hydrogen bonds with water. This class of amino acids includes serine, threonine, cysteine, asparagine, and glutamine. The polarity of serine and threonine is contributed by their hydroxyl groups, and the polarity of asparagine and glutamine is contributed by their amide groups. Cysteine is an outlier here because its polarity, contributed by its sulfhydryl group, is quite modest. Cysteine is a weak acid and can make weak hydrogen bonds with oxygen or nitrogen. I0 I Asparagine and glutamine are the amides of two other amino acids also found in proteins — aspartate and glutamate, respectively — to which asparagine and glutamine are easily hydrolyzed by acid or base. Cysteine is readily oxidized to form a covalently linked dimeric amino acid called cystine, in which two cysteine molecules or residues are joined by a disulﬁde bond (Fig. 3-7). The disulﬁde-linked residues are strongly hydrophobic (nonpolar). Disulﬁde bonds play a special role in the structures of many proteins by forming covalent links between parts of a polypeptide molecule or between two different polypeptide chains. FIGURE 3-7 Reversible formation of a disulfide bond by the oxidation of two molecules of cysteine. Disulfide bonds between Cys residues stabilize the structures of many proteins. Positively Charged (Basic) R Groups The most hydrophilic R groups are those that are either positively charged or negatively charged. The amino acids in which the R groups have signiﬁcant positive charge at pH 7.0 are lysine, which has a second primary amino group at the ε position on its aliphatic chain; arginine, which has a positively charged guanidinium group; and histidine, which has an aromatic imidazole group. As the only common amino acid having an ionizable side chain with pKa near neutrality, histidine may be positively charged (protonated form) or uncharged at pH 7.0. His residues facilitate many enzyme-catalyzed reactions by serving as proton donors/acceptors. Negatively Charged (Acidic) R Groups The two amino acids having R groups with a net negative charge at pH 7.0 are aspartate and glutamate, each of which has a second carboxyl group. Uncommon Amino Acids Also Have Important Functions In addition to the 20 common amino acids, proteins may contain residues created by modiﬁcation of common residues already incorporated into a polypeptide — that is, through postsynthetic modiﬁcation (Fig. 3-8a). Among these uncommon amino acids are 4-hydroxyproline, a derivative of proline found in the ﬁbrous protein collagen, and γ -carboxyglutamate, found in the blood- clotting protein prothrombin and in certain other proteins that bind Ca2+ as part of their biological function. More complex is desmosine, a derivative of four Lys residues, which is found in the ﬁbrous protein elastin. FIGURE 3-8 Uncommon amino acids. (a) Some uncommon amino acids found in proteins. Most are derived from common amino acids. (Note the use of either numbers or Greek letters in the names of these structures to identify the altered carbon atoms.) Extra functional groups added by modification reactions are shown in red. Desmosine is formed from four Lys residues (the carbon backbones are shaded in light red). Selenocysteine and pyrrolysine are exceptions: these amino acids are added during normal protein synthesis through a highly specialized expansion of the standard genetic code. Both are found in very small numbers of proteins. (b) Reversible amino acid modifications involved in regulation of protein activity. Phosphorylation is the most common type of regulatory modification. (c) Ornithine and citrulline, which are not found in proteins, are intermediates in the biosynthesis of arginine and in the urea cycle. Selenocysteine and pyrrolysine are special cases. These rare amino acid residues are not created through a postsynthetic modiﬁcation. Instead, they are introduced during protein synthesis through an unusual adaptation of the genetic code, which we describe in Chapter 27. Selenocysteine contains selenium rather than the sulfur of cysteine. Actually derived from serine, selenocysteine is a constituent of just a few known proteins. Pyrrolysine is found in a few proteins in several methanogenic (methane-producing) archaea and in one known bacterium; it plays a role in methane biosynthesis. Some amino acid residues in a protein may be modiﬁed transiently to alter the protein’s function. The addition of phosphoryl, methyl, acetyl, adenylyl, ADP-ribosyl, or other groups to particular amino acid residues can increase or decrease a protein’s activity (Fig. 3-8b). Phosphorylation is a particularly common regulatory modiﬁcation. Covalent modiﬁcation as a protein regulatory strategy is discussed in more detail in Chapter 6. Some 300 additional amino acids have been found in cells. They have a variety of functions, but not all are constituents of proteins. Ornithine and citrulline (Fig. 3-8c) deserve special note because they are key intermediates (metabolites) in the biosynthesis of arginine (Chapter 22) and in the urea cycle (Chapter 18). Amino Acids Can Act as Acids and Bases The amino and carboxyl groups of amino acids, along with the ionizable R groups of some amino acids, function as weak acids and bases. When an amino acid lacking an ionizable R group is dissolved in water at neutral pH, the α -amino and carboxyl groups create a dipolar ion, or zwitterion (German for “hybrid ion”), which can act as either an acid or a base (Fig. 3-9). Substances having this dual (acid-base) nature are amphoteric and are oen called ampholytes (from “amphoteric electrolytes”). A simple monoamino monocarboxylic α -amino acid, such as alanine, is a diprotic acid when fully protonated; it has two groups, the —COOH group and the —NH+3 group, that can yield protons:

FIGURE 3-9 Nonionic and zwitterionic forms of amino acids. The nonionic form does not occur in significant amounts in aqueous solutions. The zwitterion predominates at neutral pH. A zwitterion can act as either an acid (proton donor) or a base (proton acceptor). Acid-base titration involves the gradual addition or removal of protons (Chapter 2). Figure 3-10 shows the titration curve of the diprotic form of glycine. The two ionizable groups of glycine, the carboxyl group and the amino group, are titrated with a strong base such as NaOH. The plot has two distinct stages, corresponding to deprotonation of two different groups on glycine. Each of the two stages resembles in shape the titration curve of a monoprotic acid, such as acetic acid (see Fig. 2- 16), and can be analyzed in the same way. At very low pH, the predominant ionic species of glycine is the fully protonated form, +H3N−CH2—COOH. FIGURE 3-10 Titration of amino acids. The titration curve of 0.1 M glycine at 25 °C. The ionic species predominating at key points in the titration are shown above the graph. The shaded boxes, centered at about pK1=2.3 and pK2=9.60, indicate the regions of greatest buﬀering power. Note that 1 equivalent of OH−=0.1 M NaOH added. The pI occurs at the arithmetic mean between the two pKa values, and it corresponds to the inflection point in the titration. In the ﬁrst stage of the titration, the —COOH group of glycine (with its lower pKa) loses its proton. At the midpoint of this stage, equimolar concentrations of the proton-donor (+H3N—CH2—COOH) and the proton-acceptor (+H3N—CH2—COO−) species are present. As in the titration of any weak acid, a point of inﬂection is reached at this midpoint where the pH is equal to the pKa of the protonated group that is being titrated (see Fig. 2-17). For glycine, the pH at the midpoint is 2.34; thus its —COOH group has a pKa (labeled pK1 in Fig. 3-10) of 2.34. (Recall from Chapter 2 that pH and pKa are simply convenient notations for proton concentration and the equilibrium constant for ionization, respectively. The pKa is a measure of the tendency of a group to give up a proton, with that tendency decreasing 10-fold as the pKa increases by one unit.) As the titration of glycine proceeds, another point of inﬂection is reached at pH 5.97; at this point, removal of the ﬁrst proton is essentially complete and removal of the second has just begun. At this pH, glycine is present largely as the dipolar ion (zwitterion) +H3N—CH2—COO−. We shall return to the signiﬁcance of this inﬂection point in the titration curve (labeled pI in Fig. 3-10) shortly. The second stage of the titration corresponds to the removal of a proton from the —NH+3 group of glycine. The pH at the midpoint of this stage is 9.60, equal to the pKa (labeled pK2 in Fig. 3-10) for the —NH+3 group. The titration is essentially complete at a pH of about 12, at which point the predominant form of glycine is H2N—CH2—COO−. From the titration curve of glycine we can derive several important pieces of information. First, it gives a quantitative measure of the pKa of each of the two ionizing groups: 2.34 for the —COOH group and 9.60 for the —NH+3 group. Note that the carboxyl group of glycine is over 100 times more acidic (more easily ionized) than the carboxyl group of acetic acid, which, as we saw in Chapter 2, has a pKa of 4.76 — about average for a carboxyl group attached to an otherwise unsubstituted aliphatic hydrocarbon. The perturbed pKa of glycine is caused primarily by the nearby positively charged amino group on the α -carbon atom, an electronegative group that tends to pull electrons toward it (a process called electron withdrawal), as described in Figure 3-11. The opposite charges on the resulting zwitterion are also somewhat stabilizing. Similarly, the pKa of the amino group in glycine is perturbed downward relative to the average pKa of an amino group. This effect is due largely to electron withdrawal by the electronegative oxygen atoms in the carboxyl groups, increasing the tendency of the amino group to give up a proton. Hence, the α -amino group has a pKa that is lower than that of an aliphatic amine such as methylamine (Fig. 3-11). In short, the pKa of any functional group is greatly affected by its chemical environment, a phenomenon sometimes exploited in the active sites of enzymes to promote exquisitely adapted reaction mechanisms that depend on the perturbed pKa values of proton donor/acceptor groups of speciﬁc residues.

FIGURE 3-11 Eﬀect of the chemical environment on pKa. The pKa values for the ionizable groups in glycine are lower than those for simple, methyl-substituted amino and carboxyl groups. These downward perturbations of pKa are due to intramolecular interactions. Similar eﬀects can be caused by chemical groups that happen to be positioned nearby — for example, in the active site of an enzyme. The second piece of information provided by the titration curve of glycine is that this amino acid has two regions of buffering power. One of these is the relatively ﬂat portion of the curve, extending for approximately 1 pH unit on either side of the ﬁrst pKa of 2.34, indicating that glycine is a good buffer near this pH. The other buffering zone is centered around pH 9.60. (Note that glycine is not a good buffer at the pH of intracellular ﬂuid or blood, about 7.4.) Within the buffering ranges of glycine, the Henderson-Hasselbalch equation (p. 60) can be used to calculate the proportions of proton-donor and proton-acceptor species of glycine required to make a buffer at a given pH. A ﬁnal important piece of information derived from the titration curve of an amino acid is the relationship between its net charge and the pH of the solution. At pH 5.97, the point of inﬂection between the two stages in its titration curve, glycine is present predominantly as its dipolar form, fully ionized but with no net electric charge (Fig. 3-10). The characteristic pH at which the net electric charge is zero is called the isoelectric point or isoelectric pH, designated pI. For glycine, which has no ionizable group in its side chain, the isoelectric point is simply the arithmetic mean of the two pKa values: pI= (pK1+pK2)= (2.34+9.60)=5.97 As is evident in Figure 3-10, glycine has a net negative charge at any pH above its pI and thus will move toward the positive electrode (the anode) when placed in an electric ﬁeld. At any pH below its pI, glycine has a net positive charge and will move toward the negative electrode (the cathode). The farther the pH of a glycine solution is from its isoelectric point, the greater the net electric charge of the population of glycine molecules. At pH 1.0, for example, glycine exists almost entirely as the form +H3N—CH2—COOH with a net positive charge of 1.0. At pH 2.34, where there is an equal mixture of +H3N—CH2—COOH and +H3N—CH2—COO−, the average or net positive charge is 0.5. The sign and the magnitude of the net charge of any amino acid at any pH can be predicted in the same way. Amino Acids Diﬀer in Their Acid-Base Properties The shared properties of many amino acids permit some simplifying generalizations about their acid-base behaviors. First, all amino acids with a single α -amino group, a single α -carboxyl group, and an R group that does not ionize have titration curves resembling that of glycine (Fig. 3-10). These amino acids have very similar, although not identical, pKa values: pKa of the —COOH group in the range of 1.8 to 2.4, and pKa of the —NH+3 group in the range of 8.8 to 11.0 (Table 3-1). The differences in these pKa values reﬂect the chemical environments imposed by their R groups. 12 12 Second, amino acids with an ionizable R group have more complex titration curves, with three stages corresponding to the three possible ionization steps; thus, they have three pKa values. The additional stage for the titration of the ionizable R group merges to some extent with that for the titration of the α -carboxyl group, the titration of the α -amino group, or both. The titration curves for two amino acids of this type, glutamate and histidine, are shown in Figure 3-12. The isoelectric points reﬂect the nature of the ionizing R groups that are present. For example, glutamate has a pI of 3.22, considerably lower than that of glycine. This is due to the presence of two carboxyl groups, which, at the average of their pKa values (3.22), contribute a net charge of −1 that balances the +1 contributed by the amino group. Similarly, the pI of histidine, with two groups that are positively charged when protonated, is 7.59 (the average of the pKa values of the amino and imidazole groups), much higher than that of glycine. FIGURE 3-12 Titration curves for (a) glutamate and (b) histidine. The pKa of the R group is designated here as pKR. In both cases, the presence of three ionizable groups renders the titration curve more complex. Note that for glutamate, the pI is approximately the arithmetic mean of the pKa of the two groups that are negatively charged. There is a net charge of 0 (the pI) when these two groups contribute a net charge of −1 (one protonated, the other not) to exactly balance the +1 charge of the protonated α -amino group. Similarly, the pI for histidine is approximately the arithmetic mean of the pKa of the two groups that are positively charged when protonated. Finally, in an aqueous environment, only histidine has an R group (pKa=6.0) providing signiﬁcant buffering power near the neutral pH usually found in the intracellular and extracellular ﬂuids of most animals and bacteria (Table 3-1). SUMMARY 3.1 Amino Acids The 20 amino acids commonly found as residues in proteins contain an α -carboxyl group, an α - amino group, and a distinctive R group substituted on the α -carbon atom. The α -carbon atom ofall amino acids except glycine is asymmetric, and thus amino acids can exist in at least twostereoisomeric forms. Only the L stereoisomers of amino acids, with a conﬁguration related to the absoluteconﬁguration of the reference molecule L-glyceraldehyde, are found in proteins. Amino acids can be classiﬁed into ﬁve types on the basis of the polarity and charge (at pH 7) oftheir R groups. Other, less common amino acids also occur, either as constituents of proteins (usually throughmodiﬁcation of common amino acid residues aer protein synthesis) or as free metabolites. Amino acids vary in their acid-base properties and have characteristic titration curves.Monoamino monocarboxylic amino acids (with nonionizable R groups) are diprotic acids (+H3NCH(R)COOH) at low pH and exist in several different ionic forms as the pH is increased. Amino acids with ionizable R groups have additional ionic species, depending on the pH of the medium and the pKa of the R group. 3.2 Peptides and Proteins We now turn to polymers of amino acids, the peptides and proteins. Biologically occurring polypeptides range in size from small, consisting of two or three linked amino acid residues, to very large, consisting of thousands of residues. Peptides Are Chains of Amino Acids Two amino acid molecules can be covalently joined through a substituted amide linkage, termed a peptide bond, to yield a dipeptide. Such a linkage is formed by removal of the elements of water (dehydration) — a hydroxyl moiety from the α - carboxyl group of one amino acid and a hydrogen atom the α - amino group of another (Fig. 3-13). The joined amino acids are referred to as residues, the part le over aer the elements of water are removed. Peptide bond formation is an example of a condensation reaction, a common class of reactions in living cells. The reverse reaction, bond breakage involving water, is an example of hydrolytic cleavage or hydrolysis. Under standard biochemical conditions, the equilibrium for the reaction shown in Figure 3-13 favors the hydrolysis of the dipeptide into amino acids. To make the condensation reaction thermodynamically more favorable, the carboxyl group must be chemically modiﬁed or activated so that the hydroxyl group can be more readily eliminated. A chemical approach to this problem is outlined later in this chapter. The biological approach to peptide bond formation is a major topic of Chapter 27. FIGURE 3-13 Formation of a peptide bond by condensation. The α -amino group of one amino acid (with the R2 group) acts as a nucleophile to displace the hydroxyl group of another amino acid (with the R1 group), forming a peptide bond (shaded in light red). Amino groups are good nucleophiles, but the hydroxyl group is a poor leaving group and is not readily displaced. At physiological pH, the reaction shown here does not occur to any appreciable extent. Three amino acids can be joined by two peptide bonds to form a tripeptide; similarly, four amino acids can be linked to form a tetrapeptide, ﬁve to form a pentapeptide, and so forth. When a few amino acids are joined in this fashion, the structure is called an oligopeptide. When many amino acids are joined, the product is called a polypeptide. Proteins may have thousands of amino acid residues. Although the terms “protein” and “polypeptide” are sometimes used interchangeably, molecules referred to as polypeptides generally have molecular weights below 10,000, and those called proteins have higher molecular weights. Figure 3-14 shows the structure of a pentapeptide. In a peptide, the amino acid residue at the end with a free α -amino group is the amino-terminal (or N-terminal) residue; the residue at the other end, which has a free carboxyl group, is the carboxyl- terminal (C-terminal) residue. FIGURE 3-14 The pentapeptide serylglycyltyrosylalanylleucine, Ser–Gly– Tyr–Ala–Leu, or SGYAL. Peptides are named beginning with the amino- terminal residue, which by convention is placed at the le . The peptide bonds are shaded in light red; the R groups are in red. KEY CONVENTION When an amino acid sequence of a peptide, a polypeptide, or a protein is displayed, the amino-terminal end is placed on the le and the carboxyl-terminal end is placed on the right. The sequence is read from le to right, beginning with the amino- terminal end.

Although hydrolysis of a peptide bond is an exergonic reaction, it occurs only slowly because it has a high activation energy (p. 25). As a result, the peptide bonds in proteins are quite stable, with an average half-life (t1/2) of about 7 years under most intracellular conditions. Peptides Can Be Distinguished by Their Ionization Behavior Peptides contain only one free α -amino group and one free α -carboxyl group, at opposite ends of the chain (Fig. 3-15). These groups ionize as they do in free amino acids. The α -amino and α -carboxyl groups of all nonterminal amino acids are covalently joined in the peptide bonds. They can no longer ionize and thus do not contribute to the total acid-base behavior of peptides. Ionizable R groups in a peptide (Table 3-1) also contribute to the overall acid-base properties of the molecule (Fig. 3-15). FIGURE 3-15 Alanylglutamylglycyllysine. This tetrapeptide has one free α -amino group, one free α -carboxyl group, and two ionizable R groups. The groups ionized at pH 7.0 are in red. Like free amino acids, peptides have characteristic titration curves and a characteristic isoelectric pH (pI) at which the net charge is zero and they do not move in an electric ﬁeld. These properties are exploited in some of the techniques used to separate peptides and proteins, as we describe later in the chapter. When an amino acid becomes a residue in a peptide, its chemical environment is altered, and the pKa value for an ionizable R group can change somewhat. The pKa values for R groups listed in Table 3-1 can be a useful guide to the pH range in which a given group will ionize, but they cannot be strictly applied when an amino acid becomes part of a peptide. Biologically Active Peptides and Polypeptides Occur in a Vast Range of Sizes and Compositions No generalizations can be made about the molecular weights of biologically active peptides and proteins in relation to their functions. Naturally occurring peptides range in length from two to many thousands of amino acid residues. Even the smallest peptides can have biologically important effects. Consider the commercially synthesized dipeptide L-aspartyl-L-phenylalanine methyl ester, the artiﬁcial sweetener better known as aspartame or NutraSweet. Many small peptides exert their effects at very low concentrations. For example, a number of vertebrate hormones (Chapter 23) are small peptides. These include oxytocin (nine amino acid residues), which is secreted by the posterior pituitary gland and stimulates uterine contractions, and thyrotropin- releasing factor (three residues), which is formed in the hypothalamus and stimulates the release of another hormone, thyrotropin, from the anterior pituitary gland. Some extremely toxic mushroom poisons, such as amanitin, are also small peptides, as are many antibiotics. How long are the polypeptide chains in proteins? As Table 3-2 shows, lengths vary considerably. Human cytochrome c has 104 amino acid residues linked in a single chain; bovine chymotrypsinogen has 245 residues. At the extreme is titin, a constituent of vertebrate muscle, which has nearly 27,000 amino acid residues and a molecular weight of about 3,000,000. The vast majority of naturally occurring proteins are much smaller than this, containing fewer than 2,000 amino acid residues. TABLE 3-2 Molecular Data on Some Proteins Protein Molecular weight Number of residues Number of polypeptide chains Cytochrome c (human) 12,400 104 1 Myoglobin (equine heart) 16,700 153 1 Chymotrypsin (bovine pancreas) 25,200 241 3 Hemoglobin (human) 64,500 574 4 Hexokinase (yeast) 107,900 972 2 RNA polymerase (E. coli) 450,000 4,158 5 Glutamine synthetase (E. coli) 619,000 5,628 12 Titin (human) 2,993,000 26,926 1 Some proteins consist of a single polypeptide chain, but others, called multisubunit proteins, have two or more polypeptides associated noncovalently (Table 3-2). The individual polypeptide chains in a multisubunit protein may be identical or different. If at least two are identical the protein is said to be oligomeric, and the identical units (consisting of one or more polypeptide chains) are referred to as protomers. Hemoglobin, for example, has four polypeptide subunits: two identical α chains and two identical β chains, all four held together by noncovalent interactions. Each α subunit is paired in an identical way with a β subunit within the structure of this multisubunit protein, so that hemoglobin can be considered either a tetramer of four polypeptide subunits or a dimer of αβ protomers. A few proteins contain two or more polypeptide chains linked covalently. For example, the two polypeptide chains of insulin are linked by disulﬁde bonds. In such cases, the individual polypeptides are not considered subunits; instead they are commonly referred to simply as chains. The amino acid composition of proteins is also highly variable. The 20 common amino acids almost never occur in equal amounts in a protein. Some amino acids may occur only once or not at all in a given type of protein; others may occur in large numbers. Table 3-3 shows the amino acid composition of bovine cytochrome c and chymotrypsinogen, the inactive precursor of the digestive enzyme chymotrypsin. These two proteins, with very different functions, also differ signiﬁcantly in the relative numbers of each kind of amino acid residue. TABLE 3-3 Amino Acid Composition of Two Proteins Bovine cytochrome c Bovine chymotrypsinogen Amino acid Number of residues per molecule Percentage of total Number of residues per molecule Percentage of totala a Ala 6 6 22 9 Arg 2 2 4 1.6 Asn 5 5 14 5.7 Asp 3 3 9 3.7 Cys 2 2 10 4 Gln 3 3 10 4 Glu 9 9 5 2 Gly 14 13 23 9.4 His 3 3 2 0.8 Ile 6 6 10 4 Leu 6 6 19 7.8 Lys 18 17 14 5.7 Met 2 2 2 0.8 Phe 4 4 6 2.4 Pro 4 4 9 3.7 Ser 1 1 28 11.4 Thr 8 8 23 9.4 Trp 1 1 8 3.3 Tyr 4 4 4 1.6 Val 3 3 23 9.4 Total 104 102 245 99.7 Note: In some common analyses, such as acid hydrolysis, Asp and Asn are not readily distinguished from each other and are together designated Asx (or B). Similarly, when Glu and Gln cannot be distinguished, they are together designated Glx (or Z). In addition, Trp is destroyed by acid hydrolysis. Additional procedures must be employed to obtain an accurate assessment of complete amino acid content. Percentages do not total to 100%, due to rounding. We can estimate the number of amino acid residues in a simple protein containing no other chemical constituents by dividing its molecular weight by 110. Although the average molecular weight of the 20 common amino acids is about 138, the smaller amino acids predominate in most proteins. If we take into account the proportions in which the various amino acids occur in an average protein (Table 3-1; the averages are determined by surveying the amino acid compositions of more than 1,000 different proteins), the average molecular weight of protein amino acids is nearer to 128. Because a molecule of water (M r18) is removed to create each peptide bond, the average molecular weight of an amino acid residue in a protein is about 128 − 18 = 110. Some Proteins Contain Chemical Groups Other Than Amino Acids Many proteins — for example, the enzymes ribonuclease A and chymotrypsin — contain only amino acid residues and no other chemical constituents. However, some proteins contain permanently associated chemical components in addition to amino acids; these are called conjugated proteins. The non– a a a amino acid part of a conjugated protein is usually called its prosthetic group. Conjugated proteins are classiﬁed on the basis of the chemical nature of their prosthetic groups (Table 3-4); for example, lipoproteins contain lipids, glycoproteins contain sugar groups, and metalloproteins contain a speciﬁc metal. Some proteins contain more than one prosthetic group. Usually the prosthetic group plays an important role in the protein’s biological function. TABLE 3-4 Conjugated Proteins Class Prosthetic group Example Lipoproteins Lipids β1-Lipoprotein of blood (Fig. 17-2) Glycoproteins Carbohydrates Immunoglobulin G (Fig. 5-20) Phosphoproteins Phosphate groups Glycogen phosphorylase (Fig. 6-39) Hemoproteins Heme (iron porphyrin) Hemoglobin (Figs 5-8 to 5-11) Flavoproteins Flavin nucleotides Succinate dehydrogenase (Fig. 19-9) Metalloproteins Iron Zinc Calcium Molybdenum Copper Ferritin (Box 16-1) Alcohol dehydrogenase (Fig. 14-12) Calmodulin (Fig. 12-17) Dinitrogenase (Fig. 22-3) Complex IV (Fig. 19-12) SUMMARY 3.2 Peptides and Proteins Amino acids can be joined covalently through peptide bonds to form peptides and proteins. Cells generally contain thousands of different proteins, each with a different biological activity. The ionization behavior of peptides reﬂects their ionizable side chains as well as the terminal α -amino and α -carboxyl groups. Proteins can be very long polypeptide chains of 100 to several thousand amino acid residues. However, some naturally occurring peptides have only a few amino acid residues. Some proteins are composed of several noncovalently associated polypeptide chains, called subunits. Simple proteins yield only amino acids on hydrolysis; conjugated proteins contain in addition some other component, such as a metal or organic prosthetic group. 3.3 Working with Proteins Biochemists’ understanding of protein structure and function has been derived from the study of many individual proteins. To study a protein in detail, the researcher must be able to separate it from other proteins in pure form and must have the techniques to determine its properties. The necessary methods come from protein chemistry, a discipline as old as biochemistry itself and one that retains a central position in biochemical research. Proteins Can Be Separated and Purified A pure preparation is usually essential before a protein’s properties and activities can be determined. Given that cells contain thousands of different kinds of proteins, how can one protein be puriﬁed? Methods for separating proteins take advantage of properties that vary from one protein to the next, including size, charge, and binding properties. The advent of genetic engineering approaches has provided new and simpler paths for protein puriﬁcation. The latter methods, described in Chapter 9, oen artiﬁcially modify the protein being puriﬁed, adding a few or many amino acid residues to one or both ends. In many cases, the modiﬁcations alter protein function. Isolation of unaltered native proteins requires removal of the modiﬁcation or a reliance on methods described here. The source of a protein is generally tissue or microbial cells. The ﬁrst step in any protein puriﬁcation procedure is to break open these cells, releasing their proteins into a solution called a crude extract. If necessary, differential centrifugation can be used to prepare subcellular fractions or to isolate speciﬁc organelles (see Fig. 1-7). Once the extract or organelle preparation is ready, various methods are available for purifying one or more of the proteins it contains. Commonly, the extract is subjected to treatments that separate the proteins into different fractions based on a property such as size or charge; the process is referred to as fractionation. Early fractionation steps in a puriﬁcation utilize differences in protein solubility, which is a complex function of pH, temperature, salt concentration, and other factors. The solubility of proteins is lowered in the presence of some salts, an effect called “salting out.” Ammonium sulfate ((NH4)2SO 4) is particularly effective for selectively precipitating some proteins while leaving others in solution. Low-speed centrifugation is then used to remove the precipitated proteins from those remaining in solution. A solution containing the protein of interest usually must be further altered before subsequent puriﬁcation steps are possible. For example, dialysis is a procedure that separates proteins from small solutes by taking advantage of the proteins’ larger size. The partially puriﬁed extract is placed in a bag or tube made of a semipermeable membrane, which is suspended in a much larger volume of buffered solution of appropriate ionic strength. The membrane allows the exchange of salt and buffer but not proteins. Thus dialysis retains large proteins within the membranous bag or tube while allowing the concentration of other solutes in the protein preparation to change until they come into equilibrium with the solution outside the membrane. Dialysis might be used, for example, to remove ammonium sulfate from the protein preparation. The most efﬁcient methods for fractionating proteins make use of column chromatography, which takes advantage of differences in protein charge, size, binding afﬁnity, and other properties (Fig. 3-16). A porous solid material with appropriate chemical properties (the stationary phase) is held in a column, and a buffered solution (the mobile phase) migrates through it. The protein, dissolved in the same buffered solution that was used to establish the mobile phase, is layered on the top of the column. The protein then percolates through the solid matrix as an ever- expanding band within the larger mobile phase. Individual proteins migrate faster or more slowly through the column, depending on their properties. FIGURE 3-16 Column chromatography. The standard elements of a chromatographic column include a solid, porous material (matrix) supported inside a column, generally made of plastic or glass. A solution, the mobile phase, flows through the matrix, the stationary phase. The solution that passes out of the column at the bottom (the eﬀluent) is constantly replaced by solution supplied from a reservoir at the top. The protein solution to be separated is layered on top of the column and allowed to percolate into the solid matrix. Additional solution is added on top. The protein solution forms a band within the mobile phase that is initially the depth of the protein solution applied to the column. As proteins migrate through the column (shown here at five diﬀerent times), they are retarded to diﬀerent degrees by their diﬀerent interactions with the matrix material. The overall protein band thus widens as it moves through the column. Individual types of proteins (such as A, B, and C, shown in blue, red, and green) gradually separate from each other, forming bands within the broader protein band. Separation improves (i.e., resolution increases) as the length of the column increases. However, each individual protein band also broadens with time due to diﬀusional spreading, a process that decreases resolution. In this example, protein A is well separated from B and C, but diﬀusional spreading prevents complete separation of B and C under these conditions. Protein C is being detected and its presence recorded as it is eluted from the column. Ion-exchange chromatography exploits differences in the sign and magnitude of the net electric charge of proteins at a given pH (Fig. 3-17a). The column matrix is a synthetic polymer (resin) containing bound charged groups; those with bound anionic groups are called cation exchangers, and those with bound cationic groups are called anion exchangers. The afﬁnity of each protein for the charged groups on the column is affected by the pH (which determines the ionization state of the molecule) and the concentration of competing free salt ions in the surrounding solution. Separation can be optimized by gradually changing the pH and/or salt concentration of the mobile phase in order to create a pH or salt gradient. In cation-exchange chromatography, proteins with a net positive charge migrate through the matrix more slowly than those with a net negative charge, because the migration of the former is retarded more by interaction with the stationary phase. FIGURE 3-17 Three chromatographic methods used in protein purification. (a) Ion- exchange chromatography exploits diﬀerences in the sign and magnitude of the net electric charges of proteins at a given pH. (b) Size-exclusion chromatography, also called gel filtration, separates proteins according to size. (c) Aﬀinity chromatography separates proteins by their binding specificities. Further details of these methods are given in the text. As the protein-containing solution exits a column, successive portions (fractions) of this efﬂuent are collected in test tubes. Each fraction can be tested for the presence of the protein of interest as well as other properties, such as ionic strength or total protein concentration. All fractions positive for the protein of interest can be combined as the product of this chromatographic step of the protein puriﬁcation. WORKED EXAMPLE 3-1 Ion Exchange of Peptides A biochemist wants to separate two peptides by ion-exchange chromatography. At the pH of the mobile phase to be used on the column, one peptide (A) has a pI of 5.1, due to the presence of more Glu and Asp residues than Arg, Lys, and His residues, and has a net negative charge at neutral pH. Peptide B has a pI of 7.8, reﬂecting a plurality of positively charged amino acid residues at neutral pH. At neutral pH, which peptide would elute ﬁrst from a cation-exchange resin? Which would elute ﬁrst from an anion- exchange resin? SOLUTION: A cation-exchange resin has negative charges and binds positively charged molecules, retarding their progress through the column. Peptide B, with its higher pI and net positive charge, will interact more strongly than peptide A with the cation-exchange resin. Thus, peptide A will elute ﬁrst. On the anion-exchange resin, peptide B will elute ﬁrst. Peptide A, having a relatively low pI and a net negative charge, will be retarded by its interaction with the positively charged resin. Figure 3-17 shows two variations of column chromatography in addition to ion exchange. Size-exclusion chromatography, also called gel ﬁltration (Fig. 3-17b), separates proteins according to size. In this method, large proteins emerge from the column sooner than small ones — a somewhat counterintuitive result. The solid phase consists of cross-linked polymer beads with engineered pores or cavities of a particular size. Large proteins cannot enter the cavities and so take a shorter (and more rapid) path through the column, around the beads. Small proteins enter the cavities and are slowed by their more labyrinthine path through the column. Size-exclusion chromatography can also be used to approximate the size of a protein being puriﬁed, using methods similar to those described in Figure 3-19. Afﬁnity chromatography is based on binding afﬁnity (Fig. 3-17c). The beads in the column have a covalently attached chemical group called a ligand — a group or molecule that binds to a macromolecule such as a protein. When a protein mixture is added to the column, any protein with afﬁnity for this ligand binds to the beads, and its migration through the matrix is retarded. For example, if the biological function of a protein involves binding to ATP, then attaching a molecule that resembles ATP to the beads in the column creates an afﬁnity matrix that can help purify the protein. Proteins that do not bind to ATP ﬂow more rapidly through the column. Bound proteins are then eluted by a solution containing either a high concentration of salt or a free ligand — in this case, ATP or an analog of ATP. Salt weakens the binding of the protein to the immobilized ligand, interfering with ionic interactions. Free ligand competes with the ligand attached to the beads, releasing the protein from the matrix; the protein product that elutes from the column is oen bound to the ligand used to elute it. Protein puriﬁcation protocols oen use genetic engineering to fuse additional amino acids or peptides (tags) to the target protein. Afﬁnity chromatography can be used to bind this tag, achieving a large increase in purity in a single step (see Fig. 9-11). In many cases, the tag can be subsequently removed, fully restoring the function of the native protein. Chromatographic methods are typically enhanced by the use of HPLC, or high-performance liquid chromatography. HPLC makes use of high-pressure pumps that speed the movement of the protein molecules down the column; it also uses higher- quality chromatographic materials that can withstand the crushing force of the pressurized ﬂow. By reducing the transit time on the column, HPLC can limit diffusional spreading of protein bands and thus can greatly improve resolution. Choosing the approach to puriﬁcation of a protein that has not previously been isolated is guided both by established precedents and by common sense. In most cases, several different methods must be used sequentially to purify a protein completely, each separating proteins on the basis of different properties. The choice of methods is somewhat empirical, and many strategies may be tried before the most effective one is found. Researchers can oen minimize trial and error by basing the new procedure on puriﬁcation techniques developed for similar proteins. Common sense dictates that inexpensive procedures such as salting out be used ﬁrst, when the total volume and the number of contaminants are greatest. As each puriﬁcation step is completed, the sample size generally becomes smaller (Table 3-5), making it feasible to use more sophisticated (and expensive) chromatographic procedures at later stages. A puriﬁcation table documents the success of each step in a puriﬁcation protocol. In the hypothetical puriﬁcation shown in Table 3-5, the ratio of the ﬁnal speciﬁc activity (15,000 units/mg) to the starting speciﬁc activity (10 units/mg) gives the puriﬁcation factor (1,500). The percentage of the total activity at the last step (45,000 units) relative to the total activity in the starting material (100,000 units) gives the yield from the puriﬁcation procedure (45%). TABLE 3-5 A Hypothetical Purification Table for an Enzyme Procedure or step Fraction volume (mL) Total protein (mg) Activity (units) Specific activity (units/mg) 1. Crude cellular extract 1,400 10,000 100,000 10 2. Precipitation with ammonium sulfate 280 3,000 96,000 32 3. Ion-exchange chromatography 90 400 80,000 200 4. Size-exclusion chromatography 80 100 60,000 600 5. Aﬀinity chromatography 6 3 45,000 15,000 Note: All data represent the status of the sample aer the designated procedure has been carried out. “Activity” and “specific activity” are defined on page 90. Proteins Can Be Separated and Characterized by Electrophoresis Protein puriﬁcation is usually complemented by electrophoresis, an analytical process that allows researchers to visualize and characterize proteins as they are puriﬁed. This method does not itself contribute to puriﬁcation, as electrophoresis oen adversely affects the structure and thus the function of proteins. However, it allows a biochemist to rapidly estimate the number of different proteins in a mixture and the degree of purity of a particular protein preparation. Also, electrophoresis can be used to determine such crucial properties of a protein as its isoelectric point and approximate molecular weight. Electrophoresis of proteins is generally carried out in gels made up of the cross-linked polymer polyacrylamide (Fig. 3-18). The polyacrylamide gel acts as a molecular sieve, slowing the migration of proteins approximately in proportion to their charge-to-mass ratio. Migration may also be affected by protein shape. In electrophoresis, the force moving the macromolecule is the electrical potential, E. The electrophoretic mobility, μ , of a molecule is the ratio of its velocity, V, to the electrical potential. Electrophoretic mobility is also equal to the net charge, Z, of the molecule divided by the frictional coefﬁcient, f, which reﬂects in part a protein’s shape. Thus, μ= = The migration of a protein in a gel during electrophoresis is therefore a function of its size and its shape. V E Z f FIGURE 3-18 Electrophoresis. (a) Diﬀerent samples are loaded in wells or depressions at the top of the SDS polyacrylamide gel. The proteins move into the gel when an electric field is applied. The gel minimizes convection currents caused by small temperature gradients, as well as protein movements other than those induced by the electric field. (b) Proteins can be visualized a er electrophoresis by treating the gel with a stain such as Coomassie blue, which binds to the proteins but not to the gel itself. Each band on the gel represents a diﬀerent protein (or protein subunit); smaller proteins move through the gel more rapidly than larger proteins and therefore are found nearer the bottom of the gel. This gel illustrates purification of the RecA protein of Escherichia coli. The gene for the RecA protein was cloned so that its expression (synthesis of the protein) could be controlled. The first lane shows a set of standard proteins (of known Mr), serving as molecular weight markers. The second and third lanes show, respectively, proteins from E. coli cells before and a er synthesis of RecA protein was induced. The fourth lane shows the proteins in a crude cellular extract. Subsequent lanes (le to right) show the proteins that are present a er successive purification steps. Although the protein looks pure in lane 6, two more steps are needed to remove minor contaminants not evident on the gel. The purified protein is a single polypeptide chain (Mr~38,000), as seen in the rightmost lane. The electrophoretic method commonly employed for estimation of purity and molecular weight makes use of the detergent sodium dodecyl sulfate (SDS) (“dodecyl” denoting a 12-carbon chain). A protein will bind about 1.4 times its weight of SDS, nearly one molecule of SDS for each amino acid residue. The sulfate moieties of the bound SDS contribute a large net negative charge, rendering the intrinsic charge of the protein insigniﬁcant and conferring on each protein a similar charge-to-mass ratio. In addition, SDS binding partially unfolds proteins, such that most SDS-bound proteins assume a similar rodlike shape. Electrophoresis in the presence of SDS therefore separates proteins almost exclusively on the basis of mass (molecular weight), with smaller polypeptides migrating more rapidly. Aer electrophoresis, the proteins are visualized by adding a dye such as Coomassie blue, which binds to proteins but not to the gel itself (Fig. 3-18b). Thus, a researcher can monitor the progress of a protein puriﬁcation procedure as the number of protein bands visible on the gel decreases aer each new fractionation step. When compared with the positions to which proteins of known molecular weight migrate in the gel, the position of an unidentiﬁed protein can provide a good approximation of its molecular weight (Fig. 3-19). If the protein has two or more different subunits, generally the subunits are separated by the SDS treatment, and a separate band appears for each. FIGURE 3-19 Estimating the molecular weight of a protein. The electrophoretic mobility of a protein on an SDS polyacrylamide gel is related to its molecular weight, Mr. (a) Standard proteins of known molecular weight are subjected to electrophoresis (lane 1). These marker proteins can be used to estimate the molecular weight of an unknown protein (lane 2). (b) A plot of log Mr of the marker proteins versus relative migration during electrophoresis is linear, which allows the molecular weight of the unknown protein to be read from the graph. (In similar fashion, a set of standard proteins with reproducible retention times on a size-exclusion column can be used to create a standard curve of retention time versus log Mr. The retention time of an unknown substance on the column can be compared with this standard curve to obtain an approximate Mr.) Isoelectric focusing is a procedure used to determine the isoelectric point (pI) of a protein (Fig. 3-20). A pH gradient is established by allowing a mixture of low molecular weight organic acids and bases (ampholytes; p. 77) to distribute themselves in an electric ﬁeld generated across the gel. When a protein mixture is applied, each protein migrates until it reaches the pH that matches its pI. Proteins with different isoelectric points are thus distributed differently throughout the gel. FIGURE 3-20 Isoelectric focusing. This technique separates proteins according to their isoelectric points. A protein mixture is placed on a gel strip containing an immobilized pH gradient. With an applied electric field, proteins enter the gel and migrate until each reaches a pH that is equivalent to its pI. Remember that when pH = pI, the net charge of a protein is zero. Combining isoelectric focusing and SDS electrophoresis sequentially in a process called two-dimensional electrophoresis permits the resolution of complex mixtures of proteins (Fig. 3- 21). This is a more sensitive analytical method than either electrophoretic method alone. Two-dimensional electrophoresis separates proteins of identical molecular weight that differ in pI, or proteins with similar pI values but different molecular weights.

FIGURE 3-21 Two-dimensional electrophoresis. Proteins are first separated by isoelectric focusing in a thin strip gel. The gel is then laid horizontally on a second, slab-shaped gel, and the proteins are separated by SDS polyacrylamide gel electrophoresis. Horizontal separation reflects diﬀerences in pI; vertical separation reflects diﬀerences in molecular weight. The original protein complement is thus spread in two dimensions. Thousands of cellular proteins can be resolved using this technique. Individual protein spots can be cut out of the gel and identified by mass spectrometry (see Figs 3-28 and 3-29). Unseparated Proteins Are Detected and Quantified Based on Their Functions To purify a protein, it is essential to have a way of detecting and quantifying that protein in the presence of many other proteins at each stage of the procedure. A common target of puriﬁcation is one or another of the class of proteins called enzymes (Chapter 6). Each enzyme catalyzes a particular reaction that converts one biomolecule (the substrate) to another (the product). The amount of the protein in a given solution or tissue extract can be measured, or assayed, in terms of the catalytic effect the enzyme produces — that is, the increase in the rate at which its substrate is converted to reaction products when the enzyme is present. For this purpose the researcher must know (1) the overall equation of the reaction catalyzed, (2) an analytical procedure for determining the disappearance of the substrate or the appearance of a reaction product, (3) whether the enzyme requires cofactors such as metal ions or coenzymes, (4) the dependence of the enzyme activity on substrate concentration, (5) the optimum pH, and (6) a temperature zone in which the enzyme is stable and has high activity. Enzymes are usually assayed at their optimum pH and at some convenient temperature within the range 25 to 38 ∘C. Also, very high substrate concentrations are generally used so that the initial reaction rate, measured experimentally, is proportional to enzyme concentration (Chapter 6). By international agreement, 1.0 unit of enzyme activity for most enzymes is deﬁned as the amount of enzyme causing transformation of 1.0 μ mol of substrate to product per minute at 25°C under optimal conditions of measurement (for many enzymes, this deﬁnition is inconvenient, and a unit may be deﬁned differently). The term activity refers to the total units of enzyme in a solution. The speciﬁc activity is the number of enzyme units per milligram of total protein (Fig. 3-22). The speciﬁc activity is a measure of enzyme purity: it increases during puriﬁcation of an enzyme and becomes maximal and constant when the enzyme is pure (Table 3-5). FIGURE 3-22 Activity versus specific activity. The diﬀerence between these terms can be illustrated by considering two flasks containing marbles. The flasks contain the same number of red marbles, but diﬀerent numbers of marbles of other colors. If the marbles represent proteins, both flasks contain the same activity of the protein, represented by the red marbles. The second flask, however, has the higher specific activity, because red marbles represent a higher fraction of the total. Aer each puriﬁcation step, the activity of the preparation (in units of enzyme activity) is assayed, the total amount of protein is determined independently, and the ratio of the two gives the speciﬁc activity. Activity and total protein generally decrease with each step. Activity decreases because there is always some loss due to inactivation or nonideal interactions with chromatographic materials or other molecules in the solution. Total protein decreases because the objective is to remove as much unwanted or nonspeciﬁc protein as possible. In a successful step, the loss of nonspeciﬁc protein is much greater than the loss of activity; therefore, speciﬁc activity increases even as total activity falls. The data are assembled in a puriﬁcation table similar to Table 3-5. A protein is generally considered pure when further puriﬁcation steps fail to increase speciﬁc activity and when only a single protein species can be detected (for example, by electrophoresis in the presence of SDS). For proteins that are not enzymes, other quantiﬁcation methods are required. Transport proteins can be assayed by their binding to the molecule they transport, and hormones and toxins by the biological effect they produce; for example, growth hormones will stimulate the growth of certain cultured cells. Some structural proteins represent such a large fraction of a tissue mass that they can be readily extracted and puriﬁed without a functional assay. The approaches are as varied as the proteins themselves. SUMMARY 3.3 Working with Proteins Proteins are separated and puriﬁed on the basis of differences in their properties. Proteins can be selectively precipitated by changes in pH or temperature, and particularly by the addition of certain salts. A wide range of chromatographic procedures makes use of differences in size, binding afﬁnities, charge, and other properties. These include ion-exchange, size-exclusion, afﬁnity, and high-performance liquid chromatography. Electrophoresis separates proteins on the basis of mass or charge for analytical purposes. SDS gel electrophoresis and isoelectric focusing can be used separately or in combination for higher resolution. All puriﬁcation procedures require a method for quantifying or assaying the protein of interest in the presence of other proteins. Puriﬁcation can be monitored by assaying speciﬁc activity. 3.4 The Structure of Proteins: Primary Structure Puriﬁcation of a protein is usually only a prelude to a detailed biochemical dissection of its structure and function. What is it that makes one protein an enzyme, another a hormone, another a structural protein, and still another an antibody? How do they differ chemically? The most obvious distinctions are structural, and to protein structure we now turn. We can describe the structure of large molecules such as proteins at several levels of complexity, arranged in a kind of conceptual hierarchy. Four levels of protein structure are commonly deﬁned (Fig. 3-23). A description of all covalent bonds (mainly peptide bonds and disulﬁde bonds) linking amino acid residues in a polypeptide chain is its primary structure. The most important element of primary structure is the sequence of amino acid residues. Secondary structure refers to particularly stable arrangements of amino acid residues giving rise to recurring structural patterns. Tertiary structure describes all aspects of the three-dimensional folding of a polypeptide. When a protein has two or more polypeptide subunits, their arrangement in space is referred to as quaternary structure. Our exploration of proteins will eventually include complex protein machines consisting of dozens to thousands of subunits. Primary structure is the focus of the remainder of this chapter; we discuss the higher levels of structure in Chapter 4. FIGURE 3-23 Levels of structure in proteins. The primary structure consists of a sequence of amino acids linked together by peptide bonds, and it includes any disulfide bonds. The resulting polypeptide can be arranged into units of secondary structure, such as an α helix. The helix is a part of the tertiary structure of the folded polypeptide, which is itself one of the subunits that make up the quaternary structure of the multisubunit protein, in this case hemoglobin. [Data from PDB ID 1HGA, R. Liddington et al., J. Mol. Biol. 228:551, 1992.] Primary structure now becomes our focus. We ﬁrst consider empirical clues that amino acid sequence and protein function are closely linked, then describe how amino acid sequence is determined; ﬁnally, we outline the many uses to which this information can be put. The Function of a Protein Depends on Its Amino Acid Sequence The bacterium Escherichia coli produces more than 3,000 different proteins; a human has ~20,000 genes that may produce over a million different proteins (through genetic processes discussed in Part III of this text). In both species, each type of protein has a unique amino acid sequence that confers a particular three- dimensional structure. This structure in turn confers a unique function. Amino acid sequences are important elements of the broader realm of biological information. They are a major functional expression of information stored in DNA in the form of genes. The sequences are not at all random. Each protein has a distinctive number and sequence of amino acid residues. As we shall see in Chapter 4, the primary structure of a protein determines how it folds up into its unique three-dimensional structure, and this in turn determines the function of the protein. Some simple observations illustrate the functional importance of primary structure, or the amino acid sequence of a protein. First, as we have already noted, proteins with different functions always have different amino acid sequences. Second, thousands of human genetic diseases have been traced to the production of proteins with less activity or altered activity. The alteration can range from a single change in the amino acid sequence (as in sickle cell disease, described in Chapter 5) to deletion of a larger portion of the polypeptide chain (as in most cases of Duchenne muscular dystrophy: a large deletion in the gene encoding the protein dystrophin leads to production of a shortened, inactive protein). Finally, on comparing functionally similar proteins from different species, we ﬁnd that these proteins oen have similar amino acid sequences. Thus, a close link between protein primary structure and function is evident. The amino acid sequence for a particular protein is not absolutely ﬁxed, or invariant. Virtually all of the proteins in humans are polymorphic, having amino acid sequence variants in the human population. Many human proteins are polymorphic even within an individual, with amino acid variations occurring due to processes that will be described in Part III of this text. Some of these variations have little or no effect on the function of the protein; others may affect function dramatically. Furthermore, proteins that carry out a broadly similar function in distantly related species can differ greatly in overall size and amino acid sequence. Although the amino acid sequence in some regions of the primary structure might vary considerably without affecting biological function, most proteins contain crucial regions that are essential to their function and thus have sequences that are conserved. The fraction of the overall sequence that is critical varies from protein to protein, complicating the task of relating sequence to three-dimensional structure, and structure to function. Before we can consider this problem further, however, we must examine how sequence information is obtained. In 1953, Frederick Sanger worked out the sequence of amino acid residues in the polypeptide chains of the hormone insulin (Fig. 3- 24), surprising many researchers who had long thought that determining the amino acid sequence of a polypeptide would be a hopelessly difﬁcult task. The elucidation of DNA structure in that same year by Watson and Crick telegraphed a likely relationship between DNA and protein sequences. Barely a decade aer these discoveries, the genetic code relating the nucleotide sequence of DNA to the amino acid sequence of protein molecules was elucidated (Chapter 27). FIGURE 3-24 Amino acid sequence of bovine insulin. The two polypeptide chains are joined by disulfide cross-linkages (yellow). The A chain of insulin is identical in human, pig, dog, rabbit, and sperm whale insulins. The B chains of the cow, pig, dog, goat, and horse are identical. The amino acid sequences of proteins are now most oen derived indirectly from the DNA sequences in genome databases. However, an array of techniques derived from traditional methods of polypeptide sequencing made important contributions to the broader ﬁeld of protein chemistry. The method used by Sanger to sequence insulin is based on the classical method for direct chemical sequencing of proteins from the amino terminus, the two-step Edman degradation developed by Pehr Edman. Protein Structure Is Studied Using Methods That Exploit Protein Chemistry The sequence of a protein can be predicted from the sequence of the gene encoding it, which is usually available in genomic databases. Direct sequencing can also be provided by mass spectrometry. Many methods used in traditional protein sequencing protocols remain valuable for labeling proteins or breaking them into parts for functional and structural analysis. For example, the amino-terminal α -amino group of a protein can be labeled with 1-ﬂuoro-2,4-dinitrobenzene (FDNB), dansyl chloride, or dabsyl chloride (Fig. 3-25). These reagents also label the ε -amino group of lysine residues. Disulﬁde bonds within a polypeptide or between polypeptide subunits can be broken irreversibly (Fig. 3-26). FIGURE 3-25 Modification of the α -amino group at the amino terminus. The reaction is a nucleophilic displacement of the halide ion as shown for (a) FDNB and (b) dansyl chloride. The ε-amino group of lysine will also be labeled. Dansyl chloride and (c) dabsyl chloride, another labeling reagent, have useful absorbance and/or fluorescent properties at visible wavelengths. FIGURE 3-26 Breaking disulfide bonds in proteins. Two common methods are illustrated. Oxidation of a cystine residue with performic acid produces two cysteic acid residues. Reduction by dithiothreitol (or β -mercaptoethanol) to form Cys residues must be followed by further modification of the reactive —SH groups to prevent re-formation of the disulfide bond. Carboxymethylation by iodoacetate serves this purpose. Frederick Sanger, 1918–2013 Enzymes called proteases catalyze the hydrolytic cleavage of peptide bonds and provide the most common method to break a protein into parts. Some proteases cleave only the peptide bond adjacent to particular amino acid residues (Table 3-6) and thus fragment a polypeptide chain in a predictable and reproducible way. A few chemical reagents also cleave the peptide bond adjacent to speciﬁc residues. Among proteases, the digestive enzyme trypsin catalyzes the hydrolysis of only those peptide bonds in which the carbonyl group is contributed by either a Lys or an Arg residue, regardless of the length or amino acid sequence of the chain. A polypeptide with three Lys and/or Arg residues will usually yield four smaller peptides upon cleavage with trypsin. Moreover, all except one of these will have a carboxyl-terminal Lys or Arg. TABLE 3-6 The Specificity of Some Common Methods for Fragmenting Polypeptide Chains Reagent (biological source) Cleavage points Trypsin (bovine pancreas) Lys, Arg (C) Chymotrypsin (bovine pancreas) Phe, Trp, Tyr (C) Staphylococcus aureus V8 protease (bacterium S. aureus) Asp, Glu (C) Asp-N-protease (bacterium Pseudomonas fragi) Asp, Glu (N) Pepsin (porcine stomach) Leu, Phe, Trp, Tyr (N) Endoproteinase Lys C (bacterium Lysobacter enzymogenes) Lys (C) Cyanogen bromide Met (C) All reagents except cyanogen bromide are proteases. Residues furnishing the primary recognition point for the protease or reagent; peptide bond cleavage occurs on either the carbonyl (C) side or the amino (N) side of the indicated amino acid residues. The capacity to modify proteins in speciﬁc ways has many applications in the lab. The methods used to break disulﬁde bonds can also be used to denature proteins when that is required. The development of reagents to label the amino- terminal amino acid residue led eventually to the development of an array of reagents that could react with speciﬁc groups at many locations on a protein. For example, the sulfhydryl group on Cys residues can be modiﬁed with iodoacetamides, maleimides, benzyl halides, and bromomethyl ketones (Fig. 3-27). Other amino acid residues can be modiﬁed by reagents linked to a dye or other molecule to aid in protein detection or functional a b a b studies. The cleavage of proteins into smaller parts with proteases has numerous applications that will be explored in subsequent chapters of this book. FIGURE 3-27 Reagents used to modify the sulfhydryl groups of Cys residues. (See also Fig. 3-26.) Mass Spectrometry Provides Information on Molecular Mass, Amino Acid Sequence, and Entire Proteomes Mass spectrometry can provide a highly accurate measure of the molecular mass of a protein, readily distinguishing between single proton differences. However, this technology can do much more. The sequences of multiple short polypeptide segments (20 to 30 amino acid residues each) in a protein sample can be obtained within seconds. Unknown puriﬁed proteins can be identiﬁed, and their mass can be accurately determined. When coupled to powerful peptide separation protocols, mass spectrometry can document a complete cellular proteome — deﬁned as the entire complement of proteins in a cell, including estimates of their relative abundance — in just an hour. The mass spectrometer has been an indispensable tool in chemistry for more than a century. Molecules to be analyzed, referred to as analytes, are ﬁrst ionized in a vacuum. When the newly charged molecules are introduced into an electric and/or magnetic ﬁeld, their paths through the ﬁeld are a function of their mass-to-charge ratio, m/z. This measured property of the ionized species can be used to deduce the mass (m) of the analyte with very high precision. As the m/z measurements are made in the gas phase, the technique was long limited to relatively small molecules. In 1988, two different techniques were introduced to permit transfer of macromolecules to the gas phase while limiting decomposition; these new capabilities revolutionized protein sequencing. In one technique, proteins are placed in a light-absorbing matrix. With a short pulse of laser light, the proteins are ionized and then desorbed from the matrix into the vacuum system. This process, known as matrix-assisted laser desorption/ionization mass spectrometry, or MALDI MS, is used to measure the mass of macromolecules. In a second method, macromolecules in solution are forced directly from the liquid phase to the gas phase. A solution of analytes is passed through a charged needle that is kept at a high electrical potential, dispersing the solution into a ﬁne mist of charged microdroplets. The solvent surrounding the macromolecules rapidly evaporates, leaving multiply charged macromolecular ions in the gas phase. This technique is called electrospray ionization mass spectrometry, or ESI MS. Protons added during passage through the needle give additional charge to the macromolecule. The m/z of the molecule can then be analyzed in the vacuum chamber. One method for analyzing m/z is called time of ﬂight or TOF, in which ion acceleration in an electric ﬁeld depends on m/z. A newer, more- efﬁcient method is the Orbitrap, in which ions are trapped in orbit between an outer barrel-shaped electrode and an inner spindle electrode. The trajectory of the electrons, related to their mass and charge, is detected and converted to m/z by a Fourier transform. The process for determining the molecular mass of a protein with ESI MS is illustrated in Figure 3-28. As a protein is injected into the gas phase, it acquires a variable number of protons, and thus positive charges, from the solvent. The variable addition of these charges creates a spectrum of species with different mass-to- charge ratios. Each successive peak corresponds to a species that differs from that of its neighboring peak by a charge of 1 and a mass of 1 (one proton). The mass of the protein can be determined from any two neighboring peaks. FIGURE 3-28 Electrospray ionization mass spectrometry of a protein. (a) A protein solution is dispersed into highly charged droplets by passage through a needle under the influence of a high-voltage electric field. The droplets evaporate, and the ions (with added protons in this case) enter the mass spectrometer for m/z measurement. (b) The spectrum generated is a family of peaks, with each successive peak (from right to le ) corresponding to a charged species with both mass and charge increased by 1. The inset shows a computer-generated transformation of this spectrum. [Information from M. Mann and M. Wilm, Trends Biochem. Sci. 20:219, 1995.] Amino acid sequence information is extracted using a technique called tandem MS, or MS/MS. A solution containing the protein or multiple proteins under investigation is ﬁrst treated with a protease (oen trypsin, due to its high speciﬁcity) to hydrolyze it to a mixture of shorter peptides. The mixture is then injected into a mass spectrometer that has two mass ﬁlters in tandem (Fig. 3- 29a, top). In the ﬁrst, the peptide mixture is sorted so that only one of the several types of peptides produced by cleavage emerges at the other end. The sample of the selected peptide, each molecule of which has a charge somewhere along its length, then travels through a vacuum chamber between the two mass spectrometers. In this collision cell, the peptide is further fragmented by high-energy impact with a “collision gas” such as helium or argon that is bled into the vacuum chamber. Each individual peptide is broken in only one place, on average. Although the breaks are not hydrolytic, most occur at the peptide bonds. FIGURE 3-29 Obtaining protein sequence information with tandem MS. (a) A er proteolytic hydrolysis, a protein solution is injected into a mass spectrometer (MS-1). The diﬀerent peptides are sorted so that only one type is selected for further analysis. The selected peptide is further fragmented in a chamber between the two mass spectrometers, and m/z for each fragment is measured in the second mass spectrometer (MS-2). Many of the ions generated during this second fragmentation result from breakage of the peptide bond, as shown. These are called b-type or y-type ions, depending on whether the charge is retained on the amino- or carboxyl- terminal side, respectively. (b) A typical spectrum with peaks representing the peptide fragments generated from a sample of one small peptide (21 residues). The labeled peaks are y-type ions derived from amino acid residues. The number in parentheses over each peak is the molecular weight of the amino acid ion. The successive peaks diﬀer by the mass of a particular amino acid in the original peptide. The deduced sequence is shown at the top. [Information from T. Keough et al., Proc. Natl. Acad. Sci. USA 96:7131, 1999, Fig. 3.] The second mass ﬁlter then measures the m/z ratios of all the charged fragments. This process generates one or more sets of peaks. A given set of peaks (Fig. 3-29b) consists of all the charged fragments that were generated by breaking the same type of bond (but at different points in the peptide). One set of peaks includes only the fragments in which the charge was retained on the amino-terminal side of the broken bonds; another includes only the fragments in which the charge was retained on the carboxyl- terminal side of the broken bonds. Each successive peak in a given set has one less amino acid than the peak before. The difference in mass from peak to peak identiﬁes the amino acid that was lost in each case, thus revealing the sequence of the peptide. The only ambiguities involve leucine and isoleucine, which have the same mass. Although multiple sets of peaks usually are generated, the two most prominent sets generally consist of charged fragments derived from breakage of the peptide bonds. The amino acid sequence derived from one set can be conﬁrmed by the other, improving the conﬁdence in the sequence information obtained. The analysis of complex mixtures of proteins — even entire cellular proteomes — is facilitated by liquid chromatography (LC) that is integrated into the instrument (LC-MS/MS). The organism of interest is generally one in which the genomic sequence is known. Cellular proteins are ﬁrst isolated in an extract, then digested into relatively short peptides by a protease such as trypsin. The very complex mixture of peptides is subjected to chromatography, so that resolved peptides are introduced to the mass spectrometer successively. Transfer from the liquid phase to the gas phase is facilitated by MALDI or ESI. Each peptide is analyzed for amino acid sequence, and that sequence is compared to the known genomic sequence available in databases to identify the protein it came from. Because more peptides are generated from the more common proteins in the mixture, the exercise also provides a measure of protein abundance. MS/MS scans of dozens of different peptides can be generated in less than a second. The entire proteome of a yeast cell can be analyzed in less than an hour. Mass spectrometry provides a wealth of information for proteomics research, enzymology, and protein chemistry in general. The accurately measured molecular mass of a protein is critical to its identiﬁcation. Changes in the cellular proteome can be monitored as a function of metabolic state or environmental conditions. Mass changes in the peptides scanned during a proteome analysis can reveal protein modiﬁcations of all kinds. Amino acid sequencing can reveal changes in protein sequence that result from the editing of messenger RNA in eukaryotes (Chapter 26). These methods, along with modern DNA sequencing processes (Chapter 8), are all part of a robust toolbox used to probe biological information at many levels. Small Peptides and Proteins Can Be Chemically Synthesized Many peptides are potentially useful as pharmacologic agents, and their production is of considerable commercial importance. In addition to its commercial applications, the synthesis of speciﬁc peptide portions of larger proteins is an increasingly important tool for the study of protein structure and function. There are three ways to obtain a peptide: (1) puriﬁcation from tissue, a task oen made difﬁcult by the vanishingly low concentrations of some peptides; (2) genetic engineering (Chapter 9); and (3) direct chemical synthesis. Powerful techniques now make direct chemical synthesis an attractive option in many cases. The complexity of proteins makes the traditional synthetic approaches of organic chemistry impractical for peptides with more than four or ﬁve amino acid residues. One problem is the difﬁculty of purifying the product aer each step. The major breakthrough in this technology was provided by R. Bruce Merriﬁeld in 1962. His innovation was to synthesize a peptide while keeping one end attached to a solid support. The support is an insoluble polymer (resin) contained within a column, similar to that used for chromatographic procedures. The peptide is built up on this support one amino acid at a time, through a standard set of reactions in a repeating cycle (Fig. 3- 30). At each successive step in the cycle, protective chemical groups block unwanted reactions. FIGURE 3-30 Chemical synthesis of a peptide on an insoluble polymer support. Reactions through are necessary for the formation of each peptide bond. The 9- fluorenylmethoxycarbonyl (Fmoc) group (shaded blue) prevents unwanted reactions at the α -amino group of the residue (shaded light red). Chemical synthesis proceeds from the carboxyl terminus to the amino terminus, the reverse of the direction of protein synthesis in vivo. The technology for chemical peptide synthesis has been automated. An important limitation of the process is the efﬁciency of each chemical cycle. Incomplete reaction at one stage can lead to formation of an impurity (in the form of a shorter peptide) in the next. The chemistry has been optimized to permit the synthesis of proteins of 100 amino acid residues in a few days in reasonable yield. A very similar approach is used to synthesize nucleic acids (see Fig. 8-33). It is worth noting that this technology, impressive as it is, still pales when compared with biological processes. The same 100-residue protein would be synthesized with exquisite ﬁdelity in about 5 seconds in a bacterial cell. Methods for the efﬁcient ligation (joining together) of peptides allow the assembly of synthetic peptides into larger polypeptides and proteins. Novel forms of proteins can be created with precisely positioned chemical groups, including those that might not normally be found in a cellular protein. This provides one approach to test theories of enzyme catalysis, to create proteins with altered chemical properties, and to design protein sequences that will fold into particular structures. This last application provides the ultimate test of our ability to relate the primary structure of a peptide to the three-dimensional structure that it takes up in solution. Amino Acid Sequences Provide Important Biochemical Information Knowledge of the sequence of amino acids in a protein can offer insights into its three-dimensional structure and its function, cellular location, and evolution. Most of these insights are derived by searching for similarities between a protein of interest and previously studied proteins. Comparison of a newly obtained sequence with sequence data in international repositories oen reveals relationships both surprising and enlightening. We do not understand in detail exactly how the amino acid sequence determines three-dimensional structure, nor can we always predict function from sequence. However, protein families that have some shared structural or functional features can be readily identiﬁed on the basis of amino acid sequence similarities. Individual proteins are assigned to families based on the degree of similarity in amino acid sequence. Members of a family are usually identical across 25% or more of their sequences, and proteins in these families generally share at least some structural and functional characteristics. Some families, however, are deﬁned by identities involving only a few amino acid residues that are critical to a certain function. A number of similar substructures, or “domains” (to be deﬁned more fully in Chapter 4), occur in many functionally unrelated proteins. These domains oen fold into structural conﬁgurations that have an unusual degree of stability or that are specialized for a certain environment. Evolutionary relationships can also be inferred from the structural and functional similarities within protein families. Certain amino acid sequences serve as signals that determine the cellular location, chemical modiﬁcation, and half-life of a protein. Special signal sequences, usually at the amino terminus, are used to target certain proteins for export from the cell; other proteins are targeted for distribution to the nucleus, the cell surface, the cytosol, or other cellular locations. Other sequences act as attachment sites for prosthetic groups, such as sugar groups in glycoproteins and lipids in lipoproteins. Some of these signals are well characterized and are easily recognized in the sequence of a newly characterized protein (Chapter 27). KEY CONVENTION Much of the functional information encapsulated in protein sequences comes in the form of consensus sequences. This term is applied to such sequences in DNA, RNA, or protein. When a series of related nucleic acid sequences or protein sequences are compared, a consensus sequence is the one that reﬂects the most common base or amino acid at each position. Parts of the sequence that have particularly good agreement oen represent evolutionarily conserved functional domains. Mathematical tools available online can generate consensus sequences or identify them in sequence databases. Box 3-2 illustrates common conventions for displaying consensus sequences. BOX 3-2 Consensus Sequences and Sequence Logos Consensus sequences can be represented in several ways. To illustrate two types of conventions, we use two examples of consensus sequences (Fig. 1): an ATP-binding structure called a P loop (see Fig. 12-2) and a Ca2+-binding structure called an EF hand (see Fig. 12-17). The rules described here are adapted from those used by the sequence comparison website PROSITE (http://prosite.expasy.org/sequence_logo.html), using the standard one-letter codes for the amino acids. FIGURE 1 Representations of two consensus sequences. (a) P loop, an ATP-binding structure; (b) EF hand, a Ca2+-binding structure. [Sequence data for (a) from document ID PDOC00017 and for (b) from document ID PDOC00018, www.expasy.org/prosite, N. Hulo et al., Nucleic Acids Res. 34:D227, 2006. Sequence logos created with WebLogo, http://weblogo.berkeley.edu, G. E. Crooks et al., Genome Res. 14:1188, 2004.] In one type of consensus sequence designation, shown at the top of (a) and (b) in Figure 1, each position is separated from its neighbor by a hyphen. A position where any amino acid is allowed is designated x. Ambiguities are indicated by listing the acceptable amino acids for a given position between square brackets. For example, in (a), [AG] means Ala or Gly. If all but a few amino acids are allowed at one position, the amino acids that are not allowed are listed between curly brackets. For example, in (b), {W} means any amino acid except Trp. Repetition of an element of the pattern is indicated by following that element with a number or range of numbers between parentheses. In (a), for example, x(4) means x-x-x-x; x(2,4) means x-x, or x-x-x, or x-x-x-x. When a pattern is restricted to either the amino terminus or carboxyl terminus of a sequence, that pattern starts with < or ends with >, respectively (not so for either example here). A period ends the pattern. Applying these rules to the consensus sequence in (a), either A or G can be found at the first position. Any amino acid can occupy the next four positions, followed by an invariant G and an invariant K. The last position is either S or T. Sequence logos provide a more informative and graphic representation of an amino acid (or nucleic acid) multiple sequence alignment. Each logo consists of a stack of symbols for each position in the sequence. The overall height of the stack (in bits) indicates the degree of sequence conservation at that position, whereas the height of each symbol (letter) in the stack indicates the relative frequency of that amino acid (or nucleotide). For amino acid sequences, the colors denote the characteristics of the amino acid: polar (G, S, T, Y, C, Q, N), green; basic (K, R, H), blue; acidic (D, E), red; and hydrophobic (A, V, L, I, P, W, F, M), black. The classification of amino acids in this scheme is somewhat diﬀerent from the classification in Table 3-1 and Figure 3-5. The amino acids with aromatic side chains are subsumed into the nonpolar (F, W) and polar (Y) classifications. Glycine, always hard to classify, is assigned to the polar group. Note that when multiple amino acids are acceptable at a particular position, they rarely occur with equal probability. One or a few usually predominate. The logo representation makes the predominance clear, and a conserved sequence in a protein is made obvious. However, the logo obscures some amino acid residues that may be allowed at a position, such as the Cys that occasionally occurs at position 8 of the EF hand in (b). Protein Sequences Help Elucidate the History of Life on Earth The simple string of letters denoting the amino acid sequence of a protein holds a surprising wealth of information, which is being unlocked by applying the tools of bioinformatics to genomic and protein sequence data. Each protein’s function relies on its three-dimensional structure, which in turn is determined largely by its primary structure. Thus, the biochemical information conveyed by a protein sequence is limited only by our understanding of structural and functional principles. The constantly evolving tools of bioinformatics make it possible to identify functional segments in new proteins and also help to establish both their sequence and their structural relationships to proteins already in the databases. On a different level of inquiry, protein sequences are beginning to tell us how the proteins evolved and, ultimately, how life evolved on this planet. The ﬁeld of molecular evolution is oen traced to Emile Zuckerkandl and Linus Pauling, whose work in the mid-1960s advanced the use of nucleotide and protein sequences to explore evolution. The premise is deceptively straightforward. If two organisms are closely related, the sequences of their genes and proteins should be similar. The sequences increasingly diverge as the evolutionary distance between two organisms increases. The promise of this approach began to be realized in the 1970s, when Carl Woese used ribosomal RNA sequences to deﬁne the Archaea as a group of living organisms distinct from the Bacteria and Eukarya. The information in genome and protein sequence databases can be used to trace biological history if we can learn to read the genetic hieroglyphics. Evolution has not taken a simple linear path. For a given protein, the amino acid residues essential for the activity of the protein are conserved over evolutionary time. The residues that are less important to function may vary over time — that is, one amino acid may substitute for another — and these variable residues can provide the information to trace evolution. Some proteins have more variable amino acid residues than others. For these and other reasons, different proteins evolve at different rates. Another complicating factor in tracing evolutionary history is the rare transfer of a gene or a group of genes from one organism to another, a process called horizontal gene transfer. The transferred genes may be similar to the genes they were derived from in the original organism, whereas most other genes in the two organisms may be only distantly related. An example of horizontal gene transfer is the recent rapid spread of antibiotic- resistance genes in bacterial populations. The proteins derived from these transferred genes would not be good candidates for the study of bacterial evolution, because they share only a very limited evolutionary history with their “host” organisms. The study of molecular evolution generally focuses on families of closely related proteins. In most cases, the families chosen for analysis have essential functions in cellular metabolism that must have been present in the earliest viable cells, thus greatly reducing the chance that they were introduced relatively recently by horizontal gene transfer. For example, a protein called EF-1α (elongation factor 1α ) is involved in the synthesis of proteins in all eukaryotes. A similar protein, EF-Tu, with the same function, is found in bacteria. Similarities in sequence and function indicate that EF-1α and EF-Tu are members of a family of proteins that share a common ancestor. The members of protein families are called homologous proteins, or homologs. The concept of a homolog can be further reﬁned. If two proteins in a family (that is, two homologs) are present in the same species, they are referred to as paralogs. Homologs from different species are called orthologs. The process of tracing evolution involves ﬁrst identifying suitable families of homologous proteins and then using them to reconstruct evolutionary paths. Homologs are identiﬁed using computer programs that can directly compare speciﬁc protein sequences or that can search databases to identify any protein with an amino acid that matches within deﬁned parameters. The electronic search process can be thought of as sliding one sequence past the other until a section with a good match is found. Within this sequence alignment, a positive score is assigned for each position where the two sequences are identical, and a negative score is introduced wherever gaps need to be introduced in one sequence or the other to bring them into register. The overall score provides a measure of the quality of the alignment (Fig. 3-31). The program selects the alignment with the optimal score that maximizes identical amino acid residues while minimizing the introduction of gaps. FIGURE 3-31 Aligning protein sequences with the use of gaps. Shown here is the sequence alignment of a short section of the Hsp70 proteins (a widespread class of protein-folding chaperones) from two well-studied bacterial species, E. coli and Bacillus subtilis. Introduction of a gap in the B. subtilis sequence allows a better alignment of amino acid residues on either side of the gap. Identical amino acid residues are shaded. [Information from R. S. Gupta, Microbiol. Mol. Biol. Rev. 62:1435, 1998, Fig. 2.] Finding identical amino acids is oen inadequate in attempts to identify related proteins or, more importantly, to determine how closely related the proteins are on an evolutionary time scale. A more useful analysis also considers the chemical properties of substituted amino acids. Many of the amino acid differences within a protein family may be conservative — that is, an amino acid residue is replaced by a residue that has similar chemical properties. For example, a Glu residue may substitute in one family member for the Asp residue found in another; both amino acids are negatively charged. Logically, such a conservative substitution should receive a higher score in a sequence alignment than a nonconservative substitution does — for example, in replacement of the Asp residue with a hydrophobic Phe residue. For most efforts to ﬁnd homologies and explore evolutionary relationships, protein sequences are superior to nucleic acid sequences that do not encode a protein or functional RNA. For a nucleic acid, with its four different types of residues, random alignment of nonhomologous sequences will generally yield matches for at least 25% of the positions. Introduction of a few gaps can oen increase the fraction of matched residues to 40% or more, and the probability of chance alignment of unrelated sequences becomes quite high. The 20 different amino acid residues in proteins greatly lower the probability of uninformative chance alignments of this type. The programs used to generate a sequence alignment are complemented by methods that test the reliability of the alignments. A common computerized test is to shufﬂe the amino acid sequence of one of the proteins being compared in order to produce a random sequence, then to instruct the program to align the shufﬂed sequence with the other, unshufﬂed one. Scores are assigned to the new alignment, and the shufﬂing and alignment process is repeated many times. The original alignment, before shufﬂing, should have a score signiﬁcantly higher than any of those within the distribution of scores generated by the random alignments; this increases the conﬁdence that the sequence alignment has identiﬁed a pair of homologs. Note that the absence of a signiﬁcant alignment score does not necessarily mean that no evolutionary relationship exists between two proteins. As we shall see in Chapter 4, three-dimensional structural similarities sometimes reveal evolutionary relationships where sequence homology has been wiped away by time. To use a protein family to explore evolution, researchers identify family members with similar molecular functions in the widest possible range of organisms. The sequence divergence in these protein families allows segregation of organisms into classes based on their evolutionary relationships. Certain segments of a protein sequence may be found in the organisms of one taxonomic group but not in other groups; these segments can be used as signature sequences for the group in which they are found. An example of a signature sequence is an insertion of 12 amino acids near the amino terminus of the EF-1α /EF-Tu proteins in all archaea and eukaryotes but not in bacteria (Fig. 3- 32). This particular signature is one of many biochemical clues that can help establish the evolutionary relatedness of eukaryotes and archaea. FIGURE 3-32 A signature sequence in the EF-1α /EF-Tu protein family. The signature sequence (boxed) is a 12-residue insertion near the amino terminus of the sequence. Residues that align in all species are shaded. Both archaea and eukaryotes have the signature, although the sequences of the insertions are distinct for the two groups. The variation in the signature sequence reflects the significant evolutionary divergence that has occurred at this site since it first appeared in a common ancestor of both groups. [Information from R. S. Gupta, Microbiol. Mol. Biol. Rev. 62:1435, 1998, Fig. 7.] By considering the sequences of multiple proteins, researchers can construct elaborate evolutionary trees. Figure 3- 33 presents one such tree for 10,462 bacterial species, based on the sequences of 120 proteins ubiquitous in bacteria. In Figure 3- 33, the free end points of lines are called “external nodes”; each represents an extant species, and each is so labeled. The points where two lines come together, the “internal nodes,” represent extinct ancestor species. In most representations (including Fig. 3-33), the lengths of the lines connecting the nodes reﬂect amino acid substitutions in the selected proteins that separate one species from another. The use of 120 different proteins permits calibration and a more accurate determination of the time required for the various species to diverge. FIGURE 3-33 Evolutionary tree derived from amino acid sequence comparisons. This tree includes data from 10,462 bacterial species. The leaf nodes (points of intersection with the outer circle) represent extant species. Inner nodes (points where lines emanating from the leaf nodes come together) represent extinct ancestral species. Line lengths correspond to evolutionary time, as measured by sequence divergence in 120 proteins common to all bacteria. The major bacterial phyla are denoted by lines encompassing parts of the outer circle. [Information from D. H. Parks, Nat. Biotechnol. 36:996, 2018, Fig. 1c.]. As more sequence information is made available in databases, we move toward one of the core goals of biology — creating a detailed tree of life that describes the evolution and relationship of every organism on Earth. The story is a work in progress. The questions being asked and answered are fundamental to how humans view themselves and the world around them. SUMMARY 3.4 The Structure of Proteins: Primary Structure Differences in protein function result from differences in amino acid composition and sequence. The chemical properties of particular amino acid residues are oen critical to the function of a protein. Most amino acid sequences are deduced from genomic sequences and by mass spectrometry. Methods derived from classical approaches to protein sequencing remain important in protein chemistry. Short proteins and peptides (up to about 100 residues) can be chemically synthesized. The peptide is built up, one amino acid residue at a time, while tethered to a solid support. Protein sequences are a rich source of information about protein structure and function. Bioinformatics can analyze changes in the amino acid sequences of homologous proteins over time to trace the evolution of life on Earth. Chapter Review KEY TERMS Terms in bold are deﬁned in the glossary. amino acids residue R group chiral center enantiomers absolute conﬁguration D, L system polarity absorbance, A zwitterion amphoteric ampholyte isoelectric pH (isoelectric point, pI) peptide protein peptide bond condensation hydrolysis oligopeptide polypeptide amino-terminal residue carboxyl-terminal residue oligomeric protein protomer conjugated protein prosthetic group crude extract fraction fractionation dialysis column chromatography ion-exchange chromatography cation-exchange resin anion-exchange resin size-exclusion chromatography afﬁnity chromatography high-performance liquid chromatography (HPLC) electrophoresis sodium dodecyl sulfate (SDS) isoelectric focusing speciﬁc activity primary structure secondary structure tertiary structure quaternary structure proteases mass spectrometry proteome analyte matrix-assisted laser desorption/ionization mass spectrometry (MALDI MS) electrospray ionization mass spectrometry (ESI MS) tandem mass spectrometry (MS/MS) consensus sequence bioinformatics horizontal gene transfer homologous proteins homologs paralogs orthologs signature sequence PROBLEMS 1. Amino Acid Constituents of Glutathione Glutathione is an important peptide antioxidant found in cells from bacteria to humans. Identify the three amino acid constituents of glutathione. What is unusual about glutathione’s structure? 2. Absolute Conﬁguration of Ornithine Ornithine is an amino acid that is not a building block of proteins. Instead, ornithine is an important intermediate in the urea cycle, the metabolic process that facilitates the excretion of ammonia waste products in animals. What is the absolute conﬁguration of the ornithine molecule shown here? 3. Relationship between the Titration Curve and the Acid- Base Properties of Glycine A researcher titrated a 100 mL solution of 0.1 M glycine at pH 1.72 with 2 M NaOH solution. She then monitored the pH and plotted the results in the graph shown. The key points in the titration are designated I to V. For each of the following statements, identify the appropriate key point in the titration. Note that statement (k) applies to more than one key point in the titration. a. The pH is equal to the pKa of the carboxyl group. b. The pH is equal to the pKa of the protonated amino group. c. The predominant glycine species is +H3N—CH2—COOH. d. The predominant glycine species is +H3N—CH2—COO−. e. Glycine exists as a 50:50 mixture of +H3N—CH2—COOH and +H3N—CH2—COO−. f. The average net charge of glycine is + . g. Half of the amino groups are ionized. h. The average net charge of glycine is 0. i. The average net charge of glycine is −1. j. This is the isoelectric point for glycine. k. Glycine has its maximum buffering capacity at these regions. 4. Charge States of Alanine at Its pI At a pH equal to the isoelectric point (pI) of alanine, the net charge on alanine is zero. Two structures can be drawn that have a net charge of 1 2 zero, but the predominant form of alanine at its pI is zwitterionic. a. Why is alanine predominantly zwitterionic at its pI? b. What fraction of alanine is in the completely uncharged form at its pI? 5. Ionization State of Histidine Each ionizable group of an amino acid can exist in one of two states, charged or neutral. The electric charge on the functional group is determined by the relationship between its pKa and the pH of the solution. This relationship is described by the Henderson-Hasselbalch equation. a. Histidine has three ionizable functional groups. Write the equilibrium equations for its three ionizations, and assign the proper pKa for each ionization. Draw the structure of histidine in each ionization state. What is the net charge on the histidine molecule in each ionization state? b. Draw the structures of the predominant ionization state of histidine at pH 1, 4, 8, and 12. Note that you can approximate the ionization state by treating each ionizable group independently. c. What is the net charge of histidine at pH 1, 4, 8, and 12? For each pH, will histidine migrate toward the anode (+) or toward the cathode (−) when placed in an electric ﬁeld? 6. Separation of Amino Acids by Ion-Exchange Chromatography We can analyze mixtures of amino acids by ﬁrst separating the mixture into its components through ion- exchange chromatography. Amino acids placed on a cation- exchange resin (see Fig. 3-17a) containing sulfonate (−SO− 3) groups ﬂow down the column at different rates because of two factors that inﬂuence their movement: (1) ionic attraction between the sulfonate residues on the column and positively charged functional groups on the amino acids, and (2) aggregation of nonpolar amino acid side chains with the hydrophobic backbone of the polystyrene resin. Note that the ionic attraction is more signiﬁcant than hydrophobicity for this column media. For each pair of amino acids listed, determine which will be eluted ﬁrst from the cation- exchange column by a pH 7.0 buffer. a. Glutamate and lysine b. Arginine and methionine c. Aspartate and valine d. Glycine and leucine e. Serine and alanine 7. Naming the Stereoisomers of Isoleucine Consider the structure of the amino acid isoleucine. a. How many chiral centers does isoleucine have? b. How many optical isomers does isoleucine have? c. Draw perspective formulas for all the optical isomers of isoleucine. 8. Comparing the pKa Values of Alanine and Polyalanine The titration curve of alanine shows the ionization of two functional groups with pKa values of 2.34 and 9.69, corresponding to the ionization of the carboxyl and the protonated amino groups, respectively. The titration of di-, tri-, and larger oligopeptides of alanine also shows the ionization of only two functional groups, although the experimental pKa values are different. The table summarizes the trend in pKa values. Amino acid or peptide pK1 pK2 Ala 2.34 9.69 Ala–Ala 3.12 8.30 Ala–Ala–Ala 3.39 8.03 Ala–(Ala)n–Ala, n ≥ 4 3.42 7.94 a. Draw the structure of Ala–Ala–Ala. Identify the functional groups associated with pK1 and pK2. b. Why does the value of pK1 increase with each additional Ala residue in the oligopeptide? c. Why does the value of pK2 decrease with each additional Ala residue in the oligopeptide? 9. Bonds Form by Condensation The peptide bond is an amide, generated by eliminating the elements of water from the two amino acids so joined. From which groups are the three atoms of water eliminated? 10. The Size of Proteins Calculate the approximate molecular weight of a protein composed of 682 amino acid residues in a single polypeptide chain. 11. Relationship between the Number of Amino Acid Residues and Protein Mass Experimental results describing a protein’s amino acid composition are useful to estimate the molecular weight of the entire protein. A quantitative amino acid analysis reveals that bovine cytochrome c contains 2% cysteine (Mr121) by weight. a. Calculate the approximate molecular weight in daltons of bovine cytochrome c if the number of cysteine residues is 2. Bovine chymotrypsinogen has a molecular mass of 25.6 kDa. Amino acid analysis shows that this enzyme is 4.7% Gly (Mr75.1). b. Calculate how many glycine residues are present in a molecule of bovine chymotrypsinogen. 12. Subunit Composition of a Protein A protein has a molecular mass of 400 kDa when measured by size-exclusion chromatography. When subjected to gel electrophoresis in the presence of sodium dodecyl sulfate (SDS), the protein gives three bands with molecular masses of 180, 160, and 60 kDa. When electrophoresis is carried out in the presence of SDS and dithiothreitol, three bands again form, this time with molecular masses of 160, 90, and 60 kDa. How many subunits does the protein have, and what is the molecular mass of each? 13. Net Electric Charge of Peptides A peptide has the sequence Glu–His–Trp–Ser–Gly–Leu–Arg–Pro–Gly. a. Calculate the net charge of the molecule at pH 3, 8, and 11. (Incorporation into a peptide can alter pKa values somewhat, but for this exercise, use pKa values for side chains and terminal amino and carboxyl groups as given in Table 3-1.) b. Estimate the pI for this peptide. 14. Isoelectric Point of Histones Histones are proteins found in eukaryotic cell nuclei, tightly bound to DNA, which has many phosphate groups. The pI of histones is very high, about 10.8. What amino acid residues must be present in relatively large numbers in histones? In what way do these residues contribute to the strong binding of histones to DNA? 15. Solubility of Polypeptides One method for separating polypeptides makes use of their different solubilities. The solubility of large polypeptides in water depends on the relative polarity of their R groups, particularly on the number of ionized groups: the more ionized groups there are, the more soluble the polypeptides are. Which of each pair of polypeptides is more soluble at the indicated pH? a. (Gly)20 or (Glu)20 at pH 7.0 b. (Lys–Val)3 or (Phe–Cys)3 at pH 7.0 c. (Ala–Ser–Gly)5 or (Asn–Ser–His)5 at pH 6.0 d. (Ala–Asp–Phe)5 or (Asn–Ser–His)5 at pH 3.0 16. Puriﬁcation of an Enzyme A biochemist discovers and puriﬁes a new enzyme, generating the puriﬁcation table shown. Procedure Total protein (mg) Activity (units) 1. Crude extract 10,000 68,000 2. Precipitation (salt) 5,000 65,000 3. Precipitation (pH) 4,000 56,000 4. Ion-exchange chromatography 70 49,000 5. Aﬀinity chromatography 12 42,000 6. Size-exclusion chromatography 8 40,000 a. From the information given in the table, calculate the speciﬁc activity of the enzyme aer each puriﬁcation procedure. b. Which of the puriﬁcation procedures used for this enzyme is most effective (i.e., gives the greatest relative increase in purity)? c. Which of the puriﬁcation procedures is least effective? d. Is there any indication based on the results shown in the table that the enzyme aer step 6 is now pure? What else could be done to estimate the purity of the enzyme preparation? 17. De-salting a Protein by Dialysis A puriﬁed protein is in a Hepes (N-(2-hydroxyethyl)piperazine-N′-(2-ethanesulfonic acid)) buffer at pH 7 with 500 mM NaCl. A dialysis membrane tube holds a 1 mL sample of the protein solution. The sample in the dialysis membrane ﬂoats in a beaker containing 1 L of the same Hepes buffer, but with 0 mM NaCl, for dialysis. Small molecules and ions (such as Na+,Cl−, and Hepes) can diffuse across the dialysis membrane, but the protein cannot. a. Calculate the concentration of NaCl in the protein sample, once the dialysis has come to equilibrium. Assume that no volume changes occur in the sample during the dialysis. b. Calculate the ﬁnal NaCl concentration in the protein sample aer dialysis in 250 mL of the same Hepes buffer, with 0 mM NaCl, twice in succession. 18. Predicting Cation Exchange Elution Order Suppose a column is ﬁlled with a cation-exchange resin at pH 7.0. In what order would the given peptides elute from the column if each has the same number of residues? Peptide A: Ala 30%, Asp 10%, Lys 10%, Ser 15%, Pro 25%, Cys 10% Peptide B: Ile 25%, Asp 20%, Arg 5%, Tyr 15%, His 5%, Thr 30% Peptide C: Ala 40%, Glu 5%, Arg 20%, Ser 5%, His 5%, Trp 25% 19. Protein Analysis by Gel Electrophoresis Chymotrypsin is a protease with a molecular mass of 25.6 kDa. The ﬁgure shows a stained SDS polyacrylamide gel with a single band in lane 1 and three bands of lower molecular weight in lane 2. Lane 1 contains a preparation of chymotrypsin, and lane 2 contains chymotrypsin pretreated with performic acid. Why does performic acid treatment of chymotrypsin generate three bands in lane 2? 20. Sequence Determination of Leucine Enkephalin Suppose a researcher isolates a peptide from brain tissue that binds to the same receptor as opiate drugs. This peptide is an opioid leucine enkephalin, a class of endogenous peptides that act within the brain to lower pain sensation. The researcher performs a series of procedures to determine the peptide’s sequence. First, she completely hydrolyzes the peptide by boiling it in 18% w/w HCl solution. Analysis of the hydrolysis products indicates the presence of Gly, Leu, Phe, and Tyr, in a 2:1:1:1 molar ratio. Second, she treats the peptide with 1-dimethylaminoaphthalene-5-sulfonyl chloride (dansyl chloride) before subjecting it to complete hydrolysis. Chromatography indicates the presence of the dansylamino acid derivative of tyrosine. No free tyrosine is present. a. Given the empirical composition and the results from the dansyl chloride reaction, where is Tyr located in the peptide? Finally, the researcher incubates the peptide with chymotrypsin for two hours at 10 ∘C and analyzes the products using chromatography. Complete digestion of the peptide with chymotrypsin followed by chromatography yields free tyrosine and leucine, plus a tripeptide containing Phe and Gly in a 1:2 ratio. b. Give the ﬁnal sequence for the peptide based on the results of the acid hydrolysis, dansyl chloride reaction, and the chymotryptic digestion. 21. Analysis of a Protein by Mass Spectrometry Investigators purify a protein produced by yeast grown under standard growth conditions. They incubate the protein with trypsin and sequence the peptides produced using mass spectrometry. One of the detected peptides, called peptide X, has the sequence Ala–Ser–Ala–Gly–Lys–Glu–Leu–Ile–Phe– Gln. The investigators then isolate the same protein, but this time from yeast grown under the stress of ultraviolet irradiation. When the sample is analyzed, a peptide with the mass of peptide X is no longer found. Instead, detection reveals a new peptide with the same sequence, except for an amino acid that replaces Ser and has a molecular mass of 167 Da. The investigators conclude that the protein has been altered in response to stress, and that the serine residue in the analyzed peptide has been modiﬁed. An unmodiﬁed series residue has a molecular mass of 87 Da. What modiﬁcation might account for the change in the peptide’s mass? 22. Structure of a Peptide Antibiotic from Bacillus brevis Extracts from the bacterium Bacillus brevis contain a peptide with antibiotic properties. This peptide forms complexes with metal ions and seems to disrupt ion transport across the cell membranes of other bacterial species, leading to bacterial death. The structure of the peptide has been determined from a series of observations. a. Complete acid hydrolysis of the peptide, followed by amino acid analysis, yielded equimolar amounts of Leu, Orn, Phe, Pro, and Val. Orn is ornithine, an amino acid not present in proteins but present in some peptides. Ornithine has the structure b. The molecular weight of the peptide is approximately 1,200 Da. c. The peptide failed to undergo hydrolysis when treated with the enzyme carboxypeptidase. This enzyme catalyzes the hydrolysis of the carboxyl-terminal residue of a polypeptide unless the residue is Pro or, for some reason, does not contain a free carboxyl group. d. Treatment of the intact peptide with 1-ﬂuoro-2,4- dinitrobenzene, followed by complete hydrolysis and chromatography, yielded only free amino acids and the derivative shown here. (Hint: The 2,4-dinitrophenyl derivative involves the amino group of a side chain rather than the α -amino group.) e. Partial hydrolysis of the peptide followed by chromatographic separation and sequence analysis yielded these di- and tripeptides (the amino-terminal amino acid is always the ﬁrst amino acid): Leu–Phe Phe–Pro Orn–Leu Val–Orn Val–Orn–Leu Phe–Pro–Val Pro–Val–Orn Given this information, deduce the amino acid sequence of the peptide antibiotic. Show your reasoning. When you have arrived at a structure, demonstrate that it is consistent with each experimental observation. 23. Efﬁciency in Peptide Synthesis A peptide with the primary structure Lys–Arg–Pro–Leu–Ile–Asp–Gly–Ala must be synthesized by the methods developed by Merriﬁeld. Calculate the percentage of the peptides synthesized that will be full length and have the correct sequence if the addition of each amino acid residue is 96% efﬁcient. Do the calculation a second time but assume a 99% efﬁciency for each cycle. 24. Sequence Comparisons Proteins called molecular chaperones (described in Chapter 4) assist in the process of protein folding. One class of chaperones, found in organisms from bacteria to mammals, is heat shock protein 90 (Hsp90). All Hsp90 chaperones contain a 10 amino acid “signature sequence” that allows ready identiﬁcation of these proteins in sequence databases. Two representations of this signature sequence are shown here. a. In this sequence, which amino acid residues are invariant (conserved across all species)? b. At which position(s) are amino acids limited to those with positively charged side chains? For each position, which amino acid is more commonly found? c. At which positions are substitutions restricted to amino acids with negatively charged side chains? For each position, which amino acid predominates? d. There is one position that can be any amino acid, although one amino acid appears much more oen than any other. What position is this, and which amino acid appears most oen? 25. Chromatographic Methods Three polypeptides, the sequences of which are represented using the one-letter code for their amino acids, are present in a mixture: 1. ATKNRASCLVPKHGALMFWRHKQLVSDPILQKRQHILV CRNAAG 2. GPYFGDEPLDVHDEPEEG 3. PHLLSAWKGMEGVGKSQSFAALIVILA a. Of the three, which one would migrate most slowly during the following: Anion-exchange chromatography? Cation-exchange chromatography? Size-exclusion (gel-ﬁltration) chromatography? b. Which peptide contains the ATP-binding motif shown in the sequence logo? DATA ANALYSIS PROBLEM 26. Determining the Amino Acid Sequence of Insulin Figure 3-24 shows the amino acid sequence of bovine insulin, determined by Frederick Sanger and his coworkers. Most of this work is described in a series of articles published in the Biochemical Journal from 1945 to 1955. In 1945, researchers knew that insulin was a small protein consisting of two or four polypeptide chains linked by disulﬁde bonds. Sanger’s team had developed a few simple methods for studying protein sequences. Treatment with FDNB. FDNB (1-ﬂuoro-2,4-dinitrobenzene) reacted with free amino (but not amide or guanidinium) groups in proteins to produce dinitrophenyl (DNP) derivatives of amino acids (see Fig. 3-25). Acid Hydrolysis. Boiling a protein with 10% HCl for several hours hydrolyzed all of its peptide and amide bonds. Short treatments produced short polypeptides; the longer the treatment, the more complete the breakdown of the protein into its amino acids. Oxidation of Cysteines. Treatment of a protein with performic acid cleaved all the disulﬁde bonds and converted all Cys residues to cysteic acid residues (see Fig. 3-26). Paper Chromatography. This more primitive version of thin- layer chromatography (see Fig. 10-25) separated compounds based on their chemical properties, allowing identiﬁcation of single amino acids and, in some cases, dipeptides. Thin-layer chromatography also separates larger peptides. As reported in his ﬁrst paper (1945), Sanger reacted insulin with FDNB and hydrolyzed the resulting protein. He found many free amino acids, but only three DNP–amino acids: α - DNP-glycine (DNP group attached to the α -amino group), α - DNP-phenylalanine, and ε -DNP-lysine (DNP attached to the ε -amino group). Sanger interpreted these results as showing that insulin had two protein chains: one with Gly at its amino terminus and one with Phe at its amino terminus. One of the two chains also contained a Lys residue, not at the amino terminus. Sanger named the chain beginning with a Gly residue “A” and the chain beginning with Phe “B.” a. Explain how Sanger’s results support his conclusions. b. Are the results consistent with the known structure of bovine insulin (see Fig. 3-24)? In a later paper (1949), Sanger described how he used these techniques to determine the ﬁrst few amino acids (amino- terminal end) of each insulin chain. To analyze the B chain, for example, he carried out the following steps: 1. Oxidized insulin to separate the A and B chains. 2. Prepared a sample of pure B chain with paper chromatography. 3. Reacted the B chain with FDNB. 4. Gently acid-hydrolyzed the protein so that some small peptides would be produced. 5. Separated the DNP-peptides from the peptides that did not contain DNP groups. 6. Isolated four of the DNP-peptides, which were named B1 through B4. 7. Strongly hydrolyzed each DNP-peptide to give free amino acids. 8. Identiﬁed the amino acids in each peptide with paper chromatography. The results were as follows: B1: α -DNP-phenylalanine only B2: α -DNP-phenylalanine; valine B3: aspartic acid; α -DNP-phenylalanine; valine B4: aspartic acid; glutamic acid; α -DNP-phenylalanine; valine c. Based on these data, what are the ﬁrst four (amino- terminal) amino acids of the B chain? Explain your reasoning. d. Does this result match the known sequence of bovine insulin (Fig. 3-24)? Explain any discrepancies. Sanger and colleagues used these and related methods to determine the entire sequence of the A and B chains. Their sequence for the A chain was as follows: Because acid hydrolysis had converted all Asn to Asp and all Gln to Glu, these residues had to be designated Asx and Glx, respectively (exact identity in the peptide unknown). Sanger solved this problem by using protease enzymes that cleave peptide bonds, but not the amide bonds in Asn and Gln residues, to prepare short peptides. He then determined the number of amide groups present in each peptide by measuring the NH+ 4 released when the peptide was acid- hydrolyzed. Some of the results for the A chain are shown below. The peptides may not have been completely pure, so the numbers were approximate — but good enough for Sanger’s purposes. Peptide name Peptide sequence Number of amide groups in peptide Ac1 Cys–Asx 0.7 Ap15 Tyr–Glx–Leu 0.98 Ap14 Tyr–Glx–Leu–Glx 1.06 Ap3 Asx–Tyr–Cys–Asx 2.10 Ap1 Glx–Asx–Tyr–Cys–Asx 1.94 Ap5pa1 Gly–Ile–Val–Glx 0.15 Ap5 Gly–Ile–Val–Glx–Glx–Cys–Cys– Ala–Ser–Val–Cys–Ser–Leu 1.16 e. Based on these data, determine the amino acid sequence of the A chain. Explain how you reached your answer. Compare your answer with Figure 3-24. References Sanger, F. 1945. The free amino groups of insulin. Biochem. J. 39:507–515. Sanger, F. 1949. The terminal peptides of insulin. Biochem. J. 45:563–574.

Stems are from the chapter Problems section; correct choices are drawn from Abbreviated Solutions to Problems (Appendix B) in the same edition.

Practice questions (from chapter Problems & Appendix B)Score: 0 / 25

1. Amino Acid Constituents of Glutathione Glutathione is an important peptide antioxidant found in cells from bacteria to humans. Identify the three amino acid constituents of glutathione. What is unusual about glutathione’s structure?

AThe constituents are Glu, Cys, and Gly. The Glu links to the Cys via its γ-carboxyl group. BThe arrows correspond to the orientation of the peptide bonds, —CO→NH—. CThe protein has four subunits, with molecular masses of 160, 90, 90, and 60 kDa. The two 90 kDa subunits (possibly identical) are linked by one or more disulﬁde bonds. Da. (Glu)20 b. (Lys−Val)3 c. (Asn−Ser−His)5 d. (Asn−Ser−His)5

2. Absolute Conﬁguration of Ornithine Ornithine is an amino acid that is not a building block of proteins. Instead, ornithine is an important intermediate in the urea cycle, the metabolic process that facilitates the excretion of ammonia waste products in animals. What is the absolute conﬁguration of the ornithine molecule shown here?

Aa. 10,300. The elements of water are lost when a peptide bond forms, so the molecular weight of a Cys residue is not the same as the molecular weight of free cysteine. b. 21 BOne H comes from the α-amino group of one amino acid, and an OH is removed from the α- carboxyl group of the amino acid to which the ﬁrst is joined. C75,000 DIt is L-ornithine, because the amino group occupies the same relative position as the hydroxyl group in L-glyceraldehyde.

3. Relationship between the Titration Curve and the Acid- Base Properties of Glycine A researcher titrated a 100 mL solution of 0.1 M glycine at pH 1.72 with 2 M NaOH solution. She then monitored the pH and plotted the results in the graph shown. The key points in the titration are designated I to V. For each of the following statements, identify the appropriate key point in the titration. Note that statement (k) applies to more than one key point in the titration. a. The pH is equal to the pKa of the carboxyl group. b. The pH is equal to the pKa of the protonated amino group. c. The predominant glycine species is +H3N—CH2—COOH. d. The predominant glycine species is +H3N—CH2—COO−. e. Glycine exists as a 50:50 mixture of +H3N—CH2—COOH and +H3N—CH2—COO−. f. The average net charge of glycine is + . g. Half of the amino groups are ionized. h. The average net charge of glycine is 0. i. The average net charge of glycine is −1. j. This is the isoelectric point for glycine. k. Glycine has its maximum buffering capacity at these regions.

APhosphorylation of serine would alter the mass by 80. BThe constituents are Glu, Cys, and Gly. The Glu links to the Cys via its γ-carboxyl group. Ca. II b. IV c. I d. III e. II f. II g. IV h. III i. V j. III k. II and IV Da. Structure at pH 7: b. Electrostatic interaction between the carboxylate anion and the protonated amino group of the alanine zwitterion favorably affects ionization of the carboxyl group. This favorable electrostatic interaction decreases as the length of the poly(Ala) increases, resulting in an increase in pK1. c. Ionization of the protonated amino group destroys the favorable electrostatic interaction noted in (…

4. Charge States of Alanine at Its pI At a pH equal to the isoelectric point (pI) of alanine, the net charge on alanine is zero. Two structures can be drawn that have a net charge of 1 2 zero, but the predominant form of alanine at its pI is zwitterionic. a. Why is alanine predominantly zwitterionic at its pI? b. What fraction of alanine is in the completely uncharged form at its pI?

Aa. b., c. pH Structure identified in (a) Net charge Migrates toward 1 1 +2 Cathode 4 2 +1 Cathode 8 3 0 Does not migrate 12 4 −1 Anode Ba. Y (Tyr) at position 1, F (Phe) at position 7, and R (Arg) at position 9. b. Positions 4 and 9; K (Lys) is more common at 4, R (Arg) is invariant at 9. c. Positions 5 and 10; E (Glu) is more common at both positions. d. Position 2; S (Ser). CLys, His, Arg; negatively charged phosphate groups in DNA interact with positively charged side groups in histones. Da. pI> pKa of the α-carboxyl group and pI< pKa of the α-amino group, so both groups are charged (ionized). b. 1 in 2.19×107. The pI of alanine is 6.01. From Table 3-1 and the Henderson-Hasselbalch equation, 1 in 4,680 carboxyl groups and 1 in 4,680 amino groups are uncharged. The fraction of alanine molecules with both groups uncharged is 1 in 4,6802.

5. Ionization State of Histidine Each ionizable group of an amino acid can exist in one of two states, charged or neutral. The electric charge on the functional group is determined by the relationship between its pKa and the pH of the solution. This relationship is described by the Henderson-Hasselbalch equation. a. Histidine has three ionizable functional groups. Write the equilibrium equations for its three ionizations, and assign the proper pKa for each ionization. Draw the structure of histidine in each ionization state. What is the net charge on the histidine molecule in each ionization state? b. Draw the structures of the predominant ionization state of histidine at pH 1, 4, 8, and 12. Note that you can approximate the ionization state by treating each ionizable group independently. c. What is the net charge of histidine at pH 1, 4, 8, and 12? For each pH, will histidine migrate toward the anode (+) or toward the cathode (−) when placed in an electric ﬁeld?

Aa. (Glu)20 b. (Lys−Val)3 c. (Asn−Ser−His)5 d. (Asn−Ser−His)5 Ba. Glutamate b. Methionine c. Aspartate d. Glycine e. Serine Ca. b., c. pH Structure identified in (a) Net charge Migrates toward 1 1 +2 Cathode 4 2 +1 Cathode 8 3 0 Does not migrate 12 4 −1 Anode Da. [NaCl]=0.5 mM b. [NaCl]=8 μm

6. Separation of Amino Acids by Ion-Exchange Chromatography We can analyze mixtures of amino acids by ﬁrst separating the mixture into its components through ion- exchange chromatography. Amino acids placed on a cation- exchange resin (see Fig. 3-17a) containing sulfonate (−SO− 3) groups ﬂow down the column at different rates because of two factors that inﬂuence their movement: (1) ionic attraction between the sulfonate residues on the column and positively charged functional groups on the amino acids, and (2) aggregation of nonpolar amino acid side chains with the hydrophobic backbone of the polystyrene resin. Note that the ionic attraction is more signiﬁcant than hydrophobicity for this column media. For each pair of amino acids listed, determine which will be eluted ﬁrst from the cation- exchange column by a pH 7.0 buffer. a. Glutamate and lysine b. Arginine and methionine c. Aspartate and valine d. Glycine and leucine e. Serine and alanine

AOne H comes from the α-amino group of one amino acid, and an OH is removed from the α- carboxyl group of the amino acid to which the ﬁrst is joined. Ba. Glutamate b. Methionine c. Aspartate d. Glycine e. Serine CIt is L-ornithine, because the amino group occupies the same relative position as the hydroxyl group in L-glyceraldehyde. Da. b., c. pH Structure identified in (a) Net charge Migrates toward 1 1 +2 Cathode 4 2 +1 Cathode 8 3 0 Does not migrate 12 4 −1 Anode

7. Naming the Stereoisomers of Isoleucine Consider the structure of the amino acid isoleucine. a. How many chiral centers does isoleucine have? b. How many optical isomers does isoleucine have? c. Draw perspective formulas for all the optical isomers of isoleucine.

Aa. 2 b. 4 c. BOne H comes from the α-amino group of one amino acid, and an OH is removed from the α- carboxyl group of the amino acid to which the ﬁrst is joined. CB elutes ﬁrst, A second, C last. Da. pI> pKa of the α-carboxyl group and pI< pKa of the α-amino group, so both groups are charged (ionized). b. 1 in 2.19×107. The pI of alanine is 6.01. From Table 3-1 and the Henderson-Hasselbalch equation, 1 in 4,680 carboxyl groups and 1 in 4,680 amino groups are uncharged. The fraction of alanine molecules with both groups uncharged is 1 in 4,6802.

8. Comparing the pKa Values of Alanine and Polyalanine The titration curve of alanine shows the ionization of two functional groups with pKa values of 2.34 and 9.69, corresponding to the ionization of the carboxyl and the protonated amino groups, respectively. The titration of di-, tri-, and larger oligopeptides of alanine also shows the ionization of only two functional groups, although the experimental pKa values are different. The table summarizes the trend in pKa values. Amino acid or peptide pK1 pK2 Ala 2.34 9.69 Ala–Ala 3.12 8.30 Ala–Ala–Ala 3.39 8.03 Ala–(Ala)n–Ala, n ≥ 4 3.42 7.94 a. Draw the structure of Ala–Ala–Ala. Identify the functional groups associated with pK1 and pK2. b. Why does the value of pK1 increase with each additional Ala residue in the oligopeptide? c. Why does the value of pK2 decrease with each additional Ala residue in the oligopeptide?

Aa. Anion-exchange chromatography: peptide 2; cation-exchange chromatography: peptide 1; size-exclusion chromatography: peptide 2 b. peptide 3 Ba. Amino terminus b. Tyr–Gly–Gly–Phe–Leu CThe constituents are Glu, Cys, and Gly. The Glu links to the Cys via its γ-carboxyl group. Da. Structure at pH 7: b. Electrostatic interaction between the carboxylate anion and the protonated amino group of the alanine zwitterion favorably affects ionization of the carboxyl group. This favorable electrostatic interaction decreases as the length of the poly(Ala) increases, resulting in an increase in pK1. c. Ionization of the protonated amino group destroys the favorable electrostatic interaction noted in (…

9. Bonds Form by Condensation The peptide bond is an amide, generated by eliminating the elements of water from the two amino acids so joined. From which groups are the three atoms of water eliminated?

AThe protein has four subunits, with molecular masses of 160, 90, 90, and 60 kDa. The two 90 kDa subunits (possibly identical) are linked by one or more disulﬁde bonds. BOne H comes from the α-amino group of one amino acid, and an OH is removed from the α- carboxyl group of the amino acid to which the ﬁrst is joined. CThe constituents are Glu, Cys, and Gly. The Glu links to the Cys via its γ-carboxyl group. DIt is L-ornithine, because the amino group occupies the same relative position as the hydroxyl group in L-glyceraldehyde.

10. The Size of Proteins Calculate the approximate molecular weight of a protein composed of 682 amino acid residues in a single polypeptide chain.

AThe protein has four subunits, with molecular masses of 160, 90, 90, and 60 kDa. The two 90 kDa subunits (possibly identical) are linked by one or more disulﬁde bonds. B75,000 C75%, 93%. If the efﬁciency of each amino acid addition is x, then the percentage of full-length peptides with the correct sequence aer the addition of seven amino acids will be x7, as there are seven peptide bonds. Da. Speciﬁc activity aer step 1 is 6.8 units/mg; step 2, 13 units/mg; step 3, 14 units/mg; step 4, 700 units/mg; step 5, 3,500 units/mg; step 6, 5,000 units/mg. b. Step 4 c. Step 3 d. Yes. Speciﬁc activity increased only modestly in step 6; SDS polyacrylamide gel electrophoresis.

11. Relationship between the Number of Amino Acid Residues and Protein Mass Experimental results describing a protein’s amino acid composition are useful to estimate the molecular weight of the entire protein. A quantitative amino acid analysis reveals that bovine cytochrome c contains 2% cysteine (Mr121) by weight. a. Calculate the approximate molecular weight in daltons of bovine cytochrome c if the number of cysteine residues is 2. Bovine chymotrypsinogen has a molecular mass of 25.6 kDa. Amino acid analysis shows that this enzyme is 4.7% Gly (Mr75.1). b. Calculate how many glycine residues are present in a molecule of bovine chymotrypsinogen.

AThe arrows correspond to the orientation of the peptide bonds, —CO→NH—. Ba. Glutamate b. Methionine c. Aspartate d. Glycine e. Serine CThe chymotrypsin protein has three distinct polypeptide chains linked by disulﬁde bonds. They move on the gel as separate species once the disulﬁde bonds are broken to form the three peptides in lane 2. Da. 10,300. The elements of water are lost when a peptide bond forms, so the molecular weight of a Cys residue is not the same as the molecular weight of free cysteine. b. 21

12. Subunit Composition of a Protein A protein has a molecular mass of 400 kDa when measured by size-exclusion chromatography. When subjected to gel electrophoresis in the presence of sodium dodecyl sulfate (SDS), the protein gives three bands with molecular masses of 180, 160, and 60 kDa. When electrophoresis is carried out in the presence of SDS and dithiothreitol, three bands again form, this time with molecular masses of 160, 90, and 60 kDa. How many subunits does the protein have, and what is the molecular mass of each?

Aa. b., c. pH Structure identified in (a) Net charge Migrates toward 1 1 +2 Cathode 4 2 +1 Cathode 8 3 0 Does not migrate 12 4 −1 Anode BThe protein has four subunits, with molecular masses of 160, 90, 90, and 60 kDa. The two 90 kDa subunits (possibly identical) are linked by one or more disulﬁde bonds. Ca. Y (Tyr) at position 1, F (Phe) at position 7, and R (Arg) at position 9. b. Positions 4 and 9; K (Lys) is more common at 4, R (Arg) is invariant at 9. c. Positions 5 and 10; E (Glu) is more common at both positions. d. Position 2; S (Ser). Da. Glutamate b. Methionine c. Aspartate d. Glycine e. Serine

13. Net Electric Charge of Peptides A peptide has the sequence Glu–His–Trp–Ser–Gly–Leu–Arg–Pro–Gly. a. Calculate the net charge of the molecule at pH 3, 8, and 11. (Incorporation into a peptide can alter pKa values somewhat, but for this exercise, use pKa values for side chains and terminal amino and carboxyl groups as given in Table 3-1.) b. Estimate the pI for this peptide.

Aa. pI> pKa of the α-carboxyl group and pI< pKa of the α-amino group, so both groups are charged (ionized). b. 1 in 2.19×107. The pI of alanine is 6.01. From Table 3-1 and the Henderson-Hasselbalch equation, 1 in 4,680 carboxyl groups and 1 in 4,680 amino groups are uncharged. The fraction of alanine molecules with both groups uncharged is 1 in 4,6802. BOne H comes from the α-amino group of one amino acid, and an OH is removed from the α- carboxyl group of the amino acid to which the ﬁrst is joined. Ca. at pH 3, +2; at pH 8, 0; at pH 11, −1 b. pI=7.8 D75,000

14. Isoelectric Point of Histones Histones are proteins found in eukaryotic cell nuclei, tightly bound to DNA, which has many phosphate groups. The pI of histones is very high, about 10.8. What amino acid residues must be present in relatively large numbers in histones? In what way do these residues contribute to the strong binding of histones to DNA?

Aa. 2 b. 4 c. Ba. Y (Tyr) at position 1, F (Phe) at position 7, and R (Arg) at position 9. b. Positions 4 and 9; K (Lys) is more common at 4, R (Arg) is invariant at 9. c. Positions 5 and 10; E (Glu) is more common at both positions. d. Position 2; S (Ser). CLys, His, Arg; negatively charged phosphate groups in DNA interact with positively charged side groups in histones. DIt is L-ornithine, because the amino group occupies the same relative position as the hydroxyl group in L-glyceraldehyde.

15. Solubility of Polypeptides One method for separating polypeptides makes use of their different solubilities. The solubility of large polypeptides in water depends on the relative polarity of their R groups, particularly on the number of ionized groups: the more ionized groups there are, the more soluble the polypeptides are. Which of each pair of polypeptides is more soluble at the indicated pH? a. (Gly)20 or (Glu)20 at pH 7.0 b. (Lys–Val)3 or (Phe–Cys)3 at pH 7.0 c. (Ala–Ser–Gly)5 or (Asn–Ser–His)5 at pH 6.0 d. (Ala–Asp–Phe)5 or (Asn–Ser–His)5 at pH 3.0

Aa. Y (Tyr) at position 1, F (Phe) at position 7, and R (Arg) at position 9. b. Positions 4 and 9; K (Lys) is more common at 4, R (Arg) is invariant at 9. c. Positions 5 and 10; E (Glu) is more common at both positions. d. Position 2; S (Ser). BPhosphorylation of serine would alter the mass by 80. Ca. Amino terminus b. Tyr–Gly–Gly–Phe–Leu Da. (Glu)20 b. (Lys−Val)3 c. (Asn−Ser−His)5 d. (Asn−Ser−His)5

16. Puriﬁcation of an Enzyme A biochemist discovers and puriﬁes a new enzyme, generating the puriﬁcation table shown. Procedure Total protein (mg) Activity (units)

A75%, 93%. If the efﬁciency of each amino acid addition is x, then the percentage of full-length peptides with the correct sequence aer the addition of seven amino acids will be x7, as there are seven peptide bonds. Ba. Speciﬁc activity aer step 1 is 6.8 units/mg; step 2, 13 units/mg; step 3, 14 units/mg; step 4, 700 units/mg; step 5, 3,500 units/mg; step 6, 5,000 units/mg. b. Step 4 c. Step 3 d. Yes. Speciﬁc activity increased only modestly in step 6; SDS polyacrylamide gel electrophoresis. CPhosphorylation of serine would alter the mass by 80. DIt is L-ornithine, because the amino group occupies the same relative position as the hydroxyl group in L-glyceraldehyde.

17. Crude extract 10,000 68,000

AThe constituents are Glu, Cys, and Gly. The Glu links to the Cys via its γ-carboxyl group. B75,000 Ca. Amino terminus b. Tyr–Gly–Gly–Phe–Leu Da. (Glu)20 b. (Lys−Val)3 c. (Asn−Ser−His)5 d. (Asn−Ser−His)5

18. Precipitation (salt) 5,000 65,000

AThe protein has four subunits, with molecular masses of 160, 90, 90, and 60 kDa. The two 90 kDa subunits (possibly identical) are linked by one or more disulﬁde bonds. Ba. pI> pKa of the α-carboxyl group and pI< pKa of the α-amino group, so both groups are charged (ionized). b. 1 in 2.19×107. The pI of alanine is 6.01. From Table 3-1 and the Henderson-Hasselbalch equation, 1 in 4,680 carboxyl groups and 1 in 4,680 amino groups are uncharged. The fraction of alanine molecules with both groups uncharged is 1 in 4,6802. CIt is L-ornithine, because the amino group occupies the same relative position as the hydroxyl group in L-glyceraldehyde. DThe arrows correspond to the orientation of the peptide bonds, —CO→NH—.

19. Precipitation (pH) 4,000 56,000

Aa. Glutamate b. Methionine c. Aspartate d. Glycine e. Serine Ba. II b. IV c. I d. III e. II f. II g. IV h. III i. V j. III k. II and IV CThe protein has four subunits, with molecular masses of 160, 90, 90, and 60 kDa. The two 90 kDa subunits (possibly identical) are linked by one or more disulﬁde bonds. Da. (Glu)20 b. (Lys−Val)3 c. (Asn−Ser−His)5 d. (Asn−Ser−His)5

20. Ion-exchange chromatography 70 49,000

Aa. Speciﬁc activity aer step 1 is 6.8 units/mg; step 2, 13 units/mg; step 3, 14 units/mg; step 4, 700 units/mg; step 5, 3,500 units/mg; step 6, 5,000 units/mg. b. Step 4 c. Step 3 d. Yes. Speciﬁc activity increased only modestly in step 6; SDS polyacrylamide gel electrophoresis. Ba. b., c. pH Structure identified in (a) Net charge Migrates toward 1 1 +2 Cathode 4 2 +1 Cathode 8 3 0 Does not migrate 12 4 −1 Anode CThe constituents are Glu, Cys, and Gly. The Glu links to the Cys via its γ-carboxyl group. Da. pI> pKa of the α-carboxyl group and pI< pKa of the α-amino group, so both groups are charged (ionized). b. 1 in 2.19×107. The pI of alanine is 6.01. From Table 3-1 and the Henderson-Hasselbalch equation, 1 in 4,680 carboxyl groups and 1 in 4,680 amino groups are uncharged. The fraction of alanine molecules with both groups uncharged is 1 in 4,6802.

21. Aﬀinity chromatography 12 42,000

Aa. (Glu)20 b. (Lys−Val)3 c. (Asn−Ser−His)5 d. (Asn−Ser−His)5 Ba. Speciﬁc activity aer step 1 is 6.8 units/mg; step 2, 13 units/mg; step 3, 14 units/mg; step 4, 700 units/mg; step 5, 3,500 units/mg; step 6, 5,000 units/mg. b. Step 4 c. Step 3 d. Yes. Speciﬁc activity increased only modestly in step 6; SDS polyacrylamide gel electrophoresis. CThe constituents are Glu, Cys, and Gly. The Glu links to the Cys via its γ-carboxyl group. Da. b., c. pH Structure identified in (a) Net charge Migrates toward 1 1 +2 Cathode 4 2 +1 Cathode 8 3 0 Does not migrate 12 4 −1 Anode

22. Size-exclusion chromatography 8 40,000 a. From the information given in the table, calculate the speciﬁc activity of the enzyme aer each puriﬁcation procedure. b. Which of the puriﬁcation procedures used for this enzyme is most effective (i.e., gives the greatest relative increase in purity)? c. Which of the puriﬁcation procedures is least effective? d. Is there any indication based on the results shown in the table that the enzyme aer step 6 is now pure? What else could be done to estimate the purity of the enzyme preparation?

AB elutes ﬁrst, A second, C last. Ba. 10,300. The elements of water are lost when a peptide bond forms, so the molecular weight of a Cys residue is not the same as the molecular weight of free cysteine. b. 21 Ca. Any linear polypeptide chain has only two kinds of free amino groups: a single α-amino group at the amino terminus, and an ε-amino group on each Lys residue. These amino groups react with FDNB to form a DNP–amino acid derivative. Insulin gave two different α- amino-DNP derivatives, suggesting that it has two amino termini and thus two polypeptide chains — one with an amino-terminal Gly and the other with an amino… Da. Glutamate b. Methionine c. Aspartate d. Glycine e. Serine

23. De-salting a Protein by Dialysis A puriﬁed protein is in a Hepes (N-(2-hydroxyethyl)piperazine-N′-(2-ethanesulfonic acid)) buffer at pH 7 with 500 mM NaCl. A dialysis membrane tube holds a 1 mL sample of the protein solution. The sample in the dialysis membrane ﬂoats in a beaker containing 1 L of the same Hepes buffer, but with 0 mM NaCl, for dialysis. Small molecules and ions (such as Na+,Cl−, and Hepes) can diffuse across the dialysis membrane, but the protein cannot. a. Calculate the concentration of NaCl in the protein sample, once the dialysis has come to equilibrium. Assume that no volume changes occur in the sample during the dialysis. b. Calculate the ﬁnal NaCl concentration in the protein sample aer dialysis in 250 mL of the same Hepes buffer, with 0 mM NaCl, twice in succession.

Aa. Y (Tyr) at position 1, F (Phe) at position 7, and R (Arg) at position 9. b. Positions 4 and 9; K (Lys) is more common at 4, R (Arg) is invariant at 9. c. Positions 5 and 10; E (Glu) is more common at both positions. d. Position 2; S (Ser). Ba. Amino terminus b. Tyr–Gly–Gly–Phe–Leu Ca. [NaCl]=0.5 mM b. [NaCl]=8 μm Da. Structure at pH 7: b. Electrostatic interaction between the carboxylate anion and the protonated amino group of the alanine zwitterion favorably affects ionization of the carboxyl group. This favorable electrostatic interaction decreases as the length of the poly(Ala) increases, resulting in an increase in pK1. c. Ionization of the protonated amino group destroys the favorable electrostatic interaction noted in (…

24. Predicting Cation Exchange Elution Order Suppose a column is ﬁlled with a cation-exchange resin at pH 7.0. In what order would the given peptides elute from the column if each has the same number of residues? Peptide A: Ala 30%, Asp 10%, Lys 10%, Ser 15%, Pro 25%, Cys 10% Peptide B: Ile 25%, Asp 20%, Arg 5%, Tyr 15%, His 5%, Thr 30% Peptide C: Ala 40%, Glu 5%, Arg 20%, Ser 5%, His 5%, Trp 25%

Aa. Speciﬁc activity aer step 1 is 6.8 units/mg; step 2, 13 units/mg; step 3, 14 units/mg; step 4, 700 units/mg; step 5, 3,500 units/mg; step 6, 5,000 units/mg. b. Step 4 c. Step 3 d. Yes. Speciﬁc activity increased only modestly in step 6; SDS polyacrylamide gel electrophoresis. Ba. II b. IV c. I d. III e. II f. II g. IV h. III i. V j. III k. II and IV C75%, 93%. If the efﬁciency of each amino acid addition is x, then the percentage of full-length peptides with the correct sequence aer the addition of seven amino acids will be x7, as there are seven peptide bonds. DB elutes ﬁrst, A second, C last.

25. Protein Analysis by Gel Electrophoresis Chymotrypsin is a protease with a molecular mass of 25.6 kDa. The ﬁgure shows a stained SDS polyacrylamide gel with a single band in lane 1 and three bands of lower molecular weight in lane 2. Lane 1 contains a preparation of chymotrypsin, and lane 2 contains chymotrypsin pretreated with performic acid. Why does performic acid treatment of chymotrypsin generate three bands in lane 2?

Aa. [NaCl]=0.5 mM b. [NaCl]=8 μm Ba. (Glu)20 b. (Lys−Val)3 c. (Asn−Ser−His)5 d. (Asn−Ser−His)5 Ca. Structure at pH 7: b. Electrostatic interaction between the carboxylate anion and the protonated amino group of the alanine zwitterion favorably affects ionization of the carboxyl group. This favorable electrostatic interaction decreases as the length of the poly(Ala) increases, resulting in an increase in pK1. c. Ionization of the protonated amino group destroys the favorable electrostatic interaction noted in (… DThe chymotrypsin protein has three distinct polypeptide chains linked by disulﬁde bonds. They move on the gel as separate species once the disulﬁde bonds are broken to form the three peptides in lane 2.