⌂ Contents Table of contents
Chapter 4

The Three-Dimensional Structure of Proteins

Textbook pages 475–614 (Lehninger, 8e) · 25 MCQs below · Source: printed chapter text extracted from the PDF

CHAPTER 4 THE THREE-DIMENSIONAL STRUCTURE OF PROTEINS molecular units are identical, crystallization was evidence that even very large proteins are discrete chemical entities with unique structures. However, we now know that protein structure is always malleable, and in sometimes surprising ways. Changes in structure can be as important to a protein’s function as the structure itself. FIGURE 4-1 Relationship between protein structure and function. (a) The PurE enzyme from Escherichia coli catalyzes a reaction that forms carbon– carbon bonds in de novo purine biosynthesis. PurE is a small (17 kDa) single-domain protein. In this view, the protein surface of PurE has been modeled and colored by hydrophobicity: yellow for hydrophobic surfaces, blue for hydrophilic surfaces, and shades of green for those in between. It is apparent that the protein folds so that many of its polar groups are accessible to solvent. (b, c) The enzymatically active form of PurE is an octamer; eight PurE protomers combine to create a square-shaped quaternary structure with eight active sites. The structure in (b) is a surface representation; (c) is a ribbon diagram that traces the peptide backbone. Two protomers are colored by surface hydrophobicity. Others are shown in single colors (two each in gray, tan, and pink). (d) Each active site is formed using segments of three different protomers. A molecule of the reaction product carboxyaminoimidazole ribonucleotide bound at the active site is shown as a stick structure. [Data from PDB ID 2NSL, A. A. Hoskins et al., Biochemistry 46:2842, 2007.] In this chapter, we examine the structure of proteins. We emphasize five principles: Protein structures are stabilized by noncovalent interactions and forces. Formation of a thermodynamically favorable structure depends on the influences of the hydrophobic effect, hydrogen bonds, ionic interactions, and van der Waals forces. Natural protein structures are constrained by peptide bonds, whose configurations can be described by the dihedral angles ϕ and ψ . Protein segments can adopt regular secondary structures such as the α helix and the β conformation. These structures are defined by particular values of ϕ and ψ and their formation is impacted by the amino acid composition of the segment. All of the ϕ and ψ values for a given protein structure can be visualized using a Ramachandran plot. Tertiary structure describes the well-defined, three- dimensional fold adopted by a protein. Protein structures are oen built by combinatorial use of common protein folds or motifs. Quaternary structure describes the interactions between components of a multisubunit assembly. Tertiary structure is determined by amino acid sequence. Even though protein folding is complex, some denatured proteins can spontaneously refold into their active conformation based only on the chemical properties of their constituent amino acids. Cellular proteostasis involves numerous pathways that regulate the folding, unfolding, and degradation of proteins. Many human diseases arise from protein misfolding and defects in proteostasis. The three-dimensional structures of proteins can be defined. Structural biologists use a variety of instruments and computational methods to solve biomolecular structures. The choice of method may depend on factors such as the size of the protein being studied, its properties, or the desired resolution of the final structure. 4.1 Overview of Protein Structure The possible conformations of a protein or protein segment include any structural state it can achieve without breaking covalent bonds. A change in conformation could occur, for example, by rotation about single bonds. However, of the many conformations that are theoretically possible in a protein containing hundreds of single bonds, one or a few generally predominate under biological conditions. The need for multiple stable conformations reflects the changes that must take place in most proteins as they bind to other molecules or catalyze reactions. The conformations existing under a given set of conditions are usually the ones that are thermodynamically the most stable — that is, having the lowest free energy (G). Proteins in any of their functional, folded conformations are oen called native proteins. For the vast majority of proteins, a particular structure or small set of structures is critical to function. However, in many cases, parts of proteins lack discernible structure. These protein segments are intrinsically disordered. In some cases, entire proteins are intrinsically disordered, yet are fully functional. What determines the most stable conformations of a typical protein? We can build an understanding of protein conformation stepwise from the discussion of primary structure in Chapter 3 through a consideration of secondary, tertiary, and quaternary structures. To this approach we must add emphasis on common and classifiable folding patterns, variously called supersecondary structures, folds, or motifs, which provide an important organizational context to this complex endeavor. A Protein’s Conformation Is Stabilized Largely by Weak Interactions Stability is the tendency of a protein to maintain a native conformation. Native proteins are only marginally stable; the ΔG separating the folded and unfolded states in typical proteins under physiological conditions is in the range of only 5 to 65 kJ/mol. A given polypeptide chain can theoretically assume countless conformations, and as a result, the unfolded state of a protein is characterized by a high degree of conformational entropy. This entropy, along with the hydrogen-bonding interactions of many groups in the polypeptide chain with the solvent (water), tends to maintain the unfolded state. The chemical interactions that counteract these effects and stabilize the native conformation include disulfide (covalent) bonds and the weak (noncovalent) interactions and forces described in Chapter 2: hydrogen bonds, the hydrophobic effect, and ionic interactions. Covalent disulfide bonds are strong, but they are also uncommon. The environment within most cells is highly reducing due to high concentrations of reductants such as glutathione, and most sulfhydryls will remain in the reduced state. Outside the cell, the environment is oen more oxidizing, and disulfide formation is more likely to occur. In eukaryotes, disulfide bonds are found primarily in secreted, extracellular proteins (for example, the hormone insulin). Disulfide bonds are also uncommon in bacterial proteins. However, thermophilic bacteria, as well as the archaea, typically have many proteins with stabilizing disulfide bonds; this is presumably an adaptation to life at high temperatures. For all proteins of all organisms, weak interactions are especially important in the folding of polypeptide chains into their secondary and tertiary structures. The association of multiple polypeptides to form quaternary structures also relies on these weak interactions. About 200 to 460 kJ/mol are required to break a single covalent bond, whereas weak interactions can be disrupted by a mere 0.4 to 30 kJ/mol. Individual covalent bonds, such as disulfide bonds linking separate parts of a single polypeptide chain, are clearly much stronger than individual weak interactions. Yet, because they are so numerous, the weak interactions predominate as a stabilizing force in protein structure. In general, the protein conformation with the lowest free energy (that is, the most stable conformation) is the one with the maximum number of weak interactions. The stability of a protein is not simply the sum of the free energies of formation of the many weak interactions within it. For every hydrogen bond formed in a protein during folding, a hydrogen bond (of similar strength) between the same group and water was broken. The net stability contributed by a given hydrogen bond, or the difference in free energies of the folded and unfolded states, may be close to zero. Ionic interactions may be either stabilizing or destabilizing. We must therefore look elsewhere to understand why a particular native conformation is favored. Packing of Hydrophobic Amino Acids Away from Water Favors Protein Folding On carefully examining the contribution of weak interactions to protein stability, we find that the hydrophobic effect generally predominates. Pure water contains a network of hydrogen-bonded H2O molecules. No other molecule has the hydrogen-bonding potential of water, and the presence of other molecules in an aqueous solution disrupts the hydrogen bonding of water. When water surrounds a hydrophobic molecule, the optimal arrangement of hydrogen bonds results in a highly structured shell, or solvation layer, of water around the molecule (see Fig. 2-7). The increased order of the water molecules in the solvation layer correlates with an unfavorable decrease in the entropy of the water. However, when nonpolar groups cluster together, the extent of the solvation layer decreases, because each group no longer presents its entire surface to the solution. The result is a favorable increase in entropy. As described in Chapter 2, this increase in entropy is the major thermodynamic driving force for the association of hydrophobic groups in aqueous solution. Hydrophobic amino acid side chains therefore tend to cluster in a protein’s interior, away from water (think of an oil droplet in water). The amino acid sequences of most proteins thus include a significant content of hydrophobic amino acid side chains (especially Leu, Ile, Val, Phe, and Trp). These are positioned so that they are clustered when the protein is folded, forming a hydrophobic protein core. Under physiological conditions, the formation of hydrogen bonds in a protein is driven largely by this same entropic effect. Polar groups can generally form hydrogen bonds with water and hence are soluble in water. However, the number of hydrogen bonds per unit mass is generally greater for pure water than for any other liquid or solution, and there are limits to the solubility of even the most polar molecules as their presence causes a net decrease in hydrogen bonding per unit mass. Therefore, a solvation layer forms to some extent even around polar molecules. Although the energy of formation of an intramolecular hydrogen bond between two polar groups in a macromolecule is largely canceled by the elimination of such interactions between these polar groups and water, the release of structured water as intramolecular associations form provides an entropic driving force for folding. Most of the net change in free energy as nonpolar amino acid side chains aggregate within a protein is therefore derived from the increased entropy in the surrounding aqueous solution resulting from the burial of hydrophobic surfaces. This more than counterbalances the large loss of conformational entropy as a polypeptide is constrained into its folded conformation. Polar Groups Contribute Hydrogen Bonds and Ion Pairs to Protein Folding The hydrophobic effect is clearly important in stabilizing conformation; the interior of a structured protein is generally a densely packed core of hydrophobic amino acid side chains. It is also important that any polar or charged groups in the protein interior have suitable partners for hydrogen bonding or ionic interactions. One hydrogen bond seems to contribute little to the stability of a native structure, but the presence of hydrogen- bonding groups without partners in the hydrophobic core of a protein can be so destabilizing that conformations containing these groups are oen thermodynamically untenable. The favorable free-energy change resulting from the combination of several such groups with partners in the surrounding solution can be greater than the free-energy difference between the folded and unfolded states. In addition, hydrogen bonds between groups in a protein form cooperatively (formation of one makes formation of the next one more likely) in repeating secondary structures that optimize hydrogen bonding, as described below. In this way, hydrogen bonds oen have an important role in guiding the protein-folding process. The interaction of oppositely charged groups that form an ion pair, or salt bridge, can have either a stabilizing or destabilizing effect on protein structure. As in the case of hydrogen bonds, charged amino acid side chains interact with water and salts when the protein is unfolded, and the loss of those interactions must be considered when researchers evaluate the effect of a salt bridge on the overall stability of a folded protein. However, the strength of a salt bridge increases as it moves to an environment of lower dielectric constant, ε (p. 46): from the polar aqueous solvent (ε near 80) to the nonpolar protein interior (ε near 4). Salt bridges, especially those that are partly or entirely buried, can thus provide significant stabilization to a protein structure. This trend explains the increased occurrence of buried salt bridges in the proteins of thermophilic organisms. Ionic interactions also limit structural flexibility and confer a uniqueness to a particular protein structure that the clustering of nonpolar groups via the hydrophobic effect cannot provide. Individual van der Waals Interactions Are Weak but Combine to Promote Folding In the tightly packed atomic environment of a protein, one more type of weak interaction can have a significant effect: van der Waals interactions (p. 49). Van der Waals interactions are dipole- dipole interactions involving the permanent electric dipoles in groups such as carbonyls, transient dipoles derived from fluctuations of the electron cloud surrounding any atom, and dipoles induced by interaction of one atom with another that has a permanent or transient dipole. As atoms approach each other, these dipole-dipole interactions provide an attractive intermolecular force that operates over only a limited intermolecular distance (0.3 to 0.6 nm). Individually, van der Waals interactions contribute little to overall protein stability. However, in a well-packed protein, or in an interaction between a protein and another protein or other molecule at a complementary surface, the number of such interactions can be substantial. Most of the structural patterns outlined in this chapter reflect two simple rules: (1) hydrophobic residues are largely buried in the protein interior, away from water, and (2) the number of hydrogen bonds and ionic interactions within the protein is maximized, thus reducing the number of unpaired hydrogen- bonding and ionic groups. Proteins within membranes (which we examine in Chapter 11) and proteins that are intrinsically disordered or have intrinsically disordered segments follow different rules. This reflects their particular function or environment, but weak interactions are still critical structural elements. For example, soluble but intrinsically disordered protein segments are oen enriched in amino acid side chains that are charged (especially Arg, Lys, Glu) or small (Gly, Ala), providing little or no opportunity for the formation of a stable hydrophobic core. The Peptide Bond Is Rigid and Planar Covalent bonds, too, place important constraints on the conformation of a polypeptide. In the late 1930s, Linus Pauling and Robert Corey embarked on a series of studies that laid the foundation for our current understanding of protein structure. They began with a careful analysis of the peptide bond. The α carbons of adjacent amino acid residues are separated by three covalent bonds, arranged as Cα—C—N—Cα. X-ray diffraction studies of crystals of amino acids and of simple dipeptides and tripeptides showed that the peptide C—N bond is somewhat shorter than the C—N bond in a simple amine and that the atoms associated with the peptide bond are coplanar. This indicated a resonance or partial sharing of two pairs of electrons between the carbonyl oxygen and the amide nitrogen (Fig. 4-2a). The oxygen has a partial negative charge and the hydrogen bonded to the nitrogen has a net partial positive charge, setting up a small electric dipole. The six atoms of the peptide group lie in a single plane, with the oxygen atom of the carbonyl group trans to the hydrogen atom of the amide nitrogen. From these findings Pauling and Corey concluded that the peptide C—N bonds, because of their partial double-bond character, cannot rotate freely. Rotation is permitted about the N—Cα and the Cα—C bonds. The backbone of a polypeptide chain can thus be pictured as a series of rigid planes, with consecutive planes sharing a common point of rotation at Cα (Fig. 4-2b). The rigid peptide bonds limit the range of conformations possible for a polypeptide chain. FIGURE 4-2 The planar peptide group. (a) Each peptide bond has some double-bond character due to resonance and cannot rotate. Although the N atom in a peptide bond is o en represented with a partial positive charge, careful consideration of bond orbitals and quantum mechanics indicates that the N has a net charge that is neutral or slightly negative. (b) Three bonds separate sequential α carbons in a polypeptide chain. The N—Cα and Cα—C bonds can rotate, described by dihedral angles designated ϕ and ψ , respectively. The peptide C−N bond is not free to rotate. Other single bonds in the backbone may also be rotationally hindered, depending on the size and charge of the R groups. (c) The atoms and planes defining ψ . (d) By convention, ϕ and ψ are 180° (or −180°) when the first and fourth atoms are farthest apart and the peptide is fully extended. As the viewer looks out along the bond undergoing rotation (from either direction), the ϕ and ψ angles increase as the fourth atom rotates clockwise relative to the first. In a protein, some of the conformations shown here (e.g., 0°) are prohibited by steric overlap of atoms. In (b) through (d), the balls representing atoms are smaller than the van der Waals radii for this scale. Linus Pauling, 1901–1994

Robert Corey, 1897–1971 Peptide conformation is defined by three dihedral angles (also known as torsion angles) called ϕ (phi), ψ (psi), and ω (omega), reflecting rotation about each of the three repeating bonds in the peptide backbone. A dihedral angle is the angle at the intersection of two planes. In the case of peptides, the planes are defined by bond vectors in the peptide backbone. Two successive bond vectors describe a plane. Three successive bond vectors describe two planes (the central bond vector is common to both; Fig. 4-2c), and the angle between these two planes is what we measure to describe peptide conformation. KEY CONVENTION The important dihedral angles in a peptide are defined by the three bond vectors connecting four consecutive main-chain (peptide backbone) atoms (Fig. 4-2c): ϕ involves the C— N— Cα— C bonds (with the rotation occurring about the N— Cα bond), and ψ involves the N— Cα— C— N bonds. Both ϕ and ψ are defined as ±180° when the polypeptide is fully extended and all peptide groups are in the same plane (Fig. 4-2d). As one looks down the central bond vector in the direction of the vector arrow (as depicted in Fig. 4-2c for ψ ), the dihedral angles increase as the distal (fourth) atom is rotated clockwise (Fig. 4-2d). From the ±180° position, the dihedral angle increases from −180° to 0°, at which point the first and fourth atoms are eclipsed. The rotation can be continued from 0° to +180° (same position as −180°) to bring the structure back to the starting point. The third dihedral angle, ω , is not oen considered. It involves the Cα— C— N— Cα bonds. The central bond in this case is the peptide bond, where rotation is constrained. The peptide bond is almost always (99.6% of the time) in the trans configuration, constraining ω to a value of ±180°. For a rare cis peptide bond, ω= 0°. In principle, ϕ and ψ can have any value between −180° and +180°, but many values are prohibited by steric interference between atoms in the polypeptide backbone and amino acid side chains. The conformation in which both ϕ and ψ are 0° (Fig. 4- 2d) is prohibited for this reason; this conformation is merely a reference point for describing the dihedral angles. Backbone angle preferences in a polypeptide represent yet another constraint on the overall folded structure of a protein. SUMMARY 4.1 Overview of Protein Structure A typical protein usually has one or more stable three- dimensional conformations that reflect its function. Some proteins have segments that are intrinsically disordered but are nonetheless essential for function. Whereas nonpeptide covalent bonds, particularly disulfide bonds, can play a role in stabilization of some structures, proteins are stabilized largely by multiple weak, noncovalent interactions and forces. The hydrophobic effect, derived from the increase in entropy of the surrounding water when nonpolar molecules or groups are clustered together, makes the major contribution to stabilizing the globular form of most soluble proteins. Hydrogen bonds and ionic interactions are optimized in the thermodynamically most stable structures. Van der Waals interactions involve attractive forces between molecular dipoles that occur over short distances. Individually these interactions are weak, but they combine in well-packed protein structures to provide significant effects and stabilization. The nature of the covalent bonds in the polypeptide backbone places constraints on structure. The peptide bond has a partial double-bond character that keeps the entire six-atom peptide group in a rigid planar configuration. The N— Cα and Cα— C bonds can rotate to define the dihedral angles ϕ and ψ , respectively, although permitted values of ϕ and ψ are limited by steric clashes and other constraints. 4.2 Protein Secondary Structure The term secondary structure refers to any chosen segment of a polypeptide chain and describes the local spatial arrangement of its main-chain atoms, without regard to the positioning of its side chains or its relationship to other segments. A regular secondary structure occurs when each dihedral angle, ϕ and ψ , remains the same or nearly the same throughout the segment. A few types of secondary structure are particularly stable and occur widely in proteins. The most prominent are the α helix and β conformation; another common type is the β turn. Secondary structures without a regular pattern are sometimes referred to as undefined or as random coils. Random coil, however, does not properly describe the structure of these segments. The path of most of the polypeptide backbone in a typical protein is not random; rather, it is highly specific to the structure and function of that particular protein. Our discussion here focuses on the regular structures that are most common. The α Helix Is a Common Protein Secondary Structure Pauling and Corey were aware of the importance of hydrogen bonds in orienting polar chemical groups such as the C=O and N—H groups of the peptide bond. They also had the experimental results of William Astbury, who in the 1930s had conducted pioneering x-ray studies of proteins. Astbury demonstrated that the protein that makes up hair and porcupine quills (the fibrous protein α -keratin) has a regular structure that repeats every 5.15 to 5.20 Å. (The angstrom, Å, named aer the physicist Anders J. Ångström, is equal to 0.1 nm. Although not an SI unit, it is used universally by structural biologists to describe atomic distances — it is approximately the length of a typical C—H bond.) With this information and their data on the peptide bond, and with the help of precisely constructed models, Pauling and Corey set out to determine the likely conformations of protein molecules. The first breakthrough came in 1948. Pauling, at that time a visiting lecturer at Oxford University, became ill and retired to his apartment for several days of rest. Bored with the reading available, Pauling grabbed some paper and pencils to work out a plausible stable structure that could be taken up by a polypeptide chain. The model he developed, and later confirmed in work with Corey and coworker Herman Branson, was the simplest arrangement the polypeptide chain can assume that maximizes the use of internal hydrogen bonding. It is a helical structure, and Pauling and Corey called it the α helix (Fig. 4-3). In this structure, the polypeptide backbone is tightly wound around an imaginary axis drawn longitudinally through the middle of the helix, and the R groups of the amino acid residues protrude outward from the helical backbone (Fig. 4-3b, c). The repeating unit is a single turn of the helix, which extends about 5.4 Å along the long axis, slightly greater than the periodicity that Astbury observed on x-ray analysis of hair keratin. The backbone atoms of the amino acid residues in the prototypical α helix have a characteristic set of dihedral angles that define the conformation of the α helix (Table 4-1), and each helical turn includes 3.6 amino acid residues. The α -helical segments in proteins oen deviate slightly from these dihedral angles, and they even vary somewhat within a single, continuous segment so as to produce subtle bends or kinks in the helical axis. FIGURE 4-3 Models of the α helix, showing different aspects of its structure. (a) Ball- and-stick model showing the intrachain hydrogen bonds. The repeat unit is a single turn of the helix, 3.6 residues. (b) The α helix viewed from one end, looking down the longitudinal axis. Note the positions of the R groups, represented by purple spheres. This ball-and-stick model, which emphasizes the helical arrangement, gives the false impression that the helix is hollow, because the balls do not represent the van der Waals radii of the individual atoms. (c) As this space-filling model shows, the atoms in the center of the α helix are in very close contact. (d) Helical wheel projection of an α helix. This representation can be colored to identify surfaces with particular properties. The yellow residues, for example, could be hydrophobic and conform to an interface between the helix shown here and another part of the same or another polypeptide. The red (negative) and blue (positive) residues illustrate the potential for interaction of oppositely charged side chains separated by two residues in the helix. [(b, c) Data from PDB ID 4TNC, K. A. Satyshur et al., J. Biol. Chem. 263:1628, 1988.] TABLE 4-1 Idealized ϕ and ψ Angles for Common Secondary Structures in Proteins Structure ϕ ψ α Helix   −57°   −47° β Conformation     Antiparallel −139° +135°     Parallel −119° +113° Collagen triple helix   −51° +153° β Turn type I     i + 1   −60°   −30°     i + 2   −90°     0° β Turn type II     i + 1   −60° +120°     i + 2   +80°     0° Note: In real proteins, dihedral angles o en vary somewhat from these idealized values. The i + 1 and i + 2 angles are those for the second and third amino acid residues in the β turn, respectively. Pauling and Corey considered both right-handed and le-handed variants of the α helix. The subsequent elucidation of the three- dimensional structure of myoglobin and other proteins showed that the right-handed α helix is the common form (Box 4-1). Extended le-handed α helices are theoretically less stable and a a a have not been observed in proteins. The α helix proved to be the predominant structure in α -keratins. More generally, about one- fourth of all amino acid residues in proteins are found in α helices, the exact fraction varying greatly from one protein to another. BOX 4-1 METHODS Knowing the Right Hand from the Le There is a simple method for determining whether a helical structure is right- handed or le -handed. Make fists of your two hands with thumbs outstretched and pointing away from you. Looking at your right hand, think of a helix spiraling up your right thumb in the direction in which the other four fingers are curled as shown (clockwise). The resulting helix is right-handed. Your le hand will demonstrate a le -handed helix, which rotates in the counterclockwise direction as it spirals up your thumb. Why does the α helix form more readily than many other possible conformations? The answer lies, in part, in its optimal use of intrahelical hydrogen bonds. The structure is stabilized by a hydrogen bond between the hydrogen atom attached to the electronegative nitrogen atom of a peptide linkage and the electronegative carbonyl oxygen atom of the fourth amino acid on the amino-terminal side of that peptide bond (Fig. 4-3a). Within the α helix, every peptide bond (except those close to each end of the helix) participates in such hydrogen bonding. Each successive turn of the α helix is held to adjacent turns by three to four hydrogen bonds, conferring significant stability on the overall structure. At the ends of an α -helical segment, there are always three or four amide carbonyl or amino groups that cannot participate in this helical pattern of hydrogen bonding. These may be exposed to the surrounding solvent, where they hydrogen- bond with water, or other parts of the protein may cap the helix to provide the needed hydrogen-bonding partners. WORKED EXAMPLE 4-1 Secondary Structure and Protein Dimensions What is the length, in both Å and nm, of a polypeptide with 80 amino acid residues in a single, continuous α helix? SOLUTION: An idealized α helix has 3.6 residues per turn, and the rise along the helical axis is 5.4 Å. Thus, the rise along the axis for each amino acid residue is 1.5 Å. The length of the polypeptide is therefore 80 residues× 1.5 Å/residue= 1.2× 102Å or 12 nm Amino Acid Sequence Affects Stability of the α Helix Not all polypeptides can form a stable α helix. Each amino acid residue in a polypeptide has an intrinsic propensity to form an α helix, reflecting the properties of the R group and how they affect the capacity of the adjoining main-chain atoms to take up the characteristic ϕ and ψ angles. Alanine shows the greatest tendency to form α helices in most experimental model systems. The position of an amino acid residue relative to its neighbors is also important. Interactions between amino acid side chains can stabilize or destabilize the α -helical structure. For example, if a polypeptide chain has a long block of Glu residues, this segment of the chain will not form an α helix at pH 7.0. The negatively charged carboxyl groups of adjacent Glu residues repel each other so strongly that they prevent formation of the α helix. For the same reason, if there are many adjacent Lys and/or Arg residues, with positively charged R groups at pH 7.0, they also repel each other and prevent formation of the α helix. The size and shape of Asn, Ser, Thr, and Cys residues can also destabilize an α helix if they are close together in the chain. The twist of an α helix ensures that critical interactions occur between an amino acid side chain and the side chain three (and sometimes four) residues away on either side of it. This is made clear when the α helix is depicted as a helical wheel (Fig. 4-3d). Positively charged amino acids are oen found three residues away from negatively charged amino acids, permitting the formation of an ion pair. Two aromatic amino acid residues are oen similarly spaced, resulting in a juxtaposition stabilized by the hydrophobic effect. A constraint on the formation of the α helix is the presence of Pro or Gly residues, which have the least likelihood of forming α helices. In proline, the nitrogen atom is part of a rigid ring (see Fig. 4-7), and rotation about the N—Cα bond is not possible. Thus, a Pro residue introduces a destabilizing kink in an α helix. In addition, the nitrogen atom of a Pro residue in a peptide linkage has no substituent hydrogen to participate in hydrogen bonds with other residues. For these reasons, proline is found only rarely in an α helix. Glycine occurs infrequently in α helices for a different reason: it has more conformational flexibility than the other amino acid residues. Polymers of glycine tend to take up coiled structures quite different from an α helix. A final factor affecting the stability of an α helix is the identity of the amino acid residues near the ends of the α -helical segment of the polypeptide. A small electric dipole exists in each peptide bond (Fig. 4-2a). These dipoles are aligned through the hydrogen bonds of the helix, resulting in a net dipole along the helical axis that increases with helix length (Fig. 4-4). The partial positive and negative charges of the helix dipole reside on the peptide amino and carbonyl groups near the amino-terminal and carboxyl- terminal ends, respectively. For this reason, negatively charged amino acids are oen found near the amino terminus of the helical segment, where they have a stabilizing interaction with the positive charge of the helix dipole; a positively charged amino acid at the amino-terminal end is destabilizing. The opposite is true at the carboxyl-terminal end of the helical segment. FIGURE 4-4 Helix dipole. The electric dipole of a peptide bond (see Fig. 4-2a) is transmitted along an α -helical segment through the intrachain hydrogen bonds, resulting in an overall helix dipole. In summary, five types of constraints affect the stability of an α helix: (1) the intrinsic propensity of an amino acid residue to form an α helix; (2) the interactions between R groups, particularly those spaced three (or four) residues apart; (3) the bulkiness of adjacent R groups; (4) the occurrence of Pro and Gly residues; and (5) interactions between amino acid residues at the ends of the helical segment and the electric dipole inherent to the α helix. The tendency of a given segment of a polypeptide chain to form an α helix therefore depends on the identity and sequence of amino acid residues within the segment. The β Conformation Organizes Polypeptide Chains into Sheets In 1951, Pauling and Corey predicted a second type of repetitive structure, the β conformation. This is a more extended conformation of polypeptide chains, and its structure is again defined by backbone atoms arranged according to a characteristic set of dihedral angles (Table 4-1). In the β conformation, the backbone of the polypeptide chain is extended into a zigzag rather than helical structure (Fig. 4-5). A single protein segment in the β conformation is oen called a β strand. The arrangement of several strands side by side, all in the β conformation, is called a β sheet. The zigzag structure of the individual polypeptide segments gives rise to a pleated appearance of the overall sheet. Hydrogen bonds form between backbone atoms of adjacent segments of polypeptide chain within the sheet. The individual segments that form a β sheet are usually nearby on the polypeptide chain but can also be quite distant from each other in the linear sequence of the polypeptide; they may even be in different polypeptide chains. The R groups of adjacent amino acids protrude from the zigzag structure in opposite directions, creating the alternating pattern seen in the side view in Figure 4-5. FIGURE 4-5 The β conformation of polypeptide chains. These (a) side and (b, c) top views reveal the R groups extending out from the β sheet and emphasize the pleated shape formed by the planes of the peptide bonds. (An alternative name for this structure is β -pleated sheet.) Hydrogen-bond cross-links between adjacent chains are also shown. The amino-terminal to carboxyl-terminal orientations of adjacent chains (arrows) can be the opposite or the same, forming (b) an antiparallel β sheet or (c) a parallel β sheet. The adjacent polypeptide chains in a β sheet can be either parallel or antiparallel (having the same or opposite amino-to- carboxyl orientations, respectively). The structures are somewhat similar, although the repeat period is shorter for the parallel conformation (6.5 vs. 7.0 Å for antiparallel) and the hydrogen- bonding patterns are different. The interstrand hydrogen bonds are essentially in-line (see Fig. 2-5) in the antiparallel β sheet, whereas they are distorted or not in-line for the parallel variant. In natural proteins, antiparallel β sheets are found twice as frequently as parallel β sheets. The idealized structures exhibit the bond angles given in Table 4-1; these values vary somewhat in real proteins, resulting in structural variation, as seen above for α helices. β Turns Are Common in Proteins In globular proteins, which have a compact folded structure, some amino acid residues are in turns or loops where the polypeptide chain reverses direction (Fig. 4-6). These are the connecting elements that link successive runs of α helix or β conformation. Particularly common are β turns that connect the ends of two adjacent segments of an antiparallel β sheet. The structure is a 180° turn involving four amino acid residues, with the carbonyl oxygen of the first residue forming a hydrogen bond with the amino-group hydrogen of the fourth. The peptide groups of the central two residues do not participate in any inter-residue hydrogen bonding. Several types of β turns have been described, each defined by the ϕ and ψ angles of the bonds that link the four amino acid residues that make up the particular turn (Table 4-1). Gly and Pro residues oen occur in β turns, the former because it is small and flexible, the latter because peptide bonds involving the imino nitrogen of proline readily assume the cis configuration (Fig. 4-7), a form that is particularly amenable to a tight turn. The two types of β turns shown in Figure 4-6 are the most common. Beta turns are oen found near the surface of a protein, where the peptide groups of the central two amino acid residues in the turn can hydrogen-bond with water. Considerably less common is the γ turn, a three-residue turn with a hydrogen bond between the first and third residues.

FIGURE 4-6 Structures of β turns. Type I and type II β turns are most common, distinguished by the ϕ and ψ angles taken up by the peptide backbone in the turn (see Table 4-1). Type I turns occur more than twice as frequently as type II. Note the hydrogen bond between the peptide groups of the first and fourth residues of the bends. (Individual amino acid residues are framed by large blue circles. Not all H atoms are shown in these depictions.) FIGURE 4-7 Trans and cis isomers of a peptide bond involving the imino nitrogen of proline. Of the peptide bonds between amino acid residues other than Pro, more than 99.95% are in the trans configuration. For peptide bonds involving the imino nitrogen of proline, however, about 6% are in the cis configuration; many of these occur at β turns. Common Secondary Structures Have Characteristic Dihedral Angles The α helix and the β conformation are the major repetitive secondary structures in a wide variety of proteins, although other repetitive structures exist in some specialized proteins (an example is collagen; see Fig. 4-12). Every type of secondary structure can be completely described by the dihedral angles ϕ and ψ associated with each residue. Ramachandran plots, introduced by G. N. Ramachandran, are useful tools for visualizing all of the ϕ and ψ angles observed in a particular protein structure and are oen used to test the quality of three- dimensional protein structures. In a Ramachandran plot, the dihedral angles that define the α helix and the β conformation fall within a relatively restricted range of sterically allowed structures (Fig. 4-8a). Most values of ϕ and ψ taken from known protein structures fall into the expected regions, with high concentrations near the α helix and β conformation values, as predicted (Fig. 4-8b). The only amino acid residue oen found in a conformation outside these regions is glycine. Because its side chain is small, a Gly residue can take part in many conformations that are sterically forbidden for other amino acids. FIGURE 4-8 Ramachandran plots showing a variety of structures. (a) The values of ϕ and ψ for various allowed conformations and secondary structures are shown. Peptide conformations deemed possible are those that involve little or no steric interference, based on calculations using known van der Waals radii and dihedral angles modeled as a hard sphere. Other types of Ramachandran plots make different assumptions. The areas shaded dark blue represent conformations that involve no steric overlap and are thus fully allowed. Medium blue indicates conformations permitted if atoms are allowed to approach each other by an additional 0.1 nm, a slight clash. The lightest blue indicates conformations that are permissible if a very modest flexibility (a few degrees) is allowed in the ω dihedral angle that describes the peptide bond itself (generally constrained to 180°). The white regions are conformations that are not allowed. Although le -handed α helices extending over several amino acid residues are theoretically possible, they have not been observed in proteins. The asymmetry of the plot results from the stereochemistry of the amino acid residues. (b) The values of ϕ and ψ for all the amino acid residues except Gly in the enzyme pyruvate kinase (isolated from rabbit) are overlaid on the plot of allowed conformations. The small, flexible Gly residues were excluded because they frequently fall outside the expected (blue) ranges. [(a) Information from T. E. Creighton, Proteins, p. 166. © 1984 by W. H. Freeman and Company. (b) Data from Hazel Holden, University of Wisconsin–Madison, Department of Biochemistry.] Common Secondary Structures Can Be Assessed by Circular Dichroism Any form of structural asymmetry in a molecule gives rise to differences in absorption of le-handed versus right-handed circularly polarized light. Measurement of this difference is called circular dichroism (CD) spectroscopy. An ordered structure, such as a folded protein, gives rise to an absorption spectrum that can have peaks or regions with both positive and negative values. For proteins, spectra are obtained in the far UV region (190 to 250 nm). In this region, the light-absorbing entity, or chromophore, is the peptide bond; a signal is obtained when the peptide bond is in a folded environment. The difference in molar extinction coefficients (see Box 3-1) for le-handed and right-handed, circularly polarized light (Δε ) is plotted as a function of wavelength. The α helix and β conformations have characteristic CD spectra (Fig. 4-9). Using CD spectra, biochemists can determine whether proteins are properly folded, estimate the fraction of the protein that is folded in either of the common secondary structures, and monitor transitions between the folded and unfolded states. FIGURE 4-9 Circular dichroism spectroscopy. These spectra show polylysine entirely as α helix, as β conformation, or in an unstructured, denatured state. The y axis unit is a simplified version of the units most commonly used in CD experiments. Since the curves are different for α helix, β conformation, and unstructured, the CD spectrum for a given protein can provide a rough estimate for the fraction of the protein made up of the two most common secondary structures. The CD spectrum of the native protein can serve as a benchmark for the folded state, useful for monitoring denaturation or conformational changes brought about by changes in solution conditions. SUMMARY 4.2 Protein Secondary Structure Secondary structure is the local spatial arrangement of the main-chain atoms in a selected segment of a polypeptide chain; it can be completely defined by the ϕ and ψ angles of all the amino acids in that segment. In the α helix, the repeating unit is a single helical turn of ∼5.4 Å or 3.6 amino acids. The common form found in proteins is right-handed with the amino acid R groups protruding away from the helical backbone. The propensity of a protein segment to form an α helix depends on the composition of its amino acids and amino acid positions relative to one another and relative to the helical dipole. In the β conformation, amino acids are extended in a zigzag fashion. When several β strands are arranged adjacent to one another, they can form either parallel or antiparallel β sheets. Turns or loops connect segments of α helix or β strands. β turns, which oen contain Gly or Pro residues, tend to connect segments of antiparallel β sheets. The Ramachandran plot is a visual description of the combinations of ϕ and ψ dihedral angles that are permitted in a peptide backbone and those that are not permitted due to steric constraints. Dihedral angles that define the α helix and the β conformation are found only within certain regions of the plot. Circular dichroism spectroscopy is a method for assessing common secondary structure and monitoring folding in proteins based on absorption of circularly polarized UV light. 4.3 Protein Tertiary and Quaternary Structures The overall three-dimensional arrangement of all atoms in a protein is referred to as the protein’s tertiary structure. Whereas the term “secondary structure” refers to the spatial arrangement of amino acid residues that are adjacent in a segment of a polypeptide, tertiary structure includes longer-range aspects of amino acid sequence. Amino acids that are far apart in the polypeptide sequence and are in different types of secondary structure may interact within the completely folded structure of a protein. Interacting segments of polypeptide chains are held in their characteristic tertiary positions by several kinds of weak interactions (and sometimes by covalent bonds such as disulfide cross-links) between the segments. Some proteins contain two or more separate polypeptide chains, or subunits, which may be identical or different. The arrangement of these protein subunits in three-dimensional complexes constitutes quaternary structure. In considering these higher levels of structure, it is useful to designate the major groups into which many proteins can be classified: fibrous proteins, with polypeptide chains arranged in long strands or sheets; globular proteins, with polypeptide chains folded into a spherical or globular shape; membrane proteins, with polypeptide chains embedded in hydrophobic lipid membranes; and intrinsically disordered proteins, with polypeptide chains lacking stable tertiary structures. We focus here on fibrous, globular, and intrinsically disordered proteins; membrane proteins are discussed in Chapter 11. These three groups are structurally distinct. Fibrous proteins usually consist of a single type of secondary structure, and their tertiary structure is relatively simple. Globular proteins oen contain several types of secondary structure. Intrinsically disordered proteins can lack secondary structure entirely. The groups also differ functionally: the structures that provide support, shape, and external protection to vertebrates are made of fibrous proteins. Most enzymes are globular proteins, whereas regulatory proteins can be globular, disordered, or contain both globular and disordered segments. Fibrous Proteins Are Adapted for a Structural Function α -Keratin, collagen, and silk fibroin nicely illustrate the relationship between protein structure and biological function (Table 4-2). Fibrous proteins share properties that give strength and/or flexibility to the structures in which they occur. In each case, the fundamental structural unit is a simple repeating element of secondary structure. All fibrous proteins are insoluble in water, a property conferred by a high concentration of hydrophobic amino acid residues both in the interior of the protein and on its surface. These hydrophobic surfaces are largely buried, as many similar polypeptide chains are packed together to form elaborate supramolecular complexes. The underlying structural simplicity of fibrous proteins makes them particularly useful for illustrating some of the fundamental principles of protein structure discussed previously. TABLE 4-2 Secondary Structures and Properties of Some Fibrous Proteins Structure Characteristics Examples of occurrence α Helix, cross- linked by disulfide bonds Tough, insoluble protective structures of varying hardness and flexibility α -Keratin of hair, feathers, nails β Conformation So , flexible filaments Silk fibroin Collagen triple helix High tensile strength, without stretch Collagen of tendons, bone matrix α -Keratin The α -keratins have evolved for strength. Found only in mammals, these proteins constitute almost the entire dry weight of hair, wool, nails, claws, quills, horns, and hooves and much of the outer layer of skin. The α -keratins are part of a broader family of proteins called intermediate filament (IF) proteins. Other IF proteins are found in the cytoskeletons of animal cells. All IF proteins have a structural function and share the structural features exemplified by the α -keratins. The α -keratin helix is a right-handed α helix, the same helix found in many other proteins. Francis Crick and Linus Pauling, in the early 1950s, independently suggested that the α helices of keratin were arranged as a coiled coil. Two strands of α -keratin, oriented in parallel (with their amino termini at the same end), are wrapped about each other to form a supertwisted coiled coil. The supertwisting amplifies the strength of the overall structure, just as strands are twisted to make a strong rope (Fig. 4-10). The twisting of the axis of an α helix to form a coiled coil explains the discrepancy between the 5.4 Å per turn predicted for an α helix by Pauling and Corey and the 5.15 to 5.2 Å repeating structure observed in the x-ray diffraction of hair (see end-of-chapter problem 2). The helical path of the supertwists is le-handed, opposite in sense to the α helix. The surfaces where the two α helices touch are made up of hydrophobic amino acid residues, their R groups meshed together in a regular interlocking pattern. This permits a close packing of the polypeptide chains within the le-handed supertwist. Not surprisingly, α -keratin is rich in the hydrophobic residues Ala, Val, Leu, Ile, Met, and Phe. FIGURE 4-10 Structure of hair. (a) Hair α -keratin is an elongated α helix with somewhat thicker elements near the amino and carboxyl termini. Pairs of these helices are interwound in a le -handed sense to form two-chain coiled coils. These then combine in higher-order structures called protofilaments and protofibrils. About four protofibrils—32 strands of α -keratin in all—combine to form an intermediate filament. The individual two-chain coiled coils in the various substructures also seem to be interwound, but the handedness of the interwinding and other structural details are unknown. (b) A hair is an array of many α -keratin filaments, made up of the substructures shown in (a). [(a) Information from PDB ID 3TNU, C. H. Lee et al., Nature Struct. Mol. Biol. 19:707, 2012.] An individual polypeptide in the α -keratin coiled coil has a relatively simple tertiary structure, dominated by an α -helical secondary structure with its helical axis twisted in a le-handed superhelix. The intertwining of the two α -helical polypeptides is an example of quaternary structure. Coiled coils of this type are common structural elements in filamentous proteins and in the muscle protein myosin (see Fig. 5-26). The quaternary structure of α -keratin can be quite complex. Many coiled coils can be assembled into large supramolecular complexes, such as the arrangement of α -keratin that forms the intermediate filament of hair (Fig. 4-10b). The strength of fibrous proteins is enhanced by covalent cross- links between polypeptide chains in the multihelical “ropes” and between adjacent chains in a supramolecular assembly. In α - keratins, the cross-links stabilizing quaternary structure are disulfide bonds. In the hardest and toughest α -keratins, such as those of rhinoceros horn, up to 18% of the residues are cysteines involved in disulfide bonds. Collagen Like the α -keratins, collagen has evolved to provide strength. It is found in connective tissue such as tendons, cartilage, the organic matrix of bone, and the cornea of the eye. In fact, collagen is the most abundant protein in mammals, usually comprising 25% to 35% of total protein content. The collagen helix is a unique secondary structure, quite distinct from the α helix. It is le-handed and has three amino acid residues per turn (Fig. 4-11 and Table 4-1). Collagen is also a coiled coil, but one with distinct tertiary and quaternary structures: three separate polypeptides, called α chains (not to be confused with α helices), are twisted about each other. The superhelical twisting is right-handed in collagen, opposite in sense to the le- handed helix of the α chains. FIGURE 4-11 Structure of collagen. (a) The α chain of collagen has a repeating secondary structure unique to this protein. The repeating tripeptide sequence Gly–X–Y, where X is o en Pro and Y is o en 4-Hyp, adopts a le -handed helical structure with three residues per turn. Three of these helices (shown here in white, blue, and purple) wrap around one another with a right-handed twist. (b) The three-stranded collagen superhelix shown from one end, in a ball-and-stick representation. Gly residues are shown in red. Glycine, because of its small size, is required at the tight junction where the three chains are in contact. The balls in this illustration do not represent the van der Waals radii of the individual atoms. The center of the three-stranded superhelix is not hollow, as it appears here, but very tightly packed. [Data from PDB ID 1CGD, J. Bella et al., Structure 3:893, 1995.] There are many types of vertebrate collagen. Typically, they contain about 35% Gly, 11% Ala, and 21% Pro and 4-Hyp (4- hydroxyproline, an uncommon amino acid; see Fig. 3-8a). The food product gelatin is derived from collagen. It has little nutritional value as a protein, because collagen is extremely low in many amino acids that are essential in the human diet. The unusual amino acid content of collagen is related to structural constraints unique to the collagen helix. The amino acid sequence in collagen is generally a repeating tripeptide unit, Gly– X–Y, where X is oen Pro and Y is oen 4-Hyp. Only Gly residues can be accommodated at the very tight junctions between the individual α chains (Fig. 4-11b). The Pro and 4-Hyp residues permit the sharp twisting of the collagen helix. The amino acid sequence and the supertwisted quaternary structure of collagen allow a very close packing of its three polypeptides. 4- Hydroxyproline has a special role in the structure of collagen — and in human history (Box 4-2). BOX 4-2 MEDICINE Why Sailors, Explorers, and College Students Should Eat Their Fresh Fruits and Vegetables … from this misfortune, together with the unhealthiness of the country, where there never falls a drop of rain, we were stricken with the “camp-sickness,” which was such that the flesh of our limbs all shrivelled up, and the skin of our legs became all blotched with black, mouldy patches, like an old jack-boot, and proud flesh came upon the gums of those of us who had the sickness, and none escaped from this sickness save through the jaws of death. The signal was this: when the nose began to bleed, then death was at hand. —The Memoirs of the Lord of Joinville, ca. 1300* This excerpt describes the plight of Louis IX’s scurvy-weakened army before it was destroyed by the Egyptians toward the end of the Seventh Crusade (1248– 1254). What was the nature of the malady afflicting these thirteenth-century soldiers? Scurvy is caused by lack of vitamin C, or ascorbic acid (ascorbate). Vitamin C is required for, among other things, the hydroxylation of proline and lysine in collagen; scurvy is a deficiency disease characterized by general degeneration of connective tissue. Manifestations of advanced scurvy include numerous small hemorrhages caused by fragile blood vessels; tooth loss, poor wound healing, and the reopening of old wounds; bone pain and degeneration; and eventually heart failure. Milder cases of vitamin C deficiency are accompanied by fatigue, irritability, and an increased severity of respiratory tract infections. Most animals make large amounts of vitamin C, converting glucose to ascorbate in four enzymatic steps. But in the course of evolution, humans and some other animals—gorillas, guinea pigs, and fruit bats—have lost the last enzyme in this pathway and must obtain ascorbate in their diet. Vitamin C is available in a wide range of fruits and vegetables. Until 1800, however, it was o en absent in the dried foods and other food supplies stored for winter or for extended travel. Scurvy came to wide public notice during the European voyages of discovery from 1500 to 1800. In fact, during the first circumnavigation of the globe (1519– 1522) by Ferdinand Magellan, more than 80% of his crew were lost to scurvy. Winter outbreaks of scurvy in Europe were gradually eliminated in the nineteenth century as the cultivation of the potato, introduced from South America, became widespread. In 1747, James Lind, a Scottish surgeon in the Royal Navy, carried out the first controlled clinical study in recorded history. During an extended voyage on the 50-gun warship HMS Salisbury, Lind selected 12 sailors suffering from scurvy and separated them into groups of two. All 12 received the same diet, except that each group was given a different remedy for scurvy from among those recommended at the time. The sailors given lemons and oranges recovered and returned to duty. Lind’s Treatise on the Scurvy was published in 1753, but inaction persisted in the Royal Navy for another 40 years. In 1795, the British admiralty finally mandated a ration of concentrated lime or lemon juice for all British sailors (hence the name “limeys”). Scurvy continued to be a problem in some other parts of the world until 1932, when Hungarian scientist Albert Szent-Györgyi, and W. A. Waugh and C. G. King at the University of Pittsburgh, isolated and synthesized ascorbic acid. So why is ascorbate so necessary to good health? Of particular interest to us here is its role in the formation of collagen. As noted in the text, collagen is constructed of the repeating tripeptide unit Gly–X–Y, where X and Y are generally Pro or 4-Hyp—the proline derivative -hydroxyproline, which plays an essential role in the folding of collagen and in maintaining its structure. The proline ring is normally found as a mixture of two puckered conformations, called Cγ-endo and Cγ-exo (Fig. 1). The collagen helix structure requires the Pro/4-Hyp residue in the Y positions to be in the Cγ-exo conformation, and it is this conformation that is enforced by the hydroxyl substitution at C-4 in 4-Hyp. In the absence of vitamin C, cells cannot hydroxylate the Pro at the Y positions. This leads to collagen instability and the connective tissue problems seen in scurvy. FIGURE 1 The Cγ-endo conformation of proline and the Cγ-exo conformation of 4- hydroxyproline. The hydroxylation of specific Pro residues in procollagen, the precursor of collagen, requires the action of the α -ketoglutarate-dependent enzyme prolyl 4-hydroxylase. In the normal prolyl 4-hydroxylase reaction, one molecule of α - ketoglutarate and one of O 2 bind to the enzyme. The α -ketoglutarate is oxidatively decarboxylated to form CO 2 and succinate. The remaining oxygen atom is then used to hydroxylate an appropriate Pro residue in procollagen. No ascorbate is needed in this reaction. However, prolyl 4-hydroxylase also catalyzes an oxidative decarboxylation of α -ketoglutarate that is not coupled to proline hydroxylation. During this reaction, the Fe2+ becomes oxidized, inactivating the enzyme and preventing the proline hydroxylation. Ascorbate is needed to reduce the iron and restore enzyme activity so that proline hydroxylation of procollagen can continue. Scurvy remains a problem today, not only in remote regions where nutritious food is scarce but, surprisingly, also among young adults with poor eating habits in large cities. A 2009 study of more than 1,100 men and women between the ages of 20 and 29 in Toronto, Canada, found that 1 in 7 young adults had vitamin C deficiency due to unmet dietary needs. Moreover, lower vitamin C levels were associated with higher measures of obesity and blood pressure and fewer servings a day of healthy foods. Just like eighteenth- century sailors, twenty-first-century young adults need to eat their fruits and vegetables! *From Ethel Wedgwood, The Memoirs of the Lord of Joinville: A New English Version, E. P. Dutton and Company, 1906. The tight wrapping of the α chains in the collagen triple helix provides tensile strength greater than that of a steel wire of equal cross section. Collagen fibrils (Fig. 4-12) are supramolecular assemblies consisting of triple-helical collagen molecules (sometimes referred to as tropocollagen molecules) associated in a variety of ways to provide different degrees of tensile strength. The α chains of collagen molecules and the collagen molecules of fibrils are cross-linked by unusual types of covalent bonds involving Lys, HyLys (5-hydroxylysine), or His residues that are present at a few of the X and Y positions. These links create uncommon amino acid residues such as dehydrohydroxylysinonorleucine. The increasingly rigid and brittle character of aging connective tissue results from accumulated covalent cross-links in collagen fibrils. FIGURE 4-12 Structure of collagen fibrils. Collagen (Mr 300,000) is a rod- shaped molecule, about 3,000 Å long and only 15 Å thick. Its three helically intertwined α chains may have different sequences; each chain has about 1,000 amino acid residues. Collagen fibrils are made up of collagen molecules aligned in a staggered fashion and cross-linked for strength. The specific alignment and degree of cross-linking vary with the tissue and produce characteristic cross-striations in an electron micrograph. In the example shown here, alignment of the head groups of every fourth molecule produces striations 640 Å(64 nm) apart. A typical mammal has more than 30 structural variants of collagen, particular to certain tissues and each somewhat different in sequence and function. Some human genetic defects in collagen structure illustrate the close relationship between amino acid sequence and three-dimensional structure in this protein. Osteogenesis imperfecta is characterized by abnormal bone formation in babies; at least eight variants of this condition, with different degrees of severity, occur in the human population. Ehlers-Danlos syndrome is characterized by loose joints, and at least six variants occur in humans. The composer Niccolò Paganini (1782–1840) was famed for his seemingly impossible dexterity in playing the violin. He suffered from a variant of Ehlers-Danlos syndrome that rendered him effectively double- jointed. In both disorders, some variants can be lethal, whereas others cause lifelong problems. All of the variants of both conditions result from the substitution of an amino acid residue with a larger R group (such as Cys or Ser) for a single Gly residue in an α chain in one or another of the collagen proteins (a different Gly residue in each disorder). These single-residue substitutions have a catastrophic effect on collagen function because they disrupt the Gly–X–Y repeat that gives collagen its unique helical structure. Given its role in the collagen triple helix (Fig. 4-11), Gly cannot be replaced by another amino acid residue without substantial deleterious effects on collagen structure. Fibroin The protein of silk, fibroin, is produced by insects and spiders. Its polypeptide chains are predominantly in the β conformation. Fibroin is rich in Ala and Gly residues, permitting a close packing of β sheets and an interlocking arrangement of R groups (Fig. 4-13). The overall structure is stabilized by extensive hydrogen bonding between all peptide linkages in the polypeptides of each β sheet and by the optimization of van der Waals interactions between sheets. Silk does not stretch, because the β conformation is already highly extended (Fig. 4-5). However, the structure is flexible, because the sheets are held together by numerous weak interactions rather than by covalent bonds such as the disulfide bonds in α -keratins. FIGURE 4-13 Structure of silk. The fibers in silk cloth and in a spider web are made up primarily of the protein fibroin. (a) Fibroin consists of layers of antiparallel β sheets rich in Ala and Gly residues. The small side chains interdigitate and allow close packing of the sheets, as shown in the ball-and-stick view. The segments shown here would be just a small part of the fibroin strand. (b) Strands of silk emerge from the spinnerets of a spider in this colorized scanning electron micrograph. [(a) Data from PDB ID 1SLK, S. A. Fossey et al., Biopolymers 31:1529, 1991. (b) Tina Weatherby Carvalho/MicroAngela.] Structural Diversity Reflects Functional Diversity in Globular Proteins In a globular protein, different segments of the polypeptide chain (or multiple polypeptide chains) fold back on each other, generating a more compact shape than is seen in the fibrous proteins (Fig. 4-14). The folding also provides the structural diversity necessary for proteins to carry out a wide array of biological functions. Globular proteins include enzymes, transport proteins, motor proteins, regulatory proteins, immunoglobulins, and proteins with many other functions. FIGURE 4-14 Globular protein structures are compact and varied. Human serum albumin (Mr 64,500) has 585 residues in a single chain. Given here are the approximate dimensions its single polypeptide chain would have if it occurred entirely in extended β conformation or as an α helix. Also shown is the size of the protein in its native globular form, as determined by x-ray crystallography; the polypeptide chain must be very compactly folded to fit into these dimensions. Our discussion of globular proteins begins with the principles gleaned from the first protein structures to be elucidated. This is followed by a detailed description of protein substructure and comparative categorization. Such discussions are possible only because of the vast amount of information available online from publicly accessible databases, particularly the Protein Data Bank, or PDB (Box 4-3). BOX 4-3 The Protein Data Bank The number of known three-dimensional protein structures is now more than 100,000 and doubles every couple of years. This wealth of information is revolutionizing our understanding of protein structure, the relationship of structure to function, and the evolutionary paths by which proteins arrived at their present state, which can be seen in the family resemblances that come to light as protein databases are si ed and sorted. One of the most important resources available to biochemists is the Protein Data Bank (PDB; www.rcsb.org). The PDB is an archive of experimentally determined three-dimensional structures of biological macromolecules, containing virtually all of the macromolecular structures (such as proteins, RNAs, and DNAs) elucidated to date. Each structure is assigned an identifying label (a four-character identifier called the PDB ID). Such labels are provided in the figure legends for every PDB- derived structure illustrated in this text so that students and instructors can explore the same structures on their own. The data files in the PDB describe the spatial coordinates of each atom for which the position has been determined (many of the cataloged structures are not complete). Additional data files provide information on how the structure was determined and its accuracy. The atomic coordinates can be converted into an image of the macromolecule by using structure visualization so ware. Students are encouraged to access the PDB and explore structures, using visualization so ware linked to the database. Macromolecular structure files can also be downloaded and explored on the desktop, using free so ware such as JSmol. Myoglobin Provided Early Clues about the Complexity of Globular Protein Structure The first breakthrough in understanding the three-dimensional structure of a globular protein came from x-ray diffraction studies of myoglobin carried out by John Kendrew and his colleagues in the 1950s. Myoglobin is a relatively small (Mr 16,700), oxygen- binding protein of muscle cells. It functions both to store oxygen and to facilitate oxygen diffusion in rapidly contracting muscle tissue. Myoglobin contains a single polypeptide chain of 153 amino acid residues of known sequence and a single iron protoporphyrin, or heme, group. The same heme group that is found in myoglobin is found in hemoglobin, the oxygen-binding protein of erythrocytes, and is responsible for the deep red- brown color of both myoglobin and hemoglobin. Myoglobin is particularly abundant in the muscles of diving mammals such as whales, seals, and porpoises — so abundant that the muscles of these animals are brown. Storage and distribution of oxygen by muscle myoglobin permits diving mammals to remain submerged for long periods. The activities of myoglobin and other globin molecules are investigated in greater detail in Chapter 5. Figure 4-15 shows several structural representations of myoglobin, illustrating how the polypeptide chain is folded in three dimensions — its tertiary structure. The red group surrounded by protein is heme. The backbone of the myoglobin molecule consists of eight relatively straight segments of α helix interrupted by bends, some of which are β turns. The longest α helix has 23 amino acid residues and the shortest has only 7; all helices are right-handed. More than 70% of the residues in myoglobin are in these α -helical regions. X-ray analysis has revealed the precise position of each of the R groups, which fill up nearly all the space within the folded chain that is not occupied by backbone atoms. FIGURE 4-15 Tertiary structure of sperm whale myoglobin. Orientation of the protein is similar in (a) through (d); the heme group is shown in red. In addition to illustrating the myoglobin structure, this figure provides examples of several different ways to display protein structure. (a) The polypeptide backbone in a ribbon representation of a type introduced by Jane Richardson, which highlights regions of secondary structure. The α - helical regions are evident. (b) Surface contour image; this is useful for visualizing pockets in the protein where other molecules might bind. (c) Ribbon representation including side chains (yellow) for the hydrophobic residues Leu, Ile, Val, and Phe. (d) Space-filling model with all amino acid side chains. Each atom is represented by a sphere encompassing its van der Waals radius. The hydrophobic residues are again shown in yellow; most are buried in the interior of the protein and thus are not visible. [Data from PDB ID 1MBO, S. E. Phillips, J. Mol. Biol. 142:531, 1980.] Many important conclusions were drawn from the structure of myoglobin. The positioning of amino acid side chains reflects a structure that is largely stabilized by the hydrophobic effect. Most of the hydrophobic R groups are in the interior of the molecule, hidden from exposure to water. All but two of the polar R groups are located on the outer surface of the molecule, and all are hydrated. The myoglobin molecule is so compact that its interior has room for only four molecules of water. This dense hydrophobic core is typical of globular proteins. The fraction of space occupied by atoms in an organic liquid is 0.4 to 0.6. In a globular protein the fraction is about 0.75, comparable to that in a crystal (in a typical crystal the fraction is 0.70 to 0.78, near the theoretical maximum). In this packed environment, weak interactions strengthen and reinforce each other. For example, the nonpolar side chains in the core are so close together that short-range van der Waals interactions make a significant contribution to stabilizing interactions. Deduction of the structure of myoglobin confirmed some expectations and introduced some new elements of secondary structure. As predicted by Pauling and Corey, all the peptide bonds are in the planar trans configuration. The α helices in myoglobin provided the first direct experimental evidence for the existence of this type of secondary structure. Three of the four Pro residues are found at bends. The fourth Pro residue occurs within an α helix, where it creates a kink necessary for tight helix packing. The flat heme group rests in a crevice, or pocket, in the myoglobin molecule. Within this pocket, the accessibility of the heme group to solvent is highly restricted. This is important for function, because free heme groups in an oxygenated solution are rapidly oxidized from the ferrous (Fe2+) form, which is active in the reversible binding of O 2, to the ferric (Fe3+) form, which does not bind O 2. As myoglobin structures from many different species were resolved, investigators were able to observe the structural changes that accompany the binding of oxygen or other molecules and thus, for the first time, to understand the correlation between protein structure and function. Hundreds of proteins have now been subjected to similar analysis. Globular Proteins Have a Variety of Tertiary Structures Myoglobin illustrates just one of many ways in which a polypeptide chain can fold. Table 4-3 shows the proportions of α helix and β conformation (expressed as percentage of residues in each type) in several small, single-chain, globular proteins. Each of these proteins has a distinct structure, adapted for its particular biological function, but together they share several important properties with myoglobin. Each is folded compactly, and in each case the hydrophobic amino acid side chains are oriented toward the interior (away from water) and the hydrophilic side chains are on the surface. The structures are also stabilized by a multitude of hydrogen bonds and some ionic interactions. TABLE 4-3 Approximate Proportion of α Helix and β Conformation in Some Single-Chain Proteins Residues (%) Protein (total residues) α Helix β Conformation Chymotrypsin (247) 14 45 a Ribonuclease (124) 26 35 Carboxypeptidase (307) 38 17 Cytochrome c (104) 39   0 Lysozyme (129) 40 12 Myoglobin (153) 78 0 Source: Data from C. R. Cantor and P. R. Schimmel, Biophysical Chemistry, Part I: The Conformation of Biological Macromolecules, p. 100, W. H. Freeman and Company, 1980. Portions of the polypeptide chains not accounted for by α helix or β conformation consist of bends and irregularly coiled or extended stretches. Segments of α helix and β conformation sometimes deviate slightly from their normal dimensions and geometry. To understand a complete three-dimensional structure, we need to analyze its folding patterns. We begin by defining two important terms that describe protein structural patterns or elements in a polypeptide chain; then we turn to the folding rules. The first term is motif, also called a fold. A motif or fold is a recognizable folding pattern involving two or more elements of secondary structure and the connection(s) between them. A motif can be very simple, such as two elements of secondary structure folded against each other, and may represent only a small part of a protein. An example is a β -α -β loop (Fig. 4-16a). A motif can also be a very elaborate structure involving scores of protein segments folded together, such as the β barrel (Fig. 4-16b). In some cases, a single large motif may comprise the entire protein. The terms “motif” and “fold” are oen used interchangeably, although “fold” is applied more commonly to somewhat more complex folding patterns. The segment defined as a motif or a a fold may or may not be independently stable. We have already encountered a well-studied motif, the coiled coil of α -keratin, which is also found in some other proteins. The distinctive arrangement of eight α helices in myoglobin is replicated in all globins and is called the globin fold. Note that a motif is not a hierarchical structural element falling between secondary and tertiary structure. It is simply a folding pattern. FIGURE 4-16 Motifs. (a) A simple motif, the β -α -β loop. (b) A more elaborate motif, the β barrel. This β barrel is a single domain of α - hemolysin (a toxin that kills a cell by creating a hole in its membrane) from the bacterium Staphylococcus aureus. [Data from (a) PDB ID 4TIM, M. E. Noble et al., J. Med. Chem., 34:2709, 1991; (b) PDB ID 7AHL, L. Song et al., Science 274:1859, 1996.] The second term for describing structural patterns is domain. A domain, as defined by Jane Richardson in 1981, is a part of a polypeptide chain that is independently stable or could undergo movements as a single entity with respect to the entire protein. Polypeptides with more than a few hundred amino acid residues oen fold into two or more domains, sometimes with different functions. In many cases, a domain from a large protein will retain its native three-dimensional structure even when separated (for example, by proteolytic cleavage) from the remainder of the polypeptide chain. In a protein with multiple domains, each domain may appear as a distinct globular lobe (Fig. 4-17); more commonly, extensive contacts between domains make individual domains hard to discern. Different domains oen have distinct functions, such as the binding of small molecules or interaction with other proteins. Small proteins usually have only one domain (the domain is the protein). FIGURE 4-17 Structural domains in the polypeptide troponin C. This calcium-binding protein, associated with muscle, has two separate calcium-binding domains, shown here in brown and blue. [Data from PDB ID 4TNC, K. A. Satyshur et al., J. Biol. Chem. 263:1628, 1988.] Folding of polypeptides is subject to an array of physical and chemical constraints, and several rules have emerged from studies of common protein-folding patterns. 1. The hydrophobic effect makes a large contribution to the stability of protein structures. Burial of hydrophobic amino acid R groups so as to exclude water requires at least two layers of secondary structure. Simple motifs such as the β - α -β loop (Fig. 4-16a) create two such layers. 2. Where they occur together in a protein, α helices and β sheets generally are found in different structural layers. This is because the backbone of a polypeptide segment in the β conformation (Fig. 4-5) cannot readily hydrogen-bond to an α helix that is adjacent to it. 3. Segments adjacent to each other in the amino acid sequence are usually stacked adjacent to each other in the folded structure. Distant segments of a polypeptide may come together in the tertiary structure, but this is not the norm. 4. The β conformation is most stable when the individual segments are twisted slightly in a right-handed sense. This influences both the arrangement of β sheets derived from the twisted segments and the path of the polypeptide connections between them. Two parallel β strands, for example, must be connected by a crossover strand (Fig. 4- 18a). In principle, this crossover could have a right-handed or le-handed conformation, but in proteins it is almost always right-handed. Right-handed connections tend to be shorter than le-handed connections and tend to bend through smaller angles, making them easier to form. The twisting of β sheets also leads to a characteristic twisting of the structure formed by many such segments together, as seen in the β barrel (Fig. 4-16b) and the twisted β sheet (Fig. 4-18c), which form the core of many larger structures. FIGURE 4-18 Stable folding patterns in proteins. (a) Connections between β strands in layered β sheets. The strands here are viewed from one end, with no twisting. The connections at a given end (e.g., near the viewer) rarely cross one another. An example of such a rare crossover is illustrated by the red strands in the structure on the right. (b) Because of the right- handed twist in β strands, connections between strands are generally right-handed. Le -handed connections must traverse sharper angles and are harder to form. (c) This twisted β sheet is from a domain of photolyase (a protein that repairs certain types of DNA damage) from E. coli. Connecting loops have been removed so as to focus on the folding of the β sheet. [Data from PDB ID 1DNP, H. W. Park et al., Science 268:1866, 1995.] Following these rules, complex motifs can be built up from simple ones. For example, a series of β -α -β loops arranged so that the β strands form a barrel creates a particularly stable and common motif, the α /β barrel (Fig. 4-19). In this structure, each parallel β segment is attached to its neighbor by an α - helical segment. All connections are right-handed. The α /β barrel is found in many enzymes, oen with a binding site (for a cofactor or a substrate) in the form of a pocket near one end of the barrel. Note that domains with similar folding patterns are said to have the same motif, even though their constituent α helices and β sheets may differ in length. FIGURE 4-19 Constructing large motifs from smaller ones. The α /β barrel is a commonly occurring motif constructed from repetitions of the β -α -β loop motif. This α /β barrel is a domain of pyruvate kinase (a glycolytic enzyme) from rabbit. [Data from PDB ID 1PKN, T. M. Larsen et al., Biochemistry 33:6301, 1994.] Some Proteins or Protein Segments Are Intrinsically Disordered Although many proteins contain well-folded and stable structures, this is not necessary for the biological function of all proteins. Many proteins or protein segments lack ordered structures in solution. The concept that some proteins function in the absence of a definable three-dimensional structure comes from reassessment of data from many different proteins. As many as a third of all human proteins may be unstructured or may have significant unstructured segments. All organisms have some proteins that fall into this category. Intrinsically disordered proteins have properties that are distinct from those of classical, structured proteins. They oen lack a hydrophobic core and instead are characterized by high densities of charged amino acid residues such as Lys, Arg, and Glu. Pro residues are also prominent, as they tend to disrupt ordered structures. Structural disorder and high charge density can facilitate the function of some proteins as spacers, insulators, or linkers in larger structures. Other disordered proteins are scavengers, binding up ions and small molecules in solution and serving as reservoirs or garbage dumps. However, many intrinsically disordered proteins are at the heart of important protein interaction networks. The lack of an ordered structure can facilitate a kind of functional promiscuity, allowing one protein to interact with multiple or even dozens of partners. Structural disorder allows some inhibitor proteins, such as the mammalian cell division protein p27, to interact with multiple targets in different ways. In solution, p27 lacks definable structure. However, it wraps around and inhibits the action of several enzymes called protein kinases (see Chapter 6) that facilitate cell division. The flexible structure of p27 allows it to accommodate itself to its different target proteins. Human tumor cells, which are cells that have lost the capacity to control cell division normally, generally have reduced levels of p27; the lower the levels of p27, the poorer the prognosis for the cancer patient. Similarly, intrinsically disordered proteins are oen present as hubs or scaffolds at the center of protein networks that constitute signaling pathways (see Fig. 12-30). These proteins, or parts of them, may interact with many different binding partners. They oen take on an ordered structure when they interact with other proteins, but the structure they assume may vary with different binding partners. The mammalian protein p53 is also critical in the control of cell division. It contains both structured and unstructured segments, and the different segments interact with dozens of other proteins. An unstructured region of p53 at the carboxyl terminus interacts with at least four different binding partners and assumes a different structure in each of the complexes (Fig. 4-20). FIGURE 4-20 Binding of the intrinsically disordered carboxyl terminus of p53 protein to its binding partners. (a) The p53 protein is made up of several different segments. Only the central domain is well ordered. (b) The linear sequence of the p53 protein is depicted as a colored bar. The overlaid graph presents a plot of the PONDR (Predictor of Natural Disordered Regions) score versus the protein sequence. PONDR is one of the best available algorithms for predicting the likelihood that a given amino acid residue is in a region of intrinsic disorder, based on the surrounding amino acid sequence and amino acid composition. A score of 1.0 indicates a probability of 100% that a protein will be disordered. In the actual protein structure, the tan central domain is ordered. The amino-terminal (blue) and carboxyl-terminal (red) regions are disordered. (c) The very end of the carboxyl-terminal region has multiple binding partners and folds when it binds to each of them; however, the three-dimensional structure that is assumed when binding occurs is different for each of the interactions shown, and thus this carboxyl- terminal segment (11 to 20 residues) is shown in a different color in each complex. [Information from V. N. Uversky, Intl. J. Biochem. Cell Biol. 43:1090, 2011, Fig. 5. (a) Data from PDB ID 1TUP, Y. Cho et al., Science 265:346, 1994. (c) Data from Cyclin A: PDB ID 1H26, E. D. Lowe et al., Biochemistry 41:15,625, 2002; sirtuin: PDB ID 1MA3, J. L. Avalos et al., Mol. Cell 10:523, 2002; CBP bromodomain: PDB ID 1JSP, S. Mujtaba et al., Mol. Cell 13:251, 2004; s100B(ββ ): PDB ID 1DT7, R. R. Rustandi et al., Nature Struct. Biol. 7:570, 2000.] Protein Motifs Are the Basis for Protein Structural Classification More than 150,000 structures are now archived in the Protein Data Bank (PDB; for a deeper explanation, see Box 4-3). An enormous amount of information about protein structural principles, protein function, and protein evolution is contained in these data. Other databases have organized this information and made it more readily accessible. In the Structural Classification of Proteins database, or SCOP2 (http://scop2.mrc-lmb.cam.ac.uk), all of the protein information in the PDB can be searched within four different categories: (1) protein relationships, (2) structural classes, (3) protein types, and (4) evolutionary events. Figure 4-21 presents examples of protein motifs taken from SCOP2 to illustrate the potential of searching within each category. The figure also introduces another way to represent elements of secondary structure and the relationships among segments of secondary structure in a protein — the topology diagram. FIGURE 4-21 Organization of proteins based on motifs. A few of the hundreds of known stable motifs. (a) Structural diagrams of the enzyme alcohol dehydrogenase from two different organisms. Such comparisons illustrate evolutionary relationships that conserve structure as well as function. (b) A topology diagram for the alcohol dehydrogenase from Acinetobacter calcoaceticus. Topology diagrams provide a way to visualize elements of secondary structure and their interconnections in two dimensions; the diagrams can be very useful in comparing structural folds or motifs. (c) The Structural Classification of Proteins (SCOP2) database (http://scop2.mrc-lmb.cam.ac.uk) organizes protein folds into four classes: all α , all β , α /β , and α + β . Examples of all α folds and all β folds are shown with their structural classification data (PDB ID, fold name, protein name, and source organism) from the SCOP2 database. The PDB ID is the unique accession code given to each structure archived in the Protein Data Bank (www.rcsb.org). [Data from (a) PDB ID 2JHF, R. Meijers et al., Biochemistry 46:5446, 2007; (a, b) PDB ID 1F8F, J. C. Beauchamp et al. (c) PDB ID 1BCF, F. Frolow et al., Nature Struct. Biol. 1:453, 1994; PDB ID 1PEX, F. X. Gomis-Ruth et al., J. Mol. Biol. 264:556, 1996.] The number of folding patterns is not infinite. Among the tens of thousands of distinct protein structures archived in the PDB, only about 1,400 different folds or motifs are classified by the SCOP2 database. Given the many years of progress in structural biology, new motifs are now discovered only rarely. Many examples of recurring domain or motif structures are available, and these reveal that protein tertiary structure is more reliably conserved than amino acid sequence. The comparison of protein structures can thus provide much information about evolution. Proteins with significant similarity in primary structure and/or with similar tertiary structure and function are said to be in the same protein family. The protein structures in the PDB belong to about 4,000 different protein families. A strong evolutionary relationship is usually evident within a protein family. For example, the globin family has many different proteins with both structural and sequence similarities to myoglobin (as seen in the proteins used as examples in Figures 4-30 and 4-31 and in Chapter 5). Two or more families that have little similarity in amino acid sequence but make use of the same major structural motif and have functional similarities are grouped into superfamilies. An evolutionary relationship among families in a superfamily is considered probable, even though time and functional distinctions — that is, different adaptive pressures — may have erased many of the telltale sequence relationships. A protein family may be widespread in all three domains of cellular life — the Bacteria, Archaea, and Eukarya — suggesting an ancient origin. Many proteins involved in intermediary metabolism and the metabolism of nucleic acids and proteins fall into this category. Other families may be present in only a small group of organisms, indicating that the structure arose more recently. Tracing the natural history of structural motifs through the use of structural classifications in databases such as SCOP2 provides a powerful complement to sequence analyses in tracing evolutionary relationships. The SCOP2 database is curated manually, with the objective of placing proteins in the correct evolutionary framework based on conserved structural features. Structural motifs become especially important in defining protein families and superfamilies. Improved protein classification and comparison systems lead inevitably to the elucidation of new functional relationships. Given the central role of proteins in living systems, these structural comparisons can help illuminate every aspect of biochemistry, from the evolution of individual proteins to the evolutionary history of complete metabolic pathways. Protein Quaternary Structures Range from Simple Dimers to Large Complexes Many proteins have multiple polypeptide subunits (from two to hundreds). The association of polypeptide chains can serve a variety of functions. Many multisubunit proteins have regulatory roles; the binding of small molecules may affect the interaction between subunits, causing large changes in the protein’s activity in response to small changes in the concentration of substrate or regulatory molecules (Chapter 6). In other cases, separate subunits take on separate but related functions, such as catalysis and regulation. Some associations, such as those seen in the fibrous proteins considered earlier in this chapter and the coat proteins of viruses, serve primarily structural roles. Some very large protein assemblies are the site of complex, multistep reactions. For example, each ribosome, the site of protein synthesis, incorporates dozens of protein subunits along with RNA molecules. A multisubunit protein can also be referred to as an oligomer or multimer. If an oligomer has nonidentical subunits, the overall structure of the protein can be asymmetric and quite complicated. However, many oligomers have identical subunits or repeating groups of nonidentical subunits, usually in symmetric arrangements. As noted in Chapter 3, the repeating structural unit in such an oligomeric protein, whether a single subunit or a group of subunits, is called a protomer. The first oligomeric protein to have its three-dimensional structure determined was hemoglobin (Mr 64,500), which contains four polypeptide chains and four heme prosthetic groups, in which the iron atoms are in the ferrous (Fe2+) state (as we shall see in Chapter 5). The protein portion, the globin, consists of two α chains (141 residues each) and two β chains (146 residues each). Note that in this case, α and β do not refer to secondary structures. In a practice that can be confusing to the beginning student, the Greek letters α and β (and γ , δ , and others) are oen used to distinguish two different kinds of subunits within a multisubunit protein, regardless of what kinds of secondary structure may predominate in the subunits. Because hemoglobin is four times as large as myoglobin, much more time and effort were required to solve its three-dimensional structure by x-ray analysis, finally achieved by Max Perutz, John Kendrew, and their colleagues in 1959. The subunits of hemoglobin are arranged in symmetric pairs (Fig. 4-22), each pair having one α subunit and one β subunit. Hemoglobin can therefore be described either as a tetramer or as a dimer of αβ protomers. The role these distinct subunits play in hemoglobin function is discussed extensively in Chapter 5. FIGURE 4-22 Quaternary structure of deoxyhemoglobin. X-ray diffraction analysis of deoxyhemoglobin (hemoglobin without oxygen molecules bound to the heme groups) shows how the four polypeptide subunits are packed together. (a) A ribbon representation reveals the secondary structural elements of the structure and the positioning of all the heme prosthetic groups. (b) A surface contour model shows the pockets in which the heme prosthetic groups are bound and helps to visualize subunit packing. The α subunits are shown in shades of gray, the β subunits in shades of blue. Note that the heme groups (red) are relatively far apart. [Data from PDB ID 2HHB, G. Fermi et al., J. Mol. Biol. 175:159, 1984.] Max Perutz, 1914–2002 (le ), and John Kendrew, 1917–1997 SUMMARY 4.3 Protein Tertiary and Quaternary Structures Tertiary structure is the complete three-dimensional structure of a polypeptide chain. Many proteins fall into one of four general classes based on tertiary structure: fibrous, globular, membrane, or disordered. Insoluble fibrous proteins, such as those that make up keratin, collagen, and silk, have simple repeating elements of secondary structure. In some fibrous proteins, the individual polypeptide chains interact to form complex quaternary structures like coiled coils for strength and flexibility. Globular proteins have more complicated tertiary structures, oen containing several types of secondary structure in the same polypeptide chain, and fulfill many different functional roles in the cell. The first globular protein structure to be determined, by x-ray diffraction methods, was that of the O 2-binding protein myoglobin. The myoglobin structure revealed for the first time how protein structure and function are connected. The complex structures of globular proteins can be analyzed by examination of folding patterns, called motifs or folds. The many thousands of known protein structures are generally assembled from a repertoire of only a few hundred motifs. Domains are regions of a polypeptide chain that can fold stably and independently. Some proteins or protein segments are intrinsically disordered, lacking definable three-dimensional structure. These proteins oen have distinctive amino acid compositions that allow a more flexible structure, which is critical for their biological function. Based on structural similarities, proteins can be organized into families and superfamilies, which are informative about protein function and evolution. Quaternary structure results from interactions between the subunits of multisubunit (multimeric) proteins or large supramolecular assemblies. Some multimeric proteins are composed of repeated subunits called protomers. 4.4 Protein Denaturation and Folding Proteins lead a surprisingly precarious existence. As we have seen, a native protein conformation is only marginally stable. In addition, most proteins must maintain conformational flexibility in order to function. The continual maintenance of the active set of cellular proteins required under a given set of conditions is called proteostasis. Cellular proteostasis requires the coordinated function of pathways for protein synthesis and folding, the refolding of proteins that are partially unfolded, and the sequestration and degradation of proteins that have been irreversibly unfolded or are no longer needed. In all cells, these networks involve hundreds of enzymes and specialized proteins. As seen in Figure 4-23, the life of a protein encompasses much more than its synthesis and later degradation. The marginal stability of most proteins can produce a tenuous balance between folded and unfolded states. As proteins are synthesized on ribosomes (Chapter 27), they must fold into their native conformations. Sometimes this occurs spontaneously, but oen it requires the assistance of specialized enzymes and complexes called chaperones, which we discuss later in the chapter. Many of these same folding helpers function to refold proteins that become transiently unfolded. Proteins that are not properly folded oen have exposed hydrophobic surfaces that render them “sticky,” leading to the formation of inactive aggregates. These aggregates may lack their normal function but are not inert; their accumulation in cells lies at the heart of diseases ranging from diabetes to Parkinson disease and Alzheimer disease. Not surprisingly, all cells have elaborate pathways for recycling and/or degrading proteins that are irreversibly misfolded. FIGURE 4-23 Pathways that contribute to proteostasis. Three kinds of processes contribute to proteostasis, in some cases with multiple contributing pathways. First, proteins are synthesized on a ribosome. Second, various pathways contribute to protein folding, many of which involve the activity of complexes called chaperones. Chaperones (including chaperonins) also contribute to the refolding of proteins that are partially and transiently unfolded. Finally, proteins that are irreversibly unfolded are subject to sequestration and degradation by several additional pathways. Partially unfolded proteins and protein-folding intermediates that escape the quality-control activities of the chaperones and degradative pathways may aggregate, forming both disordered aggregates and ordered amyloidlike aggregates that contribute to disease and aging processes. [Information from F. U. Hartl et al., Nature 475:324, 2011, Fig. 6.] The transitions between the folded and unfolded states, and the network of pathways that control these transitions, now become our focus. Loss of Protein Structure Results in Loss of Function Protein structures have evolved to function in particular cellular or extracellular environments. Conditions different from these environments can result in protein structural changes, large and small. A loss of three-dimensional structure sufficient to cause loss of function is called denaturation. The denatured state does not necessarily equate with complete unfolding of the protein and randomization of conformation. Under most conditions, denatured proteins exist in a set of partially folded states. Most proteins can be denatured by heat, which has complex effects on many weak interactions in a protein (primarily on the hydrogen bonds). If the temperature is increased slowly, a protein’s conformation generally remains intact until an abrupt loss of structure (and function) occurs over a narrow temperature range (Fig. 4-24). The abruptness of the change suggests that unfolding is a cooperative process: loss of structure in one part of the protein destabilizes other parts. The effects of heat on proteins can be mitigated by structure. The very heat-stable proteins of thermophilic bacteria and archaea have evolved to function at the temperature of hot springs (∼100 °C). The folded structures of these proteins are oen similar to those of proteins in other organisms, but take some of the principles outlined here to extremes. They oen feature high densities of charged residues on their surfaces, even tighter hydrophobic packing in their interiors, and folds rendered less flexible by networks of ion pairs. Each of these features makes these proteins less susceptible to unfolding at high temperatures. FIGURE 4-24 Protein denaturation. Results are shown for proteins denatured by two different environmental changes. In each case, the transition from the folded state to the unfolded state is abrupt, suggesting cooperativity in the unfolding process. (a) Thermal denaturation of horse apomyoglobin (myoglobin without the heme prosthetic group) and ribonuclease A (with its disulfide bonds intact; see Fig. 4-26). The midpoint of the temperature range over which denaturation occurs is called the melting temperature, or Tm . Denaturation of apomyoglobin was monitored by circular dichroism (see Fig. 4-9), which measures the amount of helical structure in the protein. Denaturation of ribonuclease A was tracked by monitoring changes in the intrinsic fluorescence of the protein, which is affected by changes in the environment of a Trp residue introduced by mutation. (b) Denaturation of disulfide-intact ribonuclease A by guanidine hydrochloride (GdnHCl), monitored by circular dichroism. [Data from (a) R. A. Sendak et al., Biochemistry 35:12,978, 1996; I. Nishii et al., J. Mol. Biol. 250:223, 1995; (b) W. A. Houry et al., Biochemistry 35:10,125, 1996.] Proteins can also be denatured by extremes of pH, by certain miscible organic solvents such as alcohol or acetone, by certain solutes such as urea and guanidine hydrochloride, or by detergents. Each of these denaturing agents represents a relatively mild treatment in the sense that no covalent bonds in the polypeptide chain are broken. Organic solvents, urea, and detergents act primarily by disrupting the hydrophobic aggregation of nonpolar amino acid side chains that produces the stable core of globular proteins; urea also disrupts hydrogen bonds; and extremes of pH alter the net charge on a protein, causing electrostatic repulsion and the disruption of some hydrogen bonding. The denatured structures resulting from these various treatments are not necessarily the same. Denaturation oen leads to protein precipitation, a consequence of protein aggregate formation as exposed hydrophobic surfaces associate. The aggregates are oen highly disordered. One example is the protein precipitate that can be seen aer an egg white is boiled. More-ordered aggregates are also observed in some proteins, as we shall see. Amino Acid Sequence Determines Tertiary Structure The tertiary structure of a globular protein is determined by its amino acid sequence. The most important proof of this came from experiments showing that denaturation of some proteins is reversible. Certain globular proteins denatured by heat, extremes of pH, or denaturing reagents will regain their native structure and their biological activity if they are returned to conditions in which the native conformation is stable. This process is called renaturation. A classic example is the denaturation and renaturation of ribonuclease A, demonstrated by Christian Anfinsen in the 1950s. Purified ribonuclease A denatures completely in a concentrated urea solution in the presence of a reducing agent. The reducing agent cleaves the four disulfide bonds to yield eight Cys residues, and the urea disrupts the stabilizing hydrophobic effect, thus freeing the entire polypeptide from its folded conformation. Denaturation of ribonuclease is accompanied by a complete loss of catalytic activity. When the urea and the reducing agent are removed, the denatured ribonuclease spontaneously refolds into its correct tertiary structure, with full restoration of its catalytic activity (Fig. 4-25). The refolding of ribonuclease is so accurate that the four intrachain disulfide bonds are re-formed in the same positions in the renatured molecule as in the native ribonuclease. Later, similar results were obtained using chemically synthesized, catalytically active ribonuclease A. This eliminated the possibility that some minor contaminant in Anfinsen’s purified ribonuclease preparation might have contributed to renaturation of the enzyme, and thus it dispelled any remaining doubt that this enzyme folds spontaneously. FIGURE 4-25 Renaturation of unfolded, denatured ribonuclease. Urea denatures the ribonuclease, and mercaptoethanol (HOCH2CH2SH) reduces and thus cleaves the disulfide bonds to yield eight Cys residues. Renaturation involves reestablishing the correct disulfide cross-links. The Anfinsen experiment provided the first evidence that the amino acid sequence of a polypeptide chain contains all the information required to fold the chain into its native, three- dimensional structure. Subsequent work has shown that only a minority of proteins, many of them small and inherently stable, will fold spontaneously into their native form. Even though all proteins have the potential to fold into their native structure, many require some assistance. Polypeptides Fold Rapidly by a Stepwise Process In living cells, proteins are assembled from amino acids at a very high rate. For example, E. coli cells can make a complete, biologically active protein molecule containing 100 amino acid residues in about 5 seconds at 37 °C. However, the synthesis of peptide bonds on the ribosome is not enough; the protein must fold. How does the polypeptide chain arrive at its native conformation? Let’s assume conservatively that each of the amino acid residues could take up 10 different conformations on average, giving 10100 different conformations for the polypeptide. Let’s also assume that the protein folds spontaneously by a random process in which it tries out all possible conformations around every single bond in its backbone until it finds its native, biologically active form. If each conformation were sampled in the shortest possible time (∼10−13 second, or the time required for a single molecular vibration), it would take about 1077 years to sample all possible conformations! Clearly, protein folding is not a completely random, trial-and-error process. There must be shortcuts. This problem was first pointed out by Cyrus Levinthal in 1968 and is sometimes called Levinthal’s paradox. The folding pathway of a large polypeptide chain is unquestionably complicated. However, robust algorithms can oen predict the structure of smaller proteins on the basis of their amino acid sequences. The major folding pathways are hierarchical. Local secondary structures form first. Certain amino acid sequences fold readily into α helices or β sheets, guided by constraints such as those reviewed in our discussion of secondary structure. Ionic interactions, involving charged groups that are oen near one another in the linear sequence of the polypeptide chain, can play an important role in guiding these early folding steps. Assembly of local structures is followed by longer-range interactions between, say, two elements of secondary structure that come together to form stable folded structures. The hydrophobic effect plays a significant role throughout the process, as the aggregation of nonpolar amino acid side chains provides an entropic stabilization to intermediates and, eventually, to the final folded structure. The process continues until complete domains form and the entire polypeptide is folded (Fig. 4-26). Notably, proteins dominated by close-range interactions (between pairs of residues generally located near each other in the polypeptide sequence) tend to fold faster than proteins with more complex folding patterns and with many long- range interactions between different segments. As larger proteins with multiple domains are synthesized, domains near the amino terminus (which are synthesized first) may fold before the entire polypeptide has been assembled. FIGURE 4-26 A protein-folding pathway as defined for a small protein. A hierarchical pathway is shown, based on computer modeling. Small regions of secondary structure are assembled first and then gradually incorporated into larger structures. The program used for this model has been highly successful in predicting the three-dimensional structure of small proteins from their amino acid sequence. The numbers indicate the amino acid residues in this 56 residue peptide that have acquired their final structure in each of the steps shown. [Information from K. A. Dill et al., Annu. Rev. Biophys. 37:289, 2008, Fig. 5.] Thermodynamically, the folding process can be viewed as a kind of free-energy funnel (Fig. 4-27). The unfolded states are characterized by a high degree of conformational entropy and relatively high free energy. As folding proceeds, the narrowing of the funnel reflects the decrease in the conformational space that must be searched as the protein approaches its native state. Small depressions along the sides of the free-energy funnel represent semistable intermediates that can briefly slow the folding process. At the bottom of the funnel, an ensemble of folding intermediates has been reduced to a single native conformation (or one of a small set of native conformations). The funnels can have a variety of shapes, depending on the complexity of the folding pathway, the existence of semistable intermediates, and the potential for particular intermediates to assemble into aggregates of misfolded proteins. FIGURE 4-27 The thermodynamics of protein folding depicted as free- energy funnels. As proteins fold, the conformational space that can be explored by the structure is constrained. This is modeled as a three- dimensional thermodynamic funnel, with ΔG represented by the depth of the funnel and the native structure (N) at the bottom (lowest free-energy point). The funnel for a given protein can have a variety of shapes, depending on the number and types of folding intermediates in the folding pathways. Any folding intermediate with significant stability and a finite lifetime would be represented as a local free-energy minimum—a depression on the surface of the funnel. (a) A simple but relatively wide and smooth funnel represents a protein that has multiple folding pathways (that is, the order in which different parts of the protein fold is somewhat random), but it assumes its three-dimensional structure with no folding intermediates that have significant stability. (b) This funnel represents a more typical protein that has multiple possible folding intermediates with significant stability on the multiple pathways leading to the native structure. [Information from K. A. Dill et al., Annu. Rev. Biophys. 37:289, 2008, Fig. 9.] Thermodynamic stability is not evenly distributed over the structure of a protein — the molecule has regions of relatively high stability and others of low or negligible stability. For example, a protein may have two stable domains joined by a segment that is entirely disordered. Regions of low stability may allow a protein to alter its conformation between two or more states. As we shall see in the next two chapters, variations in the stability of regions within a protein are oen essential to protein function. Intrinsically disordered proteins or protein segments do not fold at all. Some Proteins Undergo Assisted Folding Not all proteins fold spontaneously as they are synthesized in the cell. Folding for many proteins requires chaperones, proteins that interact with partially folded or improperly folded polypeptides, facilitating correct folding pathways or providing microenvironments in which folding can occur. Several types of molecular chaperones are found in organisms ranging from bacteria to humans. Two major families of chaperones, both well studied, are the Hsp70 family and the chaperonins. Proteins in the Hsp70 family generally have a molecular weight near 70,000 and are more abundant in cells stressed by elevated temperatures (hence, heat shock proteins of Mr 70,000, or Hsp70). Hsp70 proteins bind to regions of unfolded polypeptides that are rich in hydrophobic residues. These chaperones thus “protect” both proteins subject to denaturation by heat and new peptide molecules being synthesized (and not yet folded). Hsp70 proteins also block the folding of certain proteins that must remain unfolded until they have been translocated across a membrane (as described in Chapter 27). Some chaperones also facilitate the quaternary assembly of oligomeric proteins. The Hsp70 proteins bind to and release polypeptides in a cycle that uses energy from ATP hydrolysis and involves several other proteins (including a class called Hsp40). Figure 4-28 illustrates chaperone-assisted folding as elucidated for the eukaryotic Hsp70 and Hsp40 chaperones. The binding of an unfolded polypeptide by an Hsp70 chaperone may break up a protein aggregate or prevent the formation of a new one. When the bound polypeptide is released, it has a chance to resume folding to its native structure. If folding does not occur rapidly enough, the polypeptide may be bound again and the process repeated. Alternatively, the Hsp70-bound polypeptide may be delivered to a chaperonin. FIGURE 4-28 Chaperones in protein folding. The pathway by which chaperones of the Hsp70 class bind and release polypeptides is illustrated for the eukaryotic chaperones Hsp70 and Hsp40. The chaperones do not actively promote the folding of the substrate protein, but instead prevent aggregation of unfolded peptides. The unfolded or partly folded proteins bind first to the open, ATP-bound form of Hsp70. Hsp40 then interacts with this complex and triggers ATP hydrolysis that produces the closed form of the complex, in which the domains colored orange and yellow come together like the two parts of a jaw, trapping parts of the unfolded protein inside. Dissociation of ADP and recycling of the Hsp70 requires interaction with another type of protein called a nucleotide-exchange factor (NEF). For a population of polypeptide molecules, some fraction of the molecules released a er the transient binding of partially folded proteins by Hsp70 will take up the native conformation. The remainder are quickly rebound by Hsp70 or diverted to the chaperonin system. [Information from F. U. Hartl et al., Nature 475:324, 2011, Fig. 2. Open Hsp70-ATP: PDB ID 2QXL, Q. Liu and W. A. Hendrickson, Cell 131:106, 2007. Closed Hsp70-ADP: Data from PDB ID 2KHO, E. B. Bertelson et al., Proc. Natl. Acad. Sci. USA 106:8471, 2009, and PDB ID 1DKZ, X. Zhu et al., Science 272:1606, 1996.] Chaperonins are elaborate protein complexes required for the folding of some cellular proteins that do not fold spontaneously. In E. coli, an estimated 10% to 15% of cellular proteins require the resident chaperonin system, called GroEL/GroES, for folding under normal conditions (up to 30% require this assistance when the cells are heat stressed). The analogous chaperonin system in eukaryotes is called Hsp60. The chaperonins first became known when they were found to be necessary for the growth of certain bacterial viruses (hence the designation “Gro”). These chaperone proteins are structured as a series of multisubunit rings, forming two chambers oriented back to back. Inside one of the chambers, a protein is given about 10 seconds to fold. Constraining a protein within the chamber prevents inappropriate protein aggregation and also restricts the conformational space that a polypeptide chain can explore as it folds. The GroEL/GroES folding pathway is discussed in Chapter 27. Finally, the folding pathways of some proteins require two enzymes that catalyze isomerization reactions. Protein disulfide isomerase (PDI) is a widely distributed enzyme that catalyzes the interchange, or shuffling, of disulfide bonds until the bonds of the native conformation are formed. Among its functions, PDI catalyzes the elimination of folding intermediates with inappropriate disulfide cross-links. Peptide prolyl cis-trans isomerase (PPI) catalyzes the interconversion of the cis and trans isomers of peptide bonds formed by Pro residues (Fig. 4-7), which can be a slow step in the folding of proteins that contain some Pro peptide bonds in the cis configuration. Defects in Protein Folding Are the Molecular Basis for Many Human Genetic Disorders Protein misfolding is a substantial problem in all cells. Despite the many processes that assist in protein folding, a quarter or more of all polypeptides synthesized may be destroyed because they do not fold correctly. In some cases, the misfolding causes or contributes to the development of serious disease. Many conditions, including type 2 diabetes, Alzheimer disease, Huntington disease, and Parkinson disease, are associated with a misfolding mechanism: a soluble protein that is normally secreted from the cell is secreted in a misfolded state and converted into an insoluble extracellular amyloid fiber. The diseases are collectively referred to as amyloidoses. The fibers are highly ordered and unbranched, with a diameter of 7 to 10 nm and a high degree of β -sheet structure. The β segments are oriented perpendicular to the axis of the fiber. In some amyloid fibers the overall structure includes two layers of β sheet, such as that shown for amyloid-β peptide in Figure 4-29. FIGURE 4-29 Formation of disease-causing amyloid fibrils. (a) Protein molecules whose normal structure includes regions of β sheet undergo partial folding. In a small number of the molecules, before folding is complete, the β -sheet regions of one polypeptide associate with the same region in another polypeptide, forming the nucleus of an amyloid. Additional protein molecules slowly associate with the amyloid and extend it to form a fibril. (b) The amyloid-β peptide begins as two α -helical segments of a larger protein. Proteolytic cleavage of this larger protein leaves the relatively unstable amyloid-β peptide, which loses its α -helical structure. It can then assemble slowly into amyloid fibrils (c), which contribute to the characteristic plaques on the exterior of nervous tissue in people with Alzheimer disease. The aromatic side chains shown here play a significant role in stabilizing the amyloid structure. Amyloid is rich in β sheet structure, with the β strands arranged perpendicular to the axis of the amyloid fibril. Amyloid-β peptide takes the form of two layers of extended parallel β sheet. [(a) Information from D. J. Selkoe, Nature 426:900, 2003, Fig. 1. (b) Data from PDB ID 1IYT, O. Crescenzi et al., Eur. J. Biochem. 269:5642, 2002. (c) Data from PDB ID 2BEG, T. Lührs et al., Proc. Natl. Acad. Sci. USA 102:17,342, 2005.] Many proteins can take on the amyloid fibril structure as an alternative to their normal folded conformations, and most of these proteins have a concentration of aromatic amino acid residues in a core region of β sheet or α helix. The proteins are secreted in an incompletely folded conformation. The core (or some part of it) folds into a β sheet before the rest of the protein folds correctly, and the β sheets of two or more incompletely folded protein molecules associate to begin forming an amyloid fibril. The fibril grows in the extracellular space. Other parts of the protein then fold differently, remaining on the outside of the β -sheet core in the growing fibril. The effect of aromatic residues in stabilizing the structure is shown in Figure 4-29c. Because most of the protein molecules fold normally, the onset of symptoms in the amyloidoses is oen very slow. If a person inherits a mutation such as substitution with an aromatic residue at a position that favors formation of amyloid fibrils, disease symptoms may begin at an earlier age. The amyloid deposition diseases that trigger neurodegeneration, particularly in older adults, are a special class of localized amyloidoses. Alzheimer disease is associated with extracellular amyloid deposition by neurons, involving the amyloid-β peptide (Fig. 4-29b), derived from a larger transmembrane protein (amyloid-β precursor protein) found in most human tissues. When it is part of the larger protein, the peptide is composed of two α -helical segments spanning the membrane. When the external and internal domains are cleaved off by specific proteases, the relatively unstable amyloid-β peptide leaves the membrane and loses its α -helical structure. It can then take the form of two layers of extended parallel β sheet, which can slowly assemble into amyloid fibrils (Fig. 4-29c). Deposits of these amyloid fibers seem to be the primary cause of Alzheimer disease, but a second type of amyloidlike aggregation, involving a protein called tau, also occurs intracellularly (in neurons) in people with Alzheimer disease. Inherited mutations in the tau protein do not result in Alzheimer disease, but they cause a frontotemporal dementia and parkinsonism (a condition with symptoms resembling Parkinson disease) that can be equally devastating. Several other neurodegenerative conditions involve intracellular aggregation of misfolded proteins. In Parkinson disease, the misfolded form of the protein α -synuclein aggregates into spherical filamentous masses called Lewy bodies. Huntington disease involves the protein huntingtin, which has a long polyglutamine repeat. In some individuals, the polyglutamine repeat is longer than normal, and a more subtle type of intracellular aggregation occurs. Notably, when the mutant human proteins involved in Parkinson disease and Huntington disease are expressed in Drosophila melanogaster, the flies display neurodegeneration expressed as eye deterioration, tremors, and early death. All of these symptoms are highly suppressed if expression of the Hsp70 chaperone is also increased. Protein misfolding need not lead to amyloid formation to cause serious disease. For example, cystic fibrosis is caused by defects in a membrane-bound protein called cystic fibrosis transmembrane conductance regulator (CFTR), which acts as a channel for chloride ions. The most common cystic fibrosis– causing mutation is the deletion of a Phe residue at position 508 in CFTR, which causes improper protein folding. Most of this protein is then degraded and its normal function is lost (see Box 11-2). Drugs that can correct certain CFTR misfolding events have been developed. Such drugs are called “correctors” or “pharmacological chaperones.” Many of the disease-related mutations in collagen (p. 118) also cause defective folding. A particularly remarkable type of protein misfolding is seen in the prion diseases (Box 4-4). BOX 4-4 MEDICINE Death by Misfolding: The Prion Diseases A misfolded brain protein seems to be the causative agent of several rare degenerative brain diseases in mammals. Perhaps the best known of these is bovine spongiform encephalopathy (BSE; also known as mad cow disease). Related diseases include kuru and Creutzfeldt-Jakob disease in humans, scrapie in sheep, and chronic wasting disease in deer and elk. These diseases are also referred to as spongiform encephalopathies, because the diseased brain frequently becomes riddled with holes (Fig. 1). Progressive deterioration of the brain leads to a spectrum of neurological symptoms, including weight loss; erratic behavior; problems with posture, balance, and coordination; and loss of cognitive function. The diseases are fatal. FIGURE 1 (a) Light micrograph of pyramidal cells in the human cerebral cortex. (b) A comparable section from the autopsy of a patient with Creutzfeldt-Jakob disease shows spongiform (vacuolar) degeneration, the most characteristic neurohistological feature. The yellowish vacuoles are intracellular and occur mostly in pre- and postsynaptic processes of neurons. The vacuoles in this section vary in diameter from 20 to 100 μ m. In the 1960s, investigators found that preparations of the disease-causing agents seemed to lack nucleic acids. At this time, Tikvah Alper suggested that the agent was a protein. Initially, the idea seemed heretical. All disease-causing agents known up to that time—viruses, bacteria, fungi, and so on—contained nucleic acids, and their virulence was related to genetic reproduction and propagation. However, four decades of investigations, pursued most notably by Stanley Prusiner, have provided evidence that spongiform encephalopathies are different. The infectious agent has been traced to a single protein (Mr 28,000), which Prusiner dubbed prion protein (PrP). The name was derived from proteinaceous infectious, but Prusiner thought that “prion” sounded better than “proin.” Prion protein is a normal constituent of brain tissue in all mammals. Its role is not known in detail, but it may have a molecular signaling function. Strains of mice lacking the gene for PrP (and thus the protein itself) suffer no obvious ill effects. Illness occurs only when the normal cellular PrP, or PrPC, occurs in an altered conformation called PrPSc (Sc denotes scrapie). The structure of PrPC has two α helices. The structure of PrPSc is very different, with much of the structure converted to amyloidlike β sheets (Fig. 2). The interaction of PrPSc with PrPC converts the latter to PrPSc, initiating a domino effect in which more and more of the brain protein converts to the disease-causing form. The mechanism by which the presence of PrPSc leads to spongiform encephalopathy is not understood. FIGURE 2 Structure of the globular domain of human PrP and models of the misfolded, disease-causing conformation PrPSc, and an aggregate of PrPSc. The α helices are labeled to help illustrate the modeled conformational change as the globular protein progresses to the aggregate. Helix A is incorporated into the β - sheet structure of the misfolded conformation. [Data for human PrP from PDB ID 1QLX, R. Zahn et al., Proc. Natl. Acad. Sci. USA 97:145, 2000. Information for models from C. Govaerts et al., Proc. Natl. Acad. Sci. USA 101:8342, 2004.] In inherited forms of prion diseases, a mutation in the gene encoding PrP produces a change in one amino acid residue that is believed to make the conversion of PrPC to PrPSc more likely. A complete understanding of prion diseases awaits new information on how prion protein affects brain function. Structural information about PrP is beginning to provide insights into the molecular process that allows the prion proteins to interact so as to alter their conformation (Fig. 2). The significance of prions may extend well beyond spongiform encephalopathies. Evidence is building that prionlike proteins may be responsible for additional neurodegenerative diseases such as multiple system atrophy (MSA), a disease that resembles Parkinson disease. SUMMARY 4.4 Protein Denaturation and Folding The maintenance of the steady-state collection of active cellular proteins required under a particular set of conditions — called proteostasis — involves an elaborate set of pathways and processes that fold, refold, and degrade polypeptide chains. The three-dimensional structure and the function of most proteins can be destroyed by denaturation, demonstrating a relationship between structure and function. Heat, extremes of pH, organic solvents, solutes, and detergents can all be used to denature proteins. Some denatured proteins can renature spontaneously to form biologically active protein, showing that tertiary structure is determined by amino acid sequence. Protein folding occurs too fast for it to be a completely random process. Instead, protein folding is generally hierarchical. Initially, regions of secondary structure may form, followed by folding into motifs and domains. Large ensembles of folding intermediates are rapidly brought to a single native conformation. For many proteins, folding is facilitated by Hsp70 chaperones and by chaperonins. Disulfide-bond formation and the cis-trans isomerization of Pro peptide bonds can also be catalyzed by specific enzymes during folding. Protein misfolding is the molecular basis for many human diseases, including cystic fibrosis and amyloidoses such as Alzheimer disease. 4.5 Determination of Protein and Biomolecular Structures In this chapter we have presented many types of protein structures. How were these structures determined? Structural biology is the study of the three-dimensional structures of biomolecules, including proteins, nucleic acids, lipid membranes, and oligosaccharides. Structural biologists combine biochemical approaches with physical tools and computational methods to obtain these structures. Structural biology is extraordinarily powerful for elucidating the relationships between the structure and function of proteins, the molecular basis for enzymatic catalysis and ligand binding, and evolutionary relationships between proteins. Here we focus primarily on three commonly used methods in structural biology: x-ray crystallography, nuclear magnetic resonance (NMR), and cryo- electron microscopy (cryo-EM). Which method a structural biologist uses depends on the system being studied and what information is to be learned. Oen structural biologists combine multiple methods to provide a more complete view of function. Increasingly, computational methods such as molecular dynamics simulations and in silico protein folding are proving to be essential for structural biologists, and these are discussed in Box 4-5. BOX 4-5 Video Games and Designer Proteins Computational tools are now indispensable to biochemistry. While advances in computers and so ware have revolutionized how protein structures can be solved by x-ray crystallography, NMR, and cryo-EM, advances in computational chemistry have now allowed a number of studies of protein structure to be carried out entirely in silico using powerful molecular modeling and dynamics so ware. Key developments in this field were made by Martin Karplus, Michael Levitt, and Arieh Warshel, who received the Nobel Prize in Chemistry in 2013 for their work on the theoretical and computational tools needed to carry out computer simulations of molecules as large as proteins. Protein folding is a problem particularly well-suited for computational biochemists. Protein-folding landscapes can be complex, and nearly an infinite number of possible peptide conformations are theoretically available for the protein to sample as it folds. However, to understand how folding happens efficiently and by only sampling a small portion of all possible conformations requires computers that can quickly test tens of thousands of possible folding trajectories in a brief time. Sometimes these calculations are carried out on powerful supercomputers, but in other instances research groups have taken to crowdsourcing these problems to thousands of citizen scientists. David Baker David Baker has pioneered crowdsourcing protein-folding problems to engage millions of computer users directly in biochemistry. The Rosetta@home project distributes complex protein-folding problems to citizen scientists who run the so ware in the background on their home computers. Each home computer is able to complete a portion of a protein-folding experiment, and these results can be combined with those from other users to predict protein structures. Foldit is a video game in which users compete to solve protein-folding puzzles (Fig. 1). Users are rewarded for making stabilizing contacts in the structure, like hydrogen bonds or favorable van der Waals interactions. Virtual protein designs (sometimes called theozymes) from the video game can then be tested in the real world. DNA genes coding for the virtual proteins can be created in the laboratory and be used to recombinantly produce the proteins in bacteria, using techniques described in Chapter 9. The purified proteins can then be structurally characterized by x-ray crystallography or NMR to see how closely the real-world structures and computationally designed structures match. FIGURE 1 Foldit uses a video game interface to crowdsource protein-folding problems. Proteins designed in the video game can be recreated in the laboratory and studied using biochemical and structural methods. One exciting application of computational biochemistry is the creation of “designer proteins.” These are proteins with completely novel folds or functions not yet identified in nature. Designer proteins have potential applications in bioengineering, medicine, materials science, and chemistry. In one recent example, the Baker Lab designed a series of Foldit puzzles (such as “Cover the Ligand,” in which users had to redesign a ligand-binding site by rebuilding the protein backbone and changing amino acid side chains) to improve the catalytic efficiency of a previously engineered Diels-Alderase (Fig. 2). The Diels- Alder reaction is a common method for organic chemists to create carbon– carbon bonds; however, it is almost never used by naturally occurring enzymes. Diels-Alderase enzymes would potentially be useful for carrying out environmentally friendly carbon bond forming reactions in water instead of toxic organic solvents. Foldit competitors used the video game to design thousands of virtual enzymes. The players were successful, and once their top virtual enzyme was produced and tested in the real world it was found to be more than 18-fold more active than the original Diels-Alderase. Incredibly, the protein structure created by the Foldit game players was also confirmed by x- ray crystallography. This is definitely a case where we can encourage biochemistry students to spend more time playing video games! FIGURE 2 The Diels-Alder reaction catalyzed by an enzyme designed by Foldit players. Diels-Alder reactions are commonly used by organic chemists to form C−C bonds; however, there are very few examples of enzymes catalyzing this type of chemistry in nature. [Information from C. B. Eiben et al., Nature Biotechnol. 30:190, 2109.] X-ray Diffraction Produces Electron Density Maps from Protein Crystals The spacing of atoms in a crystal lattice can be determined by measuring the locations and intensities of spots produced on a detector by a beam of x-rays of given wavelength, aer the beam has been diffracted by the electrons of the atoms. For example, x- ray analysis of sodium chloride crystals shows that Na+ and Cl− ions are arranged in a simple cubic lattice. The spacing of the different kinds of atoms in complex organic molecules, even very large ones such as proteins, can also be analyzed by x-ray diffraction methods. However, the technique for analyzing crystals of complex molecules is far more laborious than the technique for analyzing simple salt crystals. When the repeating pattern of the crystal is a molecule as large as, say, a protein, the numerous atoms in the molecule yield thousands of diffraction spots that must be analyzed by computer. Consider how images are generated in a light microscope. Light from a point source is focused on an object. The object scatters the light waves, and these scattered waves are recombined by a series of lenses to generate an enlarged image of the object. The smallest object whose structure can be determined by such a system — that is, the resolving power of the microscope — is determined by the wavelength of the light, in this case visible light, with wavelengths in the range of 400 to 700 nm. Objects smaller than half the wavelength of the incident light cannot be resolved. To resolve objects as small as proteins we must use x- rays, with wavelengths in the range of 0.7 to 1.5 Å (0.07 to 0.15 nm). However, there are no lenses that can recombine x-rays to form an image; instead, the pattern of diffracted x-rays is collected directly and an image is reconstructed by mathematical techniques. The amount of information obtained from x-ray crystallography depends on the degree of structural order in the sample. Some important structural parameters were obtained from early studies of the diffraction patterns of the fibrous proteins arranged in regular arrays in hair and wool. However, the orderly bundles formed by fibrous proteins are not crystals — the molecules are aligned side by side, but not all are oriented in the same direction. More-detailed three-dimensional structural information about proteins requires a highly ordered protein crystal. The structures of many proteins are not yet known, simply because they have proved difficult to crystallize. Practitioners have compared making protein crystals to holding together a stack of bowling balls with cellophane tape. Operationally, there are several steps in x-ray structural analysis (Fig. 4-30). A crystal is placed in an x-ray beam between the x-ray source and a detector, and a regular array of spots, called reflections, is generated. The spots are created by the diffracted x- ray beam, and each atom in a molecule makes a contribution to each spot. An electron-density map of the protein is reconstructed from the overall diffraction pattern of spots by a mathematical technique called a Fourier transform. In effect, the computer acts as a “computational lens.” A model for the structure is then built that is consistent with the electron-density map.

FIGURE 4-30 Steps in determining the structure of sperm whale myoglobin by x-ray crystallography. (a) X-ray diffraction patterns are generated from a crystal of the protein. (b) Data extracted from the diffraction patterns are used to calculate a three- dimensional electron-density map. The electron density of only part of the structure, the heme, is shown here. (c) Regions of greatest electron density reveal the location of atomic nuclei, and this information is used to piece together the final structure. Here, the heme structure is modeled into its electron-density map. (d) The completed structure of sperm whale myoglobin, including the heme. [(a, b, c) Photo and data from George N. Phillips, Jr., University of Wisconsin–Madison, Department of Biochemistry. (d) Data from PDB ID 2MBW, E. A. Brucker et al., J. Biol. Chem. 271:25,419, 1996.] John Kendrew found that the x-ray diffraction pattern of crystalline myoglobin (isolated from muscles of the sperm whale) is highly complex, with nearly 25,000 reflections. Computer analysis of these reflections took place in stages. The resolution improved at each stage until, in 1959, the positions of virtually all the nonhydrogen atoms in the protein had been determined. The amino acid sequence of the protein, obtained by chemical analysis, was consistent with the molecular model. Over 100,000 protein structures, many of them much more complex than myoglobin, have since been determined to a similar level of resolution by x-ray crystallography. The physical environment in a crystal, of course, is not identical to that in solution or in a living cell. A crystal imposes a space and time average on the structure deduced from its analysis, and x-ray diffraction studies provide little information about molecular motion within the protein. The conformation of proteins in a crystal can also be affected by nonphysiological factors such as incidental protein-protein contacts within the crystal. However, when structures derived from the analysis of crystals are compared with structural information obtained by other means (such as NMR, as described below), the crystal-derived structure almost always represents a functional conformation of the protein. Distances between Protein Atoms Can Be Measured by Nuclear Magnetic Resonance An advantage of nuclear magnetic resonance (NMR) studies is that they are carried out on macromolecules in solution, whereas x-ray crystallography is limited to molecules that can be crystallized. NMR can also illuminate the dynamic side of protein structure, including conformational changes, protein folding, and interactions with other molecules. NMR is a manifestation of nuclear spin angular momentum, a quantum mechanical property of atomic nuclei. Only certain atoms, including 1H, 13C, 15N, 19F, and 31P, have the kind of nuclear spin that gives rise to an NMR signal. Nuclear spin generates a magnetic dipole. When a strong, static magnetic field is applied to a solution containing a single type of macromolecule, the magnetic dipoles are aligned in the field in one of two orientations: parallel (low energy) or antiparallel (high energy). A short (∼10 μ s) pulse of electromagnetic energy of suitable frequency (the resonant frequency, which is in the radio frequency range) is applied at right angles to the nuclei aligned in the magnetic field. Some energy is absorbed as nuclei switch to the high-energy state, and the absorption spectrum that results contains information about the identity of the nuclei and their immediate chemical environment. The data from many such experiments on a sample are averaged, increasing the signal-to- noise ratio, and an NMR spectrum such as that in Figure 4-31 is generated.

FIGURE 4-31 NMR spectra of a globin from a marine blood worm. (a) One-dimensional 1H NMR spectrum. (b) Two-dimensional NMR data used to generate a three-dimensional structure of globin. The diagonal in a two-dimensional NMR spectrum is equivalent to a one-dimensional spectrum. The off-diagonal peaks are NOE signals generated by close- range interactions of 1H atoms that may generate signals quite distant in the one- dimensional spectrum. Two such interactions are identified in (b), and their identities are shown with blue lines in (c). Three lines are drawn for interaction 2 between a methyl group in the protein and a hydrogen on the heme. The methyl group rotates rapidly such that each of its three hydrogens contributes equally to the interaction and the NMR signal. Such information is used to determine the complete three-dimensional structure, as in (d). The multiple lines shown for the protein backbone in (d) represent the family of structures consistent with the distance constraints in the NMR data. [Data from (a, b) B. F. Volkman, National Magnetic Resonance Facility at Madison; (c) PDB ID 1VRF; (d) PDB ID 1VRE, B. F. Volkman et al., Biochemistry 37:10,906, 1998.] 1H is particularly important in NMR experiments because of its high sensitivity and natural abundance. For macromolecules, 1H NMR spectra can become quite complicated. Even a small protein has hundreds of 1H atoms, typically resulting in a one- dimensional NMR spectrum too complex for analysis. Structural analysis of proteins became possible with the advent of two- dimensional NMR techniques (Fig. 4-31b, c, d). These methods allow measurement of distance-dependent coupling of nuclear spins in nearby atoms through space (the nuclear Overhauser effect (NOE), in a method dubbed NOESY) or the coupling of nuclear spins in atoms connected by covalent bonds (total correlation spectroscopy, or TOCSY). Translating a two-dimensional NMR spectrum into a complete three-dimensional structure can be a laborious process. The NOE signals provide some information about the distances between individual atoms, but for these distance constraints to be useful, the atoms giving rise to each signal must be identified. Complementary TOCSY experiments can help identify which NOE signals reflect atoms that are linked by covalent bonds. Certain patterns of NOE signals have been associated with secondary structures such as α helices. Genetic engineering (Chapter 9) can be used to prepare proteins that contain the rare isotopes 13C or 15N. The new NMR signals produced by these atoms, and the coupling with 1H signals resulting from these substitutions, help in the assignment of individual 1H NOE signals. The process is also aided by a knowledge of the amino acid sequence of the polypeptide. To generate a three-dimensional structure, researchers feed the distance constraints into a computer along with known geometric constraints such as chirality, van der Waals radii, and bond lengths and angles. The computer generates a family of closely related structures that represent the range of conformations consistent with the NOE distance constraints (Fig. 4-31d). The uncertainty in structures generated by NMR is in part a reflection of the molecular vibrations (known as breathing) within a protein structure in solution, discussed in more detail in Chapter 5. Normal experimental uncertainty can also play a role. Thousands of Individual Molecules Are Used to Determine Structures by Cryo-Electron Microscopy Our understanding of highly complex processes such as gene expression, mitochondrial respiration, or viral infection is aided immensely by knowing the detailed molecular structures of the proteins that participate in these processes. However, it is oen difficult to determine the molecular structure of large, dynamic, macromolecular complexes that contain dozens of individual protein subunits. Moreover, integral membrane proteins oen resist crystallization once they are removed from their lipid environment, making their structures difficult to solve by x-ray diffraction, and many are too large for NMR. In principle, discrete objects in the diameter range 100 to 300 Å can be visualized by electron microscopy (EM). In practice, the high intensity of the EM beam oen damages the specimen before a high-resolution image can be obtained. In cryo-electron microscopy (cryo-EM), a sample containing many individual copies of the structure of interest is quick-frozen in vitreous (or noncrystalline) ice and kept frozen while being observed in two dimensions with the electron microscope, greatly reducing damage to the specimen by the electron beam. Particles such as purified, multisubunit enzymes, arranged randomly on the microscope grid, are visualized with the cryo- electron microscope. When cryo-EM is combined with powerful algorithms for transforming the two-dimensional structures of tens of thousands of individual, randomly oriented complexes into a three-dimensional composite, it is sometimes possible to determine molecular structures at a level comparable to that obtained by x-ray crystallography (Fig. 4-32). In favorable cases, the repetitive aspects—choice of objects to be included in the analysis, imaging of each object individually, and calculations to produce a three-dimensional structure from the huge number of two-dimensional images—can be automated. The EMDataResource (www.emdataresource.org) is a unified resource for accessing structure maps deposited into data banks and assigned EMDataBank (EMDB) accession codes.

FIGURE 4-32 Structure of the chaperone protein GroEL as determined by single-particle cryo-EM. (a) Cryo-EM images of many individual GroEL particles. (b) Side and top views of the three-dimensional structure derived from analysis of the EM images. [(b) Data from PDB ID 3E76, P. D. Kaiser et al., Acta Crystallogr. 65:967, 2009.] Many novel structures have now been obtained by cryo-EM without models based on prior x-ray or NMR structures. Since cryo-EM relies on imaging of single molecules of a complex, this technique can also be used to computationally sort the imaged particles and simultaneously determine structures of multiple conformational states. Cryo-EM has now been used to solve the structures of some of the most dynamic and largest molecular complexes in the cell, such as the human telomerase enzyme (Fig. 4-33). Telomerase is an essential enzyme for maintaining chromosome integrity in humans (see Chapter 26) and is the target of significant medical research due to its roles in aging and cancer. Cryo-EM was critical for the laboratories of Eva Nogales and Kathleen Collins in determining the architecture of telomerase due to the heterogeneity of the complex and because only minute quantities could be purified from human cells — far too little for crystallization, but enough to observe single molecules by cryo-EM.

FIGURE 4-33 Cryo-EM structure of human telomerase. The structures of the RNA (green) and protein (ribbon representations) components of human telomerase are shown embedded in the calculated 10.2 Å EM density map. [Data from EMDB ID EMD- 7521, T. Nguyen et al., Nature 557:190, 2018.] Eva Nogales Kathleen Collins SUMMARY 4.5 Determination of Protein and Biomolecular Structures In x-ray crystallography, protein molecules are crystallized in well-ordered orientations that diffract x-rays. The patterns and intensities of the diffracted x-rays depend on the structure of the protein and its crystalline properties. Mathematical methods can then reconstruct the protein structure that produces a particular diffraction pattern. NMR is oen carried out on molecules in solution and yields information about atomic nuclei and their chemical environment. Protein structures can be computed from NMR data using hundreds of distance and geometric constraints obtained from multi-dimensional NMR experiments. Biomolecules are frozen in vitreous ice for imaging by cryo- EM. The individual molecules are then identified and computationally sorted. The sorted two-dimensional images are then combined using computers to produce a three-dimensional structure. Chapter Review KEY TERMS Terms in bold are defined in the glossary. conformation native conformation hydrophobic effect solvation layer secondary structure α helix β conformation β sheet β turn Ramachandran plot circular dichroism (CD) spectroscopy tertiary structure quaternary structure fibrous proteins globular proteins intrinsically disordered proteins α -keratin collagen fibroin Protein Data Bank (PDB) motif fold domain topology diagram protein family multimer oligomer protomer proteostasis denaturation renaturation chaperone Hsp70 chaperonin protein disulfide isomerase (PDI) peptide prolyl cis-trans isomerase (PPI) amyloid amyloidoses prion x-ray crystallography nuclear magnetic resonance (NMR) spectroscopy cryo-electron microscopy (cryo-EM) PROBLEMS 1. Properties of the Peptide Bond In x-ray studies of crystalline peptides, Linus Pauling and Robert Corey found that the C— N bond in the peptide link is intermediate in length (1.32 Å) between a typical C— N single bond (1.49 Å) and a C=N double bond (1.27 Å). They also found that the peptide bond is planar (all four atoms attached to the C— N group are located in the same plane) and that the two α -carbon atoms attached to the C— N are always trans to each other (on opposite sides of the peptide bond). a. What does the length of the C— N bond in the peptide linkage indicate about its strength and its bond order (i.e., whether it is single, double, or triple)? b. What do Pauling and Corey’s observations tell us about the ease of rotation about the C— N peptide bond? 2. Structural and Functional Relationships in Fibrous Proteins William Astbury discovered that the x-ray diffraction pattern of wool shows a repeating structural unit spaced about 5.2 Å along the length of the wool fiber. When he steamed and stretched the wool, the x-ray pattern showed a new repeating structural unit at a spacing of 7.0 Å. Steaming and stretching the wool and then letting it shrink gave an x-ray pattern consistent with the original spacing of about 5.2 Å. Although these observations provided important clues to the molecular structure of wool, Astbury was unable to interpret them at the time. a. Given our current understanding of the structure of wool, interpret Astbury’s observations. b. When wool sweaters or socks are washed in hot water or heated in a dryer, they shrink. Silk, on the other hand, does not shrink under the same conditions. Explain. 3. Rate of Synthesis of Hair α -Keratin Hair grows at a rate of 15 to 20 cm/yr. All this growth is concentrated at the base of the hair fiber, where α -keratin filaments are synthesized inside living epidermal cells and assembled into ropelike structures (see Fig. 4- 10). The fundamental structural element of α -keratin is the α helix, which has 3.6 amino acid residues per turn and a rise of 5.4 Å per turn (see Fig. 4-3a). Assuming that the biosynthesis of α - helical keratin chains is the rate-limiting factor in the growth of hair, calculate the rate at which peptide bonds of α -keratin chains must be synthesized (peptide bonds per second) to account for the observed yearly growth of hair. 4. Effect of pH on the Conformation of α -Helical Secondary Structures Specific rotation is a measure of a solution’s capacity to rotate circularly polarized light. The unfolding of the α helix of a polypeptide to a randomly coiled conformation is accompanied by a large decrease in a property called specific rotation. Polyglutamate, a polypeptide made up of only L-Glu residues, is an α helix at pH 3. When researchers raise the pH to 7, there is a large decrease in the specific rotation of the solution. Similarly, polylysine (L-Lys residues) is an α helix at pH 10, but when researchers lower the pH to 7, the specific rotation also decreases, as shown in the graph.

Explain the effect of the pH changes on the conformations of poly(Glu) and poly(Lys). Why does the transition occur over such a narrow range of pH? 5. Disulfide Bonds Determine the Properties of Many Proteins Some natural proteins are rich in disulfide bonds, and their mechanical properties, such as tensile strength, viscosity, and hardness, correlate with the degree of disulfide bonding. a. Glutenin, a wheat protein rich in disulfide bonds, imparts the cohesive and elastic character of dough made from wheat flour. Similarly, the hard, tough nature of tortoise shell results from the extensive disulfide bonding in its α - keratin. What is the molecular basis for the correlation between disulfide-bond content and mechanical properties of the protein? b. Most globular proteins denature and lose their activity when they are briefly heated to 65°C. However, the denaturation of globular proteins that contain multiple disulfide bonds oen requires longer heat exposure at higher temperatures. One such protein is bovine pancreatic trypsin inhibitor (BPTI), which has 58 amino acid residues in a single peptide chain and contains three disulfide bonds. Aer a solution of denatured BPTI is cooled, the protein regains its activity. What is the molecular basis for this property of BPTI? 6. Dihedral Angles Consider the series of torsion angles, ϕ and ψ , that might be taken up by the peptide backbone. Which of these closely correspond to ϕ and ψ for an idealized collagen triple helix? Refer to Figure 4-8 as a guide. 7. Amino Acid Sequence and Protein Structure Our growing understanding of how proteins fold allows researchers to make predictions about protein structure based on primary amino acid sequence data. Consider this amino acid sequence. a. Where might bends or β turns occur? b. Where might intrachain disulfide cross-linkages form? c. Suppose that this sequence is part of a larger globular protein. Indicate the probable location (external surface or interior of the protein) of each amino acid residue: Asp, Ile, Thr, Ala, Gln, Lys. Explain your reasoning. (Hint: See the hydropathy index in Table 3-1.) 8. Amino Acid Contributions to Protein Folding Like ribonuclease A, lysozyme from T4 phage is a model enzyme for understanding the energetics and pathways of protein folding. Unlike ribonuclease A, however, T4 lysozyme does not contain any disulfide bonds. A number of studies have quantified the thermodynamic contributions that individual amino acid residues and their interactions make to T4 lysozyme folding. a. An ion pair between an Asp and a His residue in T4 lysozyme contributes 13–21 kJ/mol of favorable folding energy at pH 6.0. However, this ion pair contributes much less to lysozyme folding at either pH 2.0 or pH 10.0. How can you explain this observation? b. Suppose that a Met residue buried in the folded, hydrophobic core of T4 lysozyme is replaced by mutation with a Lys residue. How would the mutation affect a plot of the thermal denaturation of T4 lysozyme at pH 3.0? (See Fig. 4-24a for an example of a thermal denaturation plot.) c. Suppose that the thermal denaturation experiment on the protein with the Met to Lys mutation took place at pH 10.0. Predict whether the mutation would have a greater or a lesser impact on protein stability at pH 10.0 than at pH 3.0. Explain your prediction. 9. Bacteriorhodopsin in Purple Membrane Proteins Under the proper environmental conditions, the salt-loving archaeon Halobacterium halobium synthesizes a membrane protein (Mr 26,000), known as bacteriorhodopsin, which is purple because it contains retinal (see Fig. 10-20). Molecules of this protein aggregate into “purple patches” in the cell membrane. Bacteriorhodopsin acts as a light-activated proton pump that provides energy for cell functions. X-ray analysis of this protein reveals that it consists of seven parallel α -helical segments, each of which traverses the bacterial cell membrane (thickness 45 Å). Calculate the minimum number of amino acid residues necessary for one segment of α helix to traverse the membrane completely. Estimate the fraction of the bacteriorhodopsin protein that is involved in membrane-spanning helices. (Use an average amino acid residue weight of 110.) 10. Conservation of Protein Structure Margaret Oakley Dayhoff originated the idea of protein superfamilies aer noticing that proteins with diverse amino acid sequences can have similar tertiary structures. Why can protein structure be more highly conserved than individual amino acid sequences? 11. Interpreting Ramachandran Plots Examine the two proteins labeled (a) and (b) below. Which of the two Ramachandran plots, labeled (c) and (d) at right, is more likely to be derived from which protein? Why? [Data from (a) PDB ID 1GWY, J. M. Mancheno et al., Structure 11:1319, 2003; (b) PDB ID 1A6M, J. Vojtechovsky et al., Biophys. J. 77:2153, 1999.] 12. Number of Polypeptide Chains in a Multisubunit Protein A researcher treated a sample (660 mg) of an oligomeric protein of Mr 132,000 with an excess of 1-fluoro-2,4-dinitrobenzene (Sanger’s reagent) under slightly alkaline conditions until the chemical reaction was complete. He then completely hydrolyzed the peptide bonds of the protein by heating it with concentrated HCl. The hydrolysate was found to contain 5.5 mg of the compound shown. 2,4-Dinitrophenyl derivatives of the α -amino groups of other amino acids could not be found. a. Explain how this information can be used to determine the number of polypeptide chains in an oligomeric protein. b. Calculate the number of polypeptide chains in this protein. c. What other analytic technique could you employ to determine whether the polypeptide chains in this protein are similar or different? 13. Amyloid Fibers in Disease Several small aromatic molecules, such as phenol red (used as a nontoxic drug model), have been shown to inhibit the formation of amyloid in laboratory model systems. A goal of the research on these small aromatic compounds is to find a drug that efficiently inhibits the formation of amyloid in the brain in people with incipient Alzheimer disease. a. Suggest why molecules with aromatic substituents would disrupt the formation of amyloid. b. Some researchers suggest that a drug used to treat Alzheimer disease may also be effective in treating type 2 (non-insulin-dependent) diabetes mellitus. Why might a single drug be effective in treating these two different conditions? 14. Protein-Folding Therapies The Food and Drug Administration recently approved the drug lumacaor for the treatment of cystic fibrosis in patients with the F508ΔCFTR mutation. This mutation is a genetically encoded deletion of amino acid F508 from the protein. About ⅔ of cystic fibrosis patients have this mutation, and lumacaor is one of the first drugs that functions as a pharmacological chaperone to correct a defect in the protein-folding process. However, lumacaor is not always effective in treating patients who have other CFTR mutations that result in misfolding. Why is lumacaor able to correct the misfolding of some mutant CFTR proteins and not others? 15. Structural Biology Methods Which structural biology method (CD, x-ray crystallography, NMR, or cryo-EM) is best suited to each task? a. Obtaining an ultra-high resolution (<1.5 Å) structure of a drug bound to its protein target b. Obtaining a low-to-medium resolution (5–10 Å) reconstruction of the 11 MDa (11,000,000 Da) bacterial flagellar motor c. Identifying the protonation state and pKa. of a His side chain in an enzyme active site d. Determining whether a protein is intrinsically disordered or contains secondary structure elements BIOCHEMISTRY ONLINE 16. Using the PDB The Protein Data Bank (PDB) contains more than 150,000 different three-dimensional biomolecular structures obtained by x-ray crystallography, NMR, and cryo-EM. Each protein structure deposited into the database is given a PDB ID. Several PDB IDs represent proteins whose structures resemble letters from the Roman alphabet. Find each protein structure in the PDB and view the three-dimensional structure using JSmol, PYMOL, or a similar structure viewer. PDB IDs: 2QYC, 2BNH, 2Q5R, 1XU9, 3H7X, 1OU5, 2WCD a. For each protein, identify its quaternary structure and describe the protomer structure as all α , all β , α /β , or α + β . b. What letter does each protein structure most closely resemble? c. What word(s) can you spell using these protein structures? 17. Protein Modeling Online A group of patients with Crohn disease (an inflammatory bowel disease) underwent biopsies of their intestinal mucosa in an attempt to identify the causative agent. Researchers identified a protein that was present at higher levels in patients with Crohn disease than in patients with an unrelated inflammatory bowel disease or in unaffected controls. The protein was isolated, and the following partial amino acid sequence was obtained (reads le to right): EAELCPDRCI HSFQNLGIQC VKKRDLEQAI SQRIQTNNNP FQVPIEEQRG DYDLNAVRLC FQVTVRDPSG RPLRLPPVLP HPIFDNRAPN TAELKICRVN RNSGSCLGGD EIFLLCDKVQ KEDIEVYFTG PGWEARGSFS QADVHRQVAI VFRTPPYADP SLQAPVRVSM QLRRPSDREL SEPMEFQYLP DTDDRHRIEE KRKRTYETFK SIMKKSPFSG PTDPRPPPRR IAVPSRSSAS VPKPAPQPYP a. You can identify this protein using a protein database such as UniProt (www.uniprot.org). On the home page, click on the link for a “BLAST” search. On the BLAST page, enter about 30 residues from the protein sequence in the appropriate search field and submit it for analysis. What does this analysis tell you about the identity of the protein? b. Try using different portions of the amino acid sequence. Do you always get the same result? c. A variety of websites provide information about the three- dimensional structure of proteins. Find information about the protein’s secondary, tertiary, and quaternary structures using database sites such as the Protein Data Bank (PDB; www.rcsb.org) or Structural Classification of Proteins (SCOP2; http://scop2.mrc-lmb.cam.ac.uk). d. In the course of your online searches, what did you learn about the cellular function of the protein? DATA ANALYSIS PROBLEM 18. Mirror-Image Proteins As noted in Chapter 3, “The amino acid residues in protein molecules are almost all L stereoisomers.” It is not clear whether this selectivity is necessary for proper protein function or is an accident of evolution. To explore this question, Milton and colleagues (1992) published a study of an enzyme made entirely of D stereoisomers. The enzyme they chose was HIV protease, a proteolytic enzyme made by HIV that converts inactive viral preproteins to their active forms. Previously, Wlodawer and coworkers (1989) had reported the complete chemical synthesis of HIV protease from L-amino acids (the L-enzyme), using the process shown in Figure 3-30. Normal HIV protease contains two Cys residues, at positions 67 and 95. Because chemical synthesis of proteins containing Cys is technically difficult, Wlodawer and colleagues substituted the synthetic amino acid L-α -amino-n-butyric acid (Aba) for the two Cys residues in the protein. In the authors’ words, this was done to “reduce synthetic difficulties associated with Cys deprotection and ease product handling.” a. The structure of Aba is shown below. Why was this a suitable substitution for a Cys residue? Under what circumstances would it not be suitable? Wlodawer and coworkers denatured the newly synthesized protein by dissolving it in 6 M guanidine HCl and then allowed it to fold slowly by dialyzing away the guanidine against a neutral buffer (10% glycerol, 25mM NaH2PO4/Na2HPO4, pH 7). b. There are many reasons to predict that a protein synthesized, denatured, and folded in this manner would not be active. Give three such reasons. c. Interestingly, the resulting L-protease was active. What does this finding tell you about the role of disulfide bonds in the native HIV protease molecule? In a more recent study, Milton and coworkers synthesized HIV protease from D-amino acids, using the same protocol as the earlier study (Wlodawer et al.). Formally, there are three possibilities for the folding of the D-protease: it would be (1) the same shape as the L-protease, (2) the mirror image of the L-protease, or (3) something else, possibly inactive. d. For each possibility, decide whether or not it is a likely outcome, and defend your position. In fact, the D-protease was active: it cleaved a particular synthetic substrate and was inhibited by specific inhibitors. To examine the structure of the D- and L- enzymes, Milton and coworkers tested both forms for activity with D and L forms of a chiral peptide substrate and for inhibition by D and L forms of a chiral peptide-analog inhibitor. Both forms were also tested for inhibition by the achiral inhibitor Evans blue. The findings are given in the table. Inhibition HIV protease Substrate hydrolysis Peptide inhibitor Evans blue (achiral)- substrate - substrate - inhibitor - inhibitor - protease − + − + + - protease + − + − + e. Which of the three models proposed is supported by these data? Explain your reasoning. f. Why does Evans blue inhibit both forms of the protease? g. Would you expect chymotrypsin to digest the D-protease? Explain your reasoning. h. Would you expect total synthesis from D-amino acids followed by renaturation to yield active enzyme for any enzyme? Explain your reasoning. References Milton, R.C., S.C. Milton, and S.B. Kent. 1992. Total chemical synthesis of a D-enzyme: the enantiomers of HIV-1 protease show demonstration of reciprocal chiral substrate specificity. Science 256:1445–1448. Wlodawer, A., M. Miller, M. Jaskólski, B.K. Sathyanarayana, E. Baldwin, I.T. Weber, L.M. Selk, L. Clawson, J. Schneider, and S.B. Kent. 1989. Conserved folding in retroviral proteases: crystal structure of a synthetic HIV-1 protease. Science 245:616–621.

Practice
Multiple choice (25 questions)

Stems are from the chapter Problems section; correct choices are drawn from Abbreviated Solutions to Problems (Appendix B) in the same edition.

Practice questions (from chapter Problems & Appendix B)Score: 0 / 25

1. Properties of the Peptide Bond In x-ray studies of crystalline peptides, Linus Pauling and Robert Corey found that the C— N bond in the peptide link is intermediate in length (1.32 Å) between a typical C— N single bond (1.49 Å) and a C=N double bond (1.27 Å). They also found that the peptide bond is planar (all four atoms attached to the C— N group are located in the same plane) and that the two α -carbon atoms attached to the C— N are always trans to each other (on opposite sides of the peptide bond). a. What does the length of the C— N bond in the peptide linkage indicate about its strength and its bond order (i.e., whether it is single, double, or triple)? b. What do Pauling and Corey’s observations tell us about the ease of rotation about the C— N peptide bond?

2. Structural and Functional Relationships in Fibrous Proteins William Astbury discovered that the x-ray diffraction pattern of wool shows a repeating structural unit spaced about 5.2 Å along the length of the wool fiber. When he steamed and stretched the wool, the x-ray pattern showed a new repeating structural unit at a spacing of 7.0 Å. Steaming and stretching the wool and then letting it shrink gave an x-ray pattern consistent with the original spacing of about 5.2 Å. Although these observations provided important clues to the molecular structure of wool, Astbury was unable to interpret them at the time. a. Given our current understanding of the structure of wool, interpret Astbury’s observations. b. When wool sweaters or socks are washed in hot water or heated in a dryer, they shrink. Silk, on the other hand, does not shrink under the same conditions. Explain.

3. Rate of Synthesis of Hair α -Keratin Hair grows at a rate of 15 to 20 cm/yr. All this growth is concentrated at the base of the hair fiber, where α -keratin filaments are synthesized inside living epidermal cells and assembled into ropelike structures (see Fig. 4- 10). The fundamental structural element of α -keratin is the α helix, which has 3.6 amino acid residues per turn and a rise of 5.4 Å per turn (see Fig. 4-3a). Assuming that the biosynthesis of α - helical keratin chains is the rate-limiting factor in the growth of hair, calculate the rate at which peptide bonds of α -keratin chains must be synthesized (peptide bonds per second) to account for the observed yearly growth of hair.

4. Effect of pH on the Conformation of α -Helical Secondary Structures Specific rotation is a measure of a solution’s capacity to rotate circularly polarized light. The unfolding of the α helix of a polypeptide to a randomly coiled conformation is accompanied by a large decrease in a property called specific rotation. Polyglutamate, a polypeptide made up of only L-Glu residues, is an α helix at pH 3. When researchers raise the pH to 7, there is a large decrease in the specific rotation of the solution. Similarly, polylysine (L-Lys residues) is an α helix at pH 10, but when researchers lower the pH to 7, the specific rotation also decreases, as shown in the graph. Explain the effect of the pH changes on the conformations of poly(Glu) and poly(Lys). Why does the transition occur over such a narrow range of pH?

5. Disulfide Bonds Determine the Properties of Many Proteins Some natural proteins are rich in disulfide bonds, and their mechanical properties, such as tensile strength, viscosity, and hardness, correlate with the degree of disulfide bonding. a. Glutenin, a wheat protein rich in disulfide bonds, imparts the cohesive and elastic character of dough made from wheat flour. Similarly, the hard, tough nature of tortoise shell results from the extensive disulfide bonding in its α - keratin. What is the molecular basis for the correlation between disulfide-bond content and mechanical properties of the protein? b. Most globular proteins denature and lose their activity when they are briefly heated to 65°C. However, the denaturation of globular proteins that contain multiple disulfide bonds oen requires longer heat exposure at higher temperatures. One such protein is bovine pancreatic trypsin inhibitor (BPTI), which has 58 amino acid residues in a single peptide chain and contains three disulfide bonds. Aer a solution of denatured BPTI is cooled, the protein regains its activity. What is the molecular basis for this property of BPTI?

6. Dihedral Angles Consider the series of torsion angles, ϕ and ψ , that might be taken up by the peptide backbone. Which of these closely correspond to ϕ and ψ for an idealized collagen triple helix? Refer to Figure 4-8 as a guide.

7. Amino Acid Sequence and Protein Structure Our growing understanding of how proteins fold allows researchers to make predictions about protein structure based on primary amino acid sequence data. Consider this amino acid sequence. a. Where might bends or β turns occur? b. Where might intrachain disulfide cross-linkages form? c. Suppose that this sequence is part of a larger globular protein. Indicate the probable location (external surface or interior of the protein) of each amino acid residue: Asp, Ile, Thr, Ala, Gln, Lys. Explain your reasoning. (Hint: See the hydropathy index in Table 3-1.)

8. Amino Acid Contributions to Protein Folding Like ribonuclease A, lysozyme from T4 phage is a model enzyme for understanding the energetics and pathways of protein folding. Unlike ribonuclease A, however, T4 lysozyme does not contain any disulfide bonds. A number of studies have quantified the thermodynamic contributions that individual amino acid residues and their interactions make to T4 lysozyme folding. a. An ion pair between an Asp and a His residue in T4 lysozyme contributes 13–21 kJ/mol of favorable folding energy at pH 6.0. However, this ion pair contributes much less to lysozyme folding at either pH 2.0 or pH 10.0. How can you explain this observation? b. Suppose that a Met residue buried in the folded, hydrophobic core of T4 lysozyme is replaced by mutation with a Lys residue. How would the mutation affect a plot of the thermal denaturation of T4 lysozyme at pH 3.0? (See Fig. 4-24a for an example of a thermal denaturation plot.) c. Suppose that the thermal denaturation experiment on the protein with the Met to Lys mutation took place at pH 10.0. Predict whether the mutation would have a greater or a lesser impact on protein stability at pH 10.0 than at pH 3.0. Explain your prediction.

9. Bacteriorhodopsin in Purple Membrane Proteins Under the proper environmental conditions, the salt-loving archaeon Halobacterium halobium synthesizes a membrane protein (Mr 26,000), known as bacteriorhodopsin, which is purple because it contains retinal (see Fig. 10-20). Molecules of this protein aggregate into “purple patches” in the cell membrane. Bacteriorhodopsin acts as a light-activated proton pump that provides energy for cell functions. X-ray analysis of this protein reveals that it consists of seven parallel α -helical segments, each of which traverses the bacterial cell membrane (thickness 45 Å). Calculate the minimum number of amino acid residues necessary for one segment of α helix to traverse the membrane completely. Estimate the fraction of the bacteriorhodopsin protein that is involved in membrane-spanning helices. (Use an average amino acid residue weight of 110.)

10. Conservation of Protein Structure Margaret Oakley Dayhoff originated the idea of protein superfamilies aer noticing that proteins with diverse amino acid sequences can have similar tertiary structures. Why can protein structure be more highly conserved than individual amino acid sequences?

11. Interpreting Ramachandran Plots Examine the two proteins labeled (a) and (b) below. Which of the two Ramachandran plots, labeled (c) and (d) at right, is more likely to be derived from which protein? Why? [Data from (a) PDB ID 1GWY, J. M. Mancheno et al., Structure 11:1319, 2003; (b) PDB ID 1A6M, J. Vojtechovsky et al., Biophys. J. 77:2153, 1999.]

12. Number of Polypeptide Chains in a Multisubunit Protein A researcher treated a sample (660 mg) of an oligomeric protein of Mr 132,000 with an excess of 1-fluoro-2,4-dinitrobenzene (Sanger’s reagent) under slightly alkaline conditions until the chemical reaction was complete. He then completely hydrolyzed the peptide bonds of the protein by heating it with concentrated HCl. The hydrolysate was found to contain 5.5 mg of the compound shown. 2,4-Dinitrophenyl derivatives of the α -amino groups of other amino acids could not be found. a. Explain how this information can be used to determine the number of polypeptide chains in an oligomeric protein. b. Calculate the number of polypeptide chains in this protein. c. What other analytic technique could you employ to determine whether the polypeptide chains in this protein are similar or different?

13. Amyloid Fibers in Disease Several small aromatic molecules, such as phenol red (used as a nontoxic drug model), have been shown to inhibit the formation of amyloid in laboratory model systems. A goal of the research on these small aromatic compounds is to find a drug that efficiently inhibits the formation of amyloid in the brain in people with incipient Alzheimer disease. a. Suggest why molecules with aromatic substituents would disrupt the formation of amyloid. b. Some researchers suggest that a drug used to treat Alzheimer disease may also be effective in treating type 2 (non-insulin-dependent) diabetes mellitus. Why might a single drug be effective in treating these two different conditions?

14. Protein-Folding Therapies The Food and Drug Administration recently approved the drug lumacaor for the treatment of cystic fibrosis in patients with the F508ΔCFTR mutation. This mutation is a genetically encoded deletion of amino acid F508 from the protein. About ⅔ of cystic fibrosis patients have this mutation, and lumacaor is one of the first drugs that functions as a pharmacological chaperone to correct a defect in the protein-folding process. However, lumacaor is not always effective in treating patients who have other CFTR mutations that result in misfolding. Why is lumacaor able to correct the misfolding of some mutant CFTR proteins and not others?

15. Structural Biology Methods Which structural biology method (CD, x-ray crystallography, NMR, or cryo-EM) is best suited to each task? a. Obtaining an ultra-high resolution (<1.5 Å) structure of a drug bound to its protein target b. Obtaining a low-to-medium resolution (5–10 Å) reconstruction of the 11 MDa (11,000,000 Da) bacterial flagellar motor c. Identifying the protonation state and pKa. of a His side chain in an enzyme active site d. Determining whether a protein is intrinsically disordered or contains secondary structure elements BIOCHEMISTRY ONLINE

16. Using the PDB The Protein Data Bank (PDB) contains more than 150,000 different three-dimensional biomolecular structures obtained by x-ray crystallography, NMR, and cryo-EM. Each protein structure deposited into the database is given a PDB ID. Several PDB IDs represent proteins whose structures resemble letters from the Roman alphabet. Find each protein structure in the PDB and view the three-dimensional structure using JSmol, PYMOL, or a similar structure viewer. PDB IDs: 2QYC, 2BNH, 2Q5R, 1XU9, 3H7X, 1OU5, 2WCD a. For each protein, identify its quaternary structure and describe the protomer structure as all α , all β , α /β , or α + β . b. What letter does each protein structure most closely resemble? c. What word(s) can you spell using these protein structures?

17. Protein Modeling Online A group of patients with Crohn disease (an inflammatory bowel disease) underwent biopsies of their intestinal mucosa in an attempt to identify the causative agent. Researchers identified a protein that was present at higher levels in patients with Crohn disease than in patients with an unrelated inflammatory bowel disease or in unaffected controls. The protein was isolated, and the following partial amino acid sequence was obtained (reads le to right): EAELCPDRCI HSFQNLGIQC VKKRDLEQAI SQRIQTNNNP FQVPIEEQRG DYDLNAVRLC FQVTVRDPSG RPLRLPPVLP HPIFDNRAPN TAELKICRVN RNSGSCLGGD EIFLLCDKVQ KEDIEVYFTG PGWEARGSFS QADVHRQVAI VFRTPPYADP SLQAPVRVSM QLRRPSDREL SEPMEFQYLP DTDDRHRIEE KRKRTYETFK SIMKKSPFSG PTDPRPPPRR IAVPSRSSAS VPKPAPQPYP a. You can identify this protein using a protein database such as UniProt (www.uniprot.org). On the home page, click on the link for a “BLAST” search. On the BLAST page, enter about 30 residues from the protein sequence in the appropriate search field and submit it for analysis. What does this analysis tell you about the identity of the protein? b. Try using different portions of the amino acid sequence. Do you always get the same result? c. A variety of websites provide information about the three- dimensional structure of proteins. Find information about the protein’s secondary, tertiary, and quaternary structures using database sites such as the Protein Data Bank (PDB; www.rcsb.org) or Structural Classification of Proteins (SCOP2; http://scop2.mrc-lmb.cam.ac.uk). d. In the course of your online searches, what did you learn about the cellular function of the protein? DATA ANALYSIS PROBLEM

18. Mirror-Image Proteins As noted in Chapter 3, “The amino acid residues in protein molecules are almost all L stereoisomers.” It is not clear whether this selectivity is necessary for proper protein function or is an accident of evolution. To explore this question, Milton and colleagues (1992) published a study of an enzyme made entirely of D stereoisomers. The enzyme they chose was HIV protease, a proteolytic enzyme made by HIV that converts inactive viral preproteins to their active forms. Previously, Wlodawer and coworkers (1989) had reported the complete chemical synthesis of HIV protease from L-amino acids (the L-enzyme), using the process shown in Figure 3-30. Normal HIV protease contains two Cys residues, at positions 67 and 95. Because chemical synthesis of proteins containing Cys is technically difficult, Wlodawer and colleagues substituted the synthetic amino acid L-α -amino-n-butyric acid (Aba) for the two Cys residues in the protein. In the authors’ words, this was done to “reduce synthetic difficulties associated with Cys deprotection and ease product handling.” a. The structure of Aba is shown below. Why was this a suitable substitution for a Cys residue? Under what circumstances would it not be suitable? Wlodawer and coworkers denatured the newly synthesized protein by dissolving it in 6 M guanidine HCl and then allowed it to fold slowly by dialyzing away the guanidine against a neutral buffer (10% glycerol, 25mM NaH2PO4/Na2HPO4, pH 7). b. There are many reasons to predict that a protein synthesized, denatured, and folded in this manner would not be active. Give three such reasons. c. Interestingly, the resulting L-protease was active. What does this finding tell you about the role of disulfide bonds in the native HIV protease molecule? In a more recent study, Milton and coworkers synthesized HIV protease from D-amino acids, using the same protocol as the earlier study (Wlodawer et al.). Formally, there are three possibilities for the folding of the D-protease: it would be (1) the same shape as the L-protease, (2) the mirror image of the L-protease, or (3) something else, possibly inactive. d. For each possibility, decide whether or not it is a likely outcome, and defend your position. In fact, the D-protease was active: it cleaved a particular synthetic substrate and was inhibited by specific inhibitors. To examine the structure of the D- and L- enzymes, Milton and coworkers tested both forms for activity with D and L forms of a chiral peptide substrate and for inhibition by D and L forms of a chiral peptide-analog inhibitor. Both forms were also tested for inhibition by the achiral inhibitor Evans blue. The findings are given in the table. Inhibition HIV protease Substrate hydrolysis Peptide inhibitor Evans blue (achiral)- substrate - substrate - inhibitor - inhibitor - protease − + − + + - protease + − + − + e. Which of the three models proposed is supported by these data? Explain your reasoning. f. Why does Evans blue inhibit both forms of the protease? g. Would you expect chymotrypsin to digest the D-protease? Explain your reasoning. h. Would you expect total synthesis from D-amino acids followed by renaturation to yield active enzyme for any enzyme? Explain your reasoning. References Milton, R.C., S.C. Milton, and S.B. Kent. 1992. Total chemical synthesis of a D-enzyme: the enantiomers of HIV-1 protease show demonstration of reciprocal chiral substrate specificity. Science 256:1445–1448. Wlodawer, A., M. Miller, M. Jaskólski, B.K. Sathyanarayana, E. Baldwin, I.T. Weber, L.M. Selk, L. Clawson, J. Schneider, and S.B. Kent. 1989. Conserved folding in retroviral proteases: crystal structure of a synthetic HIV-1 protease. Science 245:616–621.

19. Properties of the Peptide Bond In x-ray studies of crystalline peptides, Linus Pauling and Robert Corey found that the C— N bond in the peptide link is intermediate in length (1.32 Å) between a typical C— N single bond (1.49 Å) and a C=N double bond (1.27 Å). They also found that the peptide bond is planar (all four atoms attached to the C— N group are located in the same plane) and that the two α -carbon atoms attached to the C— N are always trans to each other (on opposite sides of the peptide bond). a. What does the length of the C— N bond in the peptide linkage indicate about its strength and its bond order (i.e., whether it is single, double, or triple)? b. What do Pauling and Corey’s observations tell us about the ease of rotation about the C— N peptide bond?

20. Structural and Functional Relationships in Fibrous Proteins William Astbury discovered that the x-ray diffraction pattern of wool shows a repeating structural unit spaced about 5.2 Å along the length of the wool fiber. When he steamed and stretched the wool, the x-ray pattern showed a new repeating structural unit at a spacing of 7.0 Å. Steaming and stretching the wool and then letting it shrink gave an x-ray pattern consistent with the original spacing of about 5.2 Å. Although these observations provided important clues to the molecular structure of wool, Astbury was unable to interpret them at the time. a. Given our current understanding of the structure of wool, interpret Astbury’s observations. b. When wool sweaters or socks are washed in hot water or heated in a dryer, they shrink. Silk, on the other hand, does not shrink under the same conditions. Explain.

21. Rate of Synthesis of Hair α -Keratin Hair grows at a rate of 15 to 20 cm/yr. All this growth is concentrated at the base of the hair fiber, where α -keratin filaments are synthesized inside living epidermal cells and assembled into ropelike structures (see Fig. 4- 10). The fundamental structural element of α -keratin is the α helix, which has 3.6 amino acid residues per turn and a rise of 5.4 Å per turn (see Fig. 4-3a). Assuming that the biosynthesis of α - helical keratin chains is the rate-limiting factor in the growth of hair, calculate the rate at which peptide bonds of α -keratin chains must be synthesized (peptide bonds per second) to account for the observed yearly growth of hair.

22. Effect of pH on the Conformation of α -Helical Secondary Structures Specific rotation is a measure of a solution’s capacity to rotate circularly polarized light. The unfolding of the α helix of a polypeptide to a randomly coiled conformation is accompanied by a large decrease in a property called specific rotation. Polyglutamate, a polypeptide made up of only L-Glu residues, is an α helix at pH 3. When researchers raise the pH to 7, there is a large decrease in the specific rotation of the solution. Similarly, polylysine (L-Lys residues) is an α helix at pH 10, but when researchers lower the pH to 7, the specific rotation also decreases, as shown in the graph. Explain the effect of the pH changes on the conformations of poly(Glu) and poly(Lys). Why does the transition occur over such a narrow range of pH?

23. Disulfide Bonds Determine the Properties of Many Proteins Some natural proteins are rich in disulfide bonds, and their mechanical properties, such as tensile strength, viscosity, and hardness, correlate with the degree of disulfide bonding. a. Glutenin, a wheat protein rich in disulfide bonds, imparts the cohesive and elastic character of dough made from wheat flour. Similarly, the hard, tough nature of tortoise shell results from the extensive disulfide bonding in its α - keratin. What is the molecular basis for the correlation between disulfide-bond content and mechanical properties of the protein? b. Most globular proteins denature and lose their activity when they are briefly heated to 65°C. However, the denaturation of globular proteins that contain multiple disulfide bonds oen requires longer heat exposure at higher temperatures. One such protein is bovine pancreatic trypsin inhibitor (BPTI), which has 58 amino acid residues in a single peptide chain and contains three disulfide bonds. Aer a solution of denatured BPTI is cooled, the protein regains its activity. What is the molecular basis for this property of BPTI?

24. Dihedral Angles Consider the series of torsion angles, ϕ and ψ , that might be taken up by the peptide backbone. Which of these closely correspond to ϕ and ψ for an idealized collagen triple helix? Refer to Figure 4-8 as a guide.

25. Amino Acid Sequence and Protein Structure Our growing understanding of how proteins fold allows researchers to make predictions about protein structure based on primary amino acid sequence data. Consider this amino acid sequence. a. Where might bends or β turns occur? b. Where might intrachain disulfide cross-linkages form? c. Suppose that this sequence is part of a larger globular protein. Indicate the probable location (external surface or interior of the protein) of each amino acid residue: Asp, Ile, Thr, Ala, Gln, Lys. Explain your reasoning. (Hint: See the hydropathy index in Table 3-1.)