CHAPTER 28 REGULATION OF GENE EXPRESSION abundant enzymes in the biosphere. Other gene products occur in much smaller amounts; for instance, a cell may contain only a few molecules of the enzymes that repair rare DNA lesions. Requirements for some gene products change over time. The need for enzymes in certain metabolic pathways may wax and wane as food sources change or are depleted. During development of a multicellular organism, some proteins that influence cellular differentiation are present for just a brief time in only a few cells. Specialization of cellular function can greatly affect the need for various gene products; an example is the uniquely high concentration of a single protein — hemoglobin — in erythrocytes. It is clear from these examples that the appearance of gene products must be regulated. Our exploration of the regulation of gene expression is once again guided by multiple principles: The cellular concentration of a protein is determined by a delicate balance of at least seven processes, each having several potential points of regulation. These processes include synthesis of the primary RNA transcript (transcription); posttranscriptional modification of mRNA; degradation of mRNA; protein synthesis (translation); posttranslational modification of proteins; protein targeting and transport; and protein degradation. Regulation is achieved by specialized proteins and RNAs. The proteins are usually ligand-binding proteins with no other function. They bind to specific sequences in DNA or RNA. They respond to molecular signals that can be any kind of biological molecule. The RNAs either interact with other RNAs or serve as protein cofactors. Regulated gene expression may bring about increases or decreases in the amount of a gene product. Gene products that increase in concentration under particular molecular circumstances are referred to as inducible; the process of increasing their expression is induction. Conversely, gene products that decrease in concentration in response to a molecular signal are referred to as repressible, and the process is called repression. The default transcriptional state of a gene, on or off, is dictated in part by the size and complexity of the genome. In bacteria, where genomes are relatively small and DNA is readily accessible, the default state of genes is generally “on.” Transcription of each gene or gene cluster is usually limited by a specific protein repressor. In eukaryotes, where genomes are larger and genes are encapsulated in chromatin, the default state of most genes is “off.” Gene transcription requires chromatin modification followed by the action of transcription activators. Regulation is expensive. For many genes, especially in eukaryotes, the regulatory processes can require a considerable investment of chemical energy. That expenditure is nevertheless small when compared to the cost of RNA and protein synthesis when the gene is expressed. The steps required to generate and then remove an active protein or RNA, all of which may be regulated, are summarized in Figure 28-1. We have examined several of the relevant regulatory mechanisms in previous chapters. Posttranscriptional modification of mRNA, by processes such as alternative splicing patterns (see Fig. 26-20) or RNA editing (see Figs 27-10 and 27-12), can affect which proteins are produced from an mRNA transcript and in what amounts. A variety of nucleotide sequences in an mRNA can affect the rate of its degradation (p. 986). Many factors affect the rate at which an mRNA is translated into a protein, as well as the posttranslational modification, targeting, and eventual degradation of that protein (Chapter 27). FIGURE 28-1 Seven processes that affect the steady-state concentration of a protein. Each process has several potential points of regulation. Of the regulatory processes illustrated in Figure 28-1, those operating at the level of transcription initiation are particularly well-documented. These processes are a major focus of this chapter, although we also consider other mechanisms. As noted in earlier chapters, the complexity of an organism is not reflected in the number of its protein-coding genes. Instead, as complexity increases from bacteria to mammals, mechanisms of gene regulation become more elaborate, and posttranscriptional and translational regulation play greater roles. Control of transcription initiation permits the synchronized regulation of multiple genes encoding products with interdependent activities. For example, when their DNA is heavily damaged, bacterial cells require a coordinated increase in the levels of the many DNA repair enzymes. And perhaps the most sophisticated form of coordination occurs in the complex regulatory circuits that guide the development of multicellular eukaryotes, which can include many types of regulatory mechanisms. We begin by examining the interactions between proteins and DNA that are the key to transcriptional regulation. We next discuss the specific proteins that influence the expression of specific genes, first in bacterial and then in eukaryotic cells. Information about posttranscriptional and translational controls is included in the discussion, where relevant, to provide a more complete overview of the rich complexity of cellular regulation. 28.1 The Proteins and RNAs of Gene Regulation Transcription is mediated and regulated by protein-DNA interactions, especially those involving the protein components of RNA polymerase (Chapter 26). We first consider how the activity of RNA polymerase is regulated. Next, we proceed to a general description of the proteins participating in this regulation. We then examine the molecular basis for the recognition of specific DNA sequences by DNA-binding proteins. Regulatory RNAs are encountered most o en in the regulation of mRNA translation in eukaryotes, but also play a role in the life of some bacterial mRNAs. We consider them briefly in this section, and in more detail in Section 28.3. RNA Polymerase Binds to DNA at Promoters RNA polymerases bind to DNA and initiate transcription at promoters (see Fig. 26-5), sites generally found near points at which RNA synthesis begins on the DNA template. The regulation of transcription initiation o en entails changes in how RNA polymerase interacts with a promoter. The nucleotide sequences of promoters vary considerably, affecting the binding affinity of RNA polymerases and thus the frequency of transcription initiation. Some Escherichia coli genes are transcribed once per second, others less than once per cell generation. Much of this variation is due to differences in promoter sequence. In the absence of regulatory proteins, differences in promoter sequence may affect the frequency of transcription initiation by a factor of 1,000 or more. Most E. coli promoters have a sequence close to a consensus (Fig. 28-2). Mutations that result in a shi away from the consensus sequence usually decrease the function of bacterial promoters; conversely, mutations toward consensus usually enhance promoter function. FIGURE 28-2 Consensus sequence for many E. coli promoters. Most base substitutions in the −10 and −35 regions have a negative effect on promoter function. Some promoters also include the UP (upstream promoter) element. KEY CONVENTION By convention, DNA sequences are shown as they exist in the nontemplate strand, with the 5′ terminus on the le . Nucleotides are numbered from the transcription start site, with positive numbers to the right (in the direction of transcription) and negative numbers to the le . N indicates any nucleotide.
Genes for products that are required at all times, such as those for the enzymes of central metabolic pathways, are expressed continuously, with little variation in virtually every cell of a species or organism. Such genes are o en referred to as housekeeping genes. Expression of a gene at approximately constant levels is called constitutive gene expression. Although housekeeping genes are expressed constitutively, the cellular concentrations of the proteins they encode vary widely. For these genes, the RNA polymerase–promoter interaction strongly influences the rate of transcription initiation. Differences in promoter sequence may be the only level of regulation for a housekeeping gene, allowing the cell to synthesize the appropriate level of each housekeeping gene product. The basal rate of transcription initiation at the promoters of nonhousekeeping genes is also determined by the promoter sequence, but expression of these genes is further modulated by regulatory proteins. Many of these proteins work by enhancing or interfering with the interaction between RNA polymerase and the promoter. The sequences of eukaryotic promoters are much more variable than their bacterial counterparts. The three eukaryotic RNA polymerases usually require an array of general transcription factors in order to bind to a promoter, and these can heavily influence basal transcription rates. Yet, as with bacterial gene expression, the basal level of transcription is determined in part by the effect of promoter sequences on the function of RNA polymerase and its associated transcription factors. Transcription Initiation Is Regulated by Proteins and RNAs At least three types of regulatory proteins regulate transcription initiation by RNA polymerase: specificity factors alter the specificity of RNA polymerase for a given promoter or set of promoters, repressors impede access of RNA polymerase to the promoter, and activators enhance the RNA polymerase–promoter interaction. As our understanding of the roles of protein regulators slowly matures, many new roles for gene regulation by noncoding RNAs (ncRNAs) are also beginning to emerge. Among these are the long noncoding RNAs (lncRNAs), generally defined as noncoding RNAs more than 200 nucleotides long that lack an open reading frame (ORF) that encodes a protein — thus distinguishing them from the small, functional ncRNAs (miRNA, snoRNA, snRNA, etc.) described in Chapter 26. The lncRNAs are found in all types of organisms, with tens of thousands expressed in mammalian cells. Known functions of lncRNAs include regulation of nucleosome positioning and chromatin structure, control of DNA methylation and posttranscriptional histone modifications, transcriptional gene silencing, multiple roles in transcriptional activation and repression, and much more. We introduced bacterial specificity factors in Chapter 26, although we did not refer to these proteins by that name. The σ subunit of the E. coli RNA polymerase holoenzyme is a specificity factor that mediates promoter recognition and binding. Most E. coli promoters are recognized by a single σ subunit (Mr 70,000), σ70 (see Fig. 26-5). Under some conditions, some of the σ70 subunits are replaced by one of six other specificity factors. One notable case arises when bacteria are subjected to heat stress, leading to the replacement of σ70 by σ32 (Mr 32,000). When bound to σ32, RNA polymerase is directed to a specialized set of promoters with a different consensus sequence (Fig. 28-3). These promoters control the expression of a set of genes that encode proteins, including some protein chaperones (p. 132), that are part of a stress-induced system called the heat shock response. Thus, through changes in the binding affinity of the polymerase that direct the enzyme to different promoters, a set of genes involved in related processes is coordinately regulated. In eukaryotic cells, some of the general transcription factors, in particular the TATA-binding protein (TBP; see Fig. 26-9), may be considered specificity factors. FIGURE 28-3 Consensus sequence for promoters that regulate expression of the E. coli heat shock genes. This system responds to temperature increases as well as some other environmental stresses, resulting in the induction of a set of proteins. Binding of RNA polymerase to heat shock promoters is mediated by a specialized σ subunit of the polymerase, σ32, which replaces σ70 in the RNA polymerase initiation complex. Repressors bind to specific sites on the DNA. In bacterial cells, such binding sites, called operators, are generally near a promoter. RNA polymerase binding, or its movement along the DNA a er binding, is blocked when the repressor is present. Regulation by means of a repressor protein that blocks transcription is referred to as negative regulation. Repressor binding to DNA is regulated by a molecular signal, or effector, usually a small molecule or a protein that binds to the repressor and causes a conformational change. The interaction between repressor and signal molecule either increases or decreases transcription. In some cases, the conformational change results in dissociation of a DNA-bound repressor from the operator (Fig. 28-4a). Transcription initiation can then proceed unhindered. In other cases, interaction between an inactive repressor and the signal molecule causes the repressor to bind to the operator (Fig. 28-4b). In eukaryotic cells, gene regulation by a repressor is less common. Where it does occur (more o en in lower eukaryotes such as yeast), the binding site for a repressor may be some distance from the promoter. Binding of these repressors to their binding sites has the same effect as in bacterial cells: inhibiting the assembly or activity of a transcription complex at the promoter.
FIGURE 28-4 Common patterns of regulation of transcription initiation. Two types of negative regulation are illustrated. (a) Repressor binds to the operator in the absence of the molecular signal; the external signal causes dissociation of the repressor to permit transcription. (b) Repressor binds in the presence of the signal; the repressor dissociates, and transcription ensues when the signal is removed. Positive regulation is mediated by gene activators. Again, two types are shown. (c) Activator binds in the absence of the molecular signal and transcription proceeds; when the signal is added, the activator dissociates and transcription is inhibited. (d) Activator binds in the presence of the signal; it dissociates only when the signal is removed. Note that “positive” regulation and “negative” regulation refer to the type of regulatory protein involved: the bound protein either facilitates or inhibits transcription. In either case, addition of the molecular signal may increase or decrease transcription, depending on its effect on the regulatory protein. Activators provide a molecular counterpoint to repressors; they bind to DNA and enhance the activity of RNA polymerase at a promoter; this is positive regulation. In bacteria, activator- binding sites are o en adjacent to promoters that are bound weakly or not at all by RNA polymerase alone, such that little transcription occurs in the absence of the activator. Some activators are usually bound to DNA, enhancing transcription until dissociation of the activator is triggered by the binding of a signal molecule (Fig. 28-4c). In other cases the activator binds to DNA only a er interaction with a signal molecule (Fig. 28-4d). Signal molecules can therefore increase or decrease transcription, depending on how they affect the activator. Positive regulation by activators is particularly common in eukaryotes. Many eukaryotic activators bind to DNA sites, called enhancers, that are distant from the promoter, affecting the rate of transcription at a promoter that may be located thousands of base pairs away. The distance between a promoter and the binding site of an activator or repressor is bridged by looping out of the DNA between the two sites (Fig. 28-5). The looping is facilitated in some cases by proteins called architectural regulators that bind to intervening sites. Interaction between activators and the RNA polymerase at the promoter is o en mediated by intermediary proteins called coactivators. In some instances, protein repressors may take the place of coactivators, binding to the activators and preventing the activating interaction. FIGURE 28-5 Interaction between activators/repressors and RNA polymerase in eukaryotes. Eukaryotic activators and repressors frequently bind sites thousands of base pairs distant from the promoters they regulate. DNA looping, o en facilitated by architectural regulators, brings the sites together. The interaction between activators and RNA polymerase may be mediated by coactivators, as shown. Repression is sometimes mediated by repressors (described later) that bind to activators, thereby preventing the activating interaction with RNA polymerase. Many Bacterial Genes Are Clustered and Regulated in Operons Bacteria have a simple general mechanism for coordinating the regulation of multiple genes: these genes are clustered on the chromosome and are transcribed together. Many bacterial mRNAs are polycistronic — multiple genes on a single transcript — and the single promoter that initiates transcription of the cluster is the site of regulation for expression of all the genes in the cluster. The gene cluster and promoter, plus additional sequences that function together in regulation, are called an operon (Fig. 28-6). Operons that include two to six genes transcribed as a unit are common; some operons contain 20 or more genes. The identity and order of the genes in an operon are not random. In many cases, genes in the same operon encode subunits of a larger protein complex, and cotranslation directly enables assembly of the complex. Some operons organize genes involved in related processes that require coordinated regulation. In other cases, the genes may seem to be unrelated, but they encode products required by the cell under similar conditions. FIGURE 28-6 Representative bacterial operon. Genes A, B, and C are transcribed on one polycistronic mRNA. Typical regulatory sequences include binding sites for proteins that either activate or repress transcription from the promoter. Many of the principles of bacterial gene expression were first defined by studies of lactose metabolism in E. coli, which can use lactose as its sole carbon source. In 1960, François Jacob and Jacques Monod published a short paper in the Proceedings of the French Academy of Sciences that described how two adjacent genes involved in lactose metabolism were coordinately regulated by a genetic element located at one end of the gene cluster. The genes were those for β -galactosidase, which cleaves lactose to galactose and glucose, and for galactoside permease, which transports lactose into the cell (Fig. 28-7). The terms “operon” and “operator” were first introduced in this paper. With the operon model, gene regulation could, for the first time, be considered in molecular terms. FIGURE 28-7 Lactose metabolism in E. coli. Uptake and metabolism of lactose require the activities of galactoside (lactose) permease and β - galactosidase. Conversion of lactose to allolactose by transglycosylation is a minor reaction also catalyzed by β -galactosidase. The lac Operon Is Subject to Negative Regulation The lactose (lac) operon (Fig. 28-8a) includes the genes for β - galactosidase (Z), galactoside permease (Y), and thiogalactoside transacetylase (A). The last of these enzymes seems to modify toxic galactosides to facilitate their removal from the cell. Each of the three genes is preceded by a ribosome-binding site (not shown in Fig. 28-8) that independently directs the translation of that gene (Chapter 27). Regulation of the lac operon by the lac repressor protein (Lac) follows the pattern outlined in Figure 28- 4a. FIGURE 28-8 The lac operon. (a) In the lac operon, the lacI gene encodes the Lac repressor. The lac Z, Y, and A genes encode β -galactosidase, galactoside permease, and thiogalactoside transacetylase, respectively. P is the promoter for the lac genes, and PI is the promoter for the I gene. O1 is the main operator for the lac operon; O2 and O3 are secondary operator sites of lesser affinity for the Lac repressor. The inverted repeat to which the Lac repressor binds in O1 is shown. (b) The Lac repressor binds to the main operator and O2 or O3, and it seems to form a loop in the DNA. (c) Lac repressor (shades of red) is shown bound to short, discontinuous segments of DNA (blue and orange). [(c) Data from PDB ID 2PE5, R. Daber et al., J. Mol. Biol. 370:609, 2007.] The study of lac operon mutants has revealed some details of the workings of the operon’s regulatory system. In the absence of lactose, the lac operon genes are repressed. Mutations in the operator or in another gene, the I gene, result in constitutive synthesis of the gene products. When the I gene is defective, repression can be restored by introducing a functional I gene into the cell on another DNA molecule, demonstrating that the I gene encodes a diffusible molecule that causes gene repression. This molecule proved to be a protein, now called the Lac repressor, a tetramer of identical monomers. The operator to which it binds most tightly (O1) abuts the transcription start site (Fig. 28-8a). The I gene is transcribed from its own promoter (PI) independent of the lac operon genes. The lac operon has two secondary binding sites for the Lac repressor: O2 and O3. O2 is centered near position +410, within the gene encoding β -galactosidase (Z); O3 is near position −90, within the I gene. To repress the operon, the Lac repressor seems to bind to both the main operator and one of the two secondary sites, with the intervening DNA looped out (Fig. 28-8b, c). Either binding arrangement blocks transcription initiation. Despite this elaborate binding complex, repression is not absolute. Binding of the Lac repressor reduces the rate of transcription initiation by a factor of 103. If the O2 and O3 sites are eliminated by deletion or mutation, the binding of repressor to O1 alone reduces transcription by a factor of about 102. Even in the repressed state, each cell has a few molecules of β - galactosidase and galactoside permease, presumably synthesized on the rare occasions when the repressor transiently dissociates from the operators. This basal level of transcription is essential to operon regulation. When cells are provided with lactose, the lac operon is induced. An inducer (signal) molecule binds to a specific site on the Lac repressor, causing a conformational change that results in dissociation of the repressor from the operator. The inducer in the lac operon system is not lactose itself but allolactose, an isomer of lactose (Fig. 28-7). A er entry into the E. coli cell (via the few preexisting molecules of lactose permease), lactose is converted to allolactose by one of the few preexisting β - galactosidase molecules. Release of the operator by Lac repressor, triggered as the repressor binds to allolactose, allows expression of the lac operon genes and leads to a 103-fold increase in the concentration of β -galactosidase. Several β -galactosides structurally related to allolactose are inducers of the lac operon but are not substrates for β - galactosidase; others are substrates but not inducers. One particularly effective and nonmetabolizable inducer of the lac operon that is o en used experimentally is isopropylthiogalactoside (IPTG). An inducer that cannot be metabolized allows researchers to explore the physiological function of lactose as a carbon source for growth, separate from its function in the regulation of gene expression. In addition to the multitude of operons now known in bacteria, a few polycistronic operons have been found in the cells of lower eukaryotes. In the cells of higher eukaryotes, however, almost all protein-coding genes are transcribed separately. The mechanisms by which operons are regulated can vary significantly from the simple model presented in Figure 28-8. Even the lac operon is more complex than indicated here, with an activator also contributing to the overall scheme, as we shall see in Section 28.2. Before any further discussion of the layers of regulation of gene expression, however, we examine the critical molecular interactions between DNA-binding proteins (such as repressors and activators) and the DNA sequences to which they bind. Regulatory Proteins Have Discrete DNA-Binding Domains Regulatory proteins generally bind to specific DNA sequences. Their affinity for these target sequences is roughly 104 to 106 times higher than their affinity for any other DNA sequence. Most regulatory proteins have discrete DNA-binding domains containing substructures that interact closely and specifically with the DNA. These binding domains usually include one or more of a relatively small group of recognizable and characteristic structural motifs. To bind specifically to DNA sequences, regulatory proteins must recognize surface features on the DNA (see Fig. 8-13). Most of the chemical groups that differ among the four bases and thus permit discrimination between base pairs are hydrogen-bond donor and acceptor groups exposed in the major groove of DNA (Fig. 28-9), and most of the protein-DNA contacts that impart specificity are hydrogen bonds. A notable exception is the nonpolar surface near C-5 of pyrimidines, where thymine is readily distinguished from cytosine by its protruding methyl group. Protein-DNA contacts are also possible in the minor groove of the DNA, but the hydrogen- bonding patterns there generally do not allow ready discrimination between base pairs. FIGURE 28-9 Groups in DNA available for protein binding. (a) Shown here are functional groups on all four base pairs that are displayed in the major and minor grooves of DNA. Hydrogen-bond acceptor (A) and donor (D) atoms are marked by blue and red disks, respectively. Other hydrogen atoms (H) are marked with purple disks, and methyl groups (M) with yellow disks. (b) Recognition patterns for each base pair (from le to right). The much greater variation in the patterns for the major groove gives rise to a much greater discriminatory power in the major groove relative to the minor groove. [Information from J. L. Huret, Atlas Genet. Cytogenet. Oncol. Haematol. 2006, http://atlasgeneticsoncology.org/Educ/DNAEngID30001ES.html.] Within regulatory proteins, the amino acid side chains most o en hydrogen-bonding to bases in the DNA are those of Asn, Gln, Glu, Lys, and Arg residues. Is there a simple recognition code in which a particular amino acid always pairs with a particular base? The two hydrogen bonds that can form between Gln or Asn and the N6 and N-7 positions of adenine cannot form with any other base. And an Arg residue can form two hydrogen bonds with N-7 and O6 of guanine (Fig. 28-10). Examination of the structure of many DNA-binding proteins, however, has shown that a protein can recognize each base pair in more than one way, leading to the conclusion that there is no simple amino acid–base code. For some proteins, the Gln-adenine interaction can specify A═T base pairs, but in others a van der Waals pocket for the methyl group of thymine can recognize A═T base pairs. Researchers cannot yet examine the structure of a DNA-binding protein and infer the DNA sequence to which it binds.
FIGURE 28-10 Specific amino acid residue–base pair interactions. The two examples shown have been observed in DNA-protein binding. To interact with bases in the major groove of DNA, a protein requires a relatively small substructure that can stably protrude from the protein surface. The DNA-binding domains of regulatory proteins tend to be small (60 to 90 amino acid residues), and the structural motifs within these domains that are actually in contact with the DNA are smaller still. Many small proteins are unstable because of their limited capacity to form layers of structure to bury hydrophobic groups. The DNA-binding motifs provide either a very compact stable structure or a way of allowing a segment of protein to protrude from the protein surface. The DNA-binding sites for regulatory proteins are o en inverted repeats of a short DNA sequence (a palindrome) at which multiple (usually two) subunits of a regulatory protein bind cooperatively. The Lac repressor is unusual in that it functions as a tetramer, with two dimers tethered together at the end distant from the DNA-binding sites (Fig. 28-8b). An E. coli cell usually contains about 20 tetramers of the Lac repressor. Each of the tethered dimers separately binds to a palindromic operator sequence, in contact with 17 bp of a 22 bp region in the lac operon. And each of the tethered dimers can independently bind to an operator sequence, with one generally binding to O1 and the other to O2 or O3 (as in Fig. 28-8b). The symmetry of the O1operator sequence corresponds to the twofold axis of symmetry of two paired Lac repressor subunits. The tetrameric Lac repressor binds to its operator sequences in vivo with an estimated dissociation constant of 10−10 M. The repressor discriminates between the operators and other sequences by a factor of about 106, so binding to these few base pairs among the 4.6 million or so of the E. coli chromosome is highly specific. Several DNA-binding motifs have been described, but here we focus on two that play prominent roles in the binding of DNA by regulatory proteins from all domains of life: the helix-turn-helix and the zinc finger. We also consider two other types of such motifs: the homeodomain and the RNA recognition motif, which, as its name implies, also binds RNA; both motifs play prominent roles in some eukaryotic regulatory proteins. Helix-Turn-Helix The helix-turn-helix motif is crucial to the interaction of many regulatory proteins with DNA in bacteria, and similar motifs occur in some eukaryotic regulatory proteins. The helix-turn- helix comprises about 20 amino acid residues in two short α - helical segments, each 7 to 9 residues long, separated by a β turn (Fig. 28-11). This structure generally is not stable by itself; it is simply the interactive portion of a somewhat larger DNA-binding domain. One of the two α -helical segments is called the recognition helix, because it usually contains many of the amino acids that interact with DNA in a sequence-specific way. This α helix is stacked on other segments of the protein structure so that it protrudes from the protein surface. When bound to DNA, the recognition helix is positioned in or nearly in the major groove. The Lac repressor has this DNA-binding motif. FIGURE 28-11 Helix-turn-helix. (a) DNA-binding domain of the Lac repressor bound to DNA. (b) The entire Lac repressor. The DNA-binding domains and the α helices involved in tetramer formation are labeled. The remainder of the protein has the binding sites for allolactose. The allolactose-binding domains are linked to the DNA-binding domains through linker helices. [Data from PDB ID 2PE5, R. Daber et al., J. Mol. Biol. 370:609, 2007.] Zinc Finger In a zinc finger, about 30 amino acid residues form an elongated loop held together at the base by a single Zn2+ ion, which is coordinated to 4 of the residues (4 Cys, or 2 Cys and 2 His). The zinc does not itself interact with DNA; rather, the coordination of zinc with the amino acid residues stabilizes this small structural motif. Several hydrophobic side chains in the core of the structure also lend stability. Figure 28-12 shows the interaction between DNA and three zinc fingers of a single polypeptide from the mouse regulatory protein Zif268. FIGURE 28-12 Zinc fingers. Three zinc fingers of the regulatory protein Zif268, complexed with DNA. Each Zn2+ coordinates with two His and two Cys residues. [Data from PDB ID 1ZAA, N. P. Pavletich and C. O. Pabo, Science 252:809, 1991.] Many eukaryotic DNA-binding proteins contain zinc fingers. The interaction of a single zinc finger with DNA is typically weak, and many DNA-binding proteins, like Zif268, have multiple zinc fingers that substantially enhance binding by interacting simultaneously with the DNA. One DNA-binding protein of the frog Xenopus has 37 zinc fingers. There are few known examples of the zinc finger motif in bacterial proteins. The precise manner in which proteins with zinc fingers bind to DNA differs from one protein to the next. Some zinc fingers contain the amino acid residues that are important in sequence discrimination, whereas others seem to bind DNA nonspecifically (the amino acids required for specificity are located elsewhere in the protein). Zinc fingers can also function as RNA-binding motifs, such as in certain proteins that bind eukaryotic mRNAs and act as translational repressors. We discuss this role later (Section 28.3). Homeodomain Another type of DNA-binding domain has been identified in some proteins that function as transcriptional regulators, especially during eukaryotic development. This domain of 60 amino acid residues — called the homeodomain, because it was discovered in homeotic genes (genes that regulate the development of body patterns) — is highly conserved and has now been identified in proteins from a wide variety of organisms, including humans (Fig. 28-13). The DNA-binding segment of the domain is related to the helix-turn-helix motif. The DNA sequence that encodes this domain is known as the homeobox. FIGURE 28-13 Homeodomains. Shown here are two homeodomains bound to DNA. In each homeodomain, the recognition helix, layered on two others, can be seen protruding into the major groove. This is only a small part of a larger regulatory protein from a class called Pax, active in the regulation of development in fruit flies (see Section 28.3). [Data from PDB ID 1FJL, D. S. Wilson et al., Cell 82:709, 1995.] RNA Recognition Motif New classes of proteins with RNA-binding domains continue to be identified. RNA recognition motifs (RRMs) are found in some eukaryotic gene activators, where they may do double duty in binding DNA and RNA. When bound to specific binding sites in DNA, these activators induce transcription. The same activators are sometimes regulated in part by specific lncRNAs that compete with DNA binding and decrease gene transcription. Other proteins with RRM motifs bind to mRNA, rRNA, or any of a range of other smaller, noncoding RNAs. The RRM consists of 90 to 100 amino acid residues, arranged in a four-strand antiparallel β sheet sandwiched against two α helices, with a β1-α1-β2-β3-α2-β4 topology (Fig. 28-14a). This motif may be present as part of DNA- binding regulatory proteins that also have other DNA-binding motifs or may occur in proteins that bind uniquely to RNA. The RRM is just one of many diverse protein motifs known to interact with RNA (Fig. 28-14b). FIGURE 28-14 RNA recognition motifs (RRMs). (a) An RRM from the p50 subunit of regulatory protein NF-κB is shown, bound to (le ) DNA and (right) RNA. Black lines indicate hydrogen-bonding interactions between particular amino acid residues and bases in the DNA or RNA. NF-κB is the name of a family of structurally related eukaryotic transcription factors that regulate processes ranging from immune and inflammatory responses to cell growth and apoptosis. (b) Three additional RRM motifs from widespread RNA-binding protein families. [(a) Data from (le ) PDB ID 1OOA, D. B. Huang et al., Proc. Natl. Acad. Sci. USA 100:9268, 2003; (right) PDB ID 1VKX, F. E. Chen et al., Nature 391:410, 1998. (b) DEAD box RIG1: PDB ID 3LRR, C. Lu et al., Structure 18:1032, 2010; ROQ domain: PDB ID 4QIK, D. Tan et al., Nat. Struct. Mol. Biol. 21:679, 2014; Pumilio-Nos-hunchback: PDB ID 5KL1, C. A. Weidmann et al., eLife 5, 2016.] Regulatory Proteins Also Have Protein-Protein Interaction Domains Regulatory proteins contain domains not only for DNA binding but also for protein-protein interactions — with RNA polymerase, other regulatory proteins, or other subunits of the same regulatory protein. Examples include many eukaryotic transcription factors that function as gene activators, which o en bind as dimers to the DNA through DNA-binding domains that contain zinc fingers. Some structural domains are devoted to the interactions required for dimer formation, which is generally a prerequisite for DNA binding. Like DNA-binding motifs, the structural motifs that mediate protein-protein interactions tend to fall within one of a few common categories. Two important examples are the leucine zipper and the basic helix-loop-helix. Structural motifs such as these are the basis for classifying some regulatory proteins into structural families. Leucine Zipper The leucine zipper is an amphipathic α helix with a series of hydrophobic amino acid residues concentrated on one side (Fig. 28-15), with the hydrophobic surface forming the area of contact between the two polypeptides of a dimer. A striking feature of these α helices is the occurrence of Leu residues at every seventh position, forming a straight line along the hydrophobic surface. Although researchers initially thought the Leu residues interdigitated (hence the name “zipper”), we now know that they line up, side by side, as the interacting α helices coil around each other (forming a coiled coil; Fig. 28-15b). Regulatory proteins with leucine zippers o en have a separate DNA-binding domain with a high concentration of basic (Lys or Arg) residues that can interact with the negatively charged phosphates of the DNA backbone. Leucine zippers have been found in many eukaryotic proteins and a few bacterial proteins.
FIGURE 28-15 Leucine zippers. (a) Comparison of amino acid sequences of several leucine zipper proteins. Notice the Leu (L) residues (shaded) at every seventh position in the zipper region, and the number of Lys (K) and Arg (R) residues in the DNA-binding region. (b) Leucine zipper from the yeast activator protein GCN4. Only the “zippered” α helices, derived from different subunits of the dimeric protein, are shown. The two helices wrap around each other in a gently coiled coil. The interacting Leu side chains and the conserved residues in the DNA-binding region are highlighted to correspond to the sequence in (a). [(a) Information from S. L. McKnight, Sci. Am. 264 (April):54, 1991. (b) Data from PDB ID 1YSA, T. E. Ellenberger et al., Cell 71:1223, 1992.] Basic Helix-Loop-Helix Another common structural motif, the basic helix-loop-helix, occurs in some eukaryotic regulatory proteins implicated in the control of gene expression during development of multicellular organisms. These proteins share a conserved region of about 50 amino acid residues important in both DNA binding and protein dimerization. This region can form two short, amphipathic α helices linked by a loop of variable length, the helix-loop-helix (distinct from the helix-turn-helix motif associated with DNA binding). The helix-loop-helix motifs of two polypeptides interact to form dimers (Fig. 28-16). In these proteins, DNA binding is mediated by an adjacent short amino acid sequence rich in basic residues, similar to the separate DNA-binding region in proteins containing leucine zippers. FIGURE 28-16 Helix-loop-helix. The human transcription factor Max, bound to its DNA target site. The protein is dimeric; one subunit is colored. The recognition helix is linked via the loop to the dimer-forming helix, which merges with the carboxyl-terminal end of the subunit. Interaction of the carboxyl-terminal helices of the two subunits describes a coiled coil very similar to that of a leucine zipper (see Fig. 28-15b), but with only one pair of interacting Leu residues (side chains at the right) in this example. The overall structure is sometimes called a helix-loop-helix/leucine zipper motif. [Data from PDB ID 1HLO, P. Brownlie et al., Structure 5:509, 1997.] Protein-Protein Interactions in Eukaryotic Regulatory Proteins In eukaryotes, most genes are regulated by activators, and most genes are monocistronic. If a different activator were required for each gene, the number of activators (and genes encoding them) would need to be equivalent to the number of regulated genes. However, in yeast, about 300 transcription factors (many of them activators) are responsible for the regulation of many thousands of genes. Many of the transcription factors regulate the induction of multiple genes, but most genes are subject to regulation by multiple transcription factors. Appropriate regulation of different genes is accomplished by different combinations of a limited repertoire of transcription factors at each gene, a mechanism referred to as combinatorial control. Combinatorial control is accomplished in part by mixing and matching the variants within a regulatory protein family to form a series of different active protein dimers. Several families of eukaryotic transcription factors have been defined on the basis of close structural similarities. Within each family, dimers can sometimes form between two identical proteins (a homodimer) or between two different members of the family (a heterodimer). A hypothetical family of four different leucine-zipper proteins could thus form up to 10 different dimeric species. In many cases, the different combinations have distinct regulatory and functional properties and regulate different genes. As we shall see, multiple regulatory proteins of this kind function in the regulation of most eukaryotic genes, further contributing to combinatorial control. In addition to having structural domains devoted to DNA binding and protein dimerization, which direct a particular protein dimer to a particular gene, many regulatory proteins have domains that interact with RNA polymerase, with regulatory RNAs, with unrelated regulatory proteins, or with some combination of the three. At least three types of additional domains for protein- protein interaction have been characterized (primarily in eukaryotes): glutamine-rich, proline-rich, and acidic domains, the names reflecting the amino acid residues that are especially abundant. Protein-DNA and protein-RNA binding interactions are the basis of the intricate regulatory circuits fundamental to gene function. We now turn to a closer examination of these gene regulatory schemes, first in bacteria, then in eukaryotes. SUMMARY 28.1 The Proteins and RNAs of Gene Regulation Transcription is initiated when an RNA polymerase interacts with a site called a promoter. In bacteria, the frequency of transcription initiation is dictated in part by sequence changes within the promoter. For gene products required all the time at a defined level — the products of housekeeping genes — the promoter sequence may be the sole element of regulation. For genes encoding products that are not always needed, regulation is imposed by additional proteins and RNAs. Regulation of gene transcription is imposed primarily by three types of proteins: specificity factors, repressors, and activators. Regulatory RNAs also play an important role in regulating the expression of many genes. In bacteria, genes that encode products with interdependent functions are o en clustered in an operon, a single transcriptional unit. Transcription of the genes is generally blocked by binding of a specific repressor protein at a DNA site called an operator. Dissociation of the repressor from the operator is mediated by a specific small molecule, an inducer. Many principles of gene regulation in bacteria were first elucidated in studies of the lactose (lac) operon. The Lac repressor dissociates from the lac operator when the repressor binds to its inducer, allolactose. Regulatory proteins are DNA-binding proteins that recognize specific DNA sequences; most have distinct DNA-binding domains. Within these domains, common structural motifs that bind DNA (and/or RNA) are the helix-turn-helix, zinc finger, homeodomain, and RNA recognition motif. Regulatory proteins o en contain domains such as the leucine zipper and helix-loop-helix, required for dimerization or other protein-protein interactions, and other motifs required for activation of transcription. Mixing and matching of protein family variants in dimeric transcription factors provides for more efficient and responsive regulation through combinatorial control. 28.2 Regulation of Gene Expressionin Bacteria As in many other areas of biochemical investigation, the study of the regulation of gene expression advanced earlier and faster in bacteria than in other experimental organisms. The examples of bacterial gene regulation presented here are chosen from among scores of well-studied systems, partly for their historical significance, but primarily because they provide a good overview of the range of regulatory mechanisms in bacteria. Many of the principles of bacterial gene regulation are also relevant to understanding gene expression in eukaryotic cells. We begin by examining the lactose and tryptophan operons; each system has regulatory proteins, but the overall mechanisms of regulation are very different. This is followed by a short discussion of the SOS response in E. coli, illustrating how genes scattered throughout the genome can be coordinately regulated. We then describe two bacterial systems of quite different types, illustrating the diversity of gene regulatory mechanisms: regulation of ribosomal protein synthesis at the level of translation, with many of the regulatory proteins binding to RNA (rather than DNA), and regulation of the process of “phase variation” in Salmonella, which results from genetic recombination. Finally, we examine some additional examples of posttranscriptional regulation in which the RNA modulates its own function. The lac Operon Undergoes Positive Regulation The operator-repressor-inducer interactions described earlier for the lac operon (Fig. 28-8) provide an intuitively satisfying model for an on/off switch in the regulation of gene expression, but operon regulation is rarely so simple. A bacterium’s environment is too complex for its genes to be controlled by one signal. Other factors besides lactose, such as the availability of glucose, affect the expression of the lac genes. Glucose, metabolized directly by glycolysis, is the preferred energy source in E. coli. Other sugars can serve as the main or sole nutrient, but extra enzymatic steps are required to prepare them for entry into glycolysis, necessitating the synthesis of additional enzymes. Clearly, expressing the genes for proteins that metabolize sugars such as lactose or arabinose is wasteful when glucose is abundant. What happens to the expression of the lac operon when both glucose and lactose are present? A regulatory mechanism known as catabolite repression restricts expression of the genes required for catabolism of lactose, arabinose, and other sugars in the presence of glucose, even when these secondary sugars are also present. The effect of glucose is mediated by cAMP, as a coactivator, and an activator protein known as cAMP receptor protein, or CRP (the protein is sometimes called CAP, for catabolite gene activator protein). CRP is a homodimer (subunit Mr 22,000) with binding sites for DNA and cAMP. Binding is mediated by a helix-turn-helix motif in the protein’s DNA-binding domain (Fig. 28-17). When glucose is absent, CRP-cAMP binds to a site near the lac promoter (Fig. 28-18) and stimulates RNA transcription 50-fold. The wild-type lac promoter is a relatively weak promoter, diverging from the consensus shown in Figure 28-2. The open complex of RNA polymerase and the promoter (see Fig. 26-6) does not form readily unless CRP-cAMP is present and also bound (Fig. 28-18a, c). CRP- cAMP is therefore a positive regulatory element responsive to glucose levels, whereas the Lac repressor is a negative regulatory element responsive to lactose. The two act in concert. CRP-cAMP has little effect on the lac operon when the Lac repressor is blocking transcription, and dissociation of the repressor from the lac operator has little effect on transcription of the lac operon unless CRP-cAMP is present to facilitate transcription. CRP interacts directly with RNA polymerase (at the region shown in Fig. 28-17) through the polymerase’s α subunit. Thus, optimal expression of the lac operon requires dissociation of the Lac repressor (indicating that lactose is available) and the binding of CRP-cAMP (indicating that glucose is not available). FIGURE 28-17 CRP homodimer with bound cAMP. Note the bending of the DNA around the protein. The region that interacts with RNA polymerase is labeled. [Data from PDB ID 1RUN, G. Parkinson et al., Nat. Struct. Biol. 3:837, 1996.]
FIGURE 28-18 Positive regulation of the lac operon by CRP. The binding site for CRP-cAMP is near the promoter. The combined effects of glucose and lactose availability on lac operon expression are shown. When lactose is absent, the repressor binds to the operator and prevents transcription of the lac genes. It does not matter whether glucose is (a) present or (b) absent. (c) If lactose is present, the repressor dissociates from the operator. However, if glucose is also available, low cAMP levels prevent CRP-cAMP formation and DNA binding. RNA polymerase may occasionally bind and initiate transcription, resulting in a very low level of lac genes transcription. (d) When lactose is present and glucose levels are low, cAMP levels rise. The CRP-cAMP complex forms and facilitates robust binding of RNA polymerase to the lac promoter and high levels of transcription. The effect of glucose on CRP is mediated by the cAMP interaction (Fig. 28-18). CRP binds to DNA most avidly when cAMP concentrations are high. In the presence of glucose, the synthesis of cAMP is inhibited and efflux of cAMP from the cell is stimulated. As [cAMP] declines, CRP binding to DNA declines, thereby decreasing the expression of the lac operon. CRP and cAMP participate in the coordinated regulation of many operons, primarily those that encode enzymes for the metabolism of secondary sugars such as lactose and arabinose. A network of operons with a common regulator is called a regulon. This arrangement, which allows coordinated shi s in cellular functions that can require the action of hundreds of genes, is a major theme in the regulated expression of dispersed networks of genes in eukaryotes. Other bacterial regulons include the heat shock gene system that responds to changes in temperature and the genes induced in E. coli as part of the SOS response to DNA damage, described later. Many Genes for Amino Acid Biosynthetic Enzymes Are Regulated by Transcription Attenuation The 20 common amino acids are required in large amounts for protein synthesis, and E. coli can synthesize all of them. The genes for the enzymes needed to synthesize a given amino acid are generally clustered in an operon and are expressed whenever existing supplies of that amino acid are inadequate for cellular requirements. When the amino acid is abundant, the biosynthetic enzymes are not needed and the operon is repressed. The E. coli tryptophan (trp) operon (Fig. 28-19) includes five genes for the enzymes required to convert chorismate to tryptophan (see Fig. 22-19). Note that two of the enzymes catalyze more than one step in the pathway. The mRNA from the trp operon has a half-life of only about 3 min, allowing the cell to respond rapidly to changing needs for this amino acid. The Trp repressor is a homodimer. When tryptophan is abundant, it binds to the Trp repressor, causing a conformational change that permits the repressor to bind to the trp operator and inhibit expression of the trp operon. The trp operator site overlaps the promoter, so binding of the repressor blocks binding of RNA polymerase. FIGURE 28-19 The trp operon. This operon is regulated by two mechanisms: when tryptophan levels are high, (1) the repressor (upper le ) binds to its operator and (2) transcription of trp mRNA is attenuated (see Fig. 28-20). The biosynthesis of tryptophan by the enzymes encoded in the trp operon is diagrammed at the bottom. Once again, this simple on/off circuit mediated by a repressor is not the entire regulatory story. Different cellular concentrations of tryptophan can vary the rate of synthesis of the biosynthetic enzymes over a 700-fold range. Once repression is li ed and transcription begins, the rate of transcription is fine-tuned to cellular tryptophan requirements by a second regulatory process, called transcription attenuation, in which transcription is initiated normally but is abruptly halted before the operon genes are transcribed. The frequency with which transcription is attenuated is regulated by the availability of tryptophan and relies on the very close coupling of transcription and translation in bacteria. The trp operon attenuation mechanism uses signals encoded in four sequences within a 162 nucleotide leader region at the 5' end of the mRNA, preceding the initiation codon of the first gene (Fig. 28-20a). The leader contains a region known as the attenuator, made up of sequences 3 and 4. These sequences base-pair to form a G ≡ C-rich stem-and-loop structure closely followed by a series of U residues. The attenuator structure acts as a transcription terminator (Fig. 28- 20b; see also Fig. 26-7a). Sequence 2 is an alternative complement for sequence 3 (Fig. 28-20c). If sequences 2 and 3 base-pair, the attenuator structure cannot form and transcription continues into the trp biosynthetic genes; the loop formed by the pairing of sequences 2 and 3 does not obstruct transcription. FIGURE 28-20 Transcriptional attenuation in the trp operon. Transcription is initiated at the beginning of the 162 nucleotide mRNA leader encoded by a DNA region called trpL (see Fig. 28-19). A regulatory mechanism determines whether transcription is attenuated at the end of the leader or continues into the structural genes. (a) The trp mRNA leader (trpL). The attenuation mechanism in the trp operon involves sequences 1 to 4 (highlighted). (b) Sequence 1 encodes a small peptide, the leader peptide, containing two Trp residues (W); it is translated immediately a er transcription begins. Sequences 2 and 3 are complementary, as are sequences 3 and 4. The attenuator structure forms by the pairing of sequences 3 and 4 (top). Its structure and function are similar to those of a transcription terminator. Pairing of sequences 2 and 3 (bottom) prevents the attenuator structure from forming. Note that the leader peptide has no other cellular function. Translation of its open reading frame has a purely regulatory role that determines which complementary sequences (2 and 3, or 3 and 4) are paired. (c) Base-pairing schemes for the complementary regions of the trp mRNA leader. Regulatory sequence 1 is crucial for a tryptophan-sensitive mechanism that determines whether sequence 3 pairs with sequence 2 (allowing transcription to continue) or with sequence 4 (attenuating transcription). Formation of the attenuator stem-and- loop structure depends on events that occur during translation of regulatory sequence 1, which encodes a leader peptide (so called because it is encoded by the leader region of the mRNA) of 14 amino acids, two of which are Trp residues. The leader peptide has no other known cellular function; its synthesis is simply an operon regulatory device. This peptide is translated immediately a er it is transcribed, by a ribosome that follows closely behind RNA polymerase as transcription proceeds. When tryptophan concentrations are high, concentrations of charged tryptophan tRNA (Trp-tRNATrp) are also high. This allows translation to proceed rapidly past the two Trp codons of sequence 1 and into sequence 2, before sequence 3 is synthesized by RNA polymerase. In this situation, sequence 2 is covered by the ribosome and unavailable for pairing to sequence 3 when sequence 3 is synthesized; the attenuator structure (sequences 3 and 4) forms and transcription halts (Fig. 28-20b, top). When tryptophan concentrations are low, however, the ribosome stalls at the two Trp codons in sequence 1, because charged tRNATrp is less available. Sequence 2 remains free while sequence 3 is synthesized, allowing these two sequences to base-pair and permitting transcription to proceed (Fig. 28-20b, bottom). In this way, the proportion of transcripts that are attenuated declines as tryptophan concentration declines. Many other amino acid biosynthetic operons use a similar attenuation strategy to fine-tune biosynthetic enzymes to meet the prevailing cellular requirements. The 15 amino acid leader peptide produced by the phe operon contains seven Phe residues. The leu operon leader peptide has four contiguous Leu residues. The leader peptide for the his operon contains seven contiguous His residues. In fact, in the his operon and several others, attenuation is sufficiently sensitive to be the only regulatory mechanism. Induction of the SOS Response Requires Destruction of Repressor Proteins Extensive DNA damage in the bacterial chromosome triggers the induction of nearly 60 genes scattered about the chromosome. The genes involved in the coordinated inducible response, called the SOS response (p. 939), constitute the SOS regulon. Many of the induced genes are involved in DNA repair. The key regulatory proteins are the RecA protein and the LexA repressor. The LexA repressor (Mr 22,700) inhibits transcription of all the SOS genes (Fig. 28-21), and induction of the SOS response requires removal of LexA. This is not a simple dissociation from DNA in response to binding of a small molecule, as in the regulation of the lac operon described above. Instead, the LexA repressor is inactivated when it catalyzes its own cleavage at a specific Ala–Gly peptide bond, producing two roughly equal protein fragments. At physiological pH, this autocleavage reaction requires the RecA protein. RecA is not a protease in the classical sense, but its interaction with LexA enables the repressor’s self-cleavage reaction. This function of RecA is sometimes called a co-protease activity. FIGURE 28-21 SOS response in E. coli. The LexA protein is the repressor in this system, which has an operator site near each gene. Because the recA gene is not entirely repressed by the LexA repressor, the normal cell contains about 1,000 RecA monomers. When DNA is extensively damaged (such as by UV light), DNA replication is halted and the number of single-strand gaps in the DNA increases. RecA protein binds to this damaged, single-stranded DNA, activating the protein’s co-protease activity. While bound to DNA, the RecA protein facilitates cleavage and inactivation of the LexA repressor. When the repressor is inactivated, the SOS genes, including recA, are induced; RecA levels increase 50- to 100-fold. The RecA protein provides the functional link between the biological signal (DNA damage) and induction of the SOS genes. Heavy DNA damage leads to numerous single-strand gaps in the DNA, and only RecA that is bound to single-stranded DNA can facilitate cleavage of the LexA repressor (Fig. 28-21, bottom). Binding of RecA at the gaps eventually activates its co-protease activity, leading to cleavage of the LexA repressor and SOS induction. During induction of the SOS response in a severely damaged cell, RecA also promotes the autocatalytic cleavage of, and thus inactivates, the repressors that otherwise allow propagation of certain viruses in a dormant lysogenic state within the bacterial host. This provides a remarkable illustration of evolutionary adaptation. These repressors, like LexA, undergo self-cleavage at a specific Ala–Gly peptide bond, so induction of the SOS response permits replication of the virus and lysis of the cell, releasing new viral particles. Thus, the bacteriophage can make a hasty exit from a compromised bacterial host cell. The destruction of the LexA repressor proteins as part of the response means that LexA must be resynthesized in order to reestablish gene control when the DNA damage is no longer present. The considerable amount of ATP and GTP needed for protein synthesis to maintain SOS regulon repression provides one example of the energetic cost of regulation. Synthesis of Ribosomal Proteins Is Coordinated with rRNA Synthesis In bacteria, an increased cellular demand for protein synthesis is met by increasing the number of ribosomes rather than altering the activity of individual ribosomes. In general, the number of ribosomes increases as the cellular growth rate increases. At high growth rates, ribosomes make up approximately 45% of the cell’s dry weight. The proportion of cellular resources devoted to making ribosomes is so large, and the function of ribosomes so important, that cells must coordinate the synthesis of the ribosomal components: the ribosomal proteins (r-proteins) and RNAs (rRNAs). This regulation is distinct from the mechanisms described so far: it occurs largely at the level of translation. The 52 genes that encode the r-proteins are distributed across at least 20 operons, each with 1 to 11 genes. Some of these operons also contain the genes for the subunits of DNA primase, RNA polymerase, and protein synthesis elongation factors — reflecting the close coupling of replication, transcription, and protein synthesis during bacterial cell growth. The r-protein operons are regulated primarily through a translational feedback mechanism. One r-protein encoded by each operon also functions as a translational repressor, which binds to the mRNA transcribed from that operon and blocks translation of all the genes the messenger encodes (Fig. 28-22). In general, the r- protein that plays the role of repressor also binds directly to an rRNA. Each translational repressor r-protein binds with higher affinity to the appropriate rRNA than to its mRNA, so the mRNA is bound and translation repressed only when the level of the r-protein exceeds that of the rRNA. This ensures that translation of the mRNAs encoding r-proteins is repressed only when synthesis of these r-proteins exceeds that needed to make functional ribosomes. In this way, the rate of r-protein synthesis is kept in balance with rRNA availability. FIGURE 28-22 Translational feedback in some ribosomal protein operons. The r-proteins that act as translational repressors are shown (red circles). Each translational repressor blocks the translation of all genes in that operon by binding to the indicated site on the mRNA. The operons include the genes that encode the α , β , and β' subunits of RNA polymerase and the elongation factors EF-G and EF-Tu (labeled). The r-proteins of the large (50S) ribosomal subunit are designated L1 to L34; those of the small (30S) subunit are designated S1 to S21. The mRNA-binding site for the translational repressor is near the translational start site of one of the genes in the operon, o en but not always the first gene (Fig. 28-22). In other operons this would affect only that one gene, because in bacterial polycistronic mRNAs, most genes have independent translation signals. In the r-protein operons, however, the translation of one gene depends on the translation of all the others. The translation of multiple genes seems to be blocked by folding of the mRNA into an elaborate three- dimensional structure that is stabilized both by internal base pairing and by binding of the translational repressor protein. When the translational repressor is absent, ribosome binding and translation of one or more of the genes disrupts the folded structure of the mRNA and allows all the genes to be translated. Because the synthesis of r-proteins is coordinated with the availability of rRNA, the regulation of ribosome production reflects the regulation of rRNA synthesis. In E. coli, rRNA synthesis from the seven rRNA operons responds to cellular growth rate and to changes in the availability of crucial nutrients, particularly amino acids. The regulation coordinated with amino acid concentrations is known as the stringent response (Fig. 28-23). When amino acid concentrations are low, rRNA synthesis is halted. Amino acid starvation leads to the binding of uncharged tRNAs to the ribosomal A site; this triggers a sequence of events that begins with the binding of an enzyme called stringent factor (RelA protein) to the ribosome. When bound to the ribosome, stringent factor catalyzes formation of the unusual nucleotide guanosine tetraphosphate (ppGpp); it adds pyrophosphate to the 3′ position of GTP, in the reaction GTP+ ATP → pppGpp+ AMP Then a phosphohydrolase cleaves off one phosphate to convert some pppGpp to ppGpp. The abrupt rise in pppGpp and ppGpp levels in response to amino acid starvation results in a great reduction in rRNA synthesis, mediated at least in part by the binding of ppGpp to RNA polymerase. FIGURE 28-23 Stringent response in E. coli. This response to amino acid starvation is triggered by binding of an uncharged tRNA in the ribosomal A site. A protein called stringent factor binds to the ribosome and catalyzes the synthesis of pppGpp, which is converted by a phosphohydrolase to ppGpp. The signal ppGpp reduces transcription of some genes and increases transcription of others, in part by binding to the β subunit of RNA polymerase and altering the enzyme’s promoter specificity. Synthesis of rRNA is reduced when ppGpp levels increase. The nucleotides pppGpp and ppGpp, along with cAMP, belong to a class of modified nucleotides that act as cellular second messengers. In E. coli, these two nucleotides serve as starvation signals; they cause large changes in cellular metabolism by increasing or decreasing the transcription of hundreds of genes. In eukaryotic cells, similar nucleotide second messengers also have multiple regulatory functions. The coordination of cellular metabolism with cell growth is highly complex, and further regulatory mechanisms undoubtedly remain to be discovered. The Function of Some mRNAs Is Regulated by Small RNAs in Cis or in Trans As described throughout this chapter, proteins play an important and well-documented role in regulating gene expression. But RNA also has a crucial role — one that is becoming increasingly recognized as more examples of regulatory RNAs are discovered. Once an mRNA is synthesized, its functions can be controlled by RNA-binding proteins, as seen for the r-protein operons just described, or by an RNA. A separate RNA molecule may bind to the mRNA “in trans” and affect its activity. Alternatively, a portion of the mRNA itself may regulate its own function. When part of a molecule affects the function of another part of the same molecule, it is said to act “in cis.” A well-characterized example of RNA regulation in trans is regulation of the mRNA of the gene rpoS (RNA polymerase sigma factor), which encodes σS (formerly known as σ38), one of seven E. coli sigma factors. The cell uses this specificity factor in certain stress situations, such as when it enters the stationary phase (a state of no growth, necessitated by lack of nutrients) and σS is needed to transcribe large numbers of stress response genes. The σS mRNA is present at low levels under most conditions but is not translated, because a large hairpin structure upstream of the coding region inhibits ribosome binding (Fig. 28-24). Under certain stress conditions, one or both of two small ncRNAs, DsrA (downstream region A) and RprA (rpoS regulator RNA A), are induced. Both can pair with one strand of the hairpin in the σS mRNA, disrupting the hairpin and thus allowing translation of rpoS. FIGURE 28-24 Regulation of bacterial mRNA function in trans by sRNAs. Several sRNAs (small RNAs) — DsrA, RprA, and OxyS — participate in regulation of the rpoS gene. All require the protein Hfq, an RNA chaperone that facilitates RNA-RNA pairing. Hfq has a toroid structure, with a pore in the center. (a) DsrA promotes translation by pairing with one strand of a stem-loop structure that otherwise blocks the ribosome-binding site. RprA (not shown) acts in a similar way. (b) OxyS blocks translation by pairing with the ribosome-binding site. [Information from M. Szymański and J. Barciszewski, Genome Biol. 3:reviews0005.1, 2002.] Another small RNA, OxyS (oxidative stress gene S), is induced under conditions of oxidative stress and inhibits the translation of rpoS, probably by pairing with and blocking the ribosome-binding site on the mRNA. OxyS is expressed as part of a system that responds to a different type of stress (oxidative damage) than does the rpoS RNA, and its task is to prevent expression of unneeded repair pathways. DsrA, RprA, and OxyS are all relatively small bacterial RNA molecules (less than 300 nucleotides), designated sRNAs (s for small; there are, of course, other “small” RNAs with other designations in eukaryotes). All sRNAs require for their function a protein called Hfq, an RNA chaperone that facilitates RNA-RNA pairing. The known bacterial genes regulated in this way are few in number, just a few dozen in a typical bacterial species. However, these examples provide good model systems for understanding patterns present in the more complex and numerous examples of RNA-mediated regulation in eukaryotes. Regulation in cis involves a class of RNA structures known as riboswitches. As described in Box 26-4, aptamers are RNA molecules, generated in vitro, that are capable of specific binding to a particular ligand. As one might expect, such ligand-binding RNA domains are also present in nature — in riboswitches — in a significant number of bacterial mRNAs (and even in some eukaryotic mRNAs). These natural aptamers are structured domains found in untranslated regions at the 5' ends of certain bacterial mRNAs. Some riboswitches also regulate the transcription of certain noncoding RNAs. Binding of an mRNA’s riboswitch to its appropriate ligand results in a conformational change in the mRNA, and transcription is inhibited by stabilization of a premature transcription termination structure, or translation is inhibited (in cis) by occlusion of the ribosome-binding site (Fig. 28-25). In most cases, the riboswitch acts in a kind of feedback loop. Most genes regulated in this way are involved in the synthesis or transport of the ligand that is bound by the riboswitch; thus, when the ligand is present in high concentrations, the riboswitch inhibits expression of the genes needed to replenish this ligand. FIGURE 28-25 Regulation of bacterial mRNA function in cis by riboswitches. The known modes of action are illustrated by several different riboswitches, based on a widespread natural aptamer that binds thiamine pyrophosphate. TPP binding to the aptamer leads to a conformational change that produces the varied results illustrated in (a), (b), and (c) in several different systems in which the aptamer is utilized. [Information from W. C. Winkler and R. R. Breaker, Annu. Rev. Microbiol. 59:487, 2005.] Each riboswitch binds only one ligand. Distinct riboswitches have been detected that respond to more than a dozen different ligands, including thiamine pyrophosphate (TPP, vitamin B1), cobalamin (vitamin B12), flavin mononucleotide, lysine, S-adenosylmethionine (adoMet), purines, N-acetylglucosamine 6-phosphate, glycine, and some metal cations such as Mn2+. It is likely that many more remain to be discovered. The riboswitch that responds to TPP seems to be the most widespread; it is found in many bacteria, fungi, and some plants. The bacterial TPP riboswitch inhibits translation in some species and induces premature transcription termination in others (Fig. 28-25). The eukaryotic TPP riboswitch is found in the introns of certain genes and modulates the alternative splicing of those genes. It is not yet clear how common riboswitches are. However, estimates suggest that more than 4% of the genes of Bacillus subtilis are regulated by riboswitches. Most of the riboswitches described to date, including the one that responds to adoMet, have been found only in bacteria. A drug that bound to and activated the adoMet riboswitch would shut down the genes encoding the enzymes that synthesize and transport adoMet, effectively starving the bacterial cells of this essential cofactor. Drugs of this type are being sought for use as a new class of antibiotics. The pace of discovery of functional RNAs shows no signs of abating and continues to bolster the hypothesis that RNA played a special role in the evolution of life (Chapter 26). The sRNAs and riboswitches, like ribozymes and ribosomes, may be vestiges of an RNA world obscured by time but persisting as a rich array of biological devices still functioning in the biosphere. The laboratory selection of aptamers and ribozymes with novel ligand-binding and enzymatic functions tells us that the RNA-based activities necessary for a viable RNA world are possible. Discovery of many of the same RNA functions in living organisms tells us that key components for RNA-based metabolism do exist. For example, the natural aptamers of riboswitches may be derived from RNAs that, billions of years ago, bound to cofactors needed to promote the enzymatic processes required for metabolism in the RNA world. Some Genes Are Regulated by Genetic Recombination We turn now to another mode of bacterial gene regulation, at the level of DNA rearrangement — recombination. Salmonella typhimurium, which inhabits the mammalian intestine, moves by rotating the flagella on its cell surface (Fig. 28-26). The many copies of the protein flagellin (Mr 53,000) that make up the flagella are prominent targets of mammalian immune systems. But Salmonella cells have a mechanism that evades the immune response: they switch between two distinct flagellin proteins (FljB and FliC) roughly once every 1,000 generations, using a process called phase variation. FIGURE 28-26 Salmonella typhimurium. The appendages emanating from the cell are flagella. The switch is accomplished by periodic inversion of a segment of DNA containing the promoter for a flagellin gene. The inversion is a site-specific recombination reaction (see Fig. 25-37) mediated by the Hin recombinase at specific 14 bp sequences (hix sequences) at each end of the DNA segment. When the DNA segment is in one orientation, the gene for FljB flagellin and the gene encoding a repressor, FljA, are expressed (Fig. 28-27a); the repressor shuts down expression of the gene for FliC flagellin. When the DNA segment is inverted (Fig. 28-27b), the fljA and fljB genes are no longer transcribed, and the fliC gene is induced as the repressor becomes depleted. The Hin recombinase, encoded by the hin gene in the DNA segment that undergoes inversion, is expressed when the DNA segment is in either orientation, so the cell can always switch from one state to the other. FIGURE 28-27 Regulation of flagellin genes in Salmonella: phase variation. The products of genes fliC and fljB are different flagellins. The hin gene encodes the recombinase that catalyzes inversion of the DNA segment containing the fljB promoter and the hin gene. The recombination sites (inverted repeats) are called hix. (a) In one orientation, fljB is expressed along with a repressor protein (product of the fljA gene) that represses transcription of the fliC gene. (b) In the opposite orientation, only the fliC gene is expressed; the fljA and fljB genes cannot be transcribed. The interconversion between these two states, known as phase variation, also requires two other nonspecific DNA-binding proteins (not shown), HU and FIS. This type of regulatory mechanism has the advantage of being absolute: gene expression is impossible when the gene is physically separated from its promoter (note the position of the fljB promoter in Fig. 28-27b). An absolute on/off switch may be important in this system (even though it affects only one of the two flagellin genes) because a flagellum with just one copy of the wrong flagellin might be vulnerable to host antibodies against that protein. The Salmonella system is by no means unique. Similar regulatory systems occur in some other bacteria and in some bacteriophages, and recombination systems with similar functions have been found in eukaryotes (Table 28-1). Gene regulation by DNA rearrangements that move genes and/or promoters is particularly common in pathogens that benefit by changing their host range or by changing their surface proteins, thereby staying ahead of host immune systems. TABLE 28-1 Examples of Gene Regulation by Recombination System Recombinase/recombination site Type of recombination Function Phase variation (Salmonella) Hin/hix Site-specific Alternative expression of two flagellin genes allows evasion of host immune response. Host range (bacteriophage μ ) Gin/gix Site-specific Alternative expression of two sets of tail fiber genes affects host range. Mating-type switch (yeast) HO endonuclease, RAD52 protein, other proteins/MAT Nonreciprocal gene conversion Alternative expression of two mating types of yeast, a and α , creates cells of different mating types that can mate and a undergo meiosis. Antigenic variation (trypanosomes) Varies Nonreciprocal gene conversion Successive expression of different genes encoding the variable surface glycoproteins (VSGs) allows evasion of host immune response. In nonreciprocal gene conversion (a class of recombination events not discussed in Chapter 25), genetic information is moved from one part of the genome (where it is silent) to another (where it is expressed). The reaction is similar to replicative transposition (see Fig. 25-41). Trypanosomes cause African sleeping sickness and other diseases (see Box 6-1). The outer surface of a trypanosome is made up of multiple copies of a single VSG, the major surface antigen. A cell can change surface antigens to more than 100 different forms, precluding an effective defense by the host immune system. SUMMARY 28.2 Regulation of Gene Expression in Bacteria In addition to repression by the Lac repressor, the E. coli lac operon undergoes positive regulation by the cAMP receptor protein (CRP). When [glucose] is low, [cAMP] is high and CRP-cAMP binds to a specific site on the DNA, stimulating transcription of the lac operon and production of lactose-metabolizing enzymes. The presence of glucose depresses [cAMP], decreasing expression of lac b a a b and other genes involved in metabolism of secondary sugars. A group of coordinately regulated operons is referred to as a regulon. Operons that produce the enzymes of amino acid synthesis have a regulatory circuit called attenuation, which uses a transcription termination site, called the attenuator, in the mRNA. Formation of the attenuator is modulated by a mechanism that couples transcription and translation while responding to small changes in amino acid concentration. In the SOS system, multiple unlinked genes repressed by a single repressor are induced simultaneously when DNA damage triggers RecA protein–facilitated autocatalytic proteolysis of the repressor. In the synthesis of ribosomal proteins, one protein in each r- protein operon acts as a translational repressor. The mRNA is bound by the repressor, and translation is blocked only when the r-protein is present in excess of available rRNA. Posttranscriptional regulation of some mRNAs is mediated by sRNAs that act in trans or by riboswitches, part of the mRNA structure itself, that act in cis. Some genes are regulated by genetic recombination processes that move promoters relative to the genes being regulated. Regulation can also take place at the level of translation. 28.3 Regulation of Gene Expression inEukaryotes Initiation of transcription is a crucial regulation point for gene expression in all organisms. Although eukaryotes and bacteria use some of the same regulatory mechanisms, the regulation of transcription in the two systems is fundamentally different. We can define a transcriptional ground state as the inherent activity of promoters and transcriptional machinery in vivo in the absence of regulatory sequences. In bacteria, RNA polymerase generally has access to every promoter and can bind and initiate transcription at some level of efficiency in the absence of activators or repressors. In eukaryotes, however, strong promoters are generally inactive in vivo in the absence of regulatory proteins. This fundamental difference gives rise to at least five important features that distinguish the regulation of gene expression at eukaryotic promoters from that observed in bacteria. First, access to eukaryotic promoters is restricted by the structure of chromatin, and activation of transcription is associated with many changes in chromatin structure in the transcribed region. Second, although eukaryotic cells have both positive and negative regulatory mechanisms, positive mechanisms are more prominent. Almost every eukaryotic gene requires activation to be transcribed. Third, regulatory mechanisms involving lncRNAs are more common in eukaryotic transcriptional regulation. Fourth, eukaryotic cells have larger, more complex multimeric regulatory proteins than do bacteria. Finally, transcription in the eukaryotic nucleus is separated from translation in the cytoplasm in both space and time. The complexity of regulatory circuits in eukaryotic cells is extraordinary, as is evident from the following discussion. The section ends with an illustrated description of one of the most elaborate circuits: the regulatory cascade that controls development in fruit flies. Transcriptionally Active Chromatin IsStructurally Distinct from Inactive Chromatin The effects of chromosome structure on gene regulation in eukaryotes have no clear parallel in bacteria. In the eukaryotic cell cycle, interphase chromosomes appear, at first viewing, to be dispersed and amorphous (see Fig. 24-22). Nevertheless, several forms of chromatin can be found along these chromosomes. About 10% of the chromatin in a typical eukaryotic cell is in a more condensed form than the rest of the chromatin. This form, heterochromatin, is transcriptionally inactive. Heterochromatin is generally associated with particular chromosome structures — the centromeres, for example. The remaining, less condensed chromatin is called euchromatin. Transcription of a eukaryotic gene is strongly repressed when its DNA is condensed within heterochromatin. Some, but not all, of the euchromatin is transcriptionally active. Transcriptionally active chromosomal regions are distinguished from heterochromatin in at least three ways: the positioning of nucleosomes, the presence of histone variants, and the covalent modification of nucleosomes. These transcription-associated structural changes in chromatin are collectively called chromatin remodeling. The remodeling employs a set of enzymes that promote these changes (Table 28-2). TABLE 28-2 Some Enzyme Complexes That Catalyze Chromatin Structural Changes Associated with Transcription Enzyme complex Oligomeric structure (number of polypeptides) Source Activities Histone movement, replacement, or editing, requiring ATP SWI/SNF family 8−17, Mr>106 Eukaryotes Nucleosome remodeling; transcriptional activation ISWI family 2−4 Eukaryotes Nucleosome remodeling; transcriptional repression; transcriptional activation in some cases CHD family 1−10 Eukaryotes Nucleosome remodeling; nucleosome ejection for transcriptional activation; some have repressive a roles INO80 family >10 Eukaryotes Nucleosome remodeling; transcriptional activation; family member SWR1 engages in replacement of H2A-H2B with H2AZ-H2B Histone modification GCN5- ADA2-ADA3 3 Yeast GCN5 has type A HAT activity SAGA/PCAF >20 Eukaryotes Includes GCN5-ADA2-ADA3; acetylates residues in H3, H2B, H2AZ NuA4 ≥12 Eukaryotes EsaI component has HAT activity; acetylates H4, H2A, and H2AZ Histone chaperones not requiring ATP HIRA 1 Eukaryotes Deposition of H3.3 during transcription The abbreviations for eukaryotic genes and proteins are o en more confusing or obscure than those used for bacteria. SWI (switching) was discovered as a protein required for expression of certain genes involved in mating-type switching in yeast, and SNF (sucrose nonfermenting) as a factor for expression of the yeast gene for sucrase. Subsequent studies revealed multiple SWI and SNF proteins that act in a complex. The SWI/SNF complex has a role in expression of a wide range of genes and has been found in many eukaryotes, including humans. ISWI is imitation SWI. CHD is chromodomain, helicase, DNA binding; INO80 is inositol-requiring 80; and SWR1 is SWi2/Snf2-related ATPase 1. The complex of GCN5 (general control nonderepressible) and ADA (alteration/deficiency in activation) proteins was discovered during investigation of the regulation of nitrogen metabolism genes in yeast. These proteins can be part of the larger SAGA (SPF, ADA2,3, GCN5, acetyltransferase) complex in yeasts. The equivalent of SAGA in humans is PCAF (p300/CBP-associated factor). NuA4 is nucleosome acetyltransferase of H4; ESA1 is essential SAS2-related acetyltransferase; HIRA is histone regulator A. Four known families of chromatin remodeling complexes, distinguished by their structural features, act directly to alter nucleosome composition in transcribed regions. They may unwrap, translocate, remove, or exchange nucleosomes on the DNA, hydrolyzing ATP in the process (Table 28-2; see the table footnote for an explanation of the abbreviated names of enzyme complexes described here). In some cases, the enzymes catalyze the exchange of pairs of histones within nucleosomes to alter nucleosome composition. The multitude of different complexes are specialized to function at particular genes or chromosomal regions. There are two related complexes in the SWI/SNF family in all eukaryotic a cells, both of which remodel chromatin so that nucleosomes are ejected from the DNA near transcription start sites. They appear to be involved in a dynamic cycle to allow replacement of nucleosomes with transcription factors (Fig. 28-28). The two distinct complexes generally function at different sets of genes. Most of the ISWI family complexes optimize nucleosome spacing to allow chromatin assembly and transcriptional silencing. There are generally 9 or 10 different CHD family complexes in eukaryotic cells, separated into three subfamilies. The different family members have specialized roles, either ejecting nucleosomes to activate transcription or assembling chromatin to repress transcription. The INO80 family complexes have a variety of roles in remodeling chromatin for transcriptional activation and DNA repair. One family member, SWR1, promotes subunit exchange in nucleosomes to introduce histone variants such as H2AZ (see Box 24-1), found in transcriptionally active regions.
FIGURE 28-28 Nucleosome ejection by a SWI/SNF remodeler. The SWI/SNF enzyme engulfs the nucleosome, interacting with short CGCG sequences nearby. With the aid of ATP hydrolysis, the DNA is partially separated from the nucleosome, exposing a site for transcription factor (TF) binding. A er the transcription factor is bound, the nucleosome is ejected. When transcription is no longer needed, the nucleosome can again replace the transcription factor or factors, completing the cycle. [Information from S. Brahma and S. Henikoff, Trends Biochem. Sci. 45:13, 2020.] The covalent modification of histones is altered dramatically within transcriptionally active chromatin. The core histones of nucleosome particles (H2A, H2B, H3, H4; see Fig. 24-24) are modified by methylation of Lys or Arg residues, phosphorylation of Ser or Thr residues, acetylation (see below), ubiquitination (see Fig. 27-47), or SUMOylation (SUMOs are small ubiquitin-like modifiers). Each of the core histones has two distinct structural domains. A central domain is involved in histone-histone interaction and the wrapping of DNA around the nucleosome. A lysine-rich amino-terminal domain is generally positioned near the exterior of the assembled nucleosome particle; the covalent modifications occur at specific residues concentrated in this amino-terminal domain. The patterns of modification have led some researchers to propose the existence of a histone code, in which modification patterns are recognized by enzymes that alter the structure of chromatin. Indeed, some of the modifications are essential for interactions with proteins that play key roles in transcription. The acetylation and methylation of histones figure prominently in the processes that activate chromatin for transcription. During transcription, histone H3 in nucleosomes is methylated (by specific histone methylases) at Lys4 near the 5' end of the coding region and at Lys36 within the coding region. These methylations enable the binding of histone acetyltransferases (HATs), enzymes that acetylate particular Lys residues. Cytosolic (type B) HATs acetylate newly synthesized histones before the histones are imported into the nucleus. The subsequent assembly of the histones into chromatin a er replication is facilitated by histone chaperones: CAF1 for H3 and H4 (see Box 24-1), and NAP1 for H2A and H2B. Where chromatin is being activated for transcription, the nucleosomal histones are further acetylated by nuclear (type A) HATs. The acetylation of multiple Lys residues in the amino-terminal domains of histones H3 and H4 can reduce the affinity of the entire nucleosome for DNA. Acetylation of particular Lys residues is critical for the interaction of nucleosomes with other proteins. When transcription of a gene is no longer required, the extent of methylation and acetylation of nucleosomes in that vicinity is reduced as part of a general gene-silencing process that restores the chromatin to a transcriptionally inactive state. There are two known classes of demethylases. One class, called LSD (lysine-specific histone demethylases), first converts the CH3—N linkage to an imine (CH2═N) linkage, followed by hydrolysis to generate formaldehyde and the demethylated lysine. The other class of demethylases contains JmjC (Jumonji-C) domains, first hydroxylating the methyl group, which is again subsequently removed as formaldehyde. More than 20 JmjC domain–containing histone demethylases are encoded by mammalian genomes. They are part of the same α -ketoglutarate-dependent hydroxylase enzyme family that includes the enzyme that hydroxylates proline residues in collagen (see Box 4-2). These enzymes are strongly inhibited by 2-hydroxyglutarate, an unusual metabolite produced in abundance by a mutated form of isocitrate dehydrogenase that is common in human cancers (see Fig. 16-20). Within the tumors, the high levels of 2-hydroxyglutarate produce global changes in gene expression. Histone acetylation is reduced by the action of histone deacetylases (HDACs). The deacetylases include SIRT1, SIRT2, SIRT6, and SIRT7, which are NAD+- dependent enzymes in the sirtuin family (SIRT1–7 in mammals). These enzymes deacetylate specific Lys residues in histones and other, cytoplasmic targets. In addition to the removal of certain acetyl groups, new covalent modification of histones marks chromatin as transcriptionally inactive. For example, Lys9 of histone H3 is o en methylated in heterochromatin. The net effect of chromatin remodeling in the context of transcription is to make a segment of the chromosome more accessible and to “label” (chemically modify) it so as to facilitate the binding and activity of transcription factors that regulate expression of the gene or genes in that region. Most Eukaryotic Promoters Are PositivelyRegulated As already noted, eukaryotic RNA polymerases have little or no intrinsic affinity for their promoters. The default state of eukaryotic genes is “off,” and initiation of transcription is almost always dependent on the action of multiple activator proteins. One important reason for the apparent predominance of positive regulation seems obvious: the storage of DNA within chromatin effectively renders most promoters inaccessible, so genes are silent in the absence of other regulation. The structure of chromatin affects access to some promoters more than others, but repressors that bind to DNA so as to preclude access of RNA polymerase (negative regulation) would o en be simply redundant. Other factors must be at play in the use of positive regulation, and speculation generally centers around two: the large size of eukaryotic genomes and the greater efficiency of positive regulation. First, nonspecific DNA binding of regulatory proteins becomes a more important problem in the much larger genomes of higher eukaryotes. And the chance that a single specific binding sequence will occur randomly at an inappropriate site also increases with genome size. Combinatorial control thus becomes important in a large genome (Fig. 28-29). Specificity for transcriptional activation can be improved if each of several positive regulatory proteins must bind specific DNA sequences to activate a gene. The average number of regulatory sites for a gene in a multicellular organism is six, and genes that are regulated by a dozen such sites are common. The requirement for binding of several positive regulatory proteins to specific DNA sequences vastly reduces the probability of the random occurrence of a functional juxtaposition of all the necessary binding sites. This requirement also reduces the number of regulatory proteins that must be encoded by a genome to regulate all of its genes (Fig. 28-28). Thus, a new regulator is not needed for every gene, although regulation is complex enough in higher eukaryotes that regulatory proteins may represent 5% to 10% of all protein-coding genes. FIGURE 28-29 The advantages of combinatorial control. Combinatorial control allows specific regulation of many genes using a limited repertoire of regulatory proteins. Consider the possibilities inherent in regulation by two different families of leucine zipper proteins (red and green). If each regulatory gene family had three members (as shown here, in dark, medium, and light shades, each binding to a different DNA sequence) that could freely form either homo- or heterodimers, there would be six possible dimeric species in each family and each dimer would recognize a different bipartite regulatory DNA sequence. If a gene had a regulatory site for each protein family, 36 different regulatory combinations would be possible, using just the six proteins from these two families. With six or more sites used in the regulation of a typical eukaryotic gene, the number of possible variants is much greater than this example suggests. In principle, a similar combinatorial strategy could be used by multiple negative regulatory elements, but this brings us to the second reason for the use of positive regulation: it is simply more efficient. If the ~20,000 genes in the human genome were negatively regulated, each cell would have to synthesize, at all times, all of the different repressors in concentrations sufficient to permit specific binding to each “unwanted” gene. In positive regulation, most of the genes are usually inactive (that is, RNA polymerases do not bind to the promoters) and the cell synthesizes only the activator proteins needed to promote transcription of the subset of genes required in the cell at that time. These arguments notwithstanding, there are examples of negative regulation in eukaryotes, from yeasts to humans, as we shall see. Some of that negative regulation involves lncRNAs, which are more economical to synthesize than repressor proteins. DNA-Binding Activators and CoactivatorsFacilitate Assembly of the Basal TranscriptionFactors To continue our exploration of the regulation of gene expression in eukaryotes, we return to the interactions between promoters and RNA polymerase II (Pol II), the enzyme responsible for the synthesis of eukaryotic mRNAs. Although many (but not all) Pol II promoters include the TATA box and Inr (initiator) sequences, with their standard spacing (see Fig. 26-8), they vary greatly in both the number and the location of additional sequences required for the regulation of transcription. The additional regulatory sequences, generally bound by transcription activators, are usually called enhancers in higher eukaryotes and upstream activator sequences (UASs) in yeast. A typical enhancer may be found hundreds or even thousands of base pairs upstream from the transcription start site, or may even be downstream, within the gene itself. When bound by the appropriate regulatory proteins, an enhancer increases transcription at nearby promoters regardless of its orientation in the DNA. The UASs of yeast function in a similar way, although generally they must be positioned upstream and within a few hundred base pairs of the transcription start site. Successful binding of the active Pol II holoenzyme at one of its promoters usually requires the combined action of proteins of five types: (1) transcription activators, which bind to enhancers or UASs and facilitate transcription; (2) architectural regulators, which facilitate DNA looping; (3) chromatin modification and remodeling proteins, described above; (4) coactivators; and (5) basal transcription factors, also called general transcription factors (see Fig. 26-9, Table 26-2), required at most Pol II promoters (Fig. 28-30). The coactivators are required for essential communication between activators and the complex composed of Pol II and the basal transcription factors. Coactivators also play a direct role in assembly of the preinitiation complex (PIC). Furthermore, a variety of repressor proteins can interfere with communication between Pol II and the activators, resulting in repression of transcription (Fig. 28-30b). Here we focus on the protein complexes shown in Figure 28-30 and how they interact to activate transcription. FIGURE 28-30 Eukaryotic promoters and regulatory proteins. RNA polymerase II and its associated basal (general) transcription factors form a preinitiation complex at the TATA box and Inr site of the cognate promoters, a process facilitated by transcription activators, acting through coactivators (Mediator, TFIID, or both). (a) A composite promoter with typical sequence elements and protein complexes found in both yeast and higher eukaryotes. The carboxyl-terminal domain (CTD) of Pol II (see Fig. 26-9) is an important point of interaction with Mediator and other protein complexes. Histone modification enzymes (not shown) catalyze methylation and acetylation; remodeling enzymes alter the content and placement of nucleosomes. The transcription activators have distinct DNA-binding domains and activation domains. In some cases, their function is affected by interaction with lncRNAs. Arrows indicate common modes of interaction o en required for the activation of transcription. The HMG proteins are a common type of architectural regulator (see Fig. 28-5), allowing the looping of the DNA required to bring together system components bound at distant binding sites. (b) Eukaryotic transcriptional repressors function through a range of mechanisms. Some bind directly to DNA, displacing a protein complex required for activation (not shown); many others interact with various parts of the transcription or activation protein complexes to prevent activation. Possible points of interaction are indicated with arrows. (c) The structure of an HMG protein complex with DNA shows how HMG proteins facilitate DNA looping. The binding is relatively nonspecific, although DNA sequence preferences have been identified for many HMG proteins. Shown here is the HMG domain of the protein HMG- D of Drosophila, bound to DNA. [(c) Data from PDB ID 1QRV, F. V. Murphy IV et al., EMBO J. 18:6610, 1999.] Transcription Activators The requirements for activators vary greatly from one promoter to another. A few are known to activate transcription at hundreds of promoters, whereas others are specific for a few promoters. Many activators are sensitive to the binding of signal molecules, providing the capacity to activate or deactivate transcription in response to a changing cellular environment. Some enhancers bound by activators are quite distant from the promoter’s TATA box. Multiple enhancers (o en six or more) are bound by a similar number of activators for a typical gene, providing combinatorial control and response to multiple signals. Some transcription activators can bind to both DNA and RNA, and their function is affected by one or more lncRNAs. The protein NF-κB, for example (Fig. 28-14), activates transcription of many genes involved in the immune response and cytokine production. It can bind to a DNA enhancer site or, alternatively, to an lncRNA called lethe, named a er the river of forgetfulness in Greek mythology. The lncRNA reduces transcription of genes controlled by NF-κB. Architectural Regulators How do activators function at a distance? The answer in most cases seems to be that, as indicated earlier, the intervening DNA is looped so that the various protein complexes can interact directly. The looping is promoted by architectural regulators that are abundant in chromatin and bind to DNA with limited specificity. Most prominently, the high mobility group (HMG) proteins (Fig. 28- 29c; “high mobility” refers to their electrophoretic mobility in polyacrylamide gels) play an important structural role in chromatin remodeling and transcriptional activation. Coactivator Protein Complexes Most transcription requires the presence of additional protein complexes. Some major regulatory protein complexes that interact with Pol II have been defined both genetically and biochemically. These coactivator complexes act as intermediaries between the transcription activators and the Pol II complex. Mediator, a complex consisting of 25 (yeast) to 30 (human) polypeptides, is a major eukaryotic coactivator (Fig. 28-30). Many of the 25 core polypeptides are highly conserved from fungi to humans. A subcomplex of four subunits has a kinase role, interacting transiently with the remainder of the Mediator complex, and may dissociate prior to transcription initiation. Mediator binds tightly to the carboxyl-terminal domain (CTD) of the largest subunit of Pol II. The Mediator complex is required for both basal and regulated transcription at many promoters used by Pol II, and it also stimulates phosphorylation of the CTD by TFIIH (a basal transcription factor). Transcription activators interact with one or more components of the Mediator complex, with the precise interaction sites differing from one activator to another. Coactivator complexes function at or near the promoter’s TATA box. Additional coactivators, functioning with one or a few genes, have also been described. Some of these operate in conjunction with Mediator, and some may act in systems that do not employ Mediator. TATA-Binding Protein and Basal TranscriptionFactors The first component to bind in the assembly of a preinitiation complex (PIC) at the TATA box of a typical Pol II promoter is the TATA-binding protein (TBP). At promoters lacking a TATA box, TBP is usually delivered as part of a larger complex (13 to 14 subunits) called TFIID. The complete complex also includes the basal transcription factors TFIIB, TFIIE, TFIIF, TFIIH; Pol II; and perhaps TFIIA. This minimal PIC, however, is o en insufficient for initiation of transcription and generally does not form at all if the promoter is obscured within chromatin. Positive regulation, leading to transcription, is imposed by the activators and coactivators. Mediator interacts directly with TFIIH and TFIIE, allowing their recruitment to the PIC. Choreography of Transcriptional Activation We can now begin to piece together the sequence of transcriptional activation events at a typical Pol II promoter (Fig. 28-31). The exact order of binding of some components may vary, but the model in Figure 28-31 illustrates the principles of activation as well as one common path. Many transcription activators have significant affinity for their binding sites even when the sites are within condensed chromatin. The binding of activators is o en the event that triggers subsequent activation of the promoter. Binding of one activator may enable the binding of others, gradually displacing some nucleosomes. FIGURE 28-31 The components of transcriptional activation. Activators bind the DNA first. The activators recruit the histone modification/nucleosome remodeling complexes and a coactivator such as Mediator. Mediator facilitates the binding of TBP (or TFIID) and TFIIB, and the other basal transcription factors and Pol II then bind. Phosphorylation of the CTD of Pol II leads to transcription initiation (not shown). [Information from J. A. D’Alessio et al., Mol. Cell 36:924, 2009.] Crucial remodeling of the chromatin then takes place in stages, facilitated by interactions between activators and HATs or enzyme complexes such as SWI/SNF, or both. In this way, a bound activator can draw in other components necessary for further chromatin remodeling to permit transcription of specific genes. The bound activators interact with the large Mediator complex. Mediator, in turn, provides an assembly surface for the binding of, first, TBP (or TFIID), then TFIIB, and then other components of the PIC, including Pol II. Mediator stabilizes the binding of Pol II and its associated transcription factors and greatly facilitates formation of the PIC. Complexity in these regulatory circuits is the rule rather than the exception, with multiple DNA-bound activators promoting transcription. The script can change from one promoter to another. For example, many promoters have a different set of recognition sequences and may not have a TATA box, and in multicellular eukaryotes the subunit composition of factors such as TFIID can vary from one tissue to another. However, most promoters seem to require a precisely ordered assembly of components to initiate transcription. The assembly process is not always fast. For some genes it may take minutes; for certain genes of higher eukaryotes, the process can take days. The Genes of Galactose Metabolism in YeastAre Subject to Both Positive and NegativeRegulation Some of the general principles described above can be illustrated by one well- studied eukaryotic regulatory circuit (Fig. 28-32). The enzymes required for the importation and metabolism of galactose in yeast are encoded by genes scattered over several chromosomes (Table 28-3). Each of the GAL genes is transcribed separately, and yeast cells have no operons like those in bacteria. However, all the GAL genes have similar promoters and are regulated coordinately by a common set of proteins. The promoters for the GAL genes consist of the TATA box and Inr sequences, as well as an upstream activator sequence (UASG) recognized by the transcription activator Gal4 protein (Gal4p). Regulation of gene expression by galactose entails an interplay between Gal4p and two other proteins, Gal80p and Gal3p. Gal80p forms a complex with Gal4p, preventing Gal4p from functioning as an activator of the GAL promoters. When galactose is present, it binds Gal3p, which then interacts with the Gal80p-Gal4p complex and allows Gal4p to function as an activator at the GAL promoters. As the various galactose genes are induced and their products build up, Gal3p may be replaced with Gal1p (a galactokinase needed for galactose metabolism that also acts as a regulator) for sustained activation of the regulatory circuit. FIGURE 28-32 Regulation of transcription of GAL genes in yeast. Galactose imported into the yeast cell is converted to glucose 6-phosphate by a pathway involving five enzymes, whose genes are scattered over three chromosomes (see Table 28-3). Transcription of these genes is regulated by the combined actions of the proteins Gal4p, Gal80p, and Gal3p, with Gal4p playing the central role of transcription activator. The Gal4p-Gal80p complex is inactive. Binding of galactose to Gal3p leads to interaction of Gal3p with the Gal80p-Gal4p complex and activates Gal4p. The Gal4p subsequently recruits SAGA, Mediator, and TFIID to the galactose promoters, leading to recruitment of RNA polymerase II and initiation of transcription. Chromatin remodeling to allow transcription also requires a SWI/SNF complex. TABLE 28-3 Genes of Galactose Metabolism in Yeast Relative protein expression in different carbon sources Gene Protein function Chromosomal location Protein size (number of residues) Glucose Glycerol Galactose Regulated genes GAL1 Galactokinase II 528 − − +++ GAL2 Galactose permease XII 574 − − +++ PGM2 Phosphoglucomutase XIII 569 + + ++ GAL7 Galactose 1- phosphate uridylyltransferase II 365 − − +++ GAL10 UDP-glucose 4- epimerase II 699 − − +++ MEL1 α -Galactosidase II 453 − + ++ Regulatory genes GAL3 Inducer IV 520 − + ++ GAL4 Transcriptional activator XVI 881 +/− + + GAL80 Transcriptional inhibitor XIII 435 + + ++ Information from R. Reece and A. Platt, Bioessays 19:1001, 1997. Other protein complexes also have a role in activating transcription of the GAL genes. These include the SAGA complex for histone acetylation and chromatin remodeling, the SWI/SNF complex for chromatin remodeling, and Mediator. The Gal4 protein is responsible for recruitment of these additional factors needed for transcriptional activation. SAGA may be the first and primary recruitment target for Gal4p. Glucose is the preferred carbon source for yeast, as it is for bacteria. When glucose is present, most of the GAL genes are repressed — whether galactose is present or not. The GAL regulatory system described above is effectively overridden by a complex catabolite repression system that includes several proteins (not depicted in Fig. 28-32). Transcription Activators Have a ModularStructure Transcription activators typically have a distinct structural domain for specific DNA binding and one or more additional domains for transcriptional activation or for interaction with other regulatory proteins. Interaction of two regulatory proteins is o en mediated by domains containing leucine zippers (Fig. 28-15) or helix-loop-helix motifs (Fig. 28-16). We consider here three distinct types of structural domains used in activation by the transcription activators Gal4p, Sp1, and CTF1 (Fig. 28-33a). FIGURE 28-33 Transcription activators. (a) Typical activators such as CTF1, Gal4p, and Sp1 have a DNA-binding domain and an activation domain. The nature of the activation domain is indicated by symbols: – – –, acidic; Q Q Q, glutamine-rich; P P P, proline-rich. These proteins generally activate transcription by interacting with coactivator complexes such as Mediator. Note that the binding sites illustrated here are not generally found together near a single gene. (b) A chimeric protein containing the DNA-binding domain of Sp1 and the activation domain of CTF1 activates transcription if a GC box is present. Gal4p contains a zinc finger–like structure in its DNA-binding domain, near the amino terminus; this domain has six Cys residues that coordinate two Zn2+. The protein functions as a homodimer (with dimerization mediated by interactions between two coiled coils) and binds to UASG, a palindromic DNA sequence about 17 bp long. Gal4p has a separate activation domain with many acidic amino acid residues. Experiments that substitute a variety of different peptide sequences for the acidic activation domain of Gal4p suggest that the acidic nature of this domain is critical to its function, although its precise amino acid sequence can vary considerably. Sp1 (Mr 80,000) is a transcription activator for many genes in higher eukaryotes. Its DNA-binding site, the GC box (consensus sequence GGGCGG), is usually quite near the TATA box. The DNA-binding domain of the Sp1 protein is near its carboxyl terminus and contains three zinc fingers. Two other domains in Sp1 function in activation and are notable in that 25% of their amino acid residues are Gln. A wide variety of other activator proteins also have these glutamine-rich domains. CTF1 (CCAAT-binding transcription factor 1) belongs to a family of transcription activators that bind a sequence called the CCAAT site (its consensus sequence is TGGN6GCCAA, where N is any nucleotide). The DNA-binding domain of CTF1 contains many basic amino acid residues, and the binding region is probably arranged as an α helix. This protein has neither a helix-turn-helix motif nor a zinc finger motif; its DNA-binding mechanism is not yet clear. CTF1 has a proline-rich activation domain, with Pro accounting for more than 20% of the amino acid residues. The discrete activation and DNA-binding domains of regulatory proteins o en act completely independently, as has been demonstrated in “domain-swapping” experiments. Genetic engineering techniques (Chapter 9) can join the proline- rich activation domain of CTF1 to the DNA-binding domain of Sp1 to create a protein that, like intact Sp1, binds to GC boxes on the DNA and activates transcription at a nearby promoter (as in Fig. 28-33b). The DNA-binding domain of Gal4p has similarly been replaced experimentally with the DNA-binding domain of the E. coli LexA repressor (of the SOS response; Fig. 28-21). This chimeric protein neither binds at UASG nor activates the yeast GAL genes (as would intact Gal4p) unless the UASG sequence in the DNA is replaced by the LexA recognition site. Eukaryotic Gene Expression Can Be Regulatedby Intercellular and Intracellular Signals The effects of steroid hormones (and of thyroid and retinoid hormones, which have a similar mode of action) provide additional well-studied examples of the modulation of eukaryotic regulatory proteins by direct interaction with molecular signals (see Fig. 12-34). Unlike other types of hormones, steroid hormones do not have to bind to plasma membrane receptors. Instead, they can interact with intracellular receptors that are transcription activators. Steroid hormones too hydrophobic to dissolve readily in the blood (estrogen, progesterone, and cortisol, for example) travel on specific carrier proteins from their point of release to their target tissues. In the target tissue, the hormone passes through the plasma membrane by simple diffusion. Once inside the cell, the hormone interacts with one of two types of steroid-binding nuclear receptor (Fig. 28-34). In both cases, the hormone-receptor complex acts by binding to highly specific DNA sequences called hormone response elements (HREs), thereby altering gene expression. Acting at these sites, the receptors act as transcription activators, recruiting coactivators and Pol II (plus its associated transcription factors) to trigger transcription of the gene. FIGURE 28-34 Mechanisms of steroid hormone receptor function. There are two types of steroid- binding nuclear receptors. (a) Monomeric type I receptors (NR) are found in the cytoplasm, in a complex with the heat shock protein Hsp70. Receptors for estrogen, progesterone, androgens, and glucocorticoids are of this type. When the steroid hormone binds, the Hsp70 dissociates and the receptor dimerizes, exposing a nuclear localization signal. The dimeric receptor, with hormone bound, migrates to the nucleus, where it binds to a hormone response element (HRE) and acts as a transcription activator. The activity of the receptor can be repressed by binding to an lncRNA (such as GAS5), which competes directly with binding to the HRE. (b) Type II receptors, by contrast, are always in the nucleus, bound to an HRE in the DNA and to a corepressor that renders the receptor inactive. The thyroid hormone receptor (TR) is of this type. The hormone migrates through the cytoplasm and diffuses across the nuclear membrane. In the nucleus it binds to a heterodimer consisting of the thyroid hormone receptor and the retinoid X receptor (RXR). A conformation change leads to dissociation of the corepressor, and the receptor then functions as a transcription activator. The DNA sequences (HREs) to which hormone-receptor complexes bind are similar in length and arrangement for the various steroid hormones, but they differ in sequence. Each receptor has a consensus HRE sequence (Table 28-4) to which the hormone-receptor complex binds well, with each consensus consisting of two six-nucleotide sequences, either contiguous or separated by three nucleotides, in tandem or in a palindromic arrangement. The hormone receptors have a highly conserved DNA-binding domain with two zinc fingers (Fig. 28-35). The hormone-receptor complex binds to the DNA as a dimer, with the zinc finger domains of each monomer recognizing one of the six-nucleotide sequences. The ability of a given hormone to act through the hormone-receptor complex to alter the expression of a specific gene depends on the exact sequence of the HRE, its position relative to the gene, and the number of HREs associated with the gene. FIGURE 28-35 Typical steroid hormone receptors. These receptor proteins have a binding site for the hormone, a DNA-binding domain, and a region that activates transcription of the regulated gene. The highly conserved DNA-binding domain has two zinc fingers. The sequence shown here is that for the estrogen receptor, but the residues in bold type are common to all steroid hormone receptors. TABLE 28-4 Hormone Response Elements (HREs) Bound by Steroid-Type Hormone Receptors Receptor HRE consensus sequence bound Androgen GG(A/T)ACAN2TGTTCT Glucocorticoid GGTACAN3TGTTCT Retinoic acid (some) AGGTCAN5AGGTCA Vitamin D AGGTCAN3AGGTCA Thyroid hormone AGGTCAN3AGGTCA RX AGGTCANAGGTCANAG GTCANAGGTCA a b N represents any nucleotide. Forms a dimer with the retinoic acid receptor or vitamin D receptor. The ligand-binding region of the receptor protein — always at the carboxyl terminus — is specific to the particular receptor. For example, in the ligand- binding region, the glucocorticoid receptor is only 30% similar to the estrogen receptor and 17% similar to the thyroid hormone receptor. The size of the ligand- binding region varies dramatically; in the vitamin D receptor it has only 25 amino acid residues, whereas in the mineralocorticoid receptor it has 603 residues. Mutations that change one amino acid residue in these regions can result in loss of responsiveness to a specific hormone. Some humans unable to respond to cortisol, testosterone, vitamin D, or thyroxine have mutations of this type. The lncRNAs introduce another dimension to regulation by hormone receptors. An lncRNA called GAS5 (growth arrest specific 5) inhibits transcriptional activation by the glucocorticoid receptor by directly competing with DNA for receptor binding. GAS5 also inhibits activity of the closely related androgen, progesterone, and mineralocorticoid receptors. In addition, GAS5 interacts with and sequesters an miRNA called miR-21, which interacts with and inhibits the activity of some regulatory proteins that act as tumor suppressors. Expression of GAS5 is suppressed in a wide range of tumors, resulting in increased expression of steroid hormones, higher levels of active miR-21, and faster tumor growth. Low GAS5 levels thus correlate with worsened outcomes for cancer patients, making this lncRNA a subject of intense ongoing investigation. Some hormone receptors, including the human progesterone receptor, activate transcription with the aid of a different lncRNA of ~700 nucleotides that acts as a coactivator — steroid receptor RNA activator (SRA). SRA is part of a ribonucleoprotein complex, but it is the RNA component that is required for transcription coactivation. The detailed set of interactions between SRA and other components of the regulatory systems for these genes remains to be worked out. a b Regulation Can Result from Phosphorylationof Nuclear Transcription Factors We noted in Chapter 12 that the effects of insulin on gene expression are mediated by a series of steps leading ultimately to the activation of a protein kinase in the nucleus that phosphorylates specific DNA-binding proteins, thereby altering their ability to act as transcription factors (see Fig. 12-22). This general mechanism mediates the effects of many nonsteroid hormones. For example, the β -adrenergic pathway that leads to elevated levels of cytosolic cAMP, which acts as a second messenger in both eukaryotes and bacteria (Fig. 28-18), also affects the transcription of a set of genes, each of which is located near a specific DNA sequence called a cAMP response element (CRE). The catalytic subunit of protein kinase A, released when cAMP levels rise (see Fig. 12-6), enters the nucleus and phosphorylates a nuclear protein, the CRE-binding protein (CREB). When phosphorylated, CREB binds to CREs near certain genes and acts as a transcription factor, turning on expression of these genes. Many Eukaryotic mRNAs Are Subject toTranslational Repression Regulation at the level of translation assumes a much more prominent role in eukaryotes than in bacteria and is observed in a range of cellular situations. In contrast to the tight coupling of transcription and translation in bacteria, the transcripts generated in a eukaryotic nucleus must be processed and transported to the cytoplasm before translation. This can impose a significant delay on the appearance of a protein. When a rapid increase in protein production is needed, a translationally repressed mRNA already in the cytoplasm can be activated for translation without delay. Translational regulation may play an especially important role in regulating certain very long eukaryotic genes (a few are measured in the millions of base pairs), for which transcription and mRNA processing can require many hours. Some genes are regulated at both the transcriptional and translational stages, with the latter playing a role in the fine-tuning of cellular protein levels. In some non-nucleated cells, such as reticulocytes (immature erythrocytes), transcriptional control is entirely unavailable and translational control of stored mRNAs becomes essential. As described below, translational controls can also have spatial significance during development, when the regulated translation of prepositioned mRNAs creates a local gradient of the protein product. Eukaryotes have at least four main mechanisms of translational regulation: 1. Translation initiation factors are subject to phosphorylation by protein kinases. The phosphorylated forms are o en less active and cause a general depression of translation in the cell. 2. Some proteins bind directly to mRNA and act as translational repressors, many of them binding at specific sites in the 3′ untranslated region (3′UTR). So positioned, these proteins interact with other translation initiation factors bound to the mRNA, or with the 40S ribosomal subunit, to prevent translation initiation (Fig. 28-36). 3. Binding proteins, present in eukaryotes from yeast to mammals, disrupt the interaction between eIF4E and eIF4G (see Fig. 27-27). The mammalian versions are known as 4E-BPs (eIF4E binding proteins). When cell growth is slow, these proteins limit translation by binding to the site on eIF4E that normally interacts with eIF4G. When cell growth resumes or increases in response to growth factors or other stimuli, the binding proteins are inactivated by protein kinase–dependent phosphorylation. 4. RNA-mediated regulation of gene expression o en occurs at the level of translational repression, o en by the binding of ncRNAs to mRNAs. FIGURE 28-36 Translational regulation of eukaryotic mRNA. One of the most important mechanisms for translational regulation in eukaryotes is the binding of translational repressors (RNA-binding proteins) to specific sites in the 3' untranslated region (3'UTR) of the mRNA. These proteins interact with eukaryotic initiation factors or with the ribosome to prevent or slow translation. The variety of translational regulation mechanisms provides flexibility, allowing focused repression of a few mRNAs or global regulation of all cellular translation. Translational regulation has been particularly well studied in reticulocytes. One such mechanism in these cells involves eIF2, the initiation factor that binds to the initiator tRNA and conveys it to the ribosome; when Met-tRNA has bound to the P site, the factor eIF2B binds to eIF2, recycling it with the aid of GTP binding and hydrolysis. The maturation of reticulocytes includes destruction of the cell nucleus, leaving behind a plasma membrane packed with hemoglobin. Messenger RNAs deposited in the cytoplasm before the loss of the nucleus allow for the replacement of hemoglobin. When reticulocytes become deficient in iron or heme, the translation of globin mRNAs is repressed. A protein kinase called HCR (hemin-controlled repressor) is then activated, catalyzing the phosphorylation of eIF2. When phosphorylated, eIF2 forms a stable complex with eIF2B that sequesters the eIF2, making it unavailable for participation in translation. In this way, the reticulocyte coordinates the synthesis of globin with the availability of heme. Posttranscriptional Gene Silencing IsMediated by RNA Interference In higher eukaryotes, including nematodes, fruit flies, plants, and mammals, microRNAs (miRNAs) mediate the silencing of many genes. In a phenomenon first described and explained by Craig Mello and Andrew Fire, the RNAs function by interacting with mRNAs, o en in the 3'UTR, resulting in either degradation of the mRNA or inhibition of translation. In either case, the mRNA, and thus the gene that produces it, is silenced. This form of gene regulation controls developmental timing in at least some organisms. It is also used as a mechanism to protect against invading RNA viruses (particularly important in plants, which lack an immune system) and to control the activity of transposons. In addition, small RNA molecules may play a critical (as yet undefined) role in the formation of heterochromatin. Many miRNAs are present only transiently during development, and these are sometimes referred to as small temporal RNAs (stRNAs). Thousands of different miRNAs have been identified in higher eukaryotes, and they may affect the regulation of a third of mammalian genes. They are transcribed as precursor RNAs ~70 nucleotides long, with internally complementary sequences that form hairpinlike structures. Details of the pathway for processing of miRNAs were described in Fig. 26-26). The precursors are cleaved by endonucleases such as Drosha and Dicer to form short duplexes of 20 to 25 nucleotides. One strand of the processed miRNA is transferred to the target mRNA (or to a viral or transposon RNA), leading to inhibition of translation or degradation of the mRNA (Fig. 28-37a). Some miRNAs bind to and affect a single mRNA and thus affect expression of only one gene. Others interact with multiple mRNAs and form the mechanistic core of regulons that coordinate the expression of multiple genes. FIGURE 28-37 Gene silencing by RNA interference. (a) Small temporal RNAs (stRNAs, a class of miRNAs) are generated by Dicer-mediated cleavage of longer precursors that fold to create duplex regions. The stRNAs then bind to mRNAs, leading to degradation of mRNA or inhibition of translation. (b) Double-stranded RNAs designed to interact with a particular target and to function as Dicer substrates can be constructed and introduced into a cell. Dicer processes the duplex RNAs into small interfering RNAs (siRNAs), which interact with the target mRNA. Again, either the mRNA is degraded or translation is inhibited. This gene regulation mechanism has an interesting and very useful practical side. If an investigator introduces into an organism a duplex RNA molecule corresponding in sequence to virtually any mRNA, Dicer cleaves the duplex into short segments, called small interfering RNAs (siRNAs). These bind to the mRNA and silence it (Fig. 28-37b). The process is known as RNA interference (RNAi). In plants, almost any gene can be effectively shut down in this way. Nematodes can readily ingest entire functional RNAs, and simply introducing the duplex RNA into the worm’s diet produces very effective suppression of the target gene. The technique is an important tool in the ongoing efforts to study gene function, because it can disrupt gene function without creating a mutant organism. The procedure can be applied to humans as well. Laboratory-produced siRNAs have been used to block HIV and poliovirus infections in cultured human cells for a week or so at a time. The wider application of RNAi-based pharmaceuticals was initially stymied by the difficulty inherent in delivering RNAi molecules to their required target, given the many nucleases that degrade RNA in human tissues. With recent advances in delivery methods, there are now more than a dozen RNAi pharmaceuticals in advanced clinical trials to treat a range of conditions, from familial amyloidotic polyneuropathy to viral infections and cancer. RNA-Mediated Regulation of Gene ExpressionTakes Many Forms in Eukaryotes All RNAs (regardless of their length) that do not encode proteins, including rRNAs and tRNAs, come under the general designation of ncRNAs. Mammalian genomes encode more ncRNAs than coding mRNAs. The ncRNAs in eukaryotes include miRNAs, described above; snRNAs, involved in RNA splicing (see Fig. 26- 16); snoRNAs, involved in rRNA modification (see Fig. 26-24); and lncRNAs, already encountered in this chapter. Not surprisingly, additional functional classes of ncRNAs are still being discovered. Here we describe a few more examples of ncRNAs that participate in gene regulation, which are designated lncRNAs when their length exceeds 200 nucleotides. Heat shock factor 1 (HSF1) is an activator protein that, in nonstressed cells, exists as a monomer bound by the chaperone Hsp90. Under stress conditions, HSF1 is released from Hsp90 and trimerizes. The HSF1 trimer binds to DNA and activates transcription of genes encoding products required to deal with the stress. An lncRNA called HSR1 (heat shock RNA 1; ∼600 nucleotides) stimulates HSF1 trimerization and DNA binding. HSR1 does not act alone; it functions in a complex with the translation elongation factor eEF1A. Additional RNAs affect transcription in a variety of ways. A 331 nucleotide lncRNA called 7SK, abundant in mammals, binds to the Pol II transcription elongation factor pTEFb (see Table 26-2) and represses transcript elongation. The ncRNA B2 (∼178 nucleotides) binds directly to Pol II during heat shock and represses transcription. The B2-bound Pol II assembles into stable PICs, but transcription is blocked. The mechanism that allows HSF1-responsive genes to be expressed in the presence of B2 remains to be worked out. The recognized roles of ncRNAs in gene expression and in many other cellular processes are rapidly expanding. At the same time, the study of the biochemistry of gene regulation is becoming much less protein-centric. Development Is Controlled by Cascades ofRegulatory Proteins For sheer complexity and intricacy of coordination, the patterns of gene regulation that bring about development of a zygote into a multicellular animal or plant have no peer. Development requires transitions in morphology and protein composition that depend on tightly coordinated changes in expression of the genome. More genes are expressed during early development than in any other part of the life cycle. For example, in the sea urchin, an oocyte has about 18,500 different mRNAs, compared with about 6,000 different mRNAs in the cells of a typical differentiated tissue. The mRNAs in the oocyte give rise to a cascade of events that regulate the expression of many genes across both space and time. Several organisms have emerged as important model systems for the study of development, because they are easy to maintain in a laboratory and have relatively short generation times. These include nematodes, fruit flies, zebra fish, mice, and the plant Arabidopsis. Here, we provide a brief discussion of the development of fruit flies. Our understanding of the molecular events during development of Drosophila melanogaster is particularly well advanced and can be used to illustrate patterns and principles of general significance. The life cycle of the fruit fly includes complete metamorphosis during its progression from an embryo to an adult (Fig. 28-38). Among the most important characteristics of the embryo are its polarity (the anterior and posterior parts of the animal are readily distinguished, as are its dorsal and ventral surfaces) and its metamerism (the embryo body is made up of serially repeating segments, each with characteristic features). During development, these segments become organized into a head, thorax, and abdomen. Each segment of the adult thorax has a different set of appendages. Development of this complex pattern is under genetic control, and a variety of pattern-regulating genes have been discovered that greatly affect the organization of the body. FIGURE 28-38 Life cycle of the fruit fly Drosophila melanogaster. Drosophila undergoes a complete metamorphosis, which means that the adult insect is radically different in form from its immature stages, a transformation that requires extensive alterations during development. By the late embryonic stage, segments have formed, each containing specialized structures from which the various appendages and other features of the adult fly will develop. The Drosophila egg, along with 15 nurse cells, is surrounded by a layer of follicle cells (Fig. 28-39). As the egg cell forms (before fertilization), mRNAs and proteins originating in the nurse and follicle cells are deposited in the egg cell, where some play a critical role in development. Once a fertilized egg is laid, its nucleus divides and the nuclear descendants continue to divide in synchrony every 6 to 10 min. Plasma membranes are not formed around the nuclei, which are distributed within the egg cytoplasm, forming a syncytium. Between the eighth and eleventh rounds of nuclear division, the nuclei migrate to the outer layer of the egg, forming a monolayer of nuclei surrounding the common yolk-rich cytoplasm; this is the syncytial blastoderm. A er a few additional divisions, membrane invaginations surround the nuclei to create a layer of cells that form the cellular blastoderm. At this stage, the mitotic cycles in the various cells lose their synchrony. The developmental fate of the cells is determined by the mRNAs and proteins originally deposited in the egg by the nurse and follicle cells. FIGURE 28-39 Early development in Drosophila. During development of the egg, maternal mRNAs and proteins are deposited in the developing oocyte (unfertilized egg cell) by nurse cells and follicle cells. A er fertilization, the nuclei of the egg divide in synchrony within the common cytoplasm (syncytium), then migrate to the periphery. Membrane invaginations surround the nuclei to create a monolayer of cells at the periphery; this is the cellular blastoderm stage. During the early nuclear divisions, several nuclei at the far posterior become pole cells, which later become the germ-line cells. Proteins that, through changes in local concentration or activity, cause the surrounding tissue to take up a particular shape or structure are sometimes referred to as morphogens; they are the products of pattern-regulating genes. As defined by Christiane Nüsslein-Volhard, Edward B. Lewis, and Eric F. Wieschaus, three major classes of pattern-regulating genes — maternal, segmentation, and homeotic genes — function in successive stages of development to specify the basic features of the Drosophila embryo body. Maternal genes are expressed in the unfertilized egg, and the resulting maternal mRNAs remain dormant until fertilization. These provide most of the proteins needed in very early development, until the cellular blastoderm is formed. Some of the proteins encoded by maternal mRNAs direct the spatial organization of the developing embryo at early stages, establishing its polarity. Segmentation genes, transcribed a er fertilization, direct the formation of the proper number of body segments. At least three subclasses of segmentation genes act at successive stages: gap genes divide the developing embryo into several broad regions; pair-rule genes, together with segment polarity genes, define 14 stripes that become the 14 segments of a normal embryo. Homeotic genes are expressed still later; they specify which organs and appendages will develop in particular body segments. If all cells divided to produce two identical daughter cells, multicellular organisms would never be more than a ball of identical cells. A key event in very early development is establishment of mRNA and protein gradients along the body axes, producing asymmetric cell divisions and different cell fates. Some maternal mRNAs have protein products that diffuse through the cytoplasm to create an asymmetric distribution in the egg. Different cells in the cellular blastoderm therefore inherit different amounts of these proteins, setting the cells on different developmental paths. An example is the bicoid gene. The bicoid gene product is a major anterior morphogen. The mRNA from the bicoid gene is synthesized by nurse cells and deposited in the unfertilized egg near its anterior pole. Translated soon a er fertilization, the Bicoid protein diffuses through the cell to create, by the seventh nuclear division, a concentration gradient radiating out from the anterior pole (Fig. 28-40). The Bicoid protein contains a homeodomain (p. 1062), encoded by a gene sequence motif called a homeobox and found in many proteins involved in regulating development. Bicoid is multifunctional — a transcription factor that activates the expression of several segmentation genes and also a translational repressor that inactivates certain mRNAs. The amount of Bicoid protein in various parts of the embryo increases or decreases the expression of other genes in a threshold-dependent manner. As its concentration varies along its gradient, interactions of the bicoid gene product with proteins and RNAs encoded by the nanos, pumilio, caudal, hunchback, and other regulatory genes also vary to produce different effects along the axis of the developing organism. This results in different developmental fates of cells in the blastoderm, depending on their location. FIGURE 28-40 Distribution of a maternal gene product in a Drosophila egg. (a) Micrograph of an immunologically stained egg (top), showing distribution of the bicoid (bcd) gene product. The graph shows stain intensity along the length of the egg. This distribution is essential for normal development of the anterior structures in the larva (bottom). (b) If the bcd gene is not expressed by the mother (bcd−/bcd− mutant) and thus no bicoid mRNA is deposited in the egg, the resulting larva has two posteriors (and soon dies). [Republished with permission of Elsevier, from “The bicoid protein determines position in the Drosophila embryo in a concentration-dependent manner” by Wolfgang Driever and Christiane Nüsslein-Volhard, Cell 54:83–93, July 1, 1988; permission conveyed through Copyright Clearance Center, Inc.] Humans do not resemble fruit flies, but the genes and mechanisms involved in development are nevertheless highly conserved. This can be seen in the gene clusters encoding the homeotic or Hox genes, the latter term derived from homeobox. Drosophila has one such cluster, while humans have four (Fig. 28-41), with the genes within the clusters remarkably similar from nematodes to humans. FIGURE 28-41 The Hox gene clusters and their effects on development. (a) Each Hox gene in the fruit fly is responsible for the development of structures in a defined part of the body and is expressed in defined regions of the embryo, as labeled. (b) Drosophila has one Hox gene cluster; the human genome has four. Many of these genes are highly conserved in multicellular animals. Evolutionary relationships, as indicated by sequence alignments, between genes in the fruit fly Hox gene cluster and those in the mammalian Hox gene clusters are shown by dashed lines. Similar relationships among the four sets of mammalian Hox genes are indicated by vertical alignment. [(a) Information from F. R. Turner, University of Indiana, Department of Biology.] The many regulatory genes in these three classes direct the development of an adult fly, with a head, thorax, and abdomen, with the proper number of segments, and with the correct appendages on each segment. Although embryogenesis takes about a day to complete, all these genes are activated during the first four hours. Some mRNAs and proteins are present for only a few minutes at specific points during this period. Some of the genes code for transcription factors that affect the expression of other genes in a kind of developmental cascade. Regulation at the level of translation also occurs, and many of the regulatory genes encode translational repressors, most of which bind to the 3′UTR of the mRNA (Fig. 28-36). Because many mRNAs are deposited in the egg long before their translation is required, translational repression provides an especially important avenue for regulation in developmental pathways. Many of the principles of development outlined above apply to other eukaryotes, from nematodes to humans. Some of the regulatory proteins are conserved. For example, the products of the homeobox-containing genes HOXA7 in mouse and antennapedia in fruit fly differ in only one amino acid residue. Of course, although the molecular regulatory mechanisms may be similar, many of the ultimate developmental events are not conserved (humans do not have wings or antennae). The different outcomes are brought about by differences in the downstream target genes controlled by the Hox genes. The discovery of structural determinants with identifiable molecular functions is the first step in understanding the molecular events underlying development. As more genes and their protein products are discovered, the biochemical side of this vast puzzle will be elucidated in increasingly rich detail. Stem Cells Have Developmental PotentialThat Can Be Controlled If we can understand development, and the mechanisms of gene regulation behind it, we can control it. An adult human has many different types of tissues. Many of the cells are terminally differentiated and no longer divide. If an organ malfunctions due to disease, or a limb is lost in an accident, the tissues are not readily replaced. Most cells, because of the regulatory processes in place, or even because of the loss of some or all of the genomic DNA, are not easily reprogrammed. Medical science has made organ transplants possible, but organ donors are a limited resource and organ rejection remains a major medical problem. If humans could regenerate their own organs or limbs or nervous tissue, rejection would no longer be an issue. Cures for kidney failure or neurodegenerative disorders could become reality. The key to tissue regeneration lies in stem cells — cells that have retained the capacity to differentiate into various tissues. In humans, a er an egg is fertilized, the first few cell divisions create a ball of totipotent cells, called the morula, that have the capacity to differentiate individually into any tissue or even into a complete organism (Fig. 28-42). Continued cell division produces a hollow ball, the blastocyst. The outer cells of the blastocyst eventually form the placenta. The inner layers form the germ layers of the developing fetus — the ectoderm, mesoderm, and endoderm. These cells are pluripotent: they can give rise to cells of all three germ layers and can differentiate into many types of tissues. However, they cannot differentiate into a complete organism. Some of these cells are unipotent: they can develop into only one type of cell and/or tissue. It is the pluripotent cells of the blastocyst, the embryonic stem cells, that are currently used in embryonic stem cell research. FIGURE 28-42 Totipotent and pluripotent stem cells. Cells at the morula stage are totipotent and have the capacity to differentiate into a complete organism. The source of pluripotent embryonic stem cells is the cells in the cavity of the blastocyst. Pluripotent cells give rise to many tissue types but cannot form complete organisms. Stem cells have two functions: to replenish themselves and, at the same time, provide cells that can differentiate. These tasks are accomplished in multiple ways (Fig. 28-43a). All or parts of the stem cell population can, in principle, be involved in replenishment, differentiation, or both.
FIGURE 28-43 Stem cell proliferation versus differentiation and development. Stem cells must strike a balance between self-renewal and differentiation. (a) Some possible cell division patterns that allow the replenishment of stem cells and production of some differentiated cells. Each cell may produce one stem cell and one differentiated cell, or two differentiated cells, or two stem cells in defined parts of the tissue or culture. Or a gradient of growth conditions can be established, with cell fates differing from one end of the gradient to the other. (b) Establishing a developmental niche through stem cell contact with a cell or group of cells. Molecular signals provided by the niche cells (in this case, in plants, a distal tip cell) help orient the mitotic spindle for stem cell division and ensure that one daughter cell retains stem cell properties. Other types of stem cells can potentially be used for medical benefit. In the adult organism, adult stem cells, as products of additional differentiation, have a more limited potential for further development than do embryonic stem cells. For example, the hematopoietic stem cells of bone marrow can give rise to many types of blood cells and also to cells with the capacity to regenerate bone. They are referred to as multipotent. However, these cells cannot differentiate into a liver or kidney or neuron. Adult stem cells are o en said to have a niche, a microenvironment that promotes stem cell maintenance while allowing differentiation of some daughter cells as replacements for cells in the tissue they serve (Fig. 28-43b). Hematopoietic stem cells in the bone marrow occupy a niche in which signaling from neighboring cells and other cues maintain the stem cell lineage. At the same time, some daughter cells differentiate to provide needed blood cells. Understanding the niche in which stem cells operate, and the signals the niche provides, is essential in efforts to harness the potential of stem cells for tissue regeneration. The identification and culturing of pluripotent stem cells from human blastocysts was reported by James Thomson and colleagues in 1998. This advance led to the long-term availability of established cell lines for research. All stem cells present problems for human medical applications. Adult stem cells have a limited capacity to regenerate tissues, are generally present in small numbers, and are hard to isolate from an adult human. Embryonic stem cells have much greater differentiation potential and can be cultured to generate large numbers of cells, but their use is accompanied by ethical concerns related to the necessary destruction of human embryos. Identifying a source of plentiful and medically useful stem cells that does not raise such concerns remains a major goal of medical research. Our ability to culture stem cells (i.e., maintain them in an undifferentiated state), and to manipulate them to grow and differentiate into particular tissues, is very much a function of our understanding of developmental biology. Thus far, mouse and human embryonic stem cells have been used for most research. Although both types of stem cells are pluripotent, they require very different culture conditions, optimized to allow cell division indefinitely without differentiation. Mouse embryonic stem cells are grown on a layer of gelatin and require the presence of leukemia inhibitory factor (LIF). Human embryonic stem cells are grown on a feeder layer of mouse embryonic fibroblasts and require basic fibroblast growth factor (bFGF, or FGF2). The use of a feeder cell layer implies that the mouse cells are providing a diffusible product or some surface signal, not yet known, that is needed by human stem cells to either promote cell division or prevent differentiation. A significant advance, reported in 2007, centers on success in reversing differentiation. In effect, skin cells — first from mice, then from humans — have been reprogrammed to take on the characteristics of pluripotent stem cells. The reprogramming involves manipulations to get the cells to express at least four transcription factors, Oct4, Sox2, Nanog, and Lin28, all of which are known to help maintain the stem cell–like state. Gradual improvements in this technology may make the harvesting of embryonic stem cells unnecessary and provide a source of stem cells that is genetically matched to a prospective patient. Our discussion of developmental regulation and stem cells brings us full circle, back to a biochemical beginning. Evolution appropriately provides the first and last words of this book. If evolution is to generate the kind of changes in an organism that would render it a different species, it is the developmental program that must be affected. Developmental and evolutionary processes are closely allied, each informing the other (Box 28-1). The continuing study of biochemistry has everything to do with enriching the future of humanity and understanding our origins. BOX 28-1 Of Fins, Wings, Beaks, and Things South America has several species of seed-eating finches, commonly called grassquits. About 3 million years ago, a small group of grassquits, of a single species, took flight from the continent’s Pacific coast. Perhaps driven by a storm, they lost sight of land and traveled nearly 1,000 km. Small birds such as these might easily have perished on such a journey, but the smallest of chances brought this group to a newly formed volcanic island in an archipelago later to be known as the Galápagos. It was a virgin landscape with untapped plant and insect food sources, and the newly arrived finches survived. Over the years, new islands formed and were colonized by new plants and insects — and by the finches. The birds exploited the new resources on the islands, and groups of birds gradually specialized and diverged into new species. By the time Charles Darwin stepped onto the islands in 1835, many different finch species were to be found on the various islands of the archipelago, feeding on seeds, fruits, insects, pollen, or even blood. The diversity of living creatures was a source of wonder for humans long before scientists sought to understand its origins. The extraordinary insight handed down to us by Darwin, inspired in part by his encounter with the Galápagos finches, provided a broad explanation for the existence of organisms with a vast array of appearances and characteristics. It also gave rise to many questions about the mechanisms underlying evolution. Answers to those questions have started to appear, first through the study of genomes and nucleic acid metabolism in the last half of the twentieth century, and more recently through an emerging field nicknamed evo-devo — a blend of evolutionary and developmental biology. In its modern synthesis, the theory of evolution has two main elements: mutations in a population generate genetic diversity; natural selection then acts on this diversity to favor individuals with more useful genomic tools and to disfavor others. Mutations occur at significant rates in every individual’s genome, in every cell (see Section 8.3). Advantageous mutations in single-celled organisms or in the germ line of multicellular organisms can be inherited, and they are more likely to be inherited (that is, passed on to greater numbers of offspring) if they confer an advantage. It is a straightforward scheme. But many have wondered whether it is enough to explain, say, the many different beak shapes in the Galápagos finches or the diversity of size and shape among mammals. Until recent decades, there were several widely held assumptions about the evolutionary process: that many mutations and new genes would be needed to bring about a new physical structure, that more-complex organisms would have larger genomes, and that very different species would have few genes in common. All of these assumptions were wrong. Modern genomics has revealed that the human genome contains fewer genes than expected — not many more than the fruit fly genome and fewer than some amphibian genomes. The genomes of every mammal, from mouse to human, are surprisingly similar in the number, types, and chromosomal arrangement of genes. Meanwhile, evo-devo is telling us how complex and very different creatures can evolve within these genomic realities. In the late nineteenth century, English biologist William Bateson studied animals with homeotic mutations — creatures with body parts growing in the wrong location. Bateson used his observations to challenge the Darwinian notion that evolutionary change would have to be gradual. Recent studies of the genes that control organismal development have put an exclamation point on Bateson’s ideas. Subtle changes in regulatory patterns during development, reflecting just one or a few mutations, can result in startling physical changes and fuel surprisingly rapid evolution. The Galápagos finches provide a wonderful example of the link between evolution and development. There are at least 14 (some specialists list 15) species of Galápagos finches, distinguished in large measure by their beak structure. The ground finches, for example, have broad, heavy beaks adapted to crushing large, hard seeds. The cactus finches have longer, slender beaks ideal for probing cactus fruits and flowers (Fig. 1). Clifford Tabin and colleagues carefully surveyed a set of genes expressed during avian craniofacial development. They identified a single gene, Bmp4, whose expression level correlated with formation of the more robust beaks of the ground finches. More-robust beaks were also formed in chicken embryos when high levels of Bmp4 were artificially expressed in the appropriate tissues, confirming the importance of Bmp4. In a similar study, the formation of long, slender beaks was linked to the expression of calmodulin (see Fig. 12-17) in particular tissues at appropriate developmental stages. Thus, major changes in the shape and function of the beak can be brought about by subtle changes in the expression of just two genes involved in developmental regulation. Very few mutations are required, and the needed mutations affect regulation. New genes are not required. FIGURE 1 Evolution of new beak structures to exploit new food sources. In the Galápagos finches, the different beak structures of the cactus finch and the large ground finch, which feed on different, specialized food sources, were produced to a large extent by a few mutations that altered the timing and level of expression of just two genes: those encoding calmodulin (CaM) and Bmp4. [Information from A. Abzhanov et al., Nature 442:563, 2006, Fig. 4.] The system of regulatory genes that guides development is remarkably conserved among all vertebrates. Elevated expression of Bmp4 in the right tissue at the right time leads to more-robust jaw parts in zebrafish. The same gene plays a key role in tooth development in mammals. The development of eyes is triggered by the expression of a single gene, Pax6, in fruit flies and in mammals. The mouse Pax6 gene will trigger the development of fruit fly eyes in the fruit fly, and the fruit fly Pax6 gene will trigger the development of mouse eyes in the mouse. In each organism, these genes are part of the much larger regulatory cascade that ultimately creates the correct structures in the correct locations in each organism. The cascade is ancient; for example, the Hox genes (described in the text) have been part of the developmental program of multicellular eukaryotes for more than 500 million years. Subtle changes in the cascade can have large effects on development, and thus on the ultimate appearance, of the organism. These same subtle changes can fuel remarkably rapid evolution. For example, the 400 to 500 described species of cichlids (spiny-finned fish) in Lake Malawi and Lake Victoria on the African continent are all derived from one or a few populations that colonized each lake in the past 100,000 to 200,000 years. The Galápagos finches simply followed a path of evolution and change that living creatures have been traveling for billions of years. SUMMARY 28.3 Regulation of Gene Expressionin Eukaryotes In eukaryotes, large changes in chromatin structure accompany the expression of a gene. Transcriptionally inactive heterochromatin is opened up by chromatin remodeling proteins. These eject, replace, or modify nucleosomes to allow other proteins, mainly RNA polymerase components and regulators, to access sites required to initiate transcription. In eukaryotes, positive regulation is more common than negative regulation. Promoters for Pol II typically have a TATA box and Inr sequence, as well as multiple binding sites for transcription activators. The latter sites, sometimes located hundreds or thousands of base pairs away from the TATA box, are called upstream activator sequences in yeast and enhancers in higher eukaryotes. To regulate transcriptional activity generally requires large complexes of proteins. These include basal transcription factors, activators, coactivators, architectural regulators, and the enzymes that modify and remodel chromatin. The effects of transcription activators on Pol II are facilitated by coactivator protein complexes such as Mediator. The well-studied yeast genes involved in galactose metabolism provide examples of both positive and negative regulation in a eukaryote. The modular structures of the activators have distinct activation and DNA- binding domains. Hormones affect the regulation of gene expression in one of two ways. Steroid hormones interact directly with intracellular receptors that are DNA-binding regulatory proteins; binding of the hormone has either positive or negative effects on the transcription of targeted genes. Nonsteroid hormones bind to cell surface receptors, triggering a signaling pathway that can lead to phosphorylation of a regulatory protein, affecting its activity. Translational regulation is particularly important in eukaryotes. Modulating the translation of an mRNA stored in the cytoplasm affords a more rapid response to cellular challenges than de novo assembly of transcription complexes and mRNA synthesis. MicroRNAs (miRNAs) are involved in gene silencing during development and as an antiviral defense. The pathway for processing miRNAs from larger precursors has been harnessed by researchers to develop the gene-silencing technology called RNA interference, or RNAi. Regulation mediated by ncRNAs plays an important role in eukaryotic gene expression, with known mechanisms including interactions with proteins, mRNA, and other ncRNAs. Development of a multicellular organism presents the most complex regulatory challenge. The fate of cells in the early embryo is determined by establishment of anterior-posterior and dorsal-ventral gradients of proteins that act as transcription activators or translational repressors, regulating the genes required for development of structures appropriate to a particular part of the organism. Sets of regulatory genes operate in temporal and spatial succession, transforming given areas of an egg cell into predictable structures in the adult organism. The differentiation of stem cells into functional tissues can be controlled by extracellular signals and conditions. Chapter Review KEY TERMS Terms in bold are defined in the glossary. induction repression housekeeping genes specificity factor repressor activator noncoding RNA (ncRNA) long noncoding RNA (lncRNA) operator negative regulation positive regulation architectural regulator operon helix-turn-helix zinc finger homeodomain homeobox RNA recognition motif (RRM) leucine zipper basic helix-loop-helix combinatorial control cAMP receptor protein (CRP) regulon transcription attenuation translational repressor stringent response riboswitch phase variation chromatin remodeling SWI/SNF histone acetyltransferases (HATs) enhancers upstream activator sequences (UASs) transcription activators coactivators basal transcription factors preinitiation complex (PIC) high mobility group (HMG) Mediator TATA-binding protein (TBP) hormone response element (HRE) RNA interference (RNAi) polarity metamerism maternal genes maternal mRNAs segmentation genes gap genes pair-rule genes segment polarity genes homeotic genes totipotent pluripotent unipotent embryonic stem cells PROBLEMS 1. Effect of mRNA and Protein Stability on Regulation E. coli cells are growing in a medium with glucose as the sole carbon source. A er the sudden addition of tryptophan, the cells continue to grow and divide every 30 min. Describe (qualitatively) how the amount of tryptophan synthase activity in the cells changes with time under each condition: a. The trp mRNA is stable (degrades slowly over many hours). b. The trp mRNA degrades rapidly, but tryptophan synthase is stable. c. The trp mRNA and tryptophan synthase both degrade rapidly. 2. The Lactose Operon A researcher engineers a lac operon on a plasmid but inactivates all parts of the lac operator (lacO) and the lac promoter, replacing them with the binding site for the LexA repressor (which acts in the SOS response) and a promoter regulated by LexA. She then introduces the plasmid into E. coli cells that have a lac operon with an inactive lacZ gene. Under what conditions will these transformed cells produce β -galactosidase? 3. Negative Regulation Describe the probable effects on gene expression in the lac operon of each mutation: a. Mutation in the lac operator that deletes most of O1 b. Mutation in the lacI gene that eliminates binding of repressor to operator c. Mutation in the promoter near position −10 that increases its similarity to the E. coli consensus sequence d. Mutation in the lacI gene that eliminates binding of repressor to lactose e. Mutation in the promoter near position −10 that decreases its similarity to the E. coli consensus sequence 4. Specific DNA Binding by Regulatory Proteins A typical bacterial repressor protein discriminates between its specific DNA-binding site (operator) and nonspecific DNA by a factor of 104 to 106. About 10 molecules of repressor per cell are sufficient to ensure a high level of repression. Assume that a very similar repressor existed in a human cell, with a similar specificity for its binding site. How many copies of the repressor would a human cell require to elicit a level of repression similar to that in the bacterial cell? (Hint: The E. coli genome contains about 4.6 million bp; the human haploid genome has about 3.2 billion bp.) 5. Repressor Concentration in E. coli The dissociation constant for a particular repressor-operator complex is very low, about 10−13 M . An E. coli cell (volume 2× 10−12 mL) contains 10 copies of the repressor. Calculate the cellular concentration of the repressor protein. How does this value compare with the dissociation constant of the repressor- operator complex? What is the significance of this answer? 6. Catabolite Repression E. coli cells are growing in a medium that contains lactose but no glucose. Indicate whether each of the following changes or conditions would increase, decrease, or not change the expression of the lac operon. It may be helpful to draw a model depicting what is happening in each situation. a. Addition of a high concentration of glucose b. A mutation that prevents dissociation of the Lac repressor from the operator c. A mutation that completely inactivates β - galactosidase d. A mutation that completely inactivates galactoside permease e. A mutation that prevents binding of CRP to its binding site near the lac promoter 7. Transcription Attenuation How would each manipulation of the leader region of the trp mRNA affect transcription of the E. coli trp operon? a. Increasing the distance (number of bases) between the leader peptide gene and sequence 2 b. Increasing the distance between sequences 2 and 3 c. Removing sequence 4 d. Changing the two Trp codons in the leader peptide gene to His codons e. Eliminating the ribosome-binding site for the gene that encodes the leader peptide f. Changing several nucleotides in sequence 3 so that it can base-pair with sequence 4 but not with sequence 2 8. Repressors and Repression How would a mutation in the lexA gene that prevents autocatalytic cleavage of the LexA protein affect the SOS response in E. coli? 9. Regulation by Recombination In the phase variation system of Salmonella, what would happen to the cell if the Hin recombinase became more active and promoted recombination (DNA inversion) several times in each cell generation? 10. Initiation of Transcription in Eukaryotes A biochemist discovers a new RNA polymerase activity in crude extracts of cells derived from an exotic fungus. The RNA polymerase initiates transcription only from a single, highly specialized promoter. As the biochemist purifies the polymerase, its activity declines, and the purified enzyme is completely inactive unless he adds crude extract to the reaction mixture. Suggest an explanation for these observations. 11. Functional Domains in Regulatory Proteins A biochemist replaces the DNA-binding domain of the yeast Gal4 protein with the DNA-binding domain from the Lac repressor and finds that the engineered protein no longer regulates transcription of the GAL genes in yeast. Draw a diagram of the different functional domains you would expect to find in the Gal4 protein and in the engineered protein. Why does the engineered protein no longer regulate transcription of the GAL genes? What might be done to the DNA-binding site recognized by this chimeric protein to make it functional in activating transcription of GAL genes? 12. Nucleosome Modification during Transcriptional Activation To prepare genomic regions for transcription, cells acetylate and methylate certain histones in the resident nucleosomes at specific locations. Once transcription is no longer needed, cells need to reverse these modifications. In mammals, peptidylarginine deiminases (PADIs) reverse the methylation of Arg residues in histones. The reaction promoted by these enzymes does not yield unmethylated arginine. Instead, it produces citrulline residues in the histone. What is the other product of the reaction? Suggest a mechanism for this reaction. 13. Gene Repression in Eukaryotes Explain why repression of a eukaryotic gene by an RNA might be more efficient than repression by a protein repressor. 14. Inheritance Mechanisms in Development A Drosophila egg that is bcd−/bcd− may develop normally, but the adult fruit fly will not be able to produce viable offspring. Explain. DATA ANALYSIS PROBLEM 15. Engineering a Genetic Toggle Switch in E. coli Gene regulation is o en described as an “on or off” phenomenon: a gene is either fully expressed or not expressed at all. In fact, repression and activation of a gene involve ligand- binding reactions, so genes can show intermediate levels of expression when intermediate levels of regulatory molecules are present. For example, for the E. coli lac operon, consider the binding equilibrium of the Lac repressor, operator DNA, and inducer (see Fig. 28-8). Although this is a complex, cooperative process, it can be approximately modeled by the following reaction (R is repressor; IPTG is the inducer isopropyl-β -D-thiogalactoside): R + IPT G Kd = 10−4M ⇌ R ∙IPT G Free repressor, R, binds to the operator and prevents transcription of the lac operon; the R • IPTG complex does not bind to the operator, and thus transcription of the lac operon can proceed. a. Using Equation 5-8, we can calculate the relative expression level of the proteins of the lac operon as a function of [IPTG]. Use this calculation to determine over what range of [IPTG] the expression level would vary from 10% to 90%. b. Describe qualitatively the level of lac operon proteins present in an E. coli cell before, during, and a er induction with IPTG. You do not need to give the amounts at exact times — just indicate the general trends. Gardner, Cantor, and Collins (2000) set out to make a “genetic toggle switch” — a gene-regulatory system with two key characteristics, A and B, of a light switch. (A) It has only two states: it is either fully on or fully off; it is not a dimmer switch. In biochemical terms, the target gene or gene system (operon) is either fully expressed or not expressed at all; it cannot be expressed at an intermediate level. (B) Both states are stable: although you must use a finger to flip the light switch from one state to the other, once you have flipped it and removed your finger, the switch stays in that state. In biochemical terms, exposure to an inducer or some other signal changes the expression state of the gene or operon, and it remains in that state once the signal is removed. c. Explain how the lac operon lacks both characteristics A and B. To make their “toggle switch,” Gardner and coworkers constructed a plasmid from the following components: ori An origin of replication ampR A gene conferring resistance to the antibiotic ampicillin OPlac The operator-promoter region of the E. coli lac operon OPλ The operator-promoter region of λ phage lacI The gene encoding the lac repressor protein, LacI. In the absence of IPTG, this protein strongly represses OPlac; in the presence of IPTG, it allows full expression from OPlac. repts The gene encoding a temperature-sensitive mutant λ repressor protein, repts. At 37 °C, this protein strongly represses OPλ; at 42 °C, it allows full expression from OPλ. GFP The gene for green fluorescent protein (GFP), a highly fluorescent reporter protein (see Fig. 9-16) T Transcription terminator The investigators arranged these components, as shown in the following figure, so that the two promoters were reciprocally repressed: OPlac controlled expression of repts, and OPλ controlled expression of lacI. The state of this system was reported by the expression level of GFP, which was also under the control of OPlac. d. The constructed system has two states: GFP-on (high level of expression) and GFP-off (low level of expression). For each state, describe which proteins are present and which promoters are being expressed. e. Treatment with IPTG would be expected to toggle the system from one state to the other. From which state to which? Explain your reasoning. f. Treatment with heat (42 °C) would be expected to toggle the system from one state to the other. From which state to which? Explain your reasoning. g. Why would this plasmid be expected to have characteristics A and B as described above? To confirm that their construct did indeed exhibit these characteristics, Gardner and colleagues first showed that, once switched, the GFP expression level (high or low) was stable for long periods of time (characteristic B). Next, they measured the GFP level at different concentrations of the inducer IPTG, with the following results. They noticed that the average GFP expression level was intermediate at concentration X of IPTG. However, when they measured the GFP expression level in individual cells at [IPT G]= X, they found either a high level or a low level of GFP— no cells showed an intermediate level. h. Explain how this finding demonstrates that the system has characteristic A. What is happening to cause the bimodal distribution of expression levels at [IPT G]= X? Reference Gardner, T.S., C.R. Cantor, and J.J. Collins. 2000. Construction of a genetic toggle switch in Escherichia coli. Nature 403:339–342. Notes Chapter 14 From J. Loeb, The Dynamics of Living Matter, Columbia University Press, New York, 1906. 1
Stems are from the chapter Problems section; correct choices are drawn from Abbreviated Solutions to Problems (Appendix B) in the same edition.
1. Effect of mRNA and Protein Stability on Regulation E. coli cells are growing in a medium with glucose as the sole carbon source. A er the sudden addition of tryptophan, the cells continue to grow and divide every 30 min. Describe (qualitatively) how the amount of tryptophan synthase activity in the cells changes with time under each condition: a. The trp mRNA is stable (degrades slowly over many hours). b. The trp mRNA degrades rapidly, but tryptophan synthase is stable. c. The trp mRNA and tryptophan synthase both degrade rapidly.
2. The Lactose Operon A researcher engineers a lac operon on a plasmid but inactivates all parts of the lac operator (lacO) and the lac promoter, replacing them with the binding site for the LexA repressor (which acts in the SOS response) and a promoter regulated by LexA. She then introduces the plasmid into E. coli cells that have a lac operon with an inactive lacZ gene. Under what conditions will these transformed cells produce β -galactosidase?
3. Negative Regulation Describe the probable effects on gene expression in the lac operon of each mutation: a. Mutation in the lac operator that deletes most of O1 b. Mutation in the lacI gene that eliminates binding of repressor to operator c. Mutation in the promoter near position −10 that increases its similarity to the E. coli consensus sequence d. Mutation in the lacI gene that eliminates binding of repressor to lactose e. Mutation in the promoter near position −10 that decreases its similarity to the E. coli consensus sequence
4. Specific DNA Binding by Regulatory Proteins A typical bacterial repressor protein discriminates between its specific DNA-binding site (operator) and nonspecific DNA by a factor of 104 to 106. About 10 molecules of repressor per cell are sufficient to ensure a high level of repression. Assume that a very similar repressor existed in a human cell, with a similar specificity for its binding site. How many copies of the repressor would a human cell require to elicit a level of repression similar to that in the bacterial cell? (Hint: The E. coli genome contains about 4.6 million bp; the human haploid genome has about 3.2 billion bp.)
5. Repressor Concentration in E. coli The dissociation constant for a particular repressor-operator complex is very low, about 10−13 M . An E. coli cell (volume 2× 10−12 mL) contains 10 copies of the repressor. Calculate the cellular concentration of the repressor protein. How does this value compare with the dissociation constant of the repressor- operator complex? What is the significance of this answer?
6. Catabolite Repression E. coli cells are growing in a medium that contains lactose but no glucose. Indicate whether each of the following changes or conditions would increase, decrease, or not change the expression of the lac operon. It may be helpful to draw a model depicting what is happening in each situation. a. Addition of a high concentration of glucose b. A mutation that prevents dissociation of the Lac repressor from the operator c. A mutation that completely inactivates β - galactosidase d. A mutation that completely inactivates galactoside permease e. A mutation that prevents binding of CRP to its binding site near the lac promoter
7. Transcription Attenuation How would each manipulation of the leader region of the trp mRNA affect transcription of the E. coli trp operon? a. Increasing the distance (number of bases) between the leader peptide gene and sequence 2 b. Increasing the distance between sequences 2 and 3 c. Removing sequence 4 d. Changing the two Trp codons in the leader peptide gene to His codons e. Eliminating the ribosome-binding site for the gene that encodes the leader peptide f. Changing several nucleotides in sequence 3 so that it can base-pair with sequence 4 but not with sequence 2
8. Repressors and Repression How would a mutation in the lexA gene that prevents autocatalytic cleavage of the LexA protein affect the SOS response in E. coli?
9. Regulation by Recombination In the phase variation system of Salmonella, what would happen to the cell if the Hin recombinase became more active and promoted recombination (DNA inversion) several times in each cell generation?
10. Initiation of Transcription in Eukaryotes A biochemist discovers a new RNA polymerase activity in crude extracts of cells derived from an exotic fungus. The RNA polymerase initiates transcription only from a single, highly specialized promoter. As the biochemist purifies the polymerase, its activity declines, and the purified enzyme is completely inactive unless he adds crude extract to the reaction mixture. Suggest an explanation for these observations.
11. Functional Domains in Regulatory Proteins A biochemist replaces the DNA-binding domain of the yeast Gal4 protein with the DNA-binding domain from the Lac repressor and finds that the engineered protein no longer regulates transcription of the GAL genes in yeast. Draw a diagram of the different functional domains you would expect to find in the Gal4 protein and in the engineered protein. Why does the engineered protein no longer regulate transcription of the GAL genes? What might be done to the DNA-binding site recognized by this chimeric protein to make it functional in activating transcription of GAL genes?
12. Nucleosome Modification during Transcriptional Activation To prepare genomic regions for transcription, cells acetylate and methylate certain histones in the resident nucleosomes at specific locations. Once transcription is no longer needed, cells need to reverse these modifications. In mammals, peptidylarginine deiminases (PADIs) reverse the methylation of Arg residues in histones. The reaction promoted by these enzymes does not yield unmethylated arginine. Instead, it produces citrulline residues in the histone. What is the other product of the reaction? Suggest a mechanism for this reaction.
13. Gene Repression in Eukaryotes Explain why repression of a eukaryotic gene by an RNA might be more efficient than repression by a protein repressor.
14. Inheritance Mechanisms in Development A Drosophila egg that is bcd−/bcd− may develop normally, but the adult fruit fly will not be able to produce viable offspring. Explain. DATA ANALYSIS PROBLEM
15. Engineering a Genetic Toggle Switch in E. coli Gene regulation is o en described as an “on or off” phenomenon: a gene is either fully expressed or not expressed at all. In fact, repression and activation of a gene involve ligand- binding reactions, so genes can show intermediate levels of expression when intermediate levels of regulatory molecules are present. For example, for the E. coli lac operon, consider the binding equilibrium of the Lac repressor, operator DNA, and inducer (see Fig. 28-8). Although this is a complex, cooperative process, it can be approximately modeled by the following reaction (R is repressor; IPTG is the inducer isopropyl-β -D-thiogalactoside): R + IPT G Kd = 10−4M ⇌ R ∙IPT G Free repressor, R, binds to the operator and prevents transcription of the lac operon; the R • IPTG complex does not bind to the operator, and thus transcription of the lac operon can proceed. a. Using Equation 5-8, we can calculate the relative expression level of the proteins of the lac operon as a function of [IPTG]. Use this calculation to determine over what range of [IPTG] the expression level would vary from 10% to 90%. b. Describe qualitatively the level of lac operon proteins present in an E. coli cell before, during, and a er induction with IPTG. You do not need to give the amounts at exact times — just indicate the general trends. Gardner, Cantor, and Collins (2000) set out to make a “genetic toggle switch” — a gene-regulatory system with two key characteristics, A and B, of a light switch. (A) It has only two states: it is either fully on or fully off; it is not a dimmer switch. In biochemical terms, the target gene or gene system (operon) is either fully expressed or not expressed at all; it cannot be expressed at an intermediate level. (B) Both states are stable: although you must use a finger to flip the light switch from one state to the other, once you have flipped it and removed your finger, the switch stays in that state. In biochemical terms, exposure to an inducer or some other signal changes the expression state of the gene or operon, and it remains in that state once the signal is removed. c. Explain how the lac operon lacks both characteristics A and B. To make their “toggle switch,” Gardner and coworkers constructed a plasmid from the following components: ori An origin of replication ampR A gene conferring resistance to the antibiotic ampicillin OPlac The operator-promoter region of the E. coli lac operon OPλ The operator-promoter region of λ phage lacI The gene encoding the lac repressor protein, LacI. In the absence of IPTG, this protein strongly represses OPlac; in the presence of IPTG, it allows full expression from OPlac. repts The gene encoding a temperature-sensitive mutant λ repressor protein, repts. At 37 °C, this protein strongly represses OPλ; at 42 °C, it allows full expression from OPλ. GFP The gene for green fluorescent protein (GFP), a highly fluorescent reporter protein (see Fig. 9-16) T Transcription terminator The investigators arranged these components, as shown in the following figure, so that the two promoters were reciprocally repressed: OPlac controlled expression of repts, and OPλ controlled expression of lacI. The state of this system was reported by the expression level of GFP, which was also under the control of OPlac. d. The constructed system has two states: GFP-on (high level of expression) and GFP-off (low level of expression). For each state, describe which proteins are present and which promoters are being expressed. e. Treatment with IPTG would be expected to toggle the system from one state to the other. From which state to which? Explain your reasoning. f. Treatment with heat (42 °C) would be expected to toggle the system from one state to the other. From which state to which? Explain your reasoning. g. Why would this plasmid be expected to have characteristics A and B as described above? To confirm that their construct did indeed exhibit these characteristics, Gardner and c
16. Effect of mRNA and Protein Stability on Regulation E. coli cells are growing in a medium with glucose as the sole carbon source. A er the sudden addition of tryptophan, the cells continue to grow and divide every 30 min. Describe (qualitatively) how the amount of tryptophan synthase activity in the cells changes with time under each condition: a. The trp mRNA is stable (degrades slowly over many hours). b. The trp mRNA degrades rapidly, but tryptophan synthase is stable. c. The trp mRNA and tryptophan synthase both degrade rapidly.
17. The Lactose Operon A researcher engineers a lac operon on a plasmid but inactivates all parts of the lac operator (lacO) and the lac promoter, replacing them with the binding site for the LexA repressor (which acts in the SOS response) and a promoter regulated by LexA. She then introduces the plasmid into E. coli cells that have a lac operon with an inactive lacZ gene. Under what conditions will these transformed cells produce β -galactosidase?
18. Negative Regulation Describe the probable effects on gene expression in the lac operon of each mutation: a. Mutation in the lac operator that deletes most of O1 b. Mutation in the lacI gene that eliminates binding of repressor to operator c. Mutation in the promoter near position −10 that increases its similarity to the E. coli consensus sequence d. Mutation in the lacI gene that eliminates binding of repressor to lactose e. Mutation in the promoter near position −10 that decreases its similarity to the E. coli consensus sequence
19. Specific DNA Binding by Regulatory Proteins A typical bacterial repressor protein discriminates between its specific DNA-binding site (operator) and nonspecific DNA by a factor of 104 to 106. About 10 molecules of repressor per cell are sufficient to ensure a high level of repression. Assume that a very similar repressor existed in a human cell, with a similar specificity for its binding site. How many copies of the repressor would a human cell require to elicit a level of repression similar to that in the bacterial cell? (Hint: The E. coli genome contains about 4.6 million bp; the human haploid genome has about 3.2 billion bp.)
20. Repressor Concentration in E. coli The dissociation constant for a particular repressor-operator complex is very low, about 10−13 M . An E. coli cell (volume 2× 10−12 mL) contains 10 copies of the repressor. Calculate the cellular concentration of the repressor protein. How does this value compare with the dissociation constant of the repressor- operator complex? What is the significance of this answer?
21. Catabolite Repression E. coli cells are growing in a medium that contains lactose but no glucose. Indicate whether each of the following changes or conditions would increase, decrease, or not change the expression of the lac operon. It may be helpful to draw a model depicting what is happening in each situation. a. Addition of a high concentration of glucose b. A mutation that prevents dissociation of the Lac repressor from the operator c. A mutation that completely inactivates β - galactosidase d. A mutation that completely inactivates galactoside permease e. A mutation that prevents binding of CRP to its binding site near the lac promoter
22. Transcription Attenuation How would each manipulation of the leader region of the trp mRNA affect transcription of the E. coli trp operon? a. Increasing the distance (number of bases) between the leader peptide gene and sequence 2 b. Increasing the distance between sequences 2 and 3 c. Removing sequence 4 d. Changing the two Trp codons in the leader peptide gene to His codons e. Eliminating the ribosome-binding site for the gene that encodes the leader peptide f. Changing several nucleotides in sequence 3 so that it can base-pair with sequence 4 but not with sequence 2
23. Repressors and Repression How would a mutation in the lexA gene that prevents autocatalytic cleavage of the LexA protein affect the SOS response in E. coli?
24. Regulation by Recombination In the phase variation system of Salmonella, what would happen to the cell if the Hin recombinase became more active and promoted recombination (DNA inversion) several times in each cell generation?
25. Initiation of Transcription in Eukaryotes A biochemist discovers a new RNA polymerase activity in crude extracts of cells derived from an exotic fungus. The RNA polymerase initiates transcription only from a single, highly specialized promoter. As the biochemist purifies the polymerase, its activity declines, and the purified enzyme is completely inactive unless he adds crude extract to the reaction mixture. Suggest an explanation for these observations.