Evolving the Bacterial Flagellum Through Mutation and Cooption: Part V
In the previous essay (part IV), I attempted to present the non-teleological explanation for the origin of the bacterial flagellum, coincidental cooption of alternative functions (CCAF), in its strongest possible form. In figure 2, the evolution of the bacterial flagellum was nothing more than the gradual addition of parts over time. Yet there is one thing that is somewhat misleading about Figure 2. It only presents a picture of the evolution of a six-part IC system. But we've seen that the Ur-IC state of the bacterial flagellum likely entailed around 20 gene products. As a result, a figure outlining the gradual cooption of parts should contain at least 20 components (A-E, G-U), 38 pre-flagellar functional states (F1-F38), and 19 cooption events. In other words, to propose a series of IC systems that gradually increase in complexity, step-by-step, (a 3-part system becomes a 4-part system becomes a 5-part system, etc.), 38 non-flagellar functions are involved. One can attempt to cut down on the 38 functions by arbitrarily eliminating some (due to the ad hoc load of positing 38 unknown functions), but to do so, one must dip from simultaneous cooption (since we are constrained to explain the origin of these 20 functional sequences) and thus weaken the whole hypothesis [1].
When it is realized that the CCAF hypothesis is more involved than many think, various relevant insights will emerge. And the first important insight is that the EFM hypothesis fails to map to the gradual CCAF hypothesis. This is because the EFM hypothesis identifies only two cooption events, not nineteen. It also only attempts the identification of two preflagellar functions for flagellar components: a secreted protein (that somehow became the filament) and an ion channel (that somehow became the stator/motor). Two preflagellar functions is a long way from thirty-eight.
If we are to take the EFM hypothesis seriously, we should consider it in light of what we know about the Ur-IC flagellum and map the three basic components of the story to the players that did in fact come into existence. As such, the export machinery would be represented by the N-terminus of FliG and FliM, FliN, FliF, FlhB, FliQ, FliR, FliP, FliI, and FlhA. The "filament" would be represented by everything from the drive-shaft to the cap, including FliD, FliC, FlgE, FlgL, FlgK, FlgB, FlgC, FlgG, and FliE. Finally, the motor would be represented by MotA, MotB, and the C-terminus of FliG. Thus, the EFM hypothesis, as formulated, can offer us nothing more than a two-step simultaneous cooption scenario as shown in Figure 1.

Figure 1. The EFM Hypothesis. It begins with multi-component export machinery and invokes an initial cooption even to explain the origin of the filament. But because the "filament" of the flagellum is also a multi-component system, simultaneous, not gradual cooption is being invoked. Its non-flagellar function is not provided. The second cooption event, where an ion channel is added to create the flagellum, invokes the same thing.
If we are to map the EFM hypothesis to the Ur-IC state, it fails as a gradual cooption scenario. To rescue the EFM hypothesis from the pit of simultaneous cooption (i.e., random assembly), one must begin to tease apart the yellow, blue, and pink boxes above and assign autonomous functional roles to each apart from the complex. Only such an effort will move us towards the gradual route of Figure 2 from the previous essay. Until such a successful effort is made, the EFM hypothesis remains a story about simultaneous cooption, where nine parts are being added to a nine-part machine, which finally accepts the addition of another three-part complex [2].
The IC Grip
Not only does the EFM hypothesis fail to map to a gradual CCAF scenario, but it never truly escapes the grip of irreducible complexity. To appreciate this, let us envision the gradual formation of the flagellum in light of the EFM hypothesis and CCAF. Let us assume that the construction of the flagellum roughly reflects its evolutionary assembly through cooption (a molecular version of ontogeny reflecting phylogeny). After all, the EFM hypothesis adopts this strategy.
The Rings
When the flagellum is constructed, the first thing that is laid down is the M-ring (also called the MS-ring) composed of the FliF gene product [2]. After the M-ring is formed in the inner membrane, FliG is added. Finally, FliM and FliN are added, but they need the help of FliG. The result is the formation of the C-ring/switch complex/rotor. Yet here is the key point: "the switch complex forms prior to the other cytoplasmic substructures, including the export apparatus." [3] In other words, while the EFM begins with the export apparatus, in flagella, the switch must be formed first in order to form the export apparatus. Since the switch and rotor function only as part of the rotary flagellum, from the start the flagellum is constructed with its motility function "in mind." [4]
At this point, we simply need to ask what function the M-ring served at the beginning? When one BLASTs through the bacterial, eukaryal, and archaeal genomes, FliF has no homolog anywhere. FliF does function as part of the type III system, but as explained earlier [5], it has significantly changed and is probably incapable of substituting for flagellar FliF. It likely remains as part of the type III system to facilitate the construction of the export apparatus and is not directly involved in export. The bottom line is that FliF not only has no homolog, but its function is flagellum-specific (if we ignore the more recently acquired role in type III secretion). Thus, the EFM hypothesis begins with a protein ring in the inner membrane that has no function.
And the same basic story holds true for FliG, FliN, and FliM. These proteins have no flagellum-independent homologs and their function is flagellum-specific. And the FliF-FliG/FliF-FliG-FliM/N complexes formed by hypothetical cooption events have no function; no function emerges as a consequence of turning a protein ring into a four-part system. But any non-teleological scenario must explain the origin of this four-part C-ring that anticipates the rotary flagella before we can even turn to the export apparatus. Recall the key feature to any cooption explanation is to provide a function to an IC component that is not dependent on the IC system. Thus, FliF, FliG, FliN, and FliM require alternative functions that predated the flagellum. Yet there is none. We can always assert the existence of some unknown functions in an ad hoc fashion, but if the only reason for this is to rationalize an a priori belief that the flagellum evolved, this would hardly be a convincing move. In fact, if the flagellum was indeed designed, we could nevertheless always imagine a CCAF explanation when asserting unknown functions in such an ad hoc manner (more on this later).
Thus far, we are left with an M-ring without function. Then a four-part C-ring without function. Without functions for the independent parts and their progressive "intermediates," we are merely left with a veiled appeal to non-Darwinian random assembly. In other words, several unselected sequential steps are being proposed.
Export
The next step would be to add the export machinery. Recall this was the first step in the EFM hypothesis, highlighting how it never really got off the ground. Yet, the problem of IC also surfaces here. First, there are six conserved, independent type III components found in all flagella and type III secretory machines. This strongly suggests this is a six-part IC subsystem. In fact, this hypothesis has been supported by a recent genetic analysis of loss of function mutants involving all six components of the type III system from Salmonella enterica [6]. When the mutants were analyzed one at a time, it was determined that all six components are required for export function. Thus, the CCAF hypothesis must come up with alternative functions for all of these six-components. And since there is no evidence that a subset of these six-components can carry out any biological function, it appears again that simultaneous cooption (i.e., random assembly) is again being tacitly postulated. Of course, we can again imagine unknown functional states for all six independent components, and their progressive partial assembly, but this is yet another ad hoc move.
So far we have a 10-part assembly. Yet taking into consideration all the scientific data today, it would not be until the tenth part is added do we finally get a functioning "export apparatus." This means that one has been invoking non-Darwinian random assembly (unselected sequential additions) all along to explain the origin of almost one-half of the flagellum.
The Driveshaft
Finally, it's time to co-opt our filament. But now we're back to the problems discussed in Part II of this series [7]. If we count everything from the base of the rod to the cap, we're dealing with nine more proteins. We might be able to shrink the number by invoking gene duplication, but even this move would be debatable.
Let's first consider FliE. A recent study has provided good evidence for the role and position of this component [8]. FliE appears to form a junction between the M-ring and FlgB, which is the proximal component of the driveshaft (or rod). Its primary role appears to be architectural. As the authors of this study note:
The axial proteins form a long, continuous, hollow cylindrical structure consisting of the rod, hook, hook-filament junction proteins, filament, and filament cap. Not only is this structure important for the function of the flagellum as a motor organelle but its central channel or lumen is the physical pathway by which axial protein subunits reach their assembly destination, the tip of the growing flagellum. The component substructures are all built with the following common theme. The subunits lie on the so-called basic helix of a cylindrical lattice. This underlying local helical symmetry (not to be confused with the macroscopic helicity of the flagellar filament) means that, in principle, subunits could be added indefinitely like the steps of a helical staircase. This is in contrast to the substructures of the basal body such as the MS ring, which have closed annular symmetry and thus a fixed number of subunits (thought to be about 26 in the case of the MS ring protein, FliF).
The rod and MS ring appear to abut each other closely. How do substructures with fundamentally different symmetries join together? A specialized zone might facilitate the junction. FliE seems a possible candidate for construction of such a junction zone.
...We propose that the primary role of FliE may be as a structural adapter between the annular symmetry of the MS ring and the helical symmetry of the rod and all subsequent axial structures.That FliE functions to bridge the different symmetries of the M-ring and driveshaft makes sense in light of the flagellum as a functioning whole, and its possible role in a partially completed proto-filament, incorporated by cooption, is completely obscure. It wouldn't even span the periplasm. It's similar to explaining the function of lug nuts without the wheels of a car. FliE is also an unusual protein with respect to the other proteins that form the rod and beyond. It is the only gene in its transcriptional unit and alpha-helices run throughout the protein (alpha-helices in the other rod proteins are restricted to the terminal ends). All in all, the hypothetical cooption of FliE, essential for forming the remainder of the rod and filament, doesn't make any sense.
The interesting twist is that FliE should be the prime candidate for the "protein that fortuitously stuck" to the export apparatus as part of the EFM hypothesis. This is because it is the most proximal component of the rod, attached to the M-ring. Yet it does not form a filament. In fact, the authors in the study cited above note, "we propose that FliE not be called a rod protein, since it differs in so many ways from the rod and other axial proteins."
And it gets more interesting. FliE is not part of the type III secretion machinery. This indicates that as part of the evolutionary transition from flagellum to type-III secretion, it was lost. This, in turn, indicates that FliE function is flagellum-specific. The export machinery works fine without it. It is only needed in light of the flagellum as a functioning whole.
We now have an 11-part system that appears to have no functional advantage over the original 10 part "export system" strung together by random assembly. We can now begin to add the rod. FlgB is supposedly coopted and added to FliE and begins forming the "basic helix of a cylindrical lattice." But here things get tricky when trying to collapse the gram-negative bacteria with the gram-positive bacteria. The latter group merely possess a thick peptidoglycan cell wall outside the membrane, while the former has a more complex arrangement including a second outer membrane (the periplasm being the space between the two membranes). The problem faced with coopting a drive-shaft in a gram-negative bacterium is that a partial rod that does not span the periplasm and penetrate the cell wall and outer membrane provides no obvious utility. This seems to be thorny problem that will escape our analysis since the Ur-IC scoring assumes all eubacteria are related through a common ancestor and thus factors out the gram-negative-specific features.
Regardless of the fuzziness that comes from treating gram-negative and gram-positive flagella as the same, let us consider the rod (driveshaft) composed of flgB, flgC, and flgG. Both flgB and flgC are small proteins with approximately 130 amino acids in all five of the distantly related bacteria used to score the Ur-IC state. FlgG, the most distal component of the rod, has an average size of about 260 amino acids in the same five species.
An interesting feature of all three proteins, among all five species, is that no cysteine residues are found anywhere. Cysteine is one of the more rare amino acids found in proteins. Yet, if one surveys the codon usage in these five bacteria, we would expect 1.1% of the amino acids in an "average protein" to contain cysteine. There are 3156 amino acids among all 15 of these rod proteins (from the five species). Thus, we might expect to find 34 cysteines, but there are none. This is most interesting when we consider that all proteins have many amino acids that can mutate to cysteine with a single base pair substitution. In contrast, using the same codon usage data, the average amount of glycine residues we might expect to find is 217 (6.88%) and the 15 rod proteins actually contain 198 glycines. Thus, it would appear that the rod proteins are somewhat atypical as far as bacterial proteins go.
The stoichiometries of these three proteins tell an interesting story [9]. Both flgB and flgC are present at about 6 copies per flagellum. FlgG is present at about 12 copies per flagellum. If the known helical symmetry of the filament, at 5.5 subunits per turn, extends into the rod (and this seems quite plausible), then the three components of the rod are ordered such that the most proximal one, flgB, forms one turn, followed by flgC and another turn, followed by flgG and two turns. Since a helical structure has no inherent constraint on assembly ("in principle, subunits could be added indefinitely like the steps of a helical staircase"), this suggests some factor extrinsic to these proteins is regulating this assembly in such an ordered fashion. And this raises the issue of IC.
It would seem there is no reason why the rod should be built around three proteins instead of simply one. Yet these three gene products are found in all flagella, dating back to the putative ancestral flagellum. This suggests one protein is not sufficient to form a functioning flagellar rod. Furthermore, the size of these proteins among these five distantly related bacteria has been held relatively constant (Fig 2), despite billions of years of experiencing very different selective pressures. It would seem some form of constraint or specification is at work, as natural selection will not tolerate too much deviation. And these size constraints map back to the last common ancestral flagellum, indistinguishable from the first flagellum.

Figure 2. Protein size among rod components in five distantly related bacteria.
The rod proteins all share the following features: they are secreted by the type III export machinery; they all lack cysteine; they have maintained a relatively uniform size despite long periods of different selective pressures; they have no function apart from the flagellum; and they show an intriguingly ordered arrangement in forming the rod. In other words, if we are to find a cooptable part, and look in a way that is informed by the data, we need proteins that satisfy these specifications.
Recall that the EFM hypothesis proposes that a previously secreted protein was coopted to become the filament. Since flgB is the most proximal component of the rod, it is the most likely candidate for the originally coopted part. To get a rough feel for whether or not these specifications would be satisfied by a typically secreted protein, I compared the size and cysteine content of flgB with 18 proteins secreted by the type III apparatus of Yersinia, Salmonella, and E. coli as listed by Hueck [10] (Figure 3).

Figure 3. FlgB and proteins secreted by type III export machinery. Avg. FlgB size shown in black. Proteins containing cysteine in red.
Since 8/18 virulence proteins contain cysteine, the exclusion of cysteine does not seem to be a prerequisite for export. The fact that flgB lacks cysteine may just be a common feature of small proteins, as no protein smaller than 200 amino acids contained cysteine. However, recall that flgG also lacks cysteine, although its average size is 260 amino acids, and secreted proteins ranging from 212-343 amino acids all had cysteine. In fact, the cysteine content among these five proteins is 0.92%, indicating they reflect the cysteine content of a typical bacterial protein. FlgG thus appears atypical in this respect.
When we turn to protein size, it appears that flgB is atypical of secreted proteins. Its average size among the five distantly related bacteria is 131 amino acids +/- 6. Only one of the 18 proteins falls in this size range, yopJ, which induces apoptosis in cultured mouse macrophages.
Thus far, we can see that although these 18 proteins shared in the ability to be exported by type III machinery, as a group they don't satisfy the specification for excluding cysteine, indicating that this specification is not a function of the export process itself. Neither is there any apparent bias towards protein size that would indicate flgB was coopted from a typical secreted protein. When this is coupled to the fact the flgB shows no homology to these 18 proteins, and that none of these secreted proteins form filaments (as far as I know), the hypothetical cooptable part that supposedly gave rise to flgB would appear to have had some rather flgB-specific properties.
Gene Duplication
A plausible explanation that could account for the rod proteins (flgB, flgC, and flgG) is gene duplication. That is, originally something like flgB may have been coopted and then expanded by gene duplication to give rise to these three gene products. This would have occurred prior to the last common ancestral flagellum, since all flagella have these three gene products. There are many lines of circumstantial evidence to support this hypothesis. These genes are found clustered together in the same operon. FlgB and FlgC are the same size, while flgG is essentially twice the size of either flgB or C (suggesting a gene fusion, followed by a duplication). And sequence similarities between all three have been previously proposed [11].
Yet despite all this evidence, the picture is rather ambiguous. The reason being that the similarities mentioned above may only reflect functional features rather than a historical origin. In fact, when researchers first sequenced these components, they expected similarities for these reasons alone:
Some degree of similarity among the sequences of the axial proteins would not be surprising, for two reasons. The first is that, since the axial proteins together form a continuous filamentous structure, we might expect them to have a similar lattice and to share common structural elements that determine the lattice....The second reason for looking for sequence similarities among the axial components concerns the manner in which they are thought to be exported across the cell membrane.
[12]Again, the similarities may simply be a consequence of the design and assembly of this type of rod-structure. While gene duplication is commonly used to explain the origin of IC (as in the case of the vertebrate blood clotting system), this may be a case when IC renders gene duplication an implausible explanation. Recall that flgB, flgC, and flgG are found in all bacterial flagella. Apparently, loss of one gene product cannot be compensated by the presence of the other structurally similar gene products. And recall the way they are laid down: six copies of flgB form one turn of the helical lattice, then six copies of flgC form the next turn, then twelve copies of flgG form two more turns. Is this crucial? Recall also that all subunits are needed to form a rod that can span the length of the periplasm. If a hypothetical ancestral rod was homogenous, being composed of only one gene product, it is difficult to envision the selectable nature of turning it into such a highly ordered, heterogenous structure. Without such a demonstration, there is no need to go beyond the default position that such similarities reflect functional constraints only.
If we turn to the sequence similarities themselves, the functional hypothesis is strengthened. The rod proteins tend to be most conserved at the N-terminal and C-terminal ends. This makes sense in light of their assumed assembly, where the C-terminal end of one protein interacts with the N-terminal end of the next protein. We can imagine that the C-terminal end of flgB must interact with the N-terminal end of flgC to form the helical lattice (where the transition from flgB to flgC occurs). However, the C-terminal end of flgB must also be able to interact with the N-terminal end of an adjacent flgB to form the first turn of the lattice. Thus, it is not surprising from an engineering perspective that similarities in structure and sequence would be found. Nevertheless, the sequence similarities are not that convincing.
First, I used E coli sequence to BLAST bacterial genes. When I searched with flgB sequence, 33 similar proteins with E values less than 10-4 were scored (anything with larger E-value is probably due to chance). All 33 were flgB homologs from other species. When I searched with sequence from E. coli flgC, 36 hits were uncovered, again all of them being flgC homologs.
I then used ClustalX to align the sequences of flgB and then flgC from the five species representing the Ur-IC state. ClustalX will align and score with three categories: completely invariant positions (the same amino acid in the same position); conserved with "strong" amino acids (sequences with an amino acid at the same position that belongs to a class with very similar biochemical properties) and; conserved with "weak" amino acids (sequences with an amino acid at the same position with similar biochemical properties). Let's call these the Invariant Positions (IP), the Conserved with Strong Positions (CWSP), and the Conserved with Weak Positions(CWWP).
Table I. Clustal Scoring with the first two flagellar rod proteins in Aquifex, Bacillus subtilis, E. coli, Thermotoga, and Trepenoma.
|
Alignment |
IP |
CWSP |
CWWP |
|
flgB |
11 |
14 |
14 |
|
flgC |
19 |
20 |
14 |
|
flgB, flgC |
2 |
4 |
7 |
It can be clearly seen from Table I that while flgB and flgC show decent sequence conservation (about 30% in flgB and 40% in flgC), it is mostly lost when the two genes are aligned together (down to 10%). In other words, there is decent sequence conservation within the gene groups that is essentially lost between the gene groups. This again supports the IC hypothesis, indicating that flgB and C function is not redundant and entails separate sets of information not found in some stem gene at the base of a gene duplication expansion.
If we turn to flgG, its larger size requires more than a simple duplication account. A possible explanation that takes advantage of the fact that flgG is twice the size of flgB/flgC is shown in Figure 4.

Figure 4. Hypothetical evolution of flgG from an flgB/C-like protein. We begin with an 130 amino acid flgB/C-like protein whose N-terminal and C-terminal ends are conserved. Then at step 1, gene duplication occurs and fuses the gene in tandem such that is now expressed as a 260 amino acid protein. During step 2, the conserved residues of the C-terminal end of the first copy and the N-terminal sequence of the second copy is lost by mutation, leaving a larger protein with conserved C-terminal and N-terminal ends necessary for lattice formation.
The scenario outlined in Figure 4 would predict that the central region of flgG functions primarily as a spacer merely to create a larger protein. When one aligns flgB or flgC, most sequence similarity is found in the C-terminal and N-terminal 40 amino acids. Thus, we could roughly represent the sequence of both flgB and flgC (both being about 130 amino acids) as 40-40-40, where the red signifies the conserved regions. According to the scenario in figure 4, the original fusion would then look like 40-40-40-40-40-40 and then evolve into 40-40-40-40-40-40. If flgG arose in this fashion, we would expect to find the sequence conservation retained in the two ends and no significant homology in the central 160 amino acids. However, when the five flgG sequences are aligned and scored, 22 invariant and conserved positions are found in the C-terminal 40 amino acids, 19 are found in the N-terminal 40, and 51 are found in the middle 160 amino acids. While the ends are more strongly conserved, the middle region still shows 32% sequence identity, suggesting it is not simply function-less spacer.
Next, I decided to look if these 51 conserved positions in flgG are explained by a fusion of flgB, flgC, or a fusion of both. There are four possible fusion products: the C-terminus of flgB fuses with the duplicated N-terminus of flgB, the C-terminus of flgC fuses with the duplicated N-terminus of flgC, the C-terminus of flgB fuses with the duplicated N-terminus of flgC, and the C-terminus of flgC fuses with the duplicated N-terminus of flgB. I attempted to align each possible fusion product with an alignment of the central 160 amino acids of flgG (flgG160). Nothing significant was found. Two alignments tried to fit the fusion between positions 20-110 of flgG160, one attempted to fit between regions 80-180 of flgG160, and the final one attempted to fit between regions 20-90 and 120-150. No alignment turned up any invariant positions or more than three "conserved" positions. Furthermore, around position 120 of flgG we find a consensus sequence of YTRDGSF. A search of flgB and flgC sequence with permutations of this sequence (YTR, TRD, RDG, DGS, GSF) failed to turn up any matches.
To summarize what this means, the central region of flgG contains roughly 50 conserved residues when the five genes from very distantly related bacteria are aligned. That they have been conserved for so long suggests they are functionally important. Yet this information is not found in flgB, flgC, or their possible fusion. What about ends? They do appear similar to those of flgB and C. However, when either gene product is aligned against flgG, similarity scores are much worse compared that the alignments of the individual genes themselves (Table II).
Table II. ClustalX alignment scoring. IP = invariant position; CWSP = conserved with "strong" amino acids; CWWP = conserved with "weak" amino acids. Residues in C-terminal and N-terminal 40 amino acids are counted only.
|
Align |
IP |
CWSP |
CWWP |
|
FlgB |
11 |
13 |
14 |
|
FlgC |
17 |
15 |
9 |
|
FlgG |
12 |
23 |
7 |
|
FlgG and FlgB |
3 |
9 |
11 |
|
FlgG and FlgC |
1 |
11 |
9 |
To conclude, while there are similarities between these three gene products, there is no need to posit gene duplication the account for them given a purely functional explanation is sufficient.
NEXT: Continued analysis
Citations