Evolving the Bacterial Flagellum Through Mutation and Cooption: Part VI
(by
Mike Gene (
The first five essays in this series have taken a critical look at the explanation of coincidental cooption of alternative functions as part of the EFM hypothesis to explain the origin of the bacterial flagellum. These were written a couple of years ago and since then, N. J. Matzke has provided a more lengthy defense of this hypothesis [1]. The most significant updates that Matzke adds to the story are the suggestions that 1) the Type III secretory machinery is homologous to the F0F1 ATP synthetase and 2) the flagellar motor is homologous to the ExbBD system. This essay will address the homology of the ATP synthetase and the flagellum. A later essay (or addendum to this one) will address the motor.
Homology, Design, and
Analogy
Many critics of ID insist that we cannot reliably detect design without knowledge about the designers. If it is true that the ability to infer/detect design is dependent on previous knowledge about the identity, motives, and methods of the designer, then we have simply uncovered yet another limitation of science. This would mean that even if the flagellum was indeed designed, science would not be able to recover this truth. And since the human brain is quite uncomfortable with things that are left unexplained, science would be obligated to come up with an alternative explanation.
This takes us to historical narrative. Historians have long noted that to one degree or another, the historical narrative is partly a function of the story teller. Thus, if we tell a story about the distant past without the ability to detect design, it stands to reason that our story will not contain design. It’s not that design is ruled out. It’s just never figured into the narrative. There is nothing inherently wrong with this. But many people overlook this simple fact that often times confuse the story with the actual past. The story may reflect the actual past. Then again, it may not. If the critics of design are correct, we have no way of knowing.
A nice illustration of this concerns the concept of homology as it relates to gene products in organisms whose origin are obscure. Homology, by definition, is a historical claim. It asserts that similarity is traced to common descent, that is, a historical lineage. Now, if we have no way of detecting design, then similarities can only be explained in the following ways: the outworking of natural law, convergent evolution, coincidence, or common descent (homology). To detect homology, we can simply rule out the three other alternatives. How do we do this? With an informal method not all that different from Dembski’s Explanatory Filter. We intuitively rule out coincidence and natural law/convergence if the similarities are too many (specification) amidst a large complexity. Thus, for example, Matzke thinks that the sequence similarity between FliI and the beta/alpha subunits of the F0F1 ATP Synthase “prove” homology. The similarities are too many to explain by coincidence and convergence and natural law is not known to channel any sequence into a specific arrangement.
Yet homology is “proved” without consideration of design (for, if most of the critics of design are right, science cannot detect it). If we consider the possibility of design as an explanation, it’s hard to see how similarities amidst complexity rule out design. The logic of this would involve the notion that any designer would never reuse a design in any other form or any other context. The method used to detect homology in a non-teleological matrix would force us to assume that any designer is constrained to use completely unique designs in every instance. In other words, the very analysis we need in order to detect homology without consideration of design itself fails to distinguish homology from design. None of this invalidates the homology inference within the non-teleological matrix, but it is important to remember this consideration when weighing homology-claims used in a context of design vs. non design (something science does not address).
When we turn to Matzke’s argument, in many places he attempts to draw out various predictions from this hypothesis of homology. For example, he writes:
In FliH it
is known that the C-terminal region associates with N-terminal region of FliI
(Gonzalez-Pedrajo et al., 2002), but the region responsible for membrane
association is undetermined (Auvray et al., 2002); F0-b – FliH
homology would predict that the FliH N-terminus associates with the
membrane.
Yet how does homology predict this? Why does homology predict this? I see no reason why evolving FliH from F0b entails that the N-terminal membrane spanning region should remain intact or that the N-terminal region should still interact with the membrane. Is there a rigorous background of independent information about the evolution of F0b-like proteins that would lead us to predict this? Do we have independent evidence or reason to think there was a strong selective pressure that would work to retain such a sequence or association? Why couldn’t FliH have simply lost the N-terminal membrane spanning region if it was not needed? Or if suppressor-type mutations among interacting proteins rendered this function needless in the context of the newly evolving function?
Consider another example.
Since FliI has been “proven” to be homologous to alpha/beta, homology
predicts that the other members of the ATP synthetase would be represented in
the basal body of the flagellum. This
prediction would then serve as the driving impetus behind Matzke’s attempt to
explain the early steps in the evolution of the flagellum. But the transcription termination factor
It should be clear that Matzke is not using homology to make the prediction. He is using analogous reasoning, where if two things are similar in some respects, we conclude that they are probably also similar in some further respect. For example, if we return to the F0b/FliH prediction, since F0b has property X, Y, and Z, and fliH has property X and Y, Matzke is predicting it will also have property Z. Homology is not needed nor capable of making this prediction (due to the contingencies of evolution). Thus, the success of the prediction simply strengthens the analogy. From here, one can interpret the meaning of the analogy in different ways. Matzke would use the strengthened analogy to increase his sense of conviction about his homology inference. Yet that interpretation is not mandated by the analogy. Analogy also comes into play in relation to a design inference.
Return to the FliI gene product that shares 30% sequence identity with the alpha/beta subunits of the F0F1 ATP Synthase. Matzke is convinced this “proves” homology (see Table 5 from his analysis). I regret not making a public ID prediction about about FliI forming a wheel-like oligomer earlier. Let me explain. At first, the essence of the similarity between the FliI and alpha/beta subunits appeared to revolve mostly around the same ATP-binding domain. Yet this doesn’t seem too useful in distinguishing between common design and common descent, as one might expect ATP-binding motifs to be reused in different contexts from either perspective. But what seemed to indicate a hodgepodge use of this motif was that it appeared to be used as a monomer in the flagella. There didn’t seem to be any type of design logic behind its reuse, thus tipping the scale slightly towards common descent.
Looking back, it should be clear now. First, ATP binding on
the F-ATPase takes place at the interface between the two subunits. Having fliI work as a monomer spoke to a jury-rigged solution.
Second, and more importantly, another bacterial protein complex shares the same
ATP-binding architecture –
The point here is that the use of analogy is clearly not out of place from the context of ID thinking. Thus, a strong analogy between the TTSS and F0F1 ATP Synthase would pose no serious obstacle for a design perspective.
However, analogy is also being used to propose a plausible precursor state that in turn is required for a scenario for the evolution of the flagellum. This fits one of Matzke’s main objectives behind the proposal of this scenario – to defeat the notion that the flagellum could not possibly evolve because it is irreducibly complex (my use of IC is different [3]):
For some time, advocates
of "Intelligent Design" (ID) have been promoting Mike Behe's
"irreducible complexity" argument. Behe argues that biological
systems with multiple required components could not have evolved gradually,
because intermediates lacking components would be nonfunctional. The argument
has been answered in general terms numerous times…and detailed treatments of
the evolution of specific "irreducibly complex" biochemical systems
are now available in the case of blood clotting….and the immune
system….However, the ID movement's favorite example of irreducible complexity,
the bacterial flagellum, has not received similar treatment. The bacterial flagellum
was only discussed briefly in Behe's Darwin's Black Box, but perhaps
because of the counterarguments and literature available on the evolution of
other systems, the bacterial flagellum soon became the favorite example of
irreducible complexity, and has ascended to near-iconic status for the ID
movement.
In this sense, analogy can be used to counter claims that precursor states could not possibly exist and thus call into question unevolvability claims. Those favoring the unevolvable position would have to demonstrate that Nic’s analogies are either irrelevant and/or fail to support claims of the flagellum’s evolvability.
Shaky Beginnings
Since the evidence indicates the TTSS evolved from the bacterial flagellum
[4] and Matzke essentially agrees, his scenario needs
another candidate for the necessary precursor.
He thus argues that the TTSS machinery embedded within the flagellum is also homologous to the F0F1 synthetase,
something that could plausibly predate the flagellum. Matzke begins his model with the assumption
that gram-negative bacteria are ancestral. His scenario then invokes an ancestral FliF
channel that functioned as a passive transporter and was then later converted
into a more specific channel with the subsequent association of a proto-FlhA
and/or FlhB. He
then posits the F1F0 ATP synthetase was co-opted into this FliF/FlhA/FlhB
channel in its entirety:
The hypothesis that the
entirety of a primitive F1F0-ATP synthetase may have been
coopted in toto into a primitive gated pore (proto-FliF and proto-FlhA/B) is
certainly provocative; it would explain at a stroke the origin of most of the
type III export apparatus and provide a phylogenetically basal precursor to the
flagellum even though clearly basal type III secretion systems remain
undiscovered.
This event was supposed to have converted the
passive transporter into an active transporter.
While the model up to this point is interesting, it is both vague and
unsupported. There is no evidence of any
homolog for FliF, FlhA, and FlhB, let alone that they ever originally
functioned as Matzke envisions. And the
important details outlining how an F-ATP synthetase gets plugged into the
FliF-ring, converting it into an “active transporter,” are missing. The F-ATP synthetase can actively transport
protons. And the vast majority of
passive transporters handle small molecules (such as monosaccharides, amino
acids, and ions). But Matzke’s model is
supposed to convert this into a protein
transporter. As such, one would think
that such an event might simply plug up the FliF ring (and whatever it was
transporting) and thus hinder the putative original transport events the
proto-FliF ring was supposed to provide.
Nevertheless, this is the first cooption event
invoked in Matzke’s model. Yet how do we
know this is not simply an ad hoc invocation of cooption? I have encountered several people who like to
argue that since cooption is a common evolutionary event, there is no problem
here. But that is not a scientific
argument. For example, just because
point mutations are even more common does not support the contention that FliF
arose from some globin gene by a series of point mutations retained by
selection (which is also just as common).
So let’s ponder the commonality of cooption as it
involves the ATP synthetases. They are
universally present among bacteria. So
too are the passive transporters. So,
was there something special about the cooption of the ATP synthetase into a
complex with FliF/FlhA/FlhB? Or have ATP
synthetases commonly formed complexes with all sorts of passive transporters
over evolutionary time?
A further problem with this part of the hypothesis
is that Matzke assumes a flagella-like M ring composed of FliF prior to the
evolution of anything like the flagellum.
This assumption is evident from his estimation that the F0F1 complex
could gave fit into proto-M ring and is illustrated in his Figure 7 (1c). FliF is a large protein, composed of over 500
amino acids, and approximately 26 copies associate with each other to form a
ring structure in the membrane [5]. Such
a large ring, with a patch of membrane, could probably house an ATP synthetase,
but why would the proto-FliF passive transporter form such an odd structure? Do passive transporters typically form such
large ring complexes in the membrane? Without details at this point, the model
is impossible to assess at this point.
What is assessable, however, is the claim that the
model explains “most of the type III export apparatus.” For the moment, let us ignore FliI, as it is
understandable how one might infer homology with the alpha/beta subunits. That leaves us FliF, FlhA, FlhB, FliP, FliR,
FliQ, FliH,
FliJ, and FliO. Since Matzke’s model
explains the last six of these nine players, might get the impression that most
of the TTSS has been explained. But this
overlooks the fact that FliF, FlhA, and FlhB are the three largest components
among this list. Using the sequences
from the most distantly related bacteria with flagella (Aquifex, Bacillus subtilis, E
coli, Thermotoga, and Trepenoma),
the average size for FliF, FlhA, and FlhB was 536, 686, and 368 amino acids,
respectively. In contrast, FliP, FliR,
FliQ, FliH, FliJ, and FliO are composed of an average of 245, 260, 89, 247,
140, and 136 amino acids respectively [for FliO, I used sequence from Yersinia, E coli, Caulobacter, Nitosomonas,
Vibrio, and Pseudomonas; for FliJ I used sequence from B subtilis, E coli, Yesinia, Caulobacter, and Clostridium; for FliH, I used sequence from B subtilis, E. coli, Clostridium, Trepenoma, and Thermotoga]. These three components are not universally
present in bacteria with flagella. This would mean that Matzke’s model explains
only 40% of the sequence among these nine components. And if we consider only those components
among the IC core [6], it is worse.
FliH, FliJ, and FliO are not essential for flagellar synthesis and
function. FliH and FliJ null mutants are
still able to synthesis flagella and FliO is largely restricted to
proteobacteria. In this case, Matzke’s
model would only explain roughly 28% of the sequence.
Let us return to FliI. Matke’s model begins with a functioning F0F1
ATP synthetase and looks to this complex as the source of the TTSS machinery
that includes FliI. Yet earlier, Matzke
explains the origin of FliI:
It diverges before the F1-α
and F1-β split in sequence similarity trees, and thus probably
also diverged prior to the cenancestor (Gogarten and Kibak, 1992).
However, it is more similar to the F1 subunits than the more
distantly related hexameric ATPases such as the RNA/DNA helicase termination
factor rho (Boyer, 1997), and therefore Gogarten and Kibak (1992) conclude that
the FliI family diverged specifically from a primitive F1-ATPase
prior to the cenancestor.
Something doesn't add up here. Traditional views envision FliI splitting from the ATPase subunits of the F1 subunit prior to the duplication of such a subunit into the alpha and beta forms. As Matzke notes, “Gogarten and Kibak (1992) conclude that the FliI family diverged specifically from a primitive F1-ATPase prior to the cenancestor.” FliI is more similar to the beta form than the alpha form, where it to has 29% identity with beta and 25% identity with alpha [7]. That is, FliI supposedly split off before the F0F1 ATP synthetase, as we know it, formed and retained a greater similarity with what would become the beta subunit (both having ATP hydrolysis activity). Yet Matzke also notes that the duplication/divergence into alpha and beta “is shared by all bacteria and is also found in the archaeal A-ATP synthase and eukaryote V-ATP synthase, so F1-α and F1-β are thought to have diverged before the cenancestor.”
When Matzke proposes that the F0F1 ATP synthetase was coopted “in toto into a primitive gated pore,” it is not clear
whether he envisions that the split into alpha and beta subunits has already
occurred. If it had, then we would
predict that the flagellar specific ATPase (FliI) would exist in these two
forms, forming a heterohexamer, and thus further strengthening the analogy
between the two systems. However, this
is not the case. This version of Matzke’s model would thus predict that the
homolog for the alpha subunit was lost (since FliI is more similar to the beta
subunit). The problem here is that the
F0F1 beta subunits, by themselves, have not been measured to have any ATPase
activity [8,9] and single point mutations in the alpha subunit can result in
dramatic declines in ATPase activity [10,11]. This
explains why both the alpha and beta subunits are universally present among all
F0F1 ATP synthetases, as apparently the alpha and beta interactions (that form
the active site) are so tightly fitted that even significant ATP hydrolysis
does not occur without both. Losing the
alpha subunit does not appear to be a persuasive route to take.
The more plausible scenario would be to propose
the cooption of a synthetase prior to the evolution of its alpha and beta
subunits (and their integration through subfunctionalization). But then, there is no reason to think a synthetase actually existed, as the beta and alpha split are universal features of synthetases. In other words, the candidate for cooption evaporates and there is no basis for expecting any homology between the synthetase and flagellum.
Yet Matzke’s model envisions
the flagellum evolving in a standard gram-negative bacterium with an F1F0 synthetase:
The present model will
begin with a reasonably complex bacterium, already possessing the general
secretory pathway and type II secretion system, as well as signal transduction,
a peptidoglycan cell wall, and F1F0-ATP synthetase.
Nevertheless, even with these points in mind, the real substance of Matzke’s model concerns his argument that the F-ATP synthetases are homologous to the TTSS. Let us consider his argument.
Weighing the Similarities
FliH/F0b
If FliI is a homolog of the alpha/beta subunit of the ATP synthase, might the rest of the synthase machinery also share homology with the TTSS? Matzke uncovers some circumstantial evidence that certainly suggests this might be the case. His best example is FliH, a protein that works to inhibit the ATPase activity of FliI. He proposes that FliH is a homolog of the b subunit of the ATP synthase (F0b). In the ATP synthase, the b subunit functions to tether the F0 membrane complex with the F1 catalytic complex. BLAST searches fail to turn up significant matches between FliH and F0b, nevertheless Matzke is able to gather some suggestive circumstantial evidence:
1. Both FliH and F0b form dimers and exist as elongated structures. F0b does have a membrane spanning sequence to anchor it in the plasma membrane and FliH can associate with the membrane.
2. The Fob dimer forms a complex with the alpha(3)beta(3) hexamer while the FliH dimer forms a complex with FliI. Matzke observes, “ If the FliH2 homodimer associates with the FliI6 complex in vivo, all of this begins to look suspiciously similar to the association between the F1F0-ATP synthetase F1-α3β3 and F0-b subunits: two elongated F0-b subunits form a dimer and interact with F1-α3β3.”
3. The C-terminal domain of F0b associates with the the alpha(3)beta(3) hexamer and the C-terminal domain of FliH associated with fliI.
4. A search of NCBI’s CDART based on FliH does retrieve F0-b as a result with similar domain architecture (using the default e-value cutoff of 0.01).
5. The Yersinia pestis FliH homolog YscL has low but significant sequence similarity with the e subunit of the archaeal ATPase of Methanococcus jannaschii and the e subunit of the vacuolar ATPase of Desulfurococcus spp. (these subunits could be archaeal homologs of F0b).
It is understandable that Matzke kicks off his argument with this example. Not only does it make most sense to look for homologs among the players that might interact with FliI (given the hypothesis of homology that relates the two), but the evidence he is able to accumulate is the strongest for this example. Nevertheless, a closer look undercuts much of its persuasive appeal.
The corner stone to this hypothesis of homology is the formation of the FliHFliI heterotrimer as Matzke notes this arrangement looks suspiciously similar to the ATP synthetase arrangement of the alpha/beta subunits with the b subunit. Yet Matzke is envisioning a complex between six copies of FliI and two copies of FliH:
However, this observation
is equally well explained if FliJ is a required part of a FliI6FliH2
complex essential for export.
The problem is that the heterotrimer complex that Matzke builds on is actually between one monomer of FliI and two copies of FliH [13]. The heterotrimer that forms was measured to be 106 kD, a good match for a single copy of FliI (51 kD) and the FliH dimer (52 kD). The complex Matzke has in mind would measure over 350 kD. This is a huge oversight, as it means the appearance of similarity between the two systems has been seriously undercut.
This difference may also be functional. The b subunit plays a role as a stator and has little effect on the ATP binding/hydrolysis activity of beta (that role is largely reserved for the gamma and epsilon subunits). FliH, on the other hand, reduces the ATPase activity of FliI over 10-fold when it binds to the monomer, and is thus considered a regulator of FliI activity [13]. It has been recently determined that FliI forms a homohexamer in an apparently ATP-dependent fashion [14]. This raises the possibility that FliH may be binding to FliI monomers to prevent their oligomerization until the complex is associated with the rest of the type III machinery. Upon interacting with the cytoplasmic domains of FlhA/FlhB [15], it could be dislodged, allowing FliI to oligomerize around the base of the flagellum to begin protein export through its 3 nm internal hole. This is consistent with the observation that overexpression of normal FliH has a powerful inhibitory effect on protein export that is relieved by overexpression of FliI [15]. Excess FliH could be trapping FliI in its monomer state and preventing formation of the fliI homohexamer.
A closer look at FliH and the b subunit may underscore the differences in complex formation and function. A significant difference between FliH and the b subunit is size. A survey of various distantly related bacteria [Aquificales (1); Bacillaceae (2); Clostridiales (1); Betaproteobacteria (1); Enterococcaceae (1); Epsilonproteobacteria (2); Gammaproteobacteria (5); Spirochaetaceae (2); Thermotogaceae (1)] showed FliH to have an average size of 261 amino acids (+/- 31) while the b subunit had an average size of 160 amino acids (+/-10). Figure 1 illustrates that the size of the b subunit had been extremely conserved, while the size of FliH ranges from 208 to 316 amino acids.

Figure 1. Amino acids
content of FliH (left) and the b subunit of the F0F1 ATP synthetase (right). Bacteria surveyed were (from left to right): Vibrio, Pseudomonas, Helicobacter, Nitrosomonas ,
Bacillus, Clostridium, Escherichia, Treponema(left)/Aquifex(right), Thermotoga,
Borrelia (left)/ Enterococcus (right),
Campylobacter, Oceanobacillus,
Salmonella, and Shewanella. Aquifex
has no FliH gene and both Treponema
and Borrelia do not have the F0F1
ATPase.
FliH clearly has about 100 more amino acids not accounted for by the b subunit, weakening the hypothesis of homology. The most likely difference would be in the C-terminus. While both FliH and the b subunit are elongated dimers composed mostly of alpha helices, the C-terminus of FliH appears to have a spherical structure, where Macnab’s lab envisions FliH as a round flask with a long neck [15]. Secondary structure analysis [16] of a gram positive and gram negative bacterial sequence supports this difference (Figure 2).

Figure 2. Secondary structure analysis of the ATP synthetase b
subunit and FliH in E. coli and B. subtilis. a. E
coli b subunit; b. B. subtilis b subunit; c. E. coli FliH; d. B. subtilis FliH.
Blue hashes represent alpha helix, red hashes represent beta sheet, and purple
represents coiled regions.
In the C-terminus of FliH (after position 100), there are two stretches of coil-beta-coil sequence completely lacking in the b subunit, consistent with a different tertiary structure for the former.
This difference is most significant given that it is the C-terminus of FliH that strongly binds to FliI [17]. In fact, all serial 10-amino-acid deletions after position 100 result in a failure of FliH to bind FliI [17]. What makes this most significant is that it is the C-terminus interaction of FliH with the N-terminus of FliI that is supposed to signal homology. Yet the FliH C-terminal sequence that mediates the interaction is least likely to originate from the b subunit of the ATP synthetase. Complementing this conclusion is a consideration of the N-terminal region of FliI, which Minamino et al. describe as the “flagellum-specific region” that “shows no similarity to beta” [18]. FliH-FliI interactions look very different from the b subunit interaction with alpha.
Two other facts underscore the different nature of FliH-FliI and F0b-F1
subunit binding. First, the C-terminus
of the b subunit has a distinct binding domain for the delta subunit [19]. While FliH alone forms a stable complex with
FliI, the b subunit must form a complex with delta in order to stably bind to
the beta-alpha heterohexamer. In one
experiment with the b subunit from E.
coli, removal of merely one or four amino acids from the C-terminus
resulted in a dramatic decrease in binding to the delta subunit and the F1
complex [20]. In another experiment,
liposomes with b subunits alone were unable to bind F1 complexes [21].
Secondly, given the extended structure of the b subunit, and its role as part
of the stator, it is not unexpected that it would make contact with the
beta-alpha heterohexamer. Yet
cross-linking studies have shown that most of the contacts are between the b
subunit and alpha subunit. On the b
subunit, position 92 is in close proximity to region 464-483 on the alpha
subunit, position 109-110 is in close proximity to alpha’s 102-106 or 212-250
region, and position 156 is close to position 2 and 90 of the alpha subunit [22,23]. This is significant because FliI
is more similar to the beta form than the alpha form, where it to has 29%
identity with beta and 25% identity with alpha [7].
Another obvious difference concerns the N-terminus. The b subunit has a distinct N-terminal membrane spanning unit that anchors it. Such an insertion probably compensates for the weak nature of the dimerization between the two b subunits, as their cytoplasmic domains would be held in proximity [19]. In contrast, FliH forms a stable dimer and has no obvious transmembrane region. For example, using the algorithm from Hofmann & Stoffel [24], distinct N-terminal transmembrane regions were scored using sequence from the b subunits of Bacillus, Escherichia, Aquifex, Thermotoga, and Enterococcus. In contrast, no transmembrane regions were scored using FliH sequence from Bacillus, Escherichia, Clostridium, Thermotoga, and Enterococcus. The difference can easily be appreciated from the GES hydrophobicity scales [25] of FliH and F0b from Bacillus (gm+) and Escherichia(gm-) as seen in Figure 3 and 4.

Figure 3. GES hydrophobic scale for amino acid sequence from
B. subtilis (a) and E. coli (b) F0b subunit. Blue line represents amino acids and red
dotted lines represent thresholds for membrane spanning regions.

Figure 4. GES
hydrophobic scale for amino acid sequence from B. subtilis (a) and E. coli (b)
FliH. Blue line represents amino
acids and red dotted lines represent thresholds for membrane spanning regions.
Clearly, the N-terminus of F0b and FliH look very different (in fact, F0b is more hydrophilic over its entire length, with the exception around position 120).
To summarize thus far, F0b has an N-terminal transmembrane region, a weak dimerization domain, and a C-terminal region that interacts with both F1delta and F1alpha (as part of a heterohexamer). FliH has no N-terminal transmembrane region, a strong dimerization domain, and a unique C-terminal region that interacts with a flagellum-specific N-terminal portion of the FliI monomer. When the lack of sequence similarity is added to these differences, the analogy (thus homology inference) looks very weak.
But what about the other similarities uncovered by Matzke? The CDART analysis based on FliH does retrieve F0-b as a result with similar domain architecture (using the default e-value cutoff of 0.01). Yet it retrieves the F0b sequence only from Enterococcus faecium. If you use this sequence (174 amino acids) to probe NCBI’s Conserved Domain Search, you’ll retrieve FliH because of an alignment between a consensus of FliH with positions 46-167 of E. faecium F0b (18% sequence identity). The FliH consensus sequence used is not a sequence of universally conserved residues among FliH and neither the N-terminal or C-terminal regions of FliH are used to match F0b with FliH. Might this match be a coincidence or example of convergence given that it is the extended regions of two helical dimers that are being compared?
F0b sequence from distantly related Vibrio, Pseudomonas, Helicobacter, Nitrosomonas , Bacillus, Salmonella, Escherichia, Aquifex, and Thermotoga was used to probe NCBI’s Conserved Domain Search and FliH was not retrieved in any case, even with an e-value cutoff of 1 (which is prone to pick up false positives). The weak match Matzke identifies and builds on appears to be specific to a particular b subunit in a particular gram-positive bacterium and a FliH consensus sequence. What is interesting is that with the less stringent cutoff value, intermediate filament proteins (such a laminins) and other eukaryotic cytoskeletal proteins were retrieved using F0b sequence. Such proteins often form elongated alpha-helical dimers involving coiled-coils. Thus, the CDART result Matzke relies most likely speaks to coincidence or weak convergence stemming from a common structural theme of elongated proteins. It doesn’t support homology.
What of the YscL sequence similarities with the e subunit from the archaeal ATPases? First, it is not clear if the archaeal e subunits are indeed homologous to the bacterial b subunits as the two don’t share significant sequence similarity. Yet even if this is the case, the YscL/ M. jannaschii e subunit similarity has an e-value of 0.081 and involves YscL sequence from position 45-206. When YscL sequence is used in a BLAST analysis, the most closely related sequences are SctL from Photorhabdus luminescens (68% identity), LscL from Photorhabdus luminescens (67% identity) and PscL from Pseudomonas aeruginosa (56% identity). Yet when these sequences are used to BLAST archaeal sequence, none of them retrieve any of the subunits from the archaeal ATPases. Thus, there appears to be something specific to the YscL sequence that retrieves the e subunits. Given that the TTSS machinery has probably evolved from the flagellum and the type III sequence has diverged considerably from its flagellar homologs [26], which in turn would be far removed from the F0F1 machinery, which in turn are removed from the V-ATPase machinery, the similarity between an archaeal protein and a protein specific to Yersinia probably reflects coincidence or convergence.
fliJ/F1delta
The next candidates for homology are the delta subunit from the ATP synthetase and FliJ. The delta subunit interacts with F1 heterohexamer and F0b, while FliJ interacts with both FliI and FliH. Both are similar in size and may be similar in structure (being largely alpha helical).
Yet there are many differences that seriously weaken the analogy. While the delta subunit is found in all F0F1 ATP synthetases and forms part of the stator, FliJ is not found in the deep branching bacteria (Aquifex and Thermotoga) nor the spirochaetes. Unlike FliH, CDART analysis with FliJ sequence fails to retrieve any F1delta sequence, seriously weakening the claim for homologous structure. A similar problem with the FliH/FliI interaction exists, in that delta binds to the alpha subunit [27], while FliI is more similar to the beta subunit.
Comparing the physico-chemical properties of FliJ and F1delta, using the method of Aizawa [28], uncovered more differences between the two proteins. For F1delta, sequence from Aquifex, Thermotoga, and various gram positive and proteobacteria were used (listed in Figure 5). For FliJ, sequence from various gram positive and proteobacteria were used (listed in Figure 5), as FliJ is lacking in Aquifex and Thermotoga. While the proteins are similar in size, the size of the delta subunit is strongly conserved and about 30 amino acids longer than FliJ (Figure 5b). For delta, the average amino acid content is 180 +/-2, while for FliJ the average size is 146 +/-10. More significant differences are seen in hydrophobic content. The aliphatic index (AI) is a measure of the relative volume of a protein occupied by alanine, valine, isoleucine, and leucine and may be thought of as a positive factor for the increase of thermostability of globular proteins [29]. For F1delta and FliJ, the average AI was 105.6 +/-7.2 and 75 +/-7.8, respectively. The GRAVY score is the average hydropathy score for all the amino acids in the protein [30]. The GRAVY scores are shown in Figure 5a, where F1delta has an average score of -0.118 +/- 0.164 and FliJ has an average score of -0.947 +/- 0.125. Compared to delta, FliJ is thus a smaller and much more hydrophilic protein. Finally there is the instability index, which is a measure of a protein’s stability based on the presence or absence of dipeptides that correlate with proteins know to be stable and unstable [31]. A score above 40 predicts the protein will be unstable, while a score below 40 predicts stability. Among the delta sequences analyzed, 10/11 had a score below 40 with an average value of 31.9, while among the FliJ sequences, 7/10 sequences had a score above 40 with an average value of 47.3. Thus in general, FliJ is an unstable protein while delta is stable.

Figure 5. Physico-chemical
comparison of Fldelta and fliJ.
Bacteria for F1delta analysis (left) include: Aquifex, Bacillus, Enterococcus, Escherichia, Thermotoga, Caulobacter,
Nitrosomonas, Oceanobacillus, Salmonella, Shewanella, and Vibrio. Bacteria for FliJ analysis(right) include Bacillus,
Escherichia, Yersinia, Caulobacter, Clostridium, Nitrosomonas, Oceanobacillus,
Salmonella, Shewanella, and Vibrio. GRAVY scores (the more negative, the more
hydrophilic) are shown in a and amino acid content is
shown in b.
In the ATP synthetase, the
N-terminal region of delta interacts with the N-terminal region of alpha and
the C-terminal region of delta interacts with the C terminal region of F0b.
Matzke argues:
Regarding the FliJ-FliH2
interaction, Fraser et al. (2003) favor a model where FliJ interacts with
the N-terminal region of FliH2, but their data (Gonzalez-Pedrajo
et al., 2002) shows that deletions in either the N-terminus (perhaps the
region that associates with the membrane) or middle (dimerization region) of
FliH preclude FliJ binding; thus failure of FliJ binding could be due to
general malformation of FliH2 due to the failure of FliH to dimerize
(middle deletion) or associate with the membrane (N-terminal deletion).
Homology between F1-δ and FliJ would predict that FliJ-FliH
interaction is actually mediated through the C-terminal regions of each, but
that the association may be rather weak, as it is between F0-b2
and F1-δ (Weber and Senior, 2003).
Yet there are two problems with this argument. First, the same work by Gonzalez-Pedrajo
et al. showed that three separate
C-terminal 10-amino-acid deletions (140-150; 170-180; 200-210) in FliH did not
disrupt FliH-FliJ complex formation.
When this is coupled to the fact that removal of the first 10 N-terminal
amino acids abolishes FliH-FliJ interaction, Fraser et al.’s model seems
well-supported. Secondly, it has already
been established that the C-terminal region of FliH interacts with the N-terminal
region of FliI [15].
Gonzalez-Pedrajo et al. showed
that every one of the FliH 10-amino-acid deletions downstream of position 110
failed to form a complex with FliI. For
example, if you delete amino acids from position 220-230 (FliH has 235 amino
acids), no complex with FliI was formed.
Given that small C-terminal deletions disrupt FliI association, but not
FliJ association, it would seem quite unlikely that the C-terminus of FliH is
interacting with FliJ. When this
difference is coupled to the physico-chemical differences, lack of sequence
similarity, and the lack of structural similarity detectable by CDART, the case
for an FliJ-F0delta homology is very weak.
FliO/F1epsilon
The case for homology between FliO and F1epsilon is non-existent. Epsilon is an essential component of the central stalk of the F0F1 ATP synthetase. It is a protein that makes contact with all the core players. It is required to link the soluble F1 complex to the membrane-bound F0 complex [32]. Mutation and cross-linking data also indicate that the N-terminal region of epsilon (around position 31) interacts with the polar loop of the c subunit [33]. Epsilon is also well-known for forming a tight complex with the base of the gamma subunit [34]. And epsilon can be cross-linking to both the alpha and beta subunits of the F1 heterohexamer [35]. These multiple interactions allow epsilon to bind the F1 complex with the F0 complex, as part of the central shaft and then help mediate the transactions within the F0 complex into the F1 complex. As such, epsilon plays two very interesting roles. First, it strongly inhibits the ATPase activity of both the F1 and F0F1 complex under low ATP concentrations [36]. Secondly, it has been recently shown that epsilon itself can specifically bind ATP, raising the possibility it also serves as internal sensor of ATP levels [37].
The structure of the epsilon subunit has been solved and shown to consist of two domains: an N-terminal region with ten beta strands (about 80 amino acids) and a C-terminal region with a helical hairpin (Figure 6). The N-terminal region associates with the base of the gamma subunit and polar loop of the c subunit. The C-terminal region undergoes a radical conformation switch. Figure 6 illustrates the two alpha helices faces each other and this is referred to as the “down state.” It is unlikely that this arrangement could interact with the stalk of the gamma subunit and central cavity lined by the alpha and beta subunits. Neither is this arrangement capable of inhibiting the ATPase activity of the F1 complex. The second arrangement is shown in Figure 7, where the helices straighten out and extend in a fashion that would run parallel to the gamma central stalk. This conformation is known as the “up state.”

Figure 6. NMR
structure of epsilon subunit from E coli. Figure is adapted from [38].

Figure 7. PhotoShop
alteration of Figure 6 to illustrate what the up-state of the epsilon subunit
may look like after conformation changes occur in the C-terminal helices.
There are profound functional implications for these two states, as the epsilon subunit behaves like a gear that allows the F0F1 complex to serve either as an ATP synthesis machine or a proton pump (driven by ATP hydrolysis). Put simply, when epsilon is in the up-state, the F1 complex cannot catalyze ATP hydrolysis but can catalyze ATP synthesis [39]. In the down-state, the F1 complex can catalyze ATP hydrolysis, but not ATP synthesis. Sukzuki et al. recently showed that the epsilon subunit adopts the up-state conformation in the presence of ADP and the down-state conformation in the presence of ATP [40]. Even more interesting is the fact that the proton motive force, mediated by the F0 complex, causes epsilon to strongly favor the up-state arrangement irrespective of the ATP/ADP balance, suggesting that the contacts between the c subunit and epsilon communicate such changes as a function of the c subunit undergoing its rearrangements as it deals with the protons. Thus, when a proton motive force exists and ATP pools are low, the epsilon subunit essentially acts as a gear that enforces ATP synthesis. But when ATP levels are high and the proton motive force is low, the epsilon subunit behaves as a gear to turn the synthetase into a proton pump.
When we turn to FliO, there is nothing that would cause us to suspect homology with epsilon. They don’t share any significant sequence similarity and CDART fails to retrieve FliO with epsilon sequence (or epsilon with FliO sequence). FliO has a distinct N-terminal membrane spanning region and epsilon does not. Even the physico-chemical properties are distinctly different. Epsilon from E. coli, B. subtilis, Aquifex, Thermotoga, and Enterococcus, has an average pI of 5.5 (+/- 0.6), a GRAVY score of -0.23 (+/- 0.1), and an Instability Index of 31.5 (making it a stable protein). In contrast, FliO from E. coli, Caulobacter, Nitrosomonas, Vibrio, and Pseudomonas (FliO is restricted to proteobacteria), has an average pI of 8.8 (+/- 1.6), a GRAVY score of 0.28 (+/- 0.1), and an Instability index of 41.6 (making it unstable). And unlike epsilon, there is no report of FliO inhibiting the ATPase activity of FliI nor does ATP binding to FliO. Finally, while the epsilon subunit is essential to the F0F1 ATP synthetase (being universally present among bacteria), FliO is apparently not essential, as FliO homologs are missing in Aquifex and Thermotoga (the deepest branching bacteria), gram-positive bacteria, and spirochaetes.
However, the two proteins are very similar in size. Using sequence from the same species listed above, epsilon is composed of 130 amino acids, while FliO is composed of 127 amino acids. Yet this similarity is misleading. Recall the function of the C-terminal alpha helices in epsilon as they relate to switching between ATP synthesis and ATP hydrolysis. Since it is unlikely that FliI acts as an ATP synthetase, this domain would not be needed if FliO was homologous. In fact, since ATP hydrolysis is required to fuel the transport of flagellar proteins through the export machinery, the domain would likely be troublesome and/or deleterious and we might expect selection to cut it out. What is interesting is that Chlorobium limicola lacks the C-terminal domain in its epsilon subunit. Suzuki et al. note these bacteria grow in anaerobic environments and the
F0F1 should work as an ATP
hydrolysis-driven proton pump. Because
the F0F1 with up-state epsilon is unable to mediate ATP hydrolysis-driven
proton pumping, these bacteria do not need, or had better delete, the C-terminal
domain of the epsilon subunit. [40]
Since the N-terminal domain of epsilon is roughly 80 amino acids, and epsilon in Chlorobium is 88 amino acids in length, the hypothesis of homology between FliO and epsilon would more plausibly predict that FliO would be composed of 80-90 amino acids, not 130 amino acids. Furthermore, the hypothesis of homology would predict that FliO should be composed primarily of beta strands and coiled regions, reflecting its affinity with the N-terminal domain of epsilon. Yet secondary structure analysis [16] of FliO sequence from the five bacteria mentioned above predicts FliO is composed of 44% alpha helices.
Finally, some of the epsilon contacts with the c subunit and the alpha/beta hexamer are well-defined. For example, it is known that a conserved glutamate at position 31 on the epsilon interacts with a glutamine at position 42 on the c subunit. When CLUSTALW is used to align FliO and epsilon sequences (from the species mentioned above), the conserved glutamate is clear (Figure 8). Yet there there are no conserved acidic residues in the same neighborhood of the FliO sequences.

Figure 8. Aligned sequences showing
conserved acidic residues among epsilon subunits (highlighted in blue) at or
near position 31 and no acid residues at similar positions in FliO.
A more striking difference concerns the epsilon-beta interactions, where growing evidence indicates that the C-terminal regions of epsilon interact with a highly conserved DELSEED motif on the beta subunit [41]. In fact, it is within this region that epsilon exerts its ATPase inhibitory effect on beta. But if we compare the aligned sequences of beta and FliI (from Clostridium, B. subtilis, Thermotoga, Aquifex, and E. coli), it quickly becomes apparent that there is no DELSEED motif on FliI, clearly indicating FliI is not interacting with an epsilon-like subunit carrying out an epsilon-like function (Figure 9).

Figure
9. Aligned sequences of FliI and F1beta. The DELSEED motif is highlighted in gray.
The epsilon subunit from a
thermophilic gram positive bacterium was subjected to mutagenesis, where basic
residues in the C-terminal region were replaced by alanine. This led to same loss of ATPase inhibitory
activity from epsilon as mutagenesis of the DELSEED sequence on beta [41]. The position of the basic residues is
conserved among gram positive bacteria, but not all bacteria. However, when I surveyed the last 45 amino
acids of epsilon from E. coli, B. subtilis, and Aquifex, 20% of the
residues were basic. In contrast, the
C-terminal 45 amino acids of FliP showed only 9% basic residues, which maps
closely to the value for the E. coli proteome (amino acid composition of
roughly 10% lysine and arginine).
When we combine all the various
differences between FliO and epsilon, and couple it with the fact that the
homology would not predict these two proteins should be the same size (the only
evidence in support of their homology), it becomes clear that FliO and epsilon
are not homologs.
FliP/Gamma
Like FliO/epsilon, the case for
homology between FliP and gamma is non-existent. Gamma functions as the central stalk that
rotates within the hexameric cavity formed by the 3 alpha and 3 beta subunits. It interacts with the polar loop of the c
subunit, epsilon,
and the beta/alpha subunits. As was the
case with epsilon, gamma is required to bind the F1 complex to the F0
complex. The gamma subunit is especially
important from an IC machine perspective, as it is the main player that
transmits the proton motive force (the input) to the ATP synthetase activity of
the beta subunit (the output).
There is no reason to think gamma is homologous to FliP. They share no sequence similarity and CDART analysis with FliP sequence does not retrieve gamma (nor does gamma sequence retrieve FliP). The average size of gamma from E. coli, B. subtilis, Eneterococcus, Thermotoga, and Aquifex is 308 amino acids, while FliP from E. coli, B.subtilis, Thermotoga, Aquifex, and Treponema is 245 amino acids. GRAVY scores from the same species provide an average of -0.29 for gamma and 0.77 for FliP, indicating that FliP is a smaller, much more hydrophobic protein. This is not surprising given that FliP is a membrane protein . Using the algorithm from Hofmann & Stoffel [24], B. subtilis is found to have 3 TMHs, Thermotoga and Treponema have 4 TMHs, and E. coli along with Aquifex score 5 TMHs. In striking contrast, the gamma subunit from Aquifex, B. subtilis, and Enterococcus have no TMHs, while Thermotoga and E.coli each score a single, weak TMH. According to this algorithm, a score of 500 is needed to indicate a TMH. In E. coli, a region of gamma from position 115-137 scores 605 and in gamma from Thermotoga, a region from 196-214 scores 501. In contrast, the FliP TMHs from E coli have an average score of 1850. Thus, gamma probably has no true TMH and this is consistent with the structural data. Even if we assume the E. coli gamma subunit has a true transmembrane region, comparing it to FliP underscores the radical difference between the two. In E.coli, gamma is composed of 287 amino acids. If positions 115-137 are a true TMH, that means 8% of the amino acids are buried in the membrane. FliP is 245 amino acids in length and residues at positions 4-26, 45-64, 88-105, 193-213, and 223-241 form TMHs, such that 41% of the sequence is buried in membrane.
This difference is quite significant given what we know about gamma. In coupling the F0 and F1 complex, gamma
extends the length of the 45 A stalk [42] and approximately 70 A length of the central cavity
of the alpha/beta hexamer [40]. In fact,
structural data indicate the C-terminal alpha helix (from position 204-286)
extends 118 A from the base of the stalk to the crown of the hexamer [43]. When we turn to FliP,
there is only one large hydrophilic loop that could possibly serve as an analog
for the gamma stalk. This region is
typically located between the third and fourth TMH. Its average size (using FliP from E. coli,
B.subtilis, Thermotoga, Aquifex,
and Treponema) is 85 amino acids in length. If we assume this region is largely alpha
helical (because if its hypothetical homology with
gamma), where each residue contributes 1.5 A of length, it would take
approximately 77 residues to travel 115 A.
At first, it seems like a good fit.
But the 85-amino-acid hydrophilic portion of FliP is a loop region between two TMHs.
The furthest out it could extend is roughly 42 amino acids before it has
to turn and head back into the membrane.
Given that almost half of FliP is buried in the membrane and the
hydrophilic region is much too short to extend the length of gamma, it is
highly unlikely that gamma and FliP are homologs.
Two other features add to this
conclusion. First, modeling data
indicate that the hydrophilic loop of FliP is found in the periplasm and not
the cytoplasm [44]. Since FliI (the
analog of beta/alpha) is found on the cytoplasmic side of the membrane, the
loop would be unable to interact with it.
Secondly, genetic data have identified several residues important in
gamma function. For example, a glutamine
residue at position 269 has been determined to be crucial to the assembly of
the gamma/beta/alpha complex [45]. This
glutamine is conserved among the gamma subunits from Aquifex, Thermotoga, E. coli, B. subtilis, and Enterococcus. When CLUSTALW
is used to align gamma with FliP, it is quickly apparent that FliP does not
share a conserved residue on this position (Figure 10). In fact, the sequences of both FliP and gamma
are conserved in the same region, but the actual sequences conserved are quite
different. FliP demonstrates a consensus
sequence enriched with hydrophobic residues (consensus being FVLVDGW), while
the gamma region has a consensus sequence enriched in hydrophilic residues
(YNKARQA). This should not be surprising
as the FliP positions that align with the conserved glutamine from gamma are
part of a TMH, indicating these regions are not homologous.

Figure
10. Aligned C-terminal regions of FliP and gamma. Conserved glutamine known to be important in
assembly of the F0F1 ATP synthetase is highlighted in blue.
Other data show that a glutamic
acid at position 208 plays an important role in coupling the proton motive
force to the ATPase activity of the beta subunit, perhaps playing a role in releasing
the inhibitory activity of the epsilon subunit [46]. CLUSTALW alignment again shows this residue
to be completely conserved among very distantly related bacteria. But when FliP is aligned with gamma sequence,
there is a conserved proline residue at the equivalent position (Figure 11).

Figure
11. Aligned regions of FliP and gamma around position 208. Conserved glutamic acid known to be important
in coupling F0 and F1 activities is highlighted in blue.
Furthermore, position 205 (housed
by tyrosine, phenylalanine, or leucine) is also known to be important for
interactions between gamma and the c subunit [47] not seen in the
equivalent position in FliP. Also, the
DELSEED motif found in the beta subunit is also known to have important
functional interactions with the gamma subunit [49] through position 23 (occupied by a conserved
methionine). Yet as mentioned above,
FliI does not possess the DELSEED motif (Figure 9). And when FliP sequence is aligned with gamma,
gamma’s position 23 lines up with a huge gap in 4 of the 5 FliP sequences.
Finally, to all of the above considerations, we
should note there is experimental evidence that indicates there are 4-5 copies
of FliP per basal body structure, while there is only one gamma subunit per
F0F1 complex [50].
There is no evidence to support the contention of
homology between gamma/FliP and epsilon/FliO.
This is significant from an IC perspective, as both gamma and epsilon
play essential roles in structurally and functionally linking the rotary movement
of the c subunits to the catalytic sites of the beta subunit. Without analogs of these components, there is
no reason to think FliI in the flagella is linked to a c subunit-like protein. Thus, the perspective of IC leads us to
predict that the flagellar type III secretory machinery possesses no true
homolog of the c subunit. This takes us
to the next proposed homology.
FliQ/F0c
The c subunit of the F0F1 synthetase (F0c) forms
the motor of the machine. It is estimated that approximately 12 copies of F0c
form a ring within the membrane that is turned by the proton motive force. This rotary motion is then coupled to the
epsilon and gamma subunit which in turn interact with the catalytic domain of
the beta subunit.
At first look, the similarities between FliQ and
the c subunit of the ATP synthetase are intriguing. The average size of F0c from E.
coli, B. subtilis, Eneterococcus, Thermotoga, and Aquifex is 81 amino acids, while FliQ from E. coli,
B.subtilis, Thermotoga, Aquifex, and Treponema is 89 amino acids in
length. They both form a hairpin like
structure with two transmembrane regions, although CDART analysis fails to link
the two proteins. And they have very similar physico-chemical properties. The average GRAVY score for F0c and FliQ is
1.1 and 1.2, respectively. And the
average pI for F0c is 6.1, while it is 5.5 for FliQ.
However, a closer look indicates these proteins are behaving differently over evolutionary time. The length of FliQ is much more conserved than with F0c. While F0c varies from 70-100 amino acids in the species mentioned above, FliQ varies in a much smaller range, from 88-94 residues (keeping in mind the immense phylogenetic distance of the species surveyed). Furthermore, FliQ is a much bulkier protein that is not explained by its additional 8 amino acids. The average molecular weight of F0c is 8351 D, while for FliQ it is 9916 D. Thus, while the size of F0c is about 91% of FliQ in terms of amino acid content, it is less that 85% of FliQ in terms of mass. This led me to survey the average amino acid content of the two proteins (using sequence from the same species listed above) and a striking difference appears. The average alanine and glycine content of FliQ is 8% and 5%, respectively. This maps well the average amino acid content of the proteomes from E. coli and B. subtilis (8.5% for alanine and 7% for glycine). Yet in F0c, the average alanine content is 16% and the glycine content is 14%, twice the levels expected from the average contents of proteomes. Since glycine and alanine are the two residues with the smallest side chains [51], and their disproportionate excess has been retained in the most distantly related bacteria, this probably signals a structural, thus functional property of F0c that is not shared by FliQ.
Since FliQ and F0c are very close
in size and share the same basic structure, we can obtain more meaningful
information from comparative sequence data in light of the hypothesis that these
two proteins share similar functions and a common origin. When the F0c subunits of the species
mentioned above are aligned by CLUSTALW, there are 7 positions with identical
residues, 22 conserved with strong groups, and 11 conserved with weak groups.
This means that about 49% of the sequence is identical or similar among the c
subunits. When FliQ sequence is aligned,
there are 15 identical residues, 24 conserved with strong groups, and 8
conserved with weak groups, yielding 52% sequence similarity. But when the FliQ sequences are aligned with
F0c, there are only 2 identical positions, 6 conserved with strong groups, and
3 with weak groups. That’s roughly 13%
sequence similarity, easily explained by chance, especially when considering
both proteins have two TMHs and the majority of the similarities involve
hydrophobic residues. Thus, while both
F0c and FliQ are relatively conserved among themselves, they are quite
different from each other.
But things get much worse for the hypothesis of homology. We know the essential amino acid of F0c – aspartate at position 61 (in E coli). This residue is found about half way down the second transmembrane region. This is an acidic residue which finds itself in a special local environment, such that the pKa of its carboxyl side chain is 7.1 [52]. This is a value that is significantly higher than that expected for solvent exposed residues, meaning it is not exposed to solvent and is instead buried in a localized hydrophobic pocket provided by surrounding hydrophobic side chains. This explains the unique reactivity of this residue, allowing it to efficiently protonate and deprotonate to complete a cycle of proton translocation. As can be seen in the aligned sequence of distantly related bacteria(Figure 12), position 61 is absolutely conserved with acidic residues (red box), as this is the heart of the motor. Yet a survey of FliQ sequence aligned with F0c sequence shows no acidic residues at the same position. What’s worse, not only is the equivalent position in FliQ highly variable, there is no conserved acidic residues in the neighborhood. In a sense, FliQ is lacking the “catalytic site” that defines the c subunit.

Figure
12. F0c
and FliQ subunits aligned with CLUSTALW. See text for significance of colored
boxes.
However, experimental data has
shown that we can move the carboxyl group from position 61 to the first TMH at
position 24 (normally inhabited by alanine) and still retain proton
translocation function, indicating the two TMHs are in very close contact
around this region [53]. Might there be
an acidic residue at this position on FliQ?
Position 24 on F0c is highlighted with a gray box in Figure 12. As can be seen, there are also no acidic
residues at the equivalent position in FliQ nor are
there any in the neighborhood.
Only one position in FliQ is conserved for acidic residues – position 46 (purple box). Might this serve as the proton translocator? No. In the c subunit, this region is part of the cytosolic loop. In FliQ, this is a hydrophilic region (almost at the center of the protein) and Hofmann & Stoffel’s algorithm clearly scores this as the looping region between the two TMHs. A survey of the entire protein sequence thus shows that FliQ does not possess an analog of the conserved acidic residue from F0c.
If we focus on the hydrophilic loop region, further differences appear. There is evidence that the conformation of the loop changes when D61 is protonated [54]. Such conformation changes are probably transmitted to the epsilon and gamma subunits which are also known to interact with this loop region. As such, the loop becomes the second crucial node in transmitting the machines input into output. Thus, it is not surprising to see some strong sequence conservation in this region – ARQPE. Several mutagenesis studies have been done. As mentioned above, Q42 is thought to interact with E31 on the epsilon subunit. And just as FliO had no conserved glutamate at its equivalent position, FliQ has no conserved glutamine at its equivalent position (as IC predicts). More significant is R41 on the c subunit (blue box). If this arginine is replaced with histidine, F0a subunits are not incorporated into the F0 complex. If the arginine is replaced with lysine (the most conserved change possible for arginine), F0 could link with F1, but the proton motive force was uncoupled from the ATPase activity of the beta subunits [55]. When we look to the FliQ sequence at the equivalent position, not only is arginine missing, but that position is conserved with phenylalanine, a huge, hydrophobic amino acid. In fact, the FliQ sequence aligned to the essential loop sequence of F0c has a distinctly different, yet equally well conserved sequence.
Finally, recall the preponderance of small amino acids (glycine and alanine) of F0c. It turns out that many of these are likewise important, not in direct interactions, but probably in maintaining the proper structure of the hairpin protein. For example, the alanines at position 24 (gray box) are important. In one study, this position was replaced with leucine, resulting in a loss of ATP synthetase function [56]. And what amino acid is completely conserved at the equivalent position in FliQ? Leucine. Replacement of the conserved glycine at position 27 (yellow box) with leucine abolishes proton transport [57], yet glycine is not conserved at this position in FliQ. There is also a conserved glycine at position 29 in F0c (green box). If this is replaced with a valine, proton transport is abolished [58]. The equivalent position on FliQ is conserved for residues that are even larger and more hydrophobic. Finally, replacing the glycine at position 58 (orange box) with aspartate results in loss of function [59], while the equivalent position in FliQ is conserved with methionine. These small side chains (and others) are probably essential in allowing the two TMHs to come into close contact to define the hydrophobic cavity surrounding D61 and in transmitting the proper conformational changes to the loop region.
Given that FliQ appears to lack all of the high resolution features that define the function of F0c, there is simply no reason to think the two proteins are homologous. Considering that the loop region of FliQ is also predicted to be periplasmic (as noted by Matzke), the hairpin structure is not even facing the right direction to interact with FliI in a manner that is analogous to the F0F1 ATP synthetase.
FliR/F0a
The a subunit of the ATP synthetase (F0a) is an integral membrane protein that works in conjunction with the c subunit. It sits on one side of the c subunit ring and is thought to translocate the protons to the D61 residue on the subunit. F0a is the least understood subunit of the F0F1 complex. There is still controversy over how many transmembrane regions. Some argue that the N-terminus is in the cytoplasm and the protein contains six TMHs [60], while others argue the N-terminus in the periplasm and the protein contains 5 transmembrane regions [61]. F0a also forms a stable heterotrimer with the b subunit [62].
FliR is also an integral membrane protein of very similar size. Modeling data suggest the protein also has five or six transmembrane helices, but in this case, it is the most C-terminal helice that it in question – there are either five TMHs with the C-terminal region in the periplasm or six TNHs with the C-terminal region in the cytoplasm [44].
When using F0a sequence from E. coli, B. subtilis, Eneterococcus, Thermotoga, and Aquifex and FliR from E. coli, B.subtilis, Thermotoga, Aquifex, and Treponema, it becomes apparent that the physico-chemical properties of the two proteins are also very similar. The average size of F0a is 250 amino acids and FliR is 260 amino acids in length. Both proteins have the same instability index (33). They also have similar GRAVY scores (0.83 for F0a and 1.13 for FliR). Yet such similarities might easily be explained by the fact that the two proteins are similarly sized integral proteins with 5-6 TMHs. This can easily be appreciated by considering the physico-chemical properties of human aquaporin, a protein that channels water molecules through the membrane. This protein is 265 amino acids in length with six TMHs, an instability index of 33 and a GRAVY score of 0.65.
A closer look does uncover some interesting differences. F0a is a basic protein (with an average pI of 8.6 (+/- 1.2), while FliR is mildly acidic (6.2 +/- 1.5). Furthermore, the size and hydrophobicity of FliR is much more strongly conserved than F0a. Among F0a, the size ranges from 216-283 amino acids among these distantly related bacteria, while FliR ranges from 258-262 amino acids. And while the GRAVY scores of F0a vary from 0.69 to 1.01, among FliR sequences, they only vary from 1.11 to 1.16.
CDART analysis with FliR sequence fails to retrieve F0a and vice versa. When we turn to the actual sequence data, both proteins are conserved to the same extent. The aligned sequences of F0a from the above mentioned species shows 17 identical positions, 51 positions conserved with strong groups, and 18 conserved with weak groups, meaning that 33% of the sequence is identical or similar. FliR alignment shows 15 identical positions, 49 conserved with strong groups, and 28 with weak groups, yielding 35% identity or similarity. Yet if we align both FliR and F0a sequence, there are no identical positions, 8 positions conserved with strong groups, and 11 conserved with weak groups, giving us 7% sequence identity or similarity. As with FliQ/F0c, the sequence similarity is far too weak to suggest homology, especially when the majority of the similarities are hydrophobic residues.
Although not much is known about
F0a, there is widespread consensus that a conserved arginine at position 210 is
essential to its function, as this arginine is thought to interact with the D61
residue on F0c. Mutagenesis studies show
that when this arginine is replaced with lysine (the most conserved
substitution for arginine) ATP-dependent proton translocation is lost. Since FliQ does appear to have any acidic
residues that would function in a F0c-like manner, the concept of irreducible
complexity would lead us to predict that FliR would also lack a conserved basic
residue in a position equivalent to R210 of F0a. Figure 13 shows the alignment that verifies
this prediction. The equivalent position
in FliR is occupied by a conserved leucine.
Furthermore, the sequence around R210 is rather conserved in F0a (a
consensus of TLGLRLFGN) and there is no hint of this sequence in FliR.

Figure
13. Aligned sequences of FliR
and F0a from very distantly related bacteria. The conserved arginine at
position 210 of F0a is shown in the blue box.
Finally, there is the stoichiometry of FliR. There experimental evidence indicates there are 1-3 copies per basal body [50]. There is also suggestive evidence that favors the basal body containing two copies of FliR (while F0F1 uses only one copy of F0a). Macnab proposes a conservative estimate of the basal body containing two copies of FlhB [63]. Yet he also notes that the recently sequenced genome from Clostridium shows an FliR-FlhB fusion protein, suggesting a 1:1 stoichiometry between FlhB and FliR. Thus, it would seem plausible to propose there are 2 copies of FliR per basal body.
The Utility of IC
The function of the F0F1 ATP synthetase is not simply to synthesis ATP, but
to convert the energy of a proton gradient into the chemical form of ATP. Put simply, the ATP synthetase is an energy
converter. To carry out this conversion,
all members of the eight-part machine are required. However, we can outline a circuit of IC
interactions involving the beta, gamma, epsilon, c and a
subunits. Figure 14 outlines the
four IC nodes.

Figure 14. A subset of
the F0F1 machinery highlighting some of IC interactions among some of the
components (see text for description).
If we begin in the beta subunit, there is a highly conserved DELSEED motif around position 380-386. This motif forms a 3-part interaction (IC node 1), where M23 on the gamma and C-terminal basic residues on epsilon functionally interact with DELSEED. Both gamma (around position 205) and epsilon (around position 31) then interact with the hydrophilic loop of the c subunit, including an essential glutamine at position 42(IC node 2). The polar loop undergoes conformational changes in response to the proonation of an acidic residue at position 61 on the second TMH of the c subunit. Such changes are mediated through the structure involving a series of conserved glycine and alanine residues within c (IC node 3). Finally, the acidic residue at position 61 interacts with a conserved arginine at position 210 on the a subunit (IC node 4). This IC map is only a subset of the various nodes of interaction needed for function. Missing from the map are the residues/positions that connect R210 on the a subunit to the protons in the periplasm, the residues/postions on the gamma and epsilon subunit that connect the interactions between the c subunit and beta subunit, and the residues/positions that connect the catalytic site of the beta subunit to the DELSEED motif, gamma, and epsilon regions. Furthermore, the contributions from the alpha, delta, and b subunits are note included. Nevertheless, even with this much simplified map, we can see how the hypothesis of TTSS-F0F1 homology fails at each and every IC node. As we saw, FliI lacks the DELSEED sequence. FliP lacks the conserved methionine at position 23 and FliO is not enriched with basic residues in its C-terminus. FliP has no conserved tyrosine at or near position 205 and epsilon has no conserved glutamate at or near position 31. The polar loop of FliQ looks quite different from the polar loop of the c subunit and lacks a conserved glutamine at position 42. FliQ also has no acidic residue near/at position 61. And FliR lacks a conserved arginine at or near position 210. The F0F1 machine thus has a set of IC interactions that are all completely missing from the hypothesized flagellar homologs. In other words, the functional essence of the ATP synthetase does not exist among the flagellar proteins, a finding that is best explained by the simple fact that the two machines are not homologous (a conclusion supported by other data listed above). And there is a certain irony in this conclusion. Homology/cooption are commonly cited as evolutionary solutions to the IC problem. Yet here is a case where the essence of IC turns back the hypothesis of homology/cooption. Since the complete lack of F0F1 IC interactions are missing from the TTS machinery of the flagellum, it is unlikely that the F0F1 complex is homologous to the TTS machinery, and thus cooption of the F0F1 complex is not a plausible explanation.
“It Looks Like it Could Have Evolved”
A common criticism of design inferences
is that they tend to boil down to the assertion of “it looks designed.” Yet it
should be apparent that the major thrust of Matzke’s hypothesis is to present
the bacterial flagellum in a manner where it looks like it could have evolved. To do this, Matzke offers the F0F1 ATP
synthetase as something that looks like a precursor/homolog of the type III
secretory machinery. However, given that
the sequence data does not support this inference, Matzke turns to other
data. CDART analysis succeeds in linking
only one proposed example of homology (FliH/F0b), but even here, the link is
extremely tenuous (as explained in the section on FliH). Thus, Matzke relies on other criteria: 1)
Protein size; 2) Stoichiometry
3) Presence of TMHs; and 4) orientation of the C- and N-terminus of
proteins. Yet it is not clear than these
criteria, even taken together, reliably signal homology. Consider protein size. Matzke’s candidates range from 72 to 495
amino acids. And if we omit the
FliI/alpha/beta group, the range is narrowed significantly, from 72 to 279
amino acids. Yet how meaningful is this range?
Figure 15 shows the distribution of proteins from E. coli scored
by length.

Figure
15. Length distribution of
proteins from the E. coli proteome [64]. Two vertical lines represent
range used by Matzke.
As can be seen clearly, the range
used by Matzke draws from the most commonly sized proteins. Size might be a significant factor if we were
talking about six proteins that were each over 500 amino acids in length, but
not is we’re talking 70-270 amino acids.
And what’s more, Matzke tolerates rather significant ranges in size
where, for example, F0b is only 65% the size of FliH.
Stoichiometry might be a suggestive clue, but the only solid connection Matzke has are the six-member rings of FliI and beta/alpha (and even there, the F1 ring is composed to two distinct members). The FliH/FliI stoichiometry remains to be established, as two monomers of FliH have been empirically detected to interact with a single monomer of FliI and not the FliI homohexamer (as explained above). Furthermore, as noted above, there is experimental evidence that indicates there are five copies of FliP (compared to one for gamma) and suggestive evidence that there are two copies of FliR (compared to one for F0a).
The presence or absence of TMHs is not a very powerful signal for homology. It has been estimated that 20-30% of ORFs encode membrane proteins with two of more TMHs [65]. If we begin with a membrane protein among the flagellar basal body and were to randomly choose from any random proteome, we’d pick another membrane protein 1/3 – 1/5 times. But what about the number of TMHs? Again, Matzke draws from a crowded sample. The putative homologs contain one to six TMHs. Yet among the population of bacterial membrane proteins, approximately 2/3 contain 2-6 TMHs [65,figure 2]. Yet even here, Matzke does not rigorously hold to this criterion. FliH is scored as a homolog of F0b, when the former lacks an obvious TMH contained by the latter. FliP is scored as a homolog of gamma, where FliP contains 4 TMHs, while gamma has none. And FliO is scored as a homolog to epsilon, even though FliO contains a TMH lacking in epsilon. In others words, three of the six pairs that are matched don’t even conform to similarities revolving around TMHs.
Finally, there is the orientation of C- and N-terminal ends of a protein. Yet here there is very little variability. In terms of protein-protein interactions, there are four possibilities: C-term/C-term; N-term/C-term; N-term/N-term; and C-term/N-terminus interactions. When it comes to membrane proteins, again there are four possibilities: N-term in/C-term in; N-term in/C-term out; N-term out/C-term in; N-term out/C-term out. However, in some cases, the number of possibilities is reduced further as a function of topological constraints. For example, among proteins with two TMHs, there are only two possibilities (N-term inside or N-term outside), as once the N-terminus is specified, the C-terminus is not free to vary. With such limited variability, this is not a reliable signal of homology.
However, what if we were to take all of these criteria and apply them as a whole? Let’s consider the FliR/F0a pair. The average size F0a was 250 amino acids, while FliR was 260 amino acids in length. If we are to tolerate a 35% variance (as seen with the FliH/F0b pair), this means we’re considering proteins in the range of about 170-340 amino acids. But let’s narrow that down to 200-300 amino acids. If we consider all the proteins coded by the E. coli genome, there are roughly 800 proteins (out of 4279) that fall within this length distribution [66], or 19%. Since 20-30% of the proteins are also membrane proteins, we’ll use 25%. To make the similarity stronger, let’s assume both proteins have six TMHs with the N-terminus in the cytoplasm. Among the membrane proteins in E. coli, roughly 9% have six TMHs [65]. Finally, there is a 50% chance the N-terminus will be in the cytoplasm (the C-terminus must also be in the cytoplasm as a consequence of the N-terminus orientation and presence of an even number of TMHs). Thus, if we started with FliR and were looking for a protein with similar properties, how common would such a protein be? The answer would be (0.19)(0.25)(0.09)(0.5) = 0.0021, or 1 out of every 476 proteins. In a genome with close to 4300 proteins, that number is far too large to be a reliable indicator of a relationship due to something other than chance.
As noted, the common objection against a design inference is that only a vague appearance of design is being posited, something that could be easily be explained by reading perceptions of design into chance events. That is, seeing design in life is akin to seeing a face in the clouds. Yet might this very problem also apply to Matzke’s hypothesis? Instead of seeing a face in the clouds, might there be merely a perception of an ATP synthetase in the basal body of the flagellum? The various dissimilarities (some very profound) listed above, along with the weakness of the criteria for inferring homology, is only rendered more problematic by the seemingly arbitrary nature of the chosen matches. Recall that the type III secretory machinery shares ten components with basal body of the flagellum: FliF, FlhA, FlhB, FliI, FliH, FliR, FliQ, FliP, FliG, and FliN. Yet the ATP synthetase homology hypothesis, at best, only explains half of these components. FliF, FlhB, and FlhA, which account for the most mass of the basal body, are left to a vague speculative realm and FliG/FliN are likewise vaguely added later in the picture. When the ATP synthetase is invoked to explain the other five members, two others have come with it, forcing a terrible match between FliO and epsilon. Not only are these very different proteins, FliO is found only in proteobacteria. The other match is between FliJ and delta, which is quite weak, not to mention that FliJ is lacking in deeply branching bacteria and spirochaetes. In other words, while FliJ and FliO are needed for the ATP synthetase-to-original flagellem story, both components appear to have been added much later in evolution. Thus, the match between the basal body and ATP synthetase has a forced-fit feel to it.
All of this takes us back to the
first essay in this series. In that
essay, I noted how Ken Miller was able to explain the simple mousetrap in
evolutionary terms. In other words,
Miller (and others) have unintentionally demonstrated
that the human mind can imagine evolutionary transitions when there were
none. I noted:
What is interesting about
this logic is that we already know that the mousetrap was intelligently
designed. We also know that it did not first exist as a clipboard, then a tie
clip. Thus, while it is logically possible to see the mousetrap as Miller does,
that is, as a modified clipboard and tie clip, such perceptions are not tied to
history nor the origin of the mousetrap. Thus, coming
up with imaginary accounts that tap into our ability to imagine cooptional
origins, by itself, is rather meaningless. If we can
successfully come up with such explanations where they are known to be false(the mousetrap), how do we know that our ability to do
likewise with things like the flagellum are not also inherently flawed?
References
1. Evolution in (Brownian)
space: a model for the origin of the bacterial flagellum
2. The Wheels of Life, BR
No.24