Introduction
It is well known that eukaryotic cells are more complex than prokaryotic cells. For example, while the typical eukaryotic cell is 10-100 micrometers in diameter, contains numerous membranous organelles, has a cytoskeleton, and reproduces through mitosis, the typical bacterial cell is only 0.2-2.0 micrometers in diameter, lacking organelles and cytoskeleton (or so it was thought), while reproducing through binary fission.
Yet the theme of enhanced complexity repeats itself at increasingly smaller scales like a fractal image.
Consider, for example, the three basic universal processes of information transfer: DNA replication, transcription, and translation. In both bacteria and eukaryotes, the same building blocks are used, the same macromolecules are synthesized, and the processes are essentially the same. Yet in each case, the process is more complex among eukarya than in bacteria. For example, while bacteria replicate their single chromosome from a single origin point and possess five different DNA polymerases, eukaryotes initiate replication from multiple points on their multiple chromosomes (involving a process known as licensure) and contain at least 19 DNA polymerases.
If we turn to transcription, bacteria employ a small set of transcription (sigma) factors and use an RNA polymerase (RNAP) built from five subunits. Among eukaryotes, we find 100s of different transcription factors and the single RNAP has been expanded into three versions: RNAP I, RNAP II,and RNAP III. RNAP II is most similar to the bacterial version, yet if we focus just on this protein complex, we again find enhanced complexity, where the eukaryotic version contains up to 15 subunits. And when we compare the shared core subunits, the eukaryotic versions even have additional domains (Cramer, Patrick. 2002 Multisubunit RNA polymerases. Current Opinion in Structural Biology 12:89-97).
And then there is the classic example of the bacterial and eukaryotic ribosomes. As can be seen from the table below, the eukaryotic ribosome has many more proteins (for both subunits) and longer ribosomal RNAs in each subunit.
| Comparison of Ribosome Structure in Bacteria and Eukaryotes | ||||
|---|---|---|---|---|
| Bacterial (70S) | Eukaryotic (80S) | |||
| Large Subunit | 50S | 60S | ||
| rRNAs (1 of each) |
23S (2904 nts) | 28S (4700 nts) | ||
| 5S (120 nts) | 5S (120 nts) | |||
| 5.8S (160 nts) | ||||
| Proteins | 33 | ~49 | ||
| Small Subunit | 30S | 40S | ||
| rRNA | 16S (1542 nts) | 18S (1900 nts) | ||
| Proteins | 20 | ~33 | ||
Finally, if we consider the entire proteome from Eukarya, Bacteria, and Archaea, the theme of enhanced complexity is ubiquitous (Brocchieri, L and Karlin, S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Research 33: 3390–3400). The median length of the proteins annotated among Eukaryotes is 361 amino acids while it is only 267 amino acids in Bacteria and 247 amino acids in Archaea. This is a theme that is seen among all the various functional classes of proteins, as seen from some examples in the table below.
| Median Length of Proteins (amino acids) | ||
|---|---|---|
| Bacterial | Eukaryotic | |
| DNA replication and processing proteins | 315 | 723 |
| Transcription-related proteins | 240 | 444 |
| Translation-related proteins | 208 | 296 |
| Cell division and chromosome partitioning | 346 | 439 |
| Inorganic ion transport and metabolism | 314 | 538 |
| Signal transduction mechanisms | 323 | 605 |
(modified from Brocchieri and Karlin)
The theme of enhanced complexity among eukarya is seen from many different perspectives: the global architecture of the cell, the number of steps involved in many basic processes, the number of components in any machine, and the size of proteins regardless of function. Enhanced complexity thus permeates the eukaryotic cell.
And this leaves us with some tantalizing questions. Why is the eukaryotic cell plan so much more complex than the bacterial cell plan? What does this increased complexity tell us about the eukaryotic cell plan relative to the bacterial version? Why does the theme of enhanced complexity reach into every aspect of the cell plan?
GFP Guides the Way
Green Fluorescent Protein (GFP), shown on the left, was originally isolated from the jellyfish Aequorea victoria. This little protein has become remarkably useful in biochemical research. As you can see from the picture, the amino acid chain folds into a structure known as the beta barrel. Three of the amino acids line up in the center of the barrel and interact to form a fluorophore. When the energy from blue light is absorbed by this fluorophore, it is then re-emitted as green light. Thus, the protein fluoresces green, akin to a molecular green lantern.
So why is this protein so useful in scientific research? Combined with the techniques of genetic engineering, scientists now have a non-toxic tag that allows them to see biological processes unfold in real time. Thus, GFP is commonly used to track events within the cell and within a developing embryo.
For example, let’s say that you want to watch how nervous tissue develops and spreads in an embryo. How are you going to do this? Prior to GFP, biologists had to kill and stain their specimens, giving them a static picture of events. But today, scientists can take the GFP gene and insert it into cells such that it will only be expressed in nervous tissue. This tissue turns green and development can thus be monitored. For example, the picture below is a neuron that has been engineered to express GFP.
GFP can also be used to study a more basic event. The green fluorescence depends on GFP folding into its proper shape. Thus, we can easily measure factors that influence protein folding by simply observing changes in fluorescence.
Researchers at the Max Planck Institute of Biochemistry (Chang, HC, Kaiser, CM, Hartl, FU, and Barral, JM. 2005. De novo folding of GFP fusion proteins: high efficiency in eukaryotes but not in bacteria. JMB 353: 397-409) used GFP in exactly this fashion. But there was a twist.
GFP, a jellyfish protein, does not fold efficiently in E. coli (only about one out of two synthesized proteins folds into a functional state). So what would happen if you fused the gene for GFP to a gene for another protein that does fold efficiently? When the gene was expressed, would this artificial two-domain protein fold? If so, the bacteria should turn green.
The researchers fused GFP to four different proteins: maltose binding protein, NusA, MreB, and enolase, as these are four proteins known to fold very efficiently in bacteria. The gene for these four fusion proteins was then expressed in E. coli. The result? The fusion protein failed to fold and instead formed a sticky aggregate. Furthermore, it did not matter if you put the GFP in front of the bacterial protein or behind it, as both versions fail to fold.
But what if we put the fusion proteins in baker’s yeast (a eukaryote)? They fold just fine and we get green yeast.
So why is it that this fusion protein can fold nicely in a simple, unicellular eukaryote but not in bacteria? Might this fact be related to the enhanced complexity we see in the eukaryotic cell?
Large, complex proteins are actually modular structures built from more than one simpler protein domain (fold). A domain is an “independently folding” structure that has a rather specific conformation and is often associated with particular functions. As a consequence, multi-domain proteins entail a division of labor, where different domains handle different jobs, all integrated into the overall function of the protein. For example, a complex protein could be composed of three domains, where one domain binds DNA, another domain binds and hydrolyzes ATP, and yet a third domain binds another protein. It is this modular design that allows this protein to link interactions with DNA and other proteins to energetic flow.
To make the artificial multi-domain protein mentioned above, we simply take the single domain GFP and, as one example, link it together with the single domain enolase. In other words, you open your toolbox of a thousand or so protein folds, pull out the GFP-like fold and connect it to the TIM-barrel fold (a common enzyme fold).
Or, for those who are more visual, you simply take this:

And use a short chain of amino acids to connect it to this:

Bottom line? Yeast, a simple eukaryote, can fold it while bacteria cannot. Why is that? Before answering that question, let’s consider whether this is an exception or a rule.
Researchers from the Stockholm Bioinformatics Center recently conducted an extensive survey of protein databases (Ekman, D, Bjorklund, AK, Frey-Skoyy, J, and Elofsson, A. 2005. Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. JMB 348: 231-243). Their basic finding was that 65% of eukaryotic proteins have multiple domains compared to only 40% for prokaryotes. When the cut-off value for a domain is 100 amino acids, the following breakdown is obtained:
Eukaryotes: Single-domain (35%), Two-domain (20%), and Three or more domains (45%).
Prokaryotes: Single-domain (60%), Two-domain (20%), and Three of more domains (20%).
Previously, we noted that eukaryotic proteins are typically larger than prokaryotic proteins and now we can see why: eukaryotes are just better at making multi-domain proteins. Thus, it is not surprising that they could synthesize the artificial two-domain GFP protein when bacteria could not. But there is more.
There is a particular type of multi-domain protein known as the multi-domain repeat. In this case, a single type of domain is repeated multiple times within the same amino acid chain. For example, imagine a protein that had three or more GFP-like folds all linked together. Not only are eukaryotes better at making this type of protein, but it is also more commonly found in multicellular organisms, where approximately one out of every ten multicellular proteins is a multi-domain repeat. In fact, with this criterion alone, unicellular yeast are more like bacteria than humans.
But before chasing down that rabbit, let’s ponder why it is that eukaryotes seem to be better at making multi-domain proteins.....