One of the great dividing lines in early evolution on this planet—the first great revolution, if you will—was the move from single-celled organisms like bacteria to multi-celled organization like plants, animals, and the rest of us. That is, in the terms that biologists use, to go from prokaryotes to eukaryotes.1
The main difference in cellular structure between prokaryotes and eukaryotes is the presence of a nucleus. Bacteria have none. Their DNA is loosely scattered throughout the cellular protoplasm. This means their DNA gets randomly and continuously transcribed and translated as part of their basic lifestyle. All of their DNA, all the time, is templating the proteins needed to make up and maintain the bacterium. When times are good and the bacterium can absorb nutrients, it feeds and grows. And when the amount of proteinaceous material made by the translated DNA becomes too great for the cellular membrane to contain, the bacterium copies those loose strands of DNA, divides the membrane down the middle, and becomes two new cells.2
What the multi-celled eukaryotes had to learn through the random mutations of evolution—in order to solve a problem that the prokaryotes never encountered—was how to regulate all that transcribing and translating of internal DNA as their various cells acquired different functions and became different types of cells. The first question was how to keep order among the cells in a multi-celled organism, so that each one doesn’t keep swelling and dividing, swelling and dividing, until whole organism becomes too big and ungainly to feed itself, manage its environment, and maintain its basic functions—and then swells up, chokes, and dies. The second question was how to use the encyclopedic DNA that comes down from the organism’s parents, as contributed by both egg and sperm and combined in the zygote, to create different types of cells in different types of tissues. Loose strands of DNA getting endlessly translated into all the potentially available proteins just doesn’t work in a multi-cellular context.3
But, almost from the beginning, it appears that geneticists never stopped to think about these problems. The discipline early adopted the “Central Dogma” of genetics: that DNA transcribes to RNA, which codes for proteins. The code written in deoxyribose nucleic acid (DNA) inside the cell nucleus gets copied—almost at random, it would seem—into a similar type of coding in ribose nucleic acid (RNA), which goes out into the cell body as messenger RNA (mRNA) to be captured and read by the ribosome—a structure made out of proteins and strands of RNA itself—which thumbs through the messenger RNA code and assembles proteins out of amino acids, which are bits of carbon molecules free-floating in the cell. How the transcribing of all this nuclear DNA was controlled, and how segments of it were chosen to produce only those proteins that the cell type needed—those were questions that the Central Dogma either failed to consider or put aside for later.
Now it’s later, and we have the answer. It’s an answer we probably could not have discovered without sequencing the entire human genome, which was only completed in the year 2000. And it would have taken longer if the Human Genome Project had stuck with its initial approach of fishing out genes by searching through the genome for the start codon that is common to every protein-coding gene, represented by the bases for adenosine, thymine, and guanine, in that order, ATG, which also codes for the amino acid methionine. The approach was simple: look for an ATG codon anywhere in the genome and sequence from there until you reach one of the messenger RNA’s “stop codons,” which are TAA, TGA, or TAG.
That method would have been fruitful, except it would have run out of gene candidates way early. A few years after the Human Genome Project started, however, genetic scientist Craig Venter proposed another approach, which he called “shot sequencing.” His idea was to chop the entire human genome up into random strings of a couple hundred bases each, sequence them all, then toss the newly discovered A’s, C’s, G’s, and T’s into a supercomputer and let it sort them out. The computer did this by finding identical sequences, considering them to be possible overlaps, linking the overlapping strings together, and putting the whole thing into one long, coherent order. Venter’s rationale was, why fish for the genes individually when you could just drain the lake and pick up all the fish? The head of my former company, Applied Biosystems, maker of the dominant form of gene-sequencing equipment, backed Venter in this approach, creating a company called Celera—from the Latin for “fast”—to make its own run at the human genome. They speeded up the process so much that the other labs in the Human Genome Project soon adopted shotgun sequencing and finished the program well ahead of time—about seven years before their originally projected end date.
What they discovered then was that, of the three billion base pairs in the genome—all those sequenced A’s, C’s, G’s, and T’s—only about ten percent coded for proteins according to the Central Dogma. And the use that these protein-coding genes made of each sequence was highly complex, involving patches of expressed sequences (called “exons”) and patches of intervening sequences (called “introns”) that allowed each gene to be spliced together and interpreted in alternate ways to make different but related proteins. That was a really clever trick of evolution. But the rest of the genome, the other ninety percent, seemed kind of dumb: just nonsense coding. Scientists started calling it “junk DNA,” and imagined it represented genes that we humans no longer used—from our early heritage first as fish, then amphibians, then reptiles, and finally mammals. And now those discarded gene sequences were slowly mutating themselves into nonsense.
One day in 2002 or 2003, however, I was crossing the Applied Biosystems campus with one of our scientists and she told me, “I don’t believe in junk DNA.” Her reasoning was that copying all that excess DNA each time a cell divides consumes a lot of energy. The backbone of the DNA molecule is, after all, a series of phosphate bonds. These bonds are also part of the cell’s energy molecule, adenosine triphosphate. Since phosphorus is a relatively rare element in the human body, it makes no sense that we would store a lot of it in DNA sequences that have no meaning.
Along about this time, also, genetic scientists began isolating and studying short strands of RNA, only about twenty to fifty bases long, called “microRNAs.” These were found in the cell nucleus, never seemed to go out into the cell body for protein coding or any other work, and appeared to be associated with “gene silencing” in a process called “RNA interference.” At first, these microRNAs looked like some kind of accident that could occasionally turn off a gene. One of the earliest incidents studied was microRNA interference with a petunia’s ability to produce purple pigment, so that the flowers turned out white.
And finally, about 2004, I heard a presentation at Applied Biosystems by Eric Davidson from the California Institute of Technology.4 He and his colleagues had been studying sea urchin embryos—because, as he explained, they could start 10,000 embryos in the morning and sacrifice them in the afternoon, and nobody was likely to complain. You can’t do that with puppies or human babies.
What Davidson and his group were learning was that shortly after the fertilized egg begins dividing and the cells form a spherical structure called a blastula, the cells start differentiating based on their position within that sphere. Some become a bony spine that goes on to form the urchin’s skeleton. Others become skin or gut or nerve tissue. By sacrificing those 10,000 embryos in tightly spaced time frames—on the order of ten to fifteen minutes—and examining their nuclear DNA, Davidson and his colleagues discovered the mechanism whereby one patch of DNA begins to transcribe a bit of microRNA that stays inside the nucleus, finds a complementary promoter sequence somewhere else in the urchin’s genome, triggers the transcription of another bit of microRNA, which settles in another place, and this continues on and on, starting a cascade of differentiation effects. From the first transcription, the course of this cascade will alter depending on what quadrant of the blastula the cell originally occupied and the elapsed time since the inception of egg and sperm.
By studying a similar process in other animals that haven’t shared a common ancestor with the sea urchin in millions of years, the Davidson group determined that this DNA to microRNA to DNA to microRNA action is highly conserved, which suggests that something very similar happens in other animals and in humans as well. In short, the Caltech lab uncovered a gene regulatory network that controls the development of each cell and its differentiation from other cells in the organism. It is why some cells become liver cells and produce only the proteins needed for liver cell functions, while others become skin, brain, or bone cells and produce only the proteins needed for their own functions. And, in the case of stem cells, the process goes only so far and lets the partially differentiated cell remain in an immature state until needed to repair damaged tissues—at which time the cell completes its differentiation.5
While the ten percent of the genome that codes for proteins are the body’s parts list, that other ninety percent—once thought to be “junk”—is actually the instruction manual for the body’s self-assembly. And that’s an even more clever trick of evolution.
This great revolution came in several parts, all of which more or less depended on each other. First, the DNA in the cell body had to be gathered inside a new structure called the nucleus. That way, the creation and deployment of these regulatory microRNAs could be contained and focused. Second, a whole new regimen of non-protein-coding sequences had to develop to govern gene promotion and regulation. These non-coding sequences had to become important to, and transmitted along with, the DNA load in each cell. And third, a new type of cell, the gamete, which carries only half the organism’s full DNA complement—based on the normal cell having two copies (called “alleles”) of each chromosome—must be developed, must multiply by a new cell-division process called meiosis, in order to halve the chromosome content, and then must be released through egg and sperm to form the next generation of offspring.6
That’s a lot of systems and structures to ask a one-celled creature to develop in short order. I would almost think the origination of all this complexity would need a divine hand or intelligent guidance—except that I can appreciate how evolution works. Mutations are always happening. Some are bad and kill their subject either immediately or soon after inception or maturation. And those mutations that aren’t lethal might actually be beneficial in forming new proteins that can take advantage of a changing environment or other molecular opportunities. But, more likely, any one mutation just hangs around in the organism’s genome until it’s either eliminated by accident or becomes combined with some other mutation, either in the same protein string or in another protein. Eventually, the modified protein may find a purpose. And if that happens often enough, the whole organism changes slowly over generations.
They say “rust never sleeps,” and neither does evolution. Its actions and effects are constantly shuffling genes and proteins in millions of individuals individual, in every species, and from one generation to the next. And remember that one-celled organisms go through hundreds of generations in the time that human babies are just forming in the womb, getting born, and starting to grow. The process pushes forward in all tissues, in all sorts of configurations, during every moment of time over a scale that goes back three and a half billion years, almost to the beginnings of this planet’s formation in a bombardment of hot rocks.
Seen that way, without regard for individual fortunes or for the thousands, nay, millions of embryos that Mother Nature herself hatches and sacrifices in every generation, you can imagine some pretty remarkable things occurring. Like a bacterium with its loosely organized DNA strands growing up to become all sorts of highly organized and differentiated creatures—including us humans with the organized brains to discover and start to think about this process.
1. Of course, there are single-celled eukaryotes, too. They are generally in the kingdom Protista and include many forms of algae and single-celled protozoa like the paramecium.
2. And when times aren’t good, when the environment becomes too cold or dry, or the food source goes away, then the bacterium stops processing its DNA, loses its internal water content, shrinks down to a hard little spore, and prepares for a long winter.
3. What follows is a story I’ve told before—see, for example, Gene Deserts and the Future of Medicine from December 5, 2010; The Flowering of Life from August 25, 2013; The Chemistry of Control from May 11, 2014; and Continuing Mysteries of the Genome from October 12, 2014. But the story bears repeating in this context.
4. Eric Davidson died this past September from a heart attack. We lost a great mind there.
5. And you’d better believe that genetics labs all over the world are searching for and studying those regulatory networks, trying to figure out how they can induce stem cells to grow outside the body and become complete new organs that we can use for implantation.
6. Of course, some eukaryotes like the paramecium reproduce asexually, without sharing genetic material with another organism. Most cases of this kind are found in one-celled protists, although in higher animals the process is called “parthenogenesis,” or virgin birth.