I find the mechanics of life itself endlessly fascinating. Consider just DNA and its ability to turn a coded string of molecular modules into a complete and functioning physical structure.1
Back in the early 1950s, when Watson, Crick, and Franklin were defining the DNA molecule’s helical structure through x-ray crystallography, concepts around the idea of inheritance were simple and straightforward. From the work of Gregor Mendel a century earlier, we understood that everyone has at least two sets of inheritable factors—one from the mother, another from the father—and this allowed children to turn out similar to, but not always identical with, either of their parents. And molecular biologists could readily link these factors to the two sets of chromosomes that were microscopically visible inside the nucleus of every cell. At that time the “Central Dogma” of genetics decreed that information flowed only in one direction: that DNA in the chromosomes inside the nucleus is transcribed into messenger RNA, which goes out into the cell body, where it is translated by molecular machines called ribosomes that assemble amino acids into proteins. It all seemed quite obvious.
A decade later, by the mid-1960s—and through the efforts of various researchers in universities and laboratories around the world—we had “cracked the code.” That is, we knew which DNA or RNA bases, arranged in groups of three called codons, were used to specify which of the amino acids that would go into a protein and in what order. And when that amino acid string was allowed to fold naturally, according to the arrangement of covalent charges inherent to its molecular structure, the protein would form three-dimensional bits of organic material that could build up the body’s cells, mediate its various chemical reactions, and carry signals back and forth throughout the organism.
After another decade, by the mid- to late-’70s, and again with the work of an ever-growing army of researchers, we were sequencing—that is, telling off the base pairs in order—of whole genomes. To be sure, we started small, with the microbes and simplest organisms. It would be another decade or so, up to the early ’90s, before researchers were ready to tackle the human genome laid out in some 3.2 billion base pairs along those 23 chromosomes in each human cell’s nucleus.
The original Human Genome Project was established in research facilities around the world. They were able to pull genes out of the chromosomes because the genetic sequence always included a start code (ATG) and a stop code (which could be TAA, TGA, or TAG). Find a start code and a stop code, and anything in between them was a gene. It was like hooking fish in a lake: find one, reel it in, and go look for another. But the process was expensive and took a long time. Researchers figured they would need about fifteen years to piece together the entire human genome.2
Then along came Craig Venter, a biologically oriented entrepreneur who had a unique way of looking at things. He asked why you would bother to hook those fish one at a time. Instead, why not just drain the lake and pick up all the fish at once? He approached Applied Biosystems—a company that made tools for genetic analysis, including the first protein sequencers and later several types of gene sequencers and synthesizers—about starting an effort to drain the lake. The result was a sister company to Applied Biosystems called Celera, based on the Latin root for “fast.”
Essentially, Venter and his team chopped the entire human DNA into tiny, random pieces, each about 50 bases long. Then they sequenced all these fragments to determine each string’s base pairs—that is, the A’s, C’s, G’s, and T’s that make up the genetic code. Finally, they fed all those millions of tiny sequences into a super computer and let it mull over them. The computer was programmed to find all the duplicated and overlapping letter strings and put them together into longer and longer pieces. This approach worked so well that the research centers of the Human Genome Project had to adopt it or be left out of the running. Two years later—in the year 2000, when I joined Applied Biosystems—the first draft of the human genome had just been published from among five test subjects at a cost of about $4 million.3
The big surprise at the time was that not all the genome was made up of genes—if you defined a “gene” by the Central Dogma as a sequence that coded for a protein. Only about 10% of the 3.2 billion base pairs formed this kind of gene. The other 90% seemed to be nonsense or “junk.” The researchers at the time just assumed this junk was the crumbling sequences of genes from early in our evolutionary history—that is, genes left over from our microbe, fish, lizard, mouse, and primate ancestry, representing proteins that were no longer used in human biology and sequences that were slowly mutating into mush. Interestingly, if the original Human Genome Project had gone to completion with its fishing expedition using just start and stop codes, they might never have noticed this disproportion between genes and junk.
Another surprise was the ubiquitous use that the human genome makes of what’s called alternative splicing. Most genes consist of a promoter region upstream of the start code, and then after the start code comes the sequence that codes for the specified protein, followed by the stop code. But that coding sequence doesn’t always come in one piece. It often consists of an expressed part, called an exon, and a non-expressed part that intrudes between the coding parts, called an intron. At first, introns looked like just more junk interfering with the gene’s coding. But molecular biologists quickly figured out that many proteins are alike in having similar structural parts. By knitting together different patches of exons, presumably under the instruction of the introns, a single gene could be used to make many different but related proteins. Pretty damn clever. Efficient, too. Evolution—or, if you prefer, God—had invented the principle of modular construction long before human engineers ever thought of it.
But there still was the problem of all that “junk DNA” in the genome. I remember one day walking across the Applied Biosystems campus with one of our chemists, who said she flat-out didn’t believe in junk. The body spends too much energy copying those useless sequences every time a cell divides, she said. Those sequences had to mean something.4
Along about 2004 we began hearing about “microRNAs.” These were fragments of RNA only about 50 bases long that seemed to interfere with gene expression. A plant whose genetic code produced blue flowers might instead produce only white flowers if you added a certain microRNA sequence to its cell nucleus. So microRNAs had something to do with the regulation of genes. Researchers quickly determined that small fragments of RNA annealed to the promoter region upstream of a gene and to the introns inside the gene splices in order to tell the DNA when and how to express the messenger RNA strand that would go out into the cell body to make a protein. So the Central Dogma was stood on its head: sometimes DNA transcribes into little bits of RNA that go and tell other DNA sequences when to start making their proteins.
Not long after this, Eric Davidson at the California Institute of Technology demonstrated how the process of promoting genes functioned in differentiating cells and creating divergent tissues during the development of sea urchin embryos into complete organisms. And similar processes presumably function in all other animals and plants as well. Suddenly, it became clear that the 10% of the genome that codes for proteins is just the body’s parts list. The other 90% is the body’s interactive assembly manual.
A few years later we started to hear about the “epigenome.” DNA and RNA were not the whole story, it seems, because other chemicals—specifically, a methyl group, CH3—could become involved with the microRNA control of gene expression. Promoter regions that became clogged with methyl groups no longer accepted their microRNAs, and the genes would become inactive. This might have seemed like an accident, some kind of environmental contamination, except that the cell also produces an enzyme, methyltransferase, which copies the pattern of methylation from one DNA strand to the next as the cell divides. If blocking the expression of a gene is an accident, it’s one the body has an interest in preserving.
Having a particular DNA sequence in your genome is no guarantee that a particular gene will be activated or a particular protein will get produced. And this makes sense because the DNA in each cell nucleus is the same, but not every protein is needed by every cell and tissue type. Liver cells need to make proteins necessary to their function, but those proteins would be useless and perhaps even toxic if they began appearing in a brain or muscle cell. So acquiring methylation and producing only certain microRNAs and not others is the way cells differentiate and stay different during fetal development and on into childhood.5
And now—and as part of the reason for this meditation—we have just learned that inheriting a particular genetic sequence from your parents is not the only way you can acquire a mutation. A recent article in Science magazine, “Harmful Mutations Can Fly Under the Radar,” suggests that genetic mutations which occur while the embryo is still developing and its cells are differentiating may then appear in one or more parts of the body but not in every part. This is a process called “mosaicism,” because the distribution of mutated and non-mutated cells can resemble a patchwork, mosaic design throughout the body’s tissues. Why is this important? Because a simple cheek swab of cells that researchers then sequence to show your genome or to look for a particular set of mutations may not show everything going on inside your body. Mutations that could be causing a disease condition or susceptibility somewhere else in the body might not show up in your mouth. It also means that examining the genetic profiles of parents may not indicate the susceptibilities of their children, because no one is willing to take apart each egg and sperm to sequence it before those two germ cells combine to create a zygote, which then goes on to become an embryo, which finally grows up to become you.
Life and its coding are no longer simple and straightforward. The more we learn, the more wonderfully complex the process becomes. It’s way more complicated than just the sixty-four possibilities inherent in the four base pairs forming a three-codon reading frame that’s used to select the next amino acid in order to make the next protein. We are unique individuals, and within our bodies are cells that have become unique based on how they use their share of the genetic code. And now we learn that even the coding within those cells may sometimes be unique.
It’s a wonder that we humans are able to get born, grow up, walk around, draw breath, learn algebra, think great thoughts, and achieve great things for as long as sixty or seventy years at a time. Ain’t life grand!
1. For similar blogs along these lines, see The Chemistry of Control from May 11, 2014, and The Flowering of Life from August 25, 2013, among others.
2. The U.S. Congress originally funded the Human Genome Project in 1990 with an estimated total cost of about $3 billion and a projected finish date in 2005.
3. Today faster, smaller machines can sequence an individual’s entire genome in a couple of hours for about $1,000.
4. Why does copying DNA require any energy at all? Because DNA is a polymer made up of repeating ribose sugar rings which are cross-connected from one strand of the double helix to the other by their attached structures of adenosine (A) linked to thymine (T) and cytosine (C) linked to guanine (G). But the backbone of each strand, connecting the sugar rings up and down the strand, is a phosphate group. Phosphate—a phosphorus atom bound with four oxygen atoms—is the energy source inside the cell. Making adenosine triphosphate is the business of the organelles called mitochondria, which convert the energy in your food into assembling those three phosphate groups into one molecule headed by an adenosine group. Breaking down those adenosine triphosphate molecules into adenosine diphosphate is how the rest of the cell extracts that energy. Anything that uses up phosphate groups, like copying one strand of DNA to make its complement during cell division, creates a drain on the cellular economy.
5. And figuring out how to strip that methylation and reactivate those microRNAs is one way to turn a fully developed and differentiated somatic cell back into a stem cell which keeps its options open and can be used to repair and replace many different kinds of tissues.