When the first draft of the human genome was published in 2000, one of the big surprises for researchers was how few actual genes it contains—far fewer than previously thought possible. A gene as defined by the “central dogma” of genetics1 is the DNA that codes for proteins, the “building blocks of life.” The draft genome could only identify about 30,000 such genes—and there are many more proteins than that going into a human being, more like an estimated 100,000 proteins. In fact, it appears that only about 10% of the 3 billion bases in our genome are actual genes.2
One answer to the riddle of too few genes involves alternate splicing. Most genes are not just a string of DNA bases beginning with a start code. Researchers have long known that many genes consist of DNA patches called “exons” (for “expressed,” or used, elements) between patches called “introns” (for interrupting elements). By combining different exons, the same gene can make different but related proteins. Researchers could guess that the message content of an intron might control how nearby exons were used, and that patches of DNA upstream of the start codon might regulate how and when the gene was expressed. But all that still left huge amounts of DNA unexplained.
This excess DNA is often found in “gene deserts” spaced along the chromosomes. Human chromosomes 2 and 4 have some of the largest of these deserts. Because these sequences had no obvious purpose, the concept of “junk DNA” arose: that these were old bits of code, surviving from when our ancestors were amoebas or fish or reptiles, that just got carried along.
But this explanation troubled scientists. Nature could not be so inefficient. Every time a cell divides, all the DNA in the nucleus has to be replicated, or copied out, in its entirety. Since the backbone of the DNA strand is made up of sugar rings connected by phosphate bonds, it costs the body something to replicate DNA. Sugar rings contain the molecules we need to survive, and phosphate bonds provide the energy that drives the cell by converting adenosine triphosphate (three bonds in a row) to adenosine diphosphate (two bonds). You don’t go wasting this stuff commemorating a bunch of genes we no longer need.
Discoveries made in the decade since the genome was first drafted3 have indicated that almost all of this gene-less DNA is actually used in the organism’s development and ultimately for controlling gene expression. For example, a researcher at the California Institute of Technology, Eric Davidson, studied the development of sea urchin embryos on an almost minute-by-minute basis after fertilization. He found that once the blastula—the globe of cells resulting from divisions in the fertilized egg—was formed, cells in different parts of the globe began differentiating.4 Some became the calcium-secreting cells that would form the creature’s skeleton, others became the precursors to the gut, the outer skin, and other body parts. All of this was based on the cell’s original position in one or another quadrant of the blastula.
Further, Davidson studied the genetic activity inside those cells. He discovered that differentiation always started with one patch of DNA that transcribed a small RNA strand. This strand never left the nucleus but instead found and annealed to another part of the DNA. That triggered transcription of a secondary RNA strand, which then triggered a third RNA strand, and on and on. Depending on position in the blastula and timing, these RNA strands branched along different networks that defined the particular cell type and its activity. Davidson then studied embryo development in other animals that had not shared an ancestor with the sea urchin in about 300 million years and found the same early development networks. That means humans probably share the early stages of this development pattern, too, which then diverges depending on the DNA in our genome and the genes that are finally triggered to create cellular proteins.
The human genome is not just a parts list, but the assembly manual as well. Consider that the parts list for a Jeep doesn’t require much information, just a column of part numbers. But you need a whole lot more information to tell you when, how, and where to assemble the engine, how to install the pistons and connecting rods, when to put on the piston rings, when to pour in the oil, where to wire up the ignition, when to turn the key. The “junk DNA” describes all of that.
Since the human genome was first drafted, researchers have been hard at work in two main areas. First, they are linking discovered genes and their expression to different body parts and their diseases: such-and-such gene is found to be active in liver cells, for example, and a certain mutation of that gene is found whenever they are diseased. Second, researchers are studying the underlying assembly manual represented by the gene-less regions and the regulation areas associated with genes. This work is largely done with stem cells, the partially developed cells in each domain of the body—bone, skin, muscle, connective tissue—that can develop into a particular cell type as needed.
The promise for future medicine is that by knowing the development networks in detail, we will be able to program the body’s own stem cells to repair and replace damaged or aged organs and tissues. Some of this work is already under way.5 With time and support, the many researchers now at work in the field will provide a detailed understanding of genes and the coding for their expression. And from that knowledge, we will literally be able to create new and improved organs and tissues. In the next 20 to 30 years medicine will make huge strides.
The end of injury and illness—even the end of debility and old age, if not death itself—is clearly on the horizon. This is truly a wonderful time to be alive.
1. The central dogma states that DNA is transcribed into messenger RNA, which leaves the nucleus and is translated by a cellular structure called the ribosome into strings of amino acids, which then fold into proteins. Each group of three DNA nucleotide bases—the adenosine (A), cytosine (C), guanine (G), and thymine (T) “letters” of the code—calls for one of twenty amino acids. See The Genetic Code to unlock the translation process.
2. Early work on writing the code depended on fishing out genes individually, using the translation’s universal “start” codon, ATG, as a clue to finding the next gene. By their own estimates, researchers in the Human Genome Project thought analyzing the entire genome this way would take about 15 years. Then J. Craig Venter and the team he assembled at Celera Corporation took a different approach: chop all the DNA into fragments about 500 bases long, sequence these fragments to determine their letter codes, then throw them all into a super computer to puzzle out the overlaps and write out the entire sequence. This was called the “shotgun” approach, or as one researcher put it: “Instead of fishing out genes, drain the lake and pick them up.” Celera made such progress with this technique that the publicly funded Human Genome Project quickly adopted it, and the first draft appeared about two years later. If researchers had relied solely on fishing for individual genes, the complete draft of the genome, with all of its “empty” coding areas, might never have been written.
3. Note that I don’t say “decoded.” We only have a draft of the DNA in terms that we can easily read: the sequence of A, C, G, and T nucleotides that make up the code itself. Decoding their meaning in terms of body structure and function is the process researchers are now undertaking.
4. For illustrations of this process, see Davidson College’s “Introduction to Sea Urchin Development” and Stanford University’s “Sea Urchin Embryology.”
5. See the case of a woman who received a new, bioengineered bronchial tube in 2008. Note that in the story she did not receive stem cells from a sacrificed human embryo but stem cells from her own body. Foreign stem cells would raise the problem of antibody incompatibility and rejection, requiring a lifetime under immune-suppressive drugs. Embryonic stem cells have been very useful in studying the development process, but most if not all of these organ replacements will be done with the patient’s own cells.