Saturday, February 7, 2009

How to read DNA

A review of DNA sequencing technologies, from the paleolithic to the bleeding edge.

While one of the greatest discoveries of the last century, indeed of all time, was the role and structure of DNA, it did not amount to much in practical terms until methods were devised to read its code- its sequence. There has been a fascinating evolution in technologies to read DNA, and I have experienced a good share of it. Most methods are dependent on harnessing nature's own enzymes that replicate DNA in increasingly clever ways. The resulting flood of information will serve the age-old project of "know thyself".

DNA exists in almost endless lengths (bacterial genomes are typically circular, and the average human chromosome is 1.3E8 base pairs in length). So the first step in sequencing, in typical reductive fashion, is to break this linear structure into small pieces, place them into bacterial mini-genomic circles with independent replicative ability (plasmids or their relatives), and replicate/amplify them to large amounts that can be handled, sampled, sequenced, filed, bar-coded and stored.

The paleolithic method of sequencing (I've used it a few times) is based on chemistry instead of on enzymes, and is called the Maxam-Gilbert method, after its developers. First, one cuts a large batch of DNA at a specific sequence site with what is called a "restriction" enzyme- a pair of molecular scissors. Then its ends are labeled with radioactive phosphorous (P32), and one of the two ends removed with yet another restriction cut, and the remaining DNA split up into several pools, treating each pool lightly with quite hazardous chemicals that modify the DNA at certain bases (hydrazine at T and C, dimethyl sulfate at G, and formic acid at G and A). The individual units of DNA are called nucleotides, and their key parts are called bases- the A, G, C, and T of the genetic code, after their basic pH.

These chemical reactions are only roughly base-specific, and hit other bases as well, so the whole thing is woefully inefficient. The DNA is then further chemically processed to break the backbones at the modified bases, and the mixtures are eletrophoretically separated on a gel that allows fragments differing in length by a single nucleotide base to be distinguished. The radioactive label on one end ensures that only those fragments spanning from the radioactive label to the randomly cut point appear on the X-ray film that is exposed to the gel.

All the other methods to sequence DNA use the magic of DNA replication enzymes (polymerases) to read sequence, using methods devised by Fred Sanger (who is one of only three people to have received two nobel prizes in science). They do this by getting the enzyme to incorporate occasional bases with some special property- the nucleotides either stop chain elongation at random positions, allowing fragments like the ones described above to be produced directly by the polymerase, or they have other complex modifications to be described below. The enzyme does the work of reading along the DNA, and the experimenter coaxes it to tell which nucleotide base it is seeing as it goes along.

The original Sanger method used radioactive tracers such as P32 or S35 to detect the resulting DNA fragments, but advances in fluorescence technology have revolutionized this aspect of biology, as so many others (one of the latest nobel prizes went to fluorescence labeling technologies for proteins)

How do these enzymes know where to start? The DNA is continuous, but just like in a book or a chapter or a page, you have to start somewhere. And since the text in this case is A, T, G, and C with no further punctuation, the problem of knowing where you are is quite a bit more difficult than in a book. Usually a "primer" is used to start off the DNA polymerase- a short DNA fragment that can be made by pure chemistry, perhaps 20 nucleotides long, which hybridizes to its complementary sequence in the target DNA (after it has been heated up to melting temperature). If the cloning was done in clever fashion (abutting the DNA fragment to be sequenced right up to a known part of the cloning plasmid), then the same primer can be used for an entire sequenceing project.

The original human genome project used a variation of this method, where primed DNA polymerases on templates are fed a low ratio of nucleotides that have both chain terminating capacity, (di-deoxy, as opposed to DNA's single deoxy), and also have fluorescent labels (different for each of the four bases). Then the full four-label reactions with their resulting synthesized fragments are run through an extremely tiny (capillary) electrophoretic gel, at the end of which a fluorescence detector reads off the labels from the size-sorted fragments as they travel past. This is done with expensive machines, using miniaturized reactions that attain large scales of operation, taking all this work out of the hands of regular bench scientists.

A more recent technology is the 454/Illumina system (named for the companies they are offered by), which has finally dispensed altogether with the electrophoretic separation step, which has been such a painful bottleneck.

These systems lay single molecules of template on tiny islands on a glass slide (or a bead), and do an in-place PCR amplification step to park at lot of copies at that location. Then the sequencing step is performed, with A, G, C and T successively washed over all the template islands, and a luminous flash registered wherever a single step of incorporation takes place, before the next washes and next step of polymerization is performed, etc.

The virtue of this system is its extreme miniaturization and large parallelism- many different molecules can be laid down, amplified, and sequenced in one experiment. However, the read length is paltry- only about 35 (Illumina) or 300 (454) nucleotides, compared to the 800 nucleotides regularly attainable with the gel-sorting methods above.

Read length is critically important, since the next step for all these technologies is the reverse of reductionism: the re-assembly of the sequence from all the individual sequence reads, like doing a jigsaw puzzle. The reads (for a whole genome, say) are all poured into a computer program which lines up sequences that overlap, building back up to the sequence of the entire source DNA as best it can. As with jigsaw puzzles, the bigger the pieces you start with, the easier the puzzle is to solve, to an almost exponential degree.

Last, and most amazing, a recent report in Science introduces what is sure to be the next iteration- monitoring the production of a single strand of DNA on a single polymerase from a single template strand with an extremely miniaturized apparatus. Originating in the labs of Watt Webb (of which I am an exceedingly minor alumnus), and Harold Craighead at Cornell, this technique uses an odd optical property to peek into extremely tiny volumes of solution (one zeptoliter ~1E-21 liter).

It turns out that if you shine light through holes made in a conductor whose diameter are less than half the light's wavelength, the light does not get very far. If a solution is put into those holes, you can look at the fluorescent properties of the super-tiny volume right at the floor of the hole (containing in this case a DNA polymerase with template) without being distracted by the rest of the solution which may contain a high concentration of other fluorescent compounds (nucleotides). The fluorescence system looking into the bottom of the hole essentially just "sees" the occasional one or two fluorescent molecules bouncing along the bottom, or binding to the polymerase located there.

The sequencing method is then to add a solution of four different fluorescent nucleotides which contain color labels at their outer-most phosphates, which get clipped off as they are added to the growing chain. The polymerase attached to the bottom of the view-hole can use and incorporate these nucleotides with no problem, and fluorescence from the incoming nucleotide appears transiently, as it is positioned in the enzyme's active site, but before the reaction takes place that clips off the label and incorporates the rest of the nucleotide into the growing chain.

Thus the detector sees a parade of distinct fluorescence signals, one by one, as the lone polymerase does its work synthesizing a new DNA strand along the template. The tricky part is that this process happens stochastically. One incorporation event may go fast, the next slow, as diffusion of the nucleotides and even quantum effects come into play. Several incorporations of the same nucleotide may occur in succession on the template, requiring the observers to make sure they are tracking the pauses in fluorescence that occur between each step of the elongation reaction. Much of this uncertainty can be resolved technically, and also by doing a few replicates.

One advantage of this method is that read lengths are substantially increased. The researchers (who have now duly set up shop in Silicon Valley) show an experiment using a circular template with alternating G (red, below) and C (blue) halves to run off a potentially infinitely long sequencing read. They report a rate of ~3 bases incorporated per second under their conditions, with clear alternation of C and G signals, up to 4,000 nucleotides in an hour's time. This is very promising for problems in genomic sequencing like the occurrence of repetitive regions that are very difficult to piece together from short sequencing reads, and one may hope that these lengths can be extended and the polymerization times speeded up as the technique is further optimized.

All these advances mean that it will not be long before individuals can get their entire genomes sequenced at a reasonable price. The information will allow divination of the future, in the form of improved personal medical prognoses as we slowly learn more about how the genome works. And also divination of the past, since complete genomes will allow geneological analysis of unprecedented detail and depth. Our long evolutionary inheritances reside in these ~3 billion base pairs, and bringing them into the light will generate great benefits, individually and collectively.

Incidental links:
Steven Pinker on his own genome.
Very basic TED talk on genomes by Barry Schuler.
Dire warnings about privacy issues.


  1. Since this elegant complexity -- simple and inherently beautiful -- is obviously "accidental" and inherently "meaningless" beyond itself, I guess I have, according to you, no one to thank but (fill in Name)...

    Fr. Lazarus

  2. Hi, Lazarus-

    Now you are getting the hang of it! No one to thank at all, of course. You can thank your parents, if you like. Do you thank anyone for making rivers? No, rivers just happen, out of the matrix of nature. And galaxies- do you thank anyone for making them? No- galaxies just happen too. And if there were someone responsible, what makes you think that she would care in the least whether you thank her or not? One gets the sense that the universe revolves around you and cares about you. It should be pretty clear from our glimpses of the universe that it doesn't.

    Anyhow, the ability to sequence DNA faster and cheaper isn't in itself such a testament to nature, but to man's ingenuity. No greater than the ingenuity it took to come up with Nestorianism, trinitarianism, monophysitism, and the rest, of course. So here's a cheer for us as rather clever apes!

  3. So, what you are saying is that you live and move and have your being in relationship with "the universe" which, I would assume means me and all the other "clever chimps" out here in your audience and gratitude (otherwise known as saying thank you), a simple and pretty basic core value, has no place in your "cosmology". If gratitude does have a place, then when (at what cosmological point or level) does it "kick in?" Your cosmology, which leads to your ethics, does not include a "doctrine of gratitude?" Upon what do you base any gratitude or kindness whatsoever? The biggest and best "clever ape" in New York wins?

    Accidentally interested,
    Fr. Lazarus

  4. Hi, Lazarus-

    Aren't you making rather a leap here, not to say casting aspersions? What possible relationship is there between one's relations to galaxies, rocks, the universe at large, and one's "whatsoever" capability of kindness or gratitude? It is people whom we should be kind to and grateful for. Not torturing them, not hounding them for heresy, not dehumanizing them for not believing in peculiar supernatural cults- that is the proper locus of gratitude.

    I should ask what the point is of gratitude to the universe. There is no question that we feel the beauty of the cosmos and of nature, and that is one of our very best impulses. But that does not mean that we are getting replies in return, or that there is a recipient at all on the other end. The evolutionary process of which we are the products has conferred on us a keen appreciation, even awe, of our natural surroundings. There is no problem with that. The problem is in developing pained rationalizations in response that end up having us sacrifice chickens, pray, or whatever else it is you might do to raise a signal ... which never comes.

  5. Several things:
    1.I am not grateful to the river. I am grateful to the One who created the river for the water for the environment of life the river helps provide. I am grateful for the river. Are you grateful for existing? Does the mystery of science which you must bump up against everyday in your career bring you to a halt and stir you deep within? It does me!! I am VERY grateful for the majesty and magnificence of the universe from the subatomic to the galactic. The answer you provide that all this beauty and order and elegance is just an accident is just not an adequate explanation given the raw data. I remember being given a science project in high school and told to do draw conclusions based on the data I collected. I made a “C-”, because the teacher pointed out that I had clearly come to conclusions that were not supported by the “whole body of available evidence.” In other words, I did not consider enough evidence. I did not think deeply enough and consider connections or principles that were not obvious on the surface. I humbly submit that you are coming to conclusions regarding the meaning and inherent order of the universe that do not take into consideration “the whole body of evidence”.
    2.I wonder, what is your “gospel”? What is your “sacred text”? What are your “Ten Commandments”? What identifiable principles do you use to make everyday decisions? Upon what basis do you judge the value of a person or an enterprise? I am willing to identify those texts and principles, are you? I love “Pirates of the Caribbean.” There is a point in the movie where one of the officers that are pursuing Captain Jack Sparrow asks another officer, “Do you think he plans it all out, or just makes it up as he goes along?” Be clear and answer the question. What are the core principles that enable you to navigate human life? Why do you attach more significance to one encounter with another human than another? Why do one thing instead of another? Why??
    3.I believe that all life is precious. I value all living things and all non-living things. I believe that you are infinitely valuable and a wonderful mystery to be celebrated. I believe that you are unique. I believe you are precious. I believe you are in this world for a purpose. I do not believe you are a “clever ape” any more than I am, but rather, a wonderful expression of the One who created you and breathed life into your being. I am not afraid of the awe-filled responsibility and honor of being, as a part of the human community, the stewards of the earth with all the challenges and promise that office holds. I believe we are “in over our heads” with regard to that stewardship and will mess it up every time unless we cooperate with one another and the Creator. I, therefore, do not believe the universe is an accident. I believe we desperately struggle to reach for that “purpose” and we (all of us) will not rest until we find that which outside of ourselves, speak to us of an ultimate purpose and design.
    4.I do not presume to say that the “faith communities” of various brands have not committed atrocities “in the name of ______.” They have, and mine, the Christian Church is, unfortunately, notorious. But, just because adherents have “fallen short” of the tenets of their faith does not mean their faith is meaningless or false. You are the one who is “making rather a leap.” It would be like saying that just because some group of scientists used genetic technology to enable a country to construct a “clone army” to wipe out another country; we could conclude that the WHOLE scientific enterprise is wrong. That would, I think you would agree, a less than satisfactory conclusion.

    Genuinely and purposefully interested,
    Fr. Lazarus

  6. Hi, Lazarus-

    In general, you are assuming that you and the tradition you speak for knows what is going on with the cosmos- that there is an authorial being and so forth. But what the last several centuries of scientific investigation have made clear is that prior religious traditions had no idea whatsoever what was going on in the cosmos- not even that the sun was bigger than the earth or that we revolved around it rather than it revolving about us... not to mention knowing anything significant about the nature of life itself, which this week's anniversary of Darwin celebrates. You may retreat to vague and traditional notions of over-arching authorship, but there you are speaking of feelings, not facts or evidence. What you imagine is a matter of faith alone, not evidence.

    Now as to the "whole body" of evidence, I would ask you to be more specific. I think any honest theist acknowledges that evidence will not get one to the belief you have- a leap of faith is required. The scientist is humble enough to stick to the evidence and not go further.

    I appreciate your positive beliefs, but have to ask what would happen to them were you to hear that in point of fact, Mary was not a virgin, Jesus performed none of the miracles that legends attribute to him, that he was mortal, lives in no supernatural realm, and will never return. Here I am only stating the view of critical and reasoned historical analysis.

    I have no such brittle basis for my ethics and morality, but recognize that they are a matter of tradition, cultivation and reason- of human culture, in short, which we continually reshape and adjust to our current understandings of what is good and best. I could go on, but I appreciate your interest, and hope that this clarifies some of your questions. You may enjoy the book "Moral Clarity", by Susan Neiman.

  7. Burk,
    Thought of you when I read this…
    God Bless,
    Fr. Lazarus

    Mark 8.11-13

    The Pharisees came forward and began to argue with him, seeking from him a sign from heaven to test him. He sighed from the depth of his spirit and said, "Why does this generation seek a sign? Amen, I say to you, no sign will be given to this generation." Then he left them, got into the boat again, and went off to the other shore.
    "Why does this generation seek a sign?"

    Father Most Holy, God Almighty..., when I raise the faint light of my eyes towards the sky, how can I doubt it to be your heaven? When I contemplate the movement of the stars and their yearly cycle; when I see the Pleiades, Little Bear and Morning Star and consider how each of them shines in the watch assigned to it, then I understand, O God, that you are there in those stars beyond my understanding. When I see “the breakers of the sea” (Ps 93[92].4) I cannot grasp the origin of their waters or even what sets their ebb and flow in motion. And yet– impenetrable though it be for me – I believe there to be a cause to these facts of which I have no knowledge and there, too, I perceive your presence.

    If I turn my mind towards the earth which, by means of the energy of hidden forces, decomposes all the seeds it has received in its womb, slowly causes them to germinate and multiply, then enables them to grow, I see nothing in all this that I could understand with my intellect. But even this ignorance helps me to discern you since, if I have no knowledge of the nature placed at my service, yet I understand you by the mere fact that it is there for my use.

    And if I turn towards my own self, this experience tells me that I do not understand myself and I wonder at you all the more in that I am a stranger to myself. Indeed, even if I am unable to comprehend them, I have an experience of the movements of my mind as it judges, of its operations, of its life. And it is to you that I owe this experience, you who have given me a share in this sensible nature, which is my joy even if its origin is beyond the grasp of my intelligence. I do not understand my own self but it is in myself that I find you and, in finding you, adore you.

    Saint Hilary (c.315-367), Bishop of Poitiers, Doctor of the Church, The Trinity, Bk.12, 52-53

  8. Thank you for such a moving pean to ignorance and faith. No wonder the dark ages were dark.

    Funny thing about signs, though. What he seems to be saying is that I am correct above.. that all of Jesus's signs (miracles) are legendary accretions for the benefit of the credulous. Jesus himself never engaged in them and denounced them in this passage. The fact is that spirituality is a matter of feeling, not of bad science. I honor the feeling, not the rationalizations attached to it.

    Best wishes!

  9. Burk, This entry was informative and nicely written. I would like to see more entries on genetics and evolution. 1) Example: How does the epigenetic layer of a gene affect genetics and gene expression. Since every cell has the same genes, something else must cause them to be so different (brain cell vs. muscle cell, etc.). What role does the gene’s epigenetic layer play in this? 2) The toolkit of developmental genes, including RNA. How many are there? How similar are they between animals. How does an essentially similar toolkit build so many different species? How do the developmental genes factor into evolution. When mutation takes place does it not have to affect the germ cells so the change can be passed on? How likely is it that these mutations are favorable? 3) How did the current animal cell form from more primitive lifeforms?