Saturday, September 26, 2015

Who's Driving This Wreck?

How do you find the mutations in a messy cancer sample that actually drove the cancer to exist?

Cancer is a little like a car wreck. A mechanical defect or two may cause an accident, which then causes a lot of other damage to the vehicle and to others. How does the investigator figure out what was the first thing to go wrong? In cancer cells, an accumulation of mutations is part of the mechanism by which a cell escapes normal growth controls and becomes cancerous. Unleashing a slew of mutations makes it much more likely that the cell will find (and naturally select) the two or five more mutations that allow it to transition from pre-cancerous to malignant.

But along with those causal "driver" mutations, the unleashing process usually causes hundreds or thousands of innocent "passenger" mutations, even deleterious ones that kill off some of the cancer cell descendents. Taken as a whole, the mutations are all grist for the selective mill. But for the growing practice of precision medicine, these extra mutations muddy the waters persented by the DNA sequence of a tumor sample. Modern cancer drugs are only helpful when directed against the mutant proteins that caused the cancer, and continue driving its growth.

Sure, there are a few usual suspects to round up; p53, BRCA1, 2, and others. But that is only guessing. A couple of recent papers tried to look more systematically through large sets of tumor sequences to find driver mutations, one using a popularity measure, and the other using a pathway effects measure. Unfortunately, these methods are not applied or applicable to single tumor samples, which is to say the clinical setting, but rather are academically oriented to the hunt for more genes and gene mutations to put into the hopper of possible cancer mutations that can then later be applied to clinical cases.

Simple statistics can tell you to a first approximation which mutations are more common in tumor samples than in control samples. For common mutations, like those in p53 gene, this is fine. But this method has a hard time finding uncommon cancer-causing mutations, which, though individually uncommon, are in sum a large and important class. This quest is of interest both for clinical use in compiling a complete catalog of possible driver  for prognosis and treatment, and also academically as a hunt to find new genes that have roles in causing cancer.

A recent paper takes a step towards super-charging this search by combining DNA mutation data with RNA expression data. The idea is to ask the tumor cells which pathways are particularly active or deranged from a gene expression standpoint, (and associated with cell growth and tumorigenesis), which then helps tremendously in focusing on genes that participate in those pathways as candidate tumor drivers. These would necessarily be a small fraction of the 22,000- odd genes in the whole genome.

Here is where biology starts to look a little like electrical engineering. The first step of the study is to create pathways out of the gene expression data that was drawn from their tumor samples and from other cells. Pathways are circuit diagrams of what gene regulates what other gene, in cascades of control that function everywhere in biology, especially in development, homeostasis, and environmental response, and which go haywire in cancer.

Conceptual molecular pathways that might be relevant to cancer. Misregulation of/by any gene can be detected by reading out the altered expression of targets at the bottom.

Specific example pathway, cartooning interrelations among some of the greatest hits (common driver genes mutated) in cancer biology.

An example pathway is labelled as cellular component organization, shown above. Genes like RB1, TP53, BRCA2, MYC are all well-known regulators involved in cancer. The point here is not that common cancer genes show up in such networks, but that elucidating a regulatory network should bring up all the actors in a process, including other lesser-known genes that might also play a role. Mutations in those genes are the target of this work that seeks to create a more complete catalog of known relevant genes and mutations in them that contribute to cancer. But ultimately, everything is connected with everything else, so a lot depends on how one calculates these networks. The authors seem to be relatively conservative in their scope, and cross-check their networks with those from a commercial source, Ingenuity, with which they largely agree. They also validated their final results, in terms of cancer genes and their driver mutations, using the same commercial source, rather than going into the lab to test another large batch of tumor samples, for instance, or generating transgenic mice or cell lines to evaluate the effect of each mutation.

Incidentally, even if the full set of cancer genes is known, identifying a relevant mutation, for cases like that of AURKA whose overexpression contributes to cancer, can be extremely difficult, since overexpression can be due to point mutations many thousands of bases away from the gene, in regulatory regions which are not well mapped or understood anyway. The researchers are interested, however, in simple correlation, taking many tumor samples and asking which mutations are correlated with the changes in pathway perturbation that are seen in the gene expression data. That simplifes the search somewhat.

Getting the data required for this combined analysis is not easy, yet technical advances make it possible. And the result is evidently quite powerful. The researchers claim many orders of magnitude improvement in (apparent) driver mutation detection, compared with prior algorithms, and compared with any algorithm run without pre-grouping the candidate genes by this empirical pathway-based method. Unfortunately, neither the text nor figures are very clear on this point, so I have to leave the data discussion there.

It is critically important to generate increasingly comprehensive models of cancer as part of mastering molecular biology in general. Each of our three billion DNA nucleotides is doing something, some much more than others. We have only cartoon pictures so far of a smattering of our molecular circuitry. Thankfully, nature is not coming up the new models every year, but understanding the current model of human, and the molecular accidents that befall it, is an enormous task that will keep us occupied for decades.


  • Inequality, economic sclerosis, and rent.
  • Rent in wage negotiations, by way of artificial austerity.
  • There is no sign that the Fed should be raising rates.
  • And why do bankers want higher rates anyway?
  • Ben Carson ... expertise in one area does not confer authority in all. Each case has to be made on its own terms.
  • The animated empire of Walt Disney.
  • Does the medical market work? Not for consumers.
  • Dune ... on the Afghan-Pakistani border.
  • Cringely on the cyber-arms race. All is lost.