Saturday, May 14, 2016

Dissection of an Enhancer

Enhancers provide complex, combinatorial control of gene expression in eukaryotes. Can we get past the cartoons?

How can humans get away with having no more genes than a nematode or a potato? It isn't about size, but how you use what you've got. And eukaryotes use their genes with exquisite subtlety, controlling them from DNA sequences called enhancers that can be up to a million base pairs away. Over the eons, countless levels of regulatory complexity have piled onto the gene expression system, more elements of which come to light every year. But the most powerful contol over genes comes from modular cassettes (called enhancers) peppered over the local DNA to which regulatory proteins bind to form complexes that can either activate or repress expression. These proteins themselves are expressed from yet other genes and regulatory processes that form a complex network or cascade of control.

When genome sequencing progressed to the question of what makes people different, and especially what accounts for differences in disease susceptibility, researchers quickly came up with a large number of mutations from GWAS, or genome-wide association studies, in data from large populations. But these mutations gave little insight into the diseases of interest, because the effect of each mutation was very weak. Otherwise the population would not be normal, as these were, typically, but afflicted. A slight change in disease susceptibility coming from a mutation somewhere in the genome is not likely to be informative until we have much more thorough understanding of the biological pathway of that disease.

This is one reason why biology is still going on, a decade and a half after the human genetic code was broken. The weak effect mutations noted above are often far away from any gene, and figuring out what they do is rather difficult, both because of their weakness, their perhaps uninformative position, and also because of the complexity of disease pathways and the relevant environmental effects.

Part of the problem comes down to a need to understand enhancers better, since they play such an important role in gene expression. Many sequencing projects study the exome, which comprises the protein-coding bits of the genome, and thus ignore regulatory regions completely. But even if the entire genome is studied, enhancers are maddening subjects, since they are so darned degenerate. Which is a technical term for being under-specified with lots of noise in the data. DNA-binding proteins tend to bind to short sites, typically of seven to ten nucleotides, with quite variable/noisy composition. But if helped by a neighbor, they may bind to a quite different site.. who knows? Such short sequences are naturally very common around the genome, so which ones are real, and which are decoys, among the tens or hundreds of thousands of basepairs around a gene? Again, who knows?

Thus molecular biologists have been content to do very crude analyses, deleting pieces of DNA around a specific gene, measuring a target gene's expression, and marking off sites of repression and enhancement using those results. Then they present a cartoon:

Drosophila Runt locus, with its various control regions (enhancers) mapped out a top on the genomic locus, and the proto-segmental stripes in the embryo within which each enhancer contributes to activate expression below. The locus spans 80,000 basepairs, of which the coding region is the tiny blue set of exons marked at top in blue with "Run".

This is a huge leap of knowlege, but is hardly the kind of quantative data that allows computational prediction and modeling of biology throughout the relevant regulatory pathways, let alone for other genes to which some of the same regulatory proteins bind. That would require a whole other level of data about protein-DNA binding propensities, effects from other interacting proteins, and the like, put on a quantitative basis. Which is what a recent paper begins to do.
"The rhomboid (rho) enhancer directs gene expression in the presumptive neuroectoderm under the control of the activator Dorsal, a homolog of NF-κB. The Twist activator and Snail repressor provide additional essential inputs"
A Drosophila early embryo, stained for gene expression of Rhomboid, in red. The expression patterns of the regulators Even-skipped (stripes) and Snail (ventral, or left) are both stained in green. The dorsal (back) direction is right, ventral (belly) is left, and the graph is of Rhomboid expression over the ventral->dorsal axis. The enhancer of the Rhomboid gene shown at top has its individual regulator sites colored as green (Dorsal), red (Snail) and yellow (Twist). 

Their analysis focused on one enhancer of one gene, the Rhomboid gene of the fruit fly, which directs embryonic gene expression just dorsal to the midline, shown above in red. The Snail regulator is a repressor of transcription, while Dorsal and Twist are both activators. A few examples of deleting some of these sites are shown below, along with plots of Rhomboid expression along the ventral/dorsal axis.

Individual regulator binding sites within the Rhomboid enhancer (B, boxes), featuring different site models (A) for each regulator. The fact that one regulator such as Dorsal can bind to widely divergent site, such as DL1 and DL2/3, suggests the difficulty of finding such sites computationally in the genome. B shows how well the models match the actual sequence at sites known to be bound by the respective regulators.

Plots of ventral-> dorsal expression of Rhomboid after various mutations of its Dorsal / Twist/ Snail enhancer. Black is the wild-type case, blue is the mutant data, and red is the standard error.

It is evident that the Snail sites, especially the middle one, plays an important role in restricting Rhomboid expression to the dorsal side of the embryo. This makes sense from the region of Snail expression shown previously, which is restricted to the ventral side, and from Snail's activity, which is repression of transcription.
"Mutation of any single Dorsal or Twist activator binding site resulted in a measurable reduction of peak intensity and retraction of the rho stripe from the dorsal region, where activators Dorsal and Twist are present in limiting concentrations. Strikingly, despite the differences in predicted binding affinities and relative positions of the motifs, the elimination of any site individually had similar quantitative effects, reducing gene expression to approximately 60% of the peak wild-type level"

However, when they removed pairs of sites and other combinations, the effects became dramatically non-linear, necessitating more complex modelling. In all they tested 38 variations of this one enhancer by taking out various sites, and generated 120 hypothetical models (using a machine learning system) of how they might cooperate in various non-linear ways.
"Best overall fits were observed using a model with cooperativity values parameterized in three 'bins' of 60 bp (scheme C14) and quenching in four small 25 or 35 bp bins (schemes Q5 and Q6)."
Example of data from some models (Y-axis) run on each of the 38 mutated enhancer data (X-axis). Blue is better fit between the model and the data.

What they found was that each factor needed to be modelled a bit differently. The cooperativity of the Snail repressor was quite small. While the (four) different sites differ in their effect on expression, they seem to act independently. In contrast, the activators were quite cooperative, an effect that was essentially unlimited in distance, at least over the local enhancer. Whether cooperation can extend to other enhancer modules, of which there can be many, is an interesting question.

Proof of their pudding was in the extension of their models to other enhancers, using the best models they came up with in a general form to predict expression from other enhancers that share the same regulators.

Four other enhancers (Ventral nervous system defective [vnd], Twist,  and Rhomboid from two other species of Drosophila, are scored for the modeled expression (red) over the dorsal-ventral axis, and actual expression in black.

The modeling turns out pretty decent, though half the cases are the same Rhomboid gene enhancer from related Drosophila species, which do not present a very difficult test. Could this model be extended to other regulators? Can their conclusion about the cooperativity of repressors vs activators be generalized? Probably not, or not very strongly. It is likely that similar studies would need to be carried out for most major classes of regulators to accumulate the basic data that would allow more general and useful prediction.

And that leaves the problem of finding the sites themselves, which this paper didn't deal with, but which is increasingly addressable with modern genomic technologies. There is a great deal yet to do! This work is a small example of the increasing use of modeling in biology, and the field's tip-toeing progress towards computability.

  • Seminar on the genetics of Parkinson's.
  • Whence conservatism?
  • Krugman on the phony problem of the debt.
  • Did the medievals have more monetary flexibility?
  • A man for our time: Hume, who spent his own time in "theological lying".
  • Jefferson's moral economics.
  • Trump may be an idiot, just not a complete idiot.
  • Obama and Wall Street, cont...
  • The deal is working.. a little progress in Iran.
  • More annals of pay for performance.
  • Corruption at the core of national security.
  • China's investment boom came from captive savings, i.e. state financial control.

No comments: