Saturday, June 4, 2016

Modeling Gene Regulatory Circuitry

The difficult transition from cartoons to quantitative analysis of gene regulation

As noted a few weeks ago, gene regulation is a complicated field, typically with cartoonish views developed from small amounts of data. Mapping out the basic parameters is one thing, but creating quantitative models of how regulation happens in a dynamic environment is something quite different- something still extremely rare. A recent paper uses yeast genetics to develop a more thorough way to model gene regulation, and to decide among and refine such models.

A cartoon of glutamine (nitrogen) source regulation in yeast cells. Glutamine is a good food, and tif available outside, turns off the various genes needed to synthesize it. Solid lines are known interactions, and dashed lines are marginal or hypothesized interactions. Dal80 and Gzf both form dimers, which act more strongly (as inhibitor and activator, respectively) than single proteins.
When times are good for yeast cells, in nitrogen terms, an upstream signaling system inhibits the gene activators Gat1 and Gln3, leaving the repressors Dal80 and Gzf3 present and active to repress the various target genes that contribute to the synthesis of the key nitrogen-containing molecule glutamine, since it is available as food. All these regulators bind similiar sequences, the GATA motif, near their target genes, (which number about 90), so presence of the repressors can block the activity of the activators as well as shutting off gene expression directly. Conversely, when times are bad and no glutamine is coming in as food, then the suite of glutamine synthesis genes are turned on by Gat1 and Gln3.

Binding site preferences for each regulatory protein discussed. One can tell that they are not always very well-defined.
But things are not so simple, since, evolution being free to come up with any old system and always tinkering with its inheritance, there are feedback loops in several places which exist, at least in part to provide a robust on/off switch out of this analog logic. In fact, the GAT1, DAL80, and GZF genes each have the GATA motif in their own regulatory regions. Even with such a small system, arrows are going every which way, and soon it is very difficult to come up with a defensible, intuitive understanding of how the network behaves.

Edging towards a model. Individual aspects of the known or hypothesized interactions are encoded in computable form.
The data behind the work is a collection of mRNA abundance (i.e. gene expression) studies run under various conditions, especially in mutants of the various genes, and under conditions of nitrogen rich or poor conditions. Panels of the abundance of all mRNAs of interest can be easily run- the problem really is interpretation, and the generation or design of the various mutants and environmental conditions to be informative perturbations.

This is where modelling comes into play. The authors set up the known and hypothesized interactions, each into its own equation, whose parameters could vary. Though the number of elements are few, the large number of interactions / equations meant the models, (with 5 interactions, 13 states, and 41 parameters), given a partial set of data, could not be solved analytically, but were rather approximated by Monte Carlo methods, which is to say, by guessing with sample data. Models with various hypothesized interactions were compared with each other in performance over perturbation, where the model is given a change in conditions, such as a switch to low-nitrogen medium, or an inactivating mutation in one component. The model comparison method was Bayesian because it was iterative and took into account well-known data, such as the established interactions and their key parameter levels, wherever known.

Given a model, its ability to match the experimental data from the mRNA expression profiles under various conditions can be measured, adjusted, and re-iterated. Many models can be compared, and eventually a competitive process reveals which models work better. This is informative if the models are sufficiently detailed, and there is enough detailed data to measure them on, which is one of the strong points of this well-studied regulatory system. Whether this method can be extended to other systems with far less data is questionable.

In this case, one hypothesized interaction stood out as always contributing to more succesful models. That was the inhibition of Gzf3 by Dal80, its close relative. Also, in further selections, hypothesis 2 was also strongly supported, which is the auto-activation of Gat1, probably by binding to its own promoter. On the other hand, models that were missing the hypothesized interactions 1,3, and 5 were the top performers, indicating that these (auto-inhibition of Dal80, inhibition of Dal80 by Gzf3, and cooperative binding by Gln3 and Gat1) are probably not real, or at least significant under the measured conditions.

Lastly, the authors do a bit of model validation by creating new experiments against which to measure model predictions. Using their best model, the expression of Dal80 (Y-axis) under various perturbations is reasonably well-fit.

New experiments support model predictions reasonably well. In this case, the perturbation (a, b) was shifting form poor to rich (glutamine) food source, thereby inducing the repressor regulators such as Dal80, and repressing the glutamine synthetic genes. In c, d, the perturbation was the reverse, moving cells from a rich source to a drug which directly shuts off the signaling of rich conditions, thereby releasing repression.
And given a model, one can isolate individual aspects of interest, such as the predicted occupancy of target promoters/binding sites by the regulatory factors., which they do in great detail. In the end, the authors complain that much remains unknown about this system (give us more funding!). But the far more pressing question is what to do about the thousands of other networks and species with far more complication and less data. How can they be modelled usefully, and what is the minimal amount of data needed to do so?

  • More on regulatory logic.
  • The state can work effectively.
  • A little pacifism: "Our government has roughly eight hundred foreign military bases."
  • While we have been stagnating, the rest of the world has been catching up and doing better.
  • ECB and helicopter money, but not for Greece.
  • Pakistan is not the only one playing a double game in Afghanistan.
  • Fed, on the wrong track.
  • Every day is opposite day. Do gun nuts know anything about Christianity? "Collectivism: humanity's oldest disease."
  • Methods of a con artist.
  • Abenomics looks a lot more like austerity.