Biophilia

Molecular biology needs better modeling.

Molecular biologists think in cartoons. It takes a great deal of work to establish the simplest points, like that two identifiable proteins interact with each other, or that one phosphorylates the other, which has some sort of activating effect. So biologists have been satsified to achieve such critical identifications, and move on to other parts of the network. With 20,000 genes in humans, expressed in hundreds of cell types, regulated states and disease settings, work at this level has plenty of scope to fill years of research.

But the last few decades have brought larger scale experimentation, such as chips that can determine the levels of all proteins or mRNAs in a tissue, or the sequences of all the mRNAs expressed in a cell. And more importantly, the recognition has grown that any scientific field that claims to understand its topic needs to be able to model it, in comprehensive detail. We are not at that point in molecular biology, at all. Our experiments, even those done at large scale and with the latest technology, are in essence qualitative, not quantitative. They are also crudely interventionistic, maybe knocking out a gene entirely to see what happens in response. For a system as densely networked as the eukaryotic cell, it will take a lot more to understand and model it.

One might imagine that this is a highly detailed model of cellular responses to outside stimuli. But it is not. Some of the connections are much less important than others. Some may take hours to have the indicated effect, while others happen within seconds or less. Some labels hide vast sub-systems with their own dynamics. Important items may still be missing, or assumed into the background. Some connections may be contingent on (or even reversed by) other conditions that are not shown. This kind of cartoon is merely a suggestive gloss and far from a usable computational (or true) model of how a biological regulatory system works.

The field of biological modeling has grown communities interested in detailed modeling of metabolic networks, up to whole cells. But these remain niche activities, mostly because of a lack of data. Experiments remain steadfastly qualitative, given the difficulty of performing them at all, and the vagaries of the subjects being interrogated. So we end up with cartoons, which lack not only quantitative detail on the relative levels of each molecule, but also critical dynamics of how each relationship develops in time, whether in a time scale of seconds or milliseconds, as might be possible for phosphorylation cascades (which enable our vision, for example), or a time scale of minutes, hours, or days- the scale of changes in gene expression and longer-term developmental changes in cell fate.

These time and abundance variables are naturally critical to developing dynamic and accurate models of cellular activities. But how to get them? One approach is to work with simple systems- perhaps a bacterial cell rather than a human cell, or a stripped down minimal bacterial cell rather than the E. coli standard, or a modular metabolic sub-network. Many groups have labored for years to nail down all the parameters of such systems, work which remains only partially successful at the organismal scale.

Another approach is to assume that co-expressed genes are yoked together in expression modules, or regulated by the same upstream circuitry. This is one of the earliest forms of analysis for large scale experiments, but it ignores all the complexity of the network being observed, indeed hardly counts as modeling at all. All the activated genes are lumped together into one side, and all the down-regulated genes on the other side, perhaps filtered by biggest effect. The resulting collections are clustered by some annotation of those gene's functions, thereby helping the user infer what general cell function was being regulated in her experiment / perturbation. This could be regarded perhaps as the first step on a long road from correlation analysis of gene activities to a true modeling analysis that operates with awareness of how individual genes and their products interact throughout a network.

Another approach is to resort to a lot of fudge factors, while attempting to make a detailed model of the cell /components. Assume a stable network, and fill in all the values that could get you there, given the initial cartoon version of molecule interactions. Simple models thus become heuristic tools to hunt for missing factors that affect the system, which are then progressively filled in, hopefully by doing new experiments. Such factors could be new components, or could be unsuspected dynamics or unknown parameters of those already known. This is, incidentally, of intense interest to drug makers, whose drugs are intended to tweek just the right part of the system in order to send it to a new state- say, from cancerous back to normal, well-behaved quiescence.

A recent paper offered a version of this approach, modular response analysis (MRA). The authors use perturbation data from other labs, such as the inhibition of 1000 different genes in separately assayed cells, combined with a tentative model of the components of the network, and then deploy mathematical techniques to infer / model the dynamics of how that cellular system works in the normal case. What is observed in either case- the perturbed version, or the wild-type version- is typically a system (cell) at steady state, especially if the perturbation is something like knocking out a gene or stably expressing an inhibitor of its mRNA message. Thus, figuring out the (hidden) dynamic in between- how one stable state gets to another one after a discrete change in one or more components- is the object of this quest. Molecular biologists and geneticists have been doing this kind of thing off-the-cuff forever (with mutations, for instance, or drugs). But now we have technologies (like siRNA silencing) to do this at large scale, altering many components at will and reading off the results.

This paper extends one of the relevant mathematical methods (modular response analysis, MRA) to this large scale, and finds that, with a bit of extra data and some simplifications, it is competitive with other methods (mutual information) in creating dynamic models of cellular activities, at the scale of a thousand components, which is apparently unprecedented. At the heart of MRA are, as its name implies, modules, which break down the problem into manageable portions and allow variable amounts of detail / resolution. For their interaction model, they use a database of protein interactions, which is a reasonably comprehensive, though simplistic, place to start.

What they find is that they can assemble an effective system that handles both real and simulated data, creating quantitative networks from their inputs of gene expression changes upon inhibition of large numbers of individual components, plus a basic database of protein relationships. And they can do so at reasonable scale, though that is dependent on the ability to modularize the interaction network, which is dangerous, as it may ignore important interactions. As a state of the art molecular biology inference system, it is hardly at the point of whole cell modeling, but is definitely a few steps ahead of the cartoons we typically work with.

The authors offer this as one result of their labors. Grey nodes are proteins, colored lines (edges) are activating or inhibiting interactions. Compared to the drawing above, it is decidedly more quantitative, with strengths of interactions shown. But timing remains a mystery, as do many other details, such as the mechanisms of the interactions

Fiscal contraction + interest rate increase + trade deficit = recession.
The lies come back to roost.
Status of carbon removal.
A few notes on stuttering.
A pious person, on shades of abortion.
Discussion on the rise of China.

Biophilia

Saturday, May 14, 2022

Tangling With the Network