Showing posts with label genetics. Show all posts
Showing posts with label genetics. Show all posts

Saturday, June 18, 2022

Balancing Selection

Human signatures of balancing selection, one form and source of genomic variation.

We generally think of selection as an inexorable force towards greater fitness, eliminating mutations and less fit forms in favor of those more successful. But there is a lot else going on. For one thing, much mutation is meaningless, or "neutral". For another, our lives and traits are so complicated that interactions can lead to hilly adaptive landscapes where many successful solutions exist, rather than just one best solution. One form of adaptive and genetic complexity is balancing selection, which happens when two alleles (i.e. mutants or variants) of one gene have distinct roles in the whole organism or ecological setting, each significant, and thus each is maintained over time. 

A quick example is color in moths. Dark colors work well as camouflage in dirty urban environments, while lighter colors work better in the countryside. Since both conditions exist, and moths move around between them, both color schemes are selected for, resulting in a population that is persistently mixed for this trait. Indeed, the capacity of predators to learn these colors may also lead to an automatic advantage for the less frequent color, another form of balancing selection. Heterozygotes may also have an intrinsic advantage, as is so clearly the case for the sickle cell mutation in hemoglobin, against malaria. These are all classic examples. But to bring it home, a society has only so much capacity for people like Donald Trump. Insofar as sociopathy is genetic, there will necessarily be a frequency-dependent limit, where this trait (and other antisocial traits) may be highly successful at (extremely) low frequency, but terminally destructive at high frequencies.


Schematic selective landscapes. Sometimes selection just optimizes an existing trait by intensifying it (1), or moving it along trait space to a new optimum (2). But other times, multiple forms (i.e. variants, or mutations) of a given locus each have some useful / beneficial characteristic, and may be selected either discretely for particular effects (3), or generally for their diversity (4).

One laborious method to find such sites of balancing selection in a genome is to compare it to genomes of other species. If the same variants exist in each species over long periods of divergence, that argues that such conserved sites of diversity are maintained by balancing selection. Studies of humans and chimpanzees have found some such sites, but not many. But these methods are known to be very conservative, missing out on what is likely to be most cases.

A recent paper offered a slighly more sensitive way to find signs of balancing selection in the human genome, and found quite a lot of them. (Some background here.) It is based, as many investigations of selection are, on a special property of protein-coding genes, due to the degeneracy of the genetic code, that some mutations are "synonymous" and lead to no change in the coded protein, and others are "non-synonymous" and do change the protein. The latter would be assumed to be visible to selection, and sometimes give significant signals of conservation (i.e. low rates of change between species and populations, and few variations maintained in a population). This embedded signal/control pairing of information helps to insulate against many problems in analysis, and can tell us pretty directly how severe selection is on such sites. 

It is worth adding that each basepair in the human genome has its own selective constraints. One position may code for the active site of some enzyme and be extremely well conserved, while the next may be a "synonymous" that has very few or no selective constraints, and another lies in junk DNA that doesn't code for anything or regulate anything, is effectively neutral, and can be changed with no effect. The system is in this sense massively parallel, and able to experience evolution individually at each site concurrently. On the other hand, selection on one site affects the frequencies at nearby sites, since selective "sweeps" through that area of the genome drag the nearby regions of DNA (and whatever variants they may harbor) along, whether positively if the site is increasing in frequency, or negatively if it is deleterious and causing death of its bearers. The reach of this "linkage" effect depends on the recombination frequency, which is relatively low, leading the moderate stability (and linkage) of relatively large "haplotypes" in our genomes.

At any rate, as the methods for detecting selection improve, more selection is detected, which is the lesson of this paper. These authors claim that while their method still significantly under-estimates balancing selection, they find evidnce for the existence of hundreds of sites in humans, when comparing genomes between different geographic regions of the world. A couple hundred of these sites are in the MHC regions- the immunological areas of the genome that code for antibodies and related proteins. These are well-known to be hotspots both for diversity and for the ongoing selective arms race vs pathogens (as we have recently experienced vs Covid). Seeing a lot of balancing selection there makes complete sense, naturally. 

The authors note that their focus on coding regions of the genome, and other technical limitations such as the need to find these sites through population comparisons, argues strongly that their estimate is a severe undercount. Thus one can assume that there will be at least several thousand sites of balanced selection in humans. This is quite apart from the many more sites of ongoing unidirectional selection, mostly purifying against problem mutations, but also towards positive characteristics. An accounting that is only starting to get going, over the vast amounts of variation we harbor. So we live in a dynamic world, inside and out.


  • Green fuel for airplanes... really?
  • Barr is not the good guy here.
  • Free speech- not entirely free.
  • Court to workers: drop dead.
  • Islam and the megadrought.
  • Is crypto this cycle's subprime black hole?

Saturday, May 14, 2022

Tangling With the Network

Molecular biology needs better modeling.

Molecular biologists think in cartoons. It takes a great deal of work to establish the simplest points, like that two identifiable proteins interact with each other, or that one phosphorylates the other, which has some sort of activating effect. So biologists have been satsified to achieve such critical identifications, and move on to other parts of the network. With 20,000 genes in humans, expressed in hundreds of cell types, regulated states and disease settings, work at this level has plenty of scope to fill years of research.

But the last few decades have brought larger scale experimentation, such as chips that can determine the levels of all proteins or mRNAs in a tissue, or the sequences of all the mRNAs expressed in a cell. And more importantly, the recognition has grown that any scientific field that claims to understand its topic needs to be able to model it, in comprehensive detail. We are not at that point in molecular biology, at all. Our experiments, even those done at large scale and with the latest technology, are in essence qualitative, not quantitative. They are also crudely interventionistic, maybe knocking out a gene entirely to see what happens in response. For a system as densely networked as the eukaryotic cell, it will take a lot more to understand and model it.

One might imagine that this is a highly detailed model of cellular responses to outside stimuli. But it is not. Some of the connections are much less important than others. Some may take hours to have the indicated effect, while others happen within seconds or less. Some labels hide vast sub-systems with their own dynamics. Important items may still be missing, or assumed into the background. Some connections may be contingent on (or even reversed by) other conditions that are not shown. This kind of cartoon is merely a suggestive gloss and far from a usable computational (or true) model of how a biological regulatory system works.


The field of biological modeling has grown communities interested in detailed modeling of metabolic networks, up to whole cells. But these remain niche activities, mostly because of a lack of data. Experiments remain steadfastly qualitative, given the difficulty of performing them at all, and the vagaries of the subjects being interrogated. So we end up with cartoons, which lack not only quantitative detail on the relative levels of each molecule, but also critical dynamics of how each relationship develops in time, whether in a time scale of seconds or milliseconds, as might be possible for phosphorylation cascades (which enable our vision, for example), or a time scale of minutes, hours, or days- the scale of changes in gene expression and longer-term developmental changes in cell fate.

These time and abundance variables are naturally critical to developing dynamic and accurate models of cellular activities. But how to get them? One approach is to work with simple systems- perhaps a bacterial cell rather than a human cell, or a stripped down minimal bacterial cell rather than the E. coli standard, or a modular metabolic sub-network. Many groups have labored for years to nail down all the parameters of such systems, work which remains only partially successful at the organismal scale.

Another approach is to assume that co-expressed genes are yoked together in expression modules, or regulated by the same upstream circuitry. This is one of the earliest forms of analysis for large scale experiments, but it ignores all the complexity of the network being observed, indeed hardly counts as modeling at all. All the activated genes are lumped together into one side, and all the down-regulated genes on the other side, perhaps filtered by biggest effect. The resulting collections are clustered by some annotation of those gene's functions, thereby helping the user infer what general cell function was being regulated in her experiment / perturbation. This could be regarded perhaps as the first step on a long road from correlation analysis of gene activities to a true modeling analysis that operates with awareness of how individual genes and their products interact throughout a network.

Another approach is to resort to a lot of fudge factors, while attempting to make a detailed model of the cell /components. Assume a stable network, and fill in all the values that could get you there, given the initial cartoon version of molecule interactions. Simple models thus become heuristic tools to hunt for missing factors that affect the system, which are then progressively filled in, hopefully by doing new experiments. Such factors could be new components, or could be unsuspected dynamics or unknown parameters of those already known. This is, incidentally, of intense interest to drug makers, whose drugs are intended to tweek just the right part of the system in order to send it to a new state- say, from cancerous back to normal, well-behaved quiescence.

A recent paper offered a version of this approach, modular response analysis (MRA). The authors use perturbation data from other labs, such as the inhibition of 1000 different genes in separately assayed cells, combined with a tentative model of the components of the network, and then deploy mathematical techniques to infer / model the dynamics of how that cellular system works in the normal case. What is observed in either case- the perturbed version, or the wild-type version- is typically a system (cell) at steady state, especially if the perturbation is something like knocking out a gene or stably expressing an inhibitor of its mRNA message. Thus, figuring out the (hidden) dynamic in between- how one stable state gets to another one after a discrete change in one or more components- is the object of this quest. Molecular biologists and geneticists have been doing this kind of thing off-the-cuff forever (with mutations, for instance, or drugs). But now we have technologies (like siRNA silencing) to do this at large scale, altering many components at will and reading off the results.

This paper extends one of the relevant mathematical methods (modular response analysis, MRA) to this large scale, and finds that, with a bit of extra data and some simplifications, it is competitive with other methods (mutual information) in creating dynamic models of cellular activities, at the scale of a thousand components, which is apparently unprecedented. At the heart of MRA are, as its name implies, modules, which break down the problem into manageable portions and allow variable amounts of detail / resolution. For their interaction model, they use a database of protein interactions, which is a reasonably comprehensive, though simplistic, place to start.

What they find is that they can assemble an effective system that handles both real and simulated data, creating quantitative networks from their inputs of gene expression changes upon inhibition of large numbers of individual components, plus a basic database of protein relationships. And they can do so at reasonable scale, though that is dependent on the ability to modularize the interaction network, which is dangerous, as it may ignore important interactions. As a state of the art molecular biology inference system, it is hardly at the point of whole cell modeling, but is definitely a few steps ahead of the cartoons we typically work with.

The authors offer this as one result of their labors. Grey nodes are proteins, colored lines (edges) are activating or inhibiting interactions. Compared to the drawing above, it is decidedly more quantitative, with strengths of interactions shown. But timing remains a mystery, as do many other details, such as the mechanisms of the interactions


  • Fiscal contraction + interest rate increase + trade deficit = recession.
  • The lies come back to roost.
  • Status of carbon removal.
  • A few notes on stuttering.
  • A pious person, on shades of abortion.
  • Discussion on the rise of China.

Saturday, April 2, 2022

E. O. Wilson, Atheist

Notes on the controversies of E. O. Wilson.

E. O. Wilson was one of our leading biologists and intellectuals, combining a scholarly career of love for the natural world (particularly ants) with a cultural voice of concern about what we as a species are doing to it. He was also a dedicated atheist, perched in his ivory tower at Harvard and tilting at various professional and cultural windmills. I feature below a long quote from one of his several magnum opuses, Sociobiology (1975). This was putatively a textbook by which he wanted to establish a new field within biology- the study of social structures and evolution. This was a time when molecular biology was ascendent, in his department and in biology broadly, and he wanted to push back and assert that truly important and relevant science was waiting to be done at higher levels of biology, indeed the highest level- that of whole societies. It is a vast tome, where he attempted to synthesize everything known in the field. But it met with significant resistance across the board, even though most of its propositions are now taken as a matter of course ... that our social instincts and structures are heavily biological, and have evolved just as our physical features have.

Saturday, January 1, 2022

Eugenics is All the Rage

Animal breeders have no qualms directing intensive systems of artificial selection.

Eugenics is defined with reference to humans, as any consideration or implementation of artificial selection. There is little doubt that it would be effective, but there is some disagreement about what an "improvement" would represent. We are not cattle to be bred to specification, but organisms with dignity and freedom- specifically freedom from meddling by others in our reproduction. Wild animals have this freedom as well, by default. But domestic animals- that is a different story. For all our "humane" societies and pampering of some, our treatment of others is distinctly undignified. And that includes their breeding. 

Across the domestic animals, from racing horses and show dogs to dairy cows and chickens, breeding these days is carried on at unprecedented intensity, with the most advanced scientific and statistical techniques. For farm animals, this has led to inbreeding and alarming malformations, such as chickens that can't walk, and cows with chronic udder infections. For dogs, the creation of fundamentally malformed breeds also leads to chronic suffering, (short snouts, short legs), as does lack of care in breeding for temperamental health.


These animals have serious problems, of a genetic nature.


Animal breeding has progressed through three major stages. First is the traditional approach, using hunches and personal judgements- using the best animals, and perhaps cross-breeding with animals from other farms to retain diversity, if any directed breeding is done at all. With a relaxed approach, this led to generally good results, establishing the great dog breeds and other livestock, where hardiness and health were always prominent values. But in pigeon, cat, dog, and other casual breeding since Victorian times, amateur breeding like this can also go rather astray. 

In modern livestock breeding, this was superseded by the use of Estimated Breeding Value, or EBV, which is a systematized way to account for the genetic, rather than phenotypic trait quality in animals, by accounting for their relatives, as far as they have been measured, and also by accounting for uncertainties around heritability and systematic and environmental effects on the trait of interest. This concept puts breeding on a far more scientific basis, with quantification of traits, and of pedigrees. One result is that the breeding value can be estimated for animals who do not even have the trait, such as male dairy cattle. Another has been that animal breeding has been even more relentlessly driven to meet commercial and consumer objectives, even ones that shift over time as tastes change.

Naturally, the EBV method has now been supplemented by DNA-based evaluations in more recent times. The ability to "see" into the genome by sequencing some or all of it, thereby establishing a landmark map based on variants distributed throughout, allows the traits (if linked to such landmarks) to be tracked in all individuals, regardless of phenotype, and even in individual gametes and fetuses. This dramatically reduces the lottery that otherwise is genetics. However, its value is significantly bounded by the fact that most interesting and desirable traits are usually not genetically simple (like, say, eye color), but are complex, influenced in very small amounts by many different loci / genes. 

This is a frontier for animal rights and humane policy development, that animals not only should be treated well, but bred well. In livestock breeding, European countries have some relatively aspirational standards and laws, the US lacks even that. The "standards" used by such organizations are the American Kennel Club are worse than nothing, as they drive breeding for looks alone, and welcome the most obscure and unhealthy breeds, regardless of grave malformations, temperamental disasters, and inbreeding. While health of the animal needs to be paramount, other issues such as the ability of animals to live without special care and infrastructure, and genetic diversity, also need to be addressed, if we are going to be serious stewards of animals in our care.


Saturday, November 13, 2021

Group Selection

Every new form of biological organization becomes a new unit of natural selection

Group selection has been a controversial topic in evolutionary studies. Indeed, the whole matter of where selection operates has been a confusing mess. Richard Dawkins battled his way to fame by arguing that genes were the target of selection, and that we as animal bodies were merely automata driven to unwittingly propagate them by various unconscious means. When considering the unit of selection, one could go even to the individual nucleotide, which is ultimately what is extinguished or propagated by the action of mutation and selection, plying its tiny oar towards the survival of its gene, its genome, its cell, its organism, its society, ... its blessed plot, this earth, this realm, this England!

Traditionally, the individual organism has been viewed as the main unit of selection. But can groups, when they form societies like bee hives or human tribes, be objects of selection as well? A paper reviewing the mathematics of evolution and selection makes the crucial distinction between the mechanism underlying heritability of traits, which might be a gene or nucleotide, and the unit of selection, which is the level of biological organization that exhibits traits upon which natural selection acts. The color of our eyes may be a cellular and organ-level trait, based on genes and nucleotides, but the unit of selection remains the individual, since that is where selection- via mate choice, disease, and whatever other ramifications eye color may have- acts directly to promote or inhibit reproduction. Likewise, social traits such as altruism, cooperation, detection and policing of cheaters, etc. may be in large degree be relevant and selected at the individual level, but at least some of their power and selectivity comes in the competition between groups, i.e. group selection.

It should be clear that selection happens at all sorts of levels, indeed at every level where a new form of biological organization emerges. The "unit of selection" is not singular, but manifold, and is defined, not absolutely, but by the level and properties of the trait being considered. We who inhabit multicellular bodies have pretty definitively ended competition / natural selection among the cells that compose us- those cells are not individual units of selection, since they do not persist after we are gone (even in the case of cancer where their replication has gone haywire). The closest might be competition among male sperm cells, which evidently do compete in their final voyage, though not to the extent of taking up arms against each other. Thus generally, our genes are only indirectly targets of selection, in that they generate traits that manifest on the cellular, individual, and indeed group level, with consequent selection at those levels and differential reproduction that change gene frequencies in the future.

This is called multi-level selection. The socio-biologists got into hot water back in the 1970's by asserting that group traits are at least in part biologically based, as are individual psychological traits, and thus that groups must act as units of selection. This did not sit well with the politically correct of the day, who wanted as a matter of principle to believe that humans (and especially subgroups such as ethnicities and races) are all created equal, and that any talk of heritability of traits such as intelligence, aggressiveness, altruism, etc. was, if not wrong, at least socially devisive and certainly damaging to a proper communist / constructivist view of the malleability of the human condition. While constructivist views of our social psychology, relations and conflicts certainly have significant truth, they can be taken too far, such as the arch-feminist idea that male-ness is purely a social construction, and that some counter-programming is all it would take to make a utopian, de-gendered world.

I'll scratch your back ...

But that is all in the past, and not only are social and group traits increasingly recognized as biological and to some degree heritable, but our evolutionary history is unthinkable without a lot of specific socially relevant traits being encoded, evolved, and put to the test in group-group competition, whether via direct competition or just relative success of independent groups without direct interaction. A set of papers made a review of this field and developed a general mathematical treatment of multi-level selection (MLS), postulating that any biological entity or level of organization can be a unit of selection- when traits can be defined pertaining to that level. This is especially relevant to emergent traits that can not be defined at lower levels of organization. 

Alcoholism, for instance, is hard to define at the cellular or single gene level, but can be easily defined at the organismal level. So it is selected at the level, where individuals suffer and die due to its effects and impair the lives of others along the way. While it necessarily has genetic components and heritability, and those genes can be thought of as being selected for or against, they often drag along many other genes, and have complex relations with other genes in the trait's expression, leaving the definition of the trait and its interaction with natural selection at the individual level. The unit of selection is a separate concept from the genetic and developmental processes that generate the trait. In alcoholism, the adult is the unit of selection, consituting a collection of characteristics that develop out of genes and other sources, whose frequencies may change based on that selection. 

"The genetical theory of MLS ... describes the action of group selection in terms of change in a genetical character. As discussed in the previous section, a genetical score may be assigned to any biological entity that contains genes – such as an entire population – and change in this genetical score can be computed, irrespective of how that population is subdivided into groups and individuals, or the biological level of organization at which the corresponding phenotype actually manifests. ... the theory of natural selection is ‘genetical’: this adjective pertains to the medium by which characters are inherited, rather than to the unit of selection itself."

 

It may be that all this is just a matter of convenience and book-keeping, as traits are defined (by us) on a macro basis. A gene's-eye view of the situation would focus on its own gains and losses in the rough and tumble of life. But in that case, we could not speak of alcoholism as a trait, but would have to speak of the gene's eye view of all the pressures it finds itself under, which would range widely over molecular, cellular territories and beyond, and violate our basic conceptions of a trait that is under natural selection. That is why a trait is defined at a particular level of organization where that characteristic becomes manifest, rather than at at gene level. There is no gene for alcoholism, though the trait is composed of / developed out of many heritable elements.

Imagine, in contrast, that alcoholism had no genetic component at all, but was purely random in genetic terms, not even affected by, say, genetic susceptibility to advertising blandishments. Such a trait would be subject to natural selection (i.e. death and other forms debility). But all that selection on the trait would have no effect on the next and future generations, due to its lack of heritability. It would have no genetic implications, by definition. So the unit of selection and trait being selected are separate issues from the genetic elements that might underpin it, particularly the degree or lack thereof of its genetic basis. 

While we are discussing this particular trait, it might be worth noting that in group terms, affinity to alcohol might be considered a positive trait, contributing to group bonding through the ages. Thus alcoholism might be a matter of stabilizing selection, trading off between its individual harms and its group benefits, particularly in the prehistoric setting where alcohol concentrations tended to be low, social controls strong, and alcoholism proper quite hard to develop.

This discussion, based on the paper series, is all based on the Price equation, which apparently underlies the field and is an extremely general statement / definition of natural selection. It contains basically two terms, which provide for a separation between the aspects of biological change derived from natural selection, and all the rest of the sources of change- drift, environmental change, etc. The selection portion it expresses as co-variation between traits in two populations (such as in successive generations) and the success of individuals (or other units of selection) carrying that trait. The whole equation rests on four key terms, none of which are explicitly genetic:

  • The unit of selection- the biological organization that exhibits the trait, whether an individual, group, etc.
  • The arena of selection- the population of units within which selection and evolution take place.
  • The character under selection- the trait at issue, at whatever appropriate level of organization.
  • The target of selection- the quantity (fitness) by which the character / trait is either good or bad, thus being selected.

As far as the unit of selection and the trait that pertains to that unit, any level will do, as long as it corresponds with a unit, or trait, that is definable to us and selectable in nature. 

"Between-group selection is directly analogous to standard, individual-level natural selection, but with the group taking on the role of the unit of selection, the group's phenotype acting as the character under selection and group fitness being the target of selection."    

"... by framing selection in its full generality from the outset, Price's equation reveals that kin and group selection are components of natural selection, and we obtain their dynamics by drawing them out of—rather than adding them into—the basic form of Price's equation. Moreover, by showing how the kin selection and group selection viewpoints both emerge from the mathematics of natural selection, Price's equation shows that these are not competing hypotheses for the evolution of social behaviour but simply different ways of conceptualizing the very same evolutionary process—and that a fierce, decades-long debate had been largely over nothing."


"For group selection to overcome selection within groups, less than one successfully reproducing migrant may be exchanged per two populations per population lifetime. ... Indeed, if groups are long lived, successful migrants must be very rare, and within-group inbreeding intense, for group selection to prevail over equally intense within-group selection."


Each level of selection can operate on many different traits, however, some of which may not directly conflict. So leaving aside the direct competition between individual and group interests, there is a rich field of action for group selection. This observation of the great sensistivity of group benefits to the rate of migration, especially for traits that conflict between individual and group benefits, gives us a clue about the origins of tribalism, which makes a practice of accentuating infinitesimal differences (or entirely imaginary ones) and using them to justify xenophobia, war, and genocide. It is a key legacy of evolution, particularly group evolution, and one that we struggle to overcome.

So group selection is perfectly consistent with evolutionary theory, (though some rather testy controversies remain). Does that mean that racism is OK? Do group differences justify tribalism and oppression? Well, our instinct for tribalism is certainly testament to a long evolutionary history of group selection, with its tireless focus on tiny, or even nonexistent, differences. The fact is that among humans, group differences are always swamped by within-group variation. We also do not generally discriminate so harshly against the differently abled and neuro-diverse *within tribes as we do against those we perceive outside them. So the practical and moral basis of discrimination and oppression is very poorly founded. True group selection is also virtually powerless against high migration rates, which we have throughout the modern world in any case. Thus the tribal instinct, which is now so flexibly deployed for nebulous groupings as nation states or sports teams, is totally out of its natural element, were we even inclined to mount some new eugenic project of any nature, whether individual or group.


Saturday, October 30, 2021

Genetics and Non-Genetics of Temperament

Some fish are shy, some honeybees are outgoing. What makes individuals out of a uniform genetic background?

Do flies have personalities? Apparently so. Drosophila have a long and storied history as perhaps the greatest model organism for genetic research. They have brains, intricate development, complex bodies and behaviors, but also rapid generation time, relatively easy handling, and mass rearing. A new paper describes a quest to define their personalities- behavioral traits that vary despite a uniform genetic background. Personality is a trait that may be genetically influenced, but may just as well have environmental or sporadic causes (that is, not determined by outside factors). Importantly, this kind of trait tends to recur in a population, indicating that while it may not be determined, it follows certain canalized pathways in development, which might themselves be amenable to genetic investigation. Human personality studies have a long history, with various systems trying to make sense of the typical forms and range of variation.

A recent paper did a massive screen of uniformly inbred flies for personality variations. Computerization and automation have revolutionized the animal screening field, as it has so many others, so now flies can be indivually put through a battery of tests with minimal effort to humans, looking for their individual responses to light, to maze choices, spontaneous activity, circadian preferences, sensitivity to odors, etc. These tests were compiled for hundreds of genetically identical flies from birth to death, followed by sequencing of their mRNA expression to see which genes were active. Another batch of more diverse wild-type flies were tested as well to compare what variable genetic influences might be afoot.

Firstly, the differences they observed in these flies were stable over time. They represent true "types" of behavior, despite the lack of genetic input. Secondly, they are limited in landscape. Those flies more active in one test tend to be more active in other tests as well. So the variations in behavior seem to flow from deep-seated categorical types that follow typical patterns within fly development. Which tests should yield correlated scores, and which other ones are more orthogonal, is a little hard to figure out and a matter of subjective taste, so these conclusions about wide-spread correlations in disparate behaviors reflecting personality types is based largely on these researchers knowing their flies on a pretty intimate basis.

A matrix of videos of flies just strolling along, captured by these researchers. Not all flies walk the same way.

For example, they emphasize correlations where they would not have expected them- between, say light sensitivity and overall activity- and non-correlations where they would have expected correlation- say between activity measures of maze walking and free activity. The main observation is that there were a lot of variations among these identical-twin flies. So, just as identical humans can have different personalities, sensitivities, and outlooks, so can flies. 

Is there anything one can say about this genetically? The behavioral variations were themselves not genetically based, but rather due to alternate paths taken down developmental pathways, via either sporadic or experience-based differences. The flies were raised in the same homes, so to speak, but as we know from humans, however similar things may seem on the outside, the individual subjective experience can be very different. At any rate, the developmental pathways leading to the variations are themselves genetically determined, so this exercise was really about learning about how they work, and what range of variation they support/allow.

This analysis of course boils down to how informative the behavioral traits are that the researchers are testing. And obviously, they were not very informative- how does one connect a propensity to turn left when going down a maze with some developmental process? These researchers threw a bunch of statistics at their data, including from the gene expression analysis performed in the sacrificed flies after their mortal trials were over. For instance, among known molecular pathways, metabolic pathway gene expression correlated with activity assays of behavior- not a big surprise. Expression of photo-transduction related genes also correlated with response to light. The biggest correlation was between oxidative phosphorylation gene expression (i.e. mitochondrial activity) with their various activity measurements, which were, after all, the essence of all their assays. In humans, some people are just high-energy, which informs everything they do.

"We found that in all cases, behavioral variation has high dimensionality, that is, many independent axes of variation."

In the end, they conclude that, yes, flies of identical genetic background grow up to have distinct behavioral profiles, or one can say, personalities. Many of these behavioral profiles or traits are independent of each other, indicating several, or even numerous, axes of development where such differences can arise. The researchers estimate 27 dimensions of trait variability, in fact, just from this smattering of tests. But others vary together, forming a sort of personality type, though the choice of assays was obviously very influential in these cross-correlations. These results give a very rough start to the project of figuring out where animal development is less than fully determined, and can thus give rise to the non-genetic variation that provides rich fodder for environmental and social adaptation / specialization. While genes are not directly responsible for this variation, they are responsible for the available range, and thus set the parameters of possible adaptation.

It is sadly typical that these researchers disposed of about 1/3 of their flies at the outset of the study for being insufficiently active. While they are surely correct that these flies would continue to be less active through the rest of the assays, thus giving less data to their automated tests, they did not ask themselves why some flies might choose to think before they leap - so to speak. Were they genetically defective? The flies were identical to a matter of a handful of single nucleotide variations. If inbreeding was a problem, all the flies would have been equally affected. So it is likely that one of the most significant personality traits was summarily excluded out of raw institutionalized bias against the more introverted fly, conveniently veiled by claims of technical limitations. Hey hey, ho ho!

  • Yes, they have a brain.
  • Technical talk on SARS COV2 evolution, which has been, obviously, rapid and devastating.
  • And a story about its endemic fate as a regular cold virus among us.
  • Manchin isn't a slouch in the corruption department either.
  • We need a lot more electricity.
  • The price of fish.
  • If you thing facebook is bad here, it is worse for other countries.
  • I was thinking about oculus. But now, maybe not.
  • A little bit of wonderfulness from the Muppets.

Saturday, October 9, 2021

Alzheimer's: Wnt or Lose

A molecular exploration of the causes of Alzheimer's disease.

What causes Alzheimer's disease remains a bit of a mystery, as there is no simple and single molecular explanation, as there is with, say, Huntington's disease, which is caused by a single gene defect. There is one leading candidate, however, which is the amyloid protein, one of the accumulated molecular signatures of the disease in post-mortem brains. Some genetic forms of Alzheimer's start with defects in the gene that encodes this protein, APP (amyloid precursor protein). And a protease processing system that cleaves out the toxic amyloid beta protein from the much larger original APP protein is also closely involved with Alzheimer risk. So while there are many other genetic risk factors and possible causes relating to the APP and other systems, this seems to be the dominant causal element in Alzheimer's disease.

The naming of this protein is rather backwards, focusing on the pathological roles of defective forms, rather than on what the normal protein does. But we don't really know what that normal function is yet, so have had little choice. A recent paper described one new function for the normal APP protein, which is as a receptor for a family of proteins called WNT (for wingless integration site, an obscure derivation combining findings from fly and mouse genetics). APP had long been known to interact with WNT functions, and a reduction of WNT signaling is one of the pathologic (and possibly pathogenic) hallmarks of Alzheimer's, but this seems to be the first time it has been tabbed as a direct receptor for WNT.

What is WNT? These proteins track back to the dawn of multicelled animals, where they first appear in order to orchestrate the migration and communication of cells of the blastopore. This is the invagination that performs the transition (gastrulation) from an egg-derived ball of cells to the sheets of what will become the endoderm and mesoderm on the inside, and the ectoderm on the outside. The endoderm becomes the gut and respiratory organs, the mesoderm becomes the skeleton, muscles, blood, heart, and connective tissue, and the ectoderm becomes the skin and nervous system. WNT proteins are the ligands expressed in one set of cells, and their receptors (Frizzled and a few other proteins) are expressed on other cells which are nearby and need to relate for some developmental / migration / identification, or other purpose. One other family, the NOTCH proteins and their respective cell surface receptors, have a similar evolutionary history and likewise function as core developmental cell-cell signaling and identification systems. 

Rough structure of the APP protein. The membrane  spanning portion is in teal at the bottom, showing also some key secretase protease cleavage sites, which liberate alpha and beta portions of the protein. The internal segment is at bottom, and functions, when cleaved from the rest of the protein, as a nuclear transcription activator. Above are various extracellular domains, including one for "ligand binding", which is thought by at least one research group to bind WNT. The dimerization domain can bind other APP proteins on other cells, and heparin, another binding partner is a common component of the extracellular environment.

Fast forward a billion years, and WNT family members are deeply involved in many decisions during animal development and afterwards, particularly in the brain, controlling nerve cell branching and synapse formation in adults. WNT, NOTCH, and APP are each ligand+receptor systems, where a ligand from one cell or in soluble form binds to a receptor on the surface of another cell, which "receives" the signal and can do a multitude of things in response. The usual receptors for WNT are a family of Frizzled proteins plus a bunch of other helper proteins, the receptors for NOTCH are Jagged proteins, and the APP protein is itself a receptor whose ligand has till now been unclear, though it can homodimerize, detecting APP on other cells. APP is a large protein, and one of its responses to signals is to be cleaved in several ways. Its short cell-interior tail can be cleaved, (by gamma secretase), upon which that piece travels to the nucleus and with other proteins acts as a transciption regulator, activating, among other genes, its own gene, APP. Another possible cleavage is done by alpha secretase, causing the release of soluble APP alpha (sAPPα), which has pro-survival activities for neurons and protects them against excessive activity (excito-toxicity). Lastly, beta-secretase can cleaves APP into the toxic beta (Aβ), which in tiny amounts is also neuro-protective, but in larger amounts is highly toxic to neurons, starting the spiral of death which characterizes the hollowing out of the brain in Alzheimer's disease.

The cleavages by alpha secretase and beta secretase are mutually exclusive- the cleavage sites and products overlap, so cleavage by one prevents cleavage by the other, or destroys its product. And WNT signaling plays an important role in which route is chosen. WNT signals by two methods, called canonical or non-canonical, depending on which receptor and which ligand meet. Canonical signaling is neuro-protective, opposed to Alzheimer development, and leads to alpha secretase cleavage. Non-canonical signaling tends to the opposite, leading to internalization of APP from the surface, and beta secretase cleavage which needs acidic conditions that are found in the internal endsomes where APP ends up. So the balance of WNT "tone" is critical, and is part of the miscellaneous other risk factors that make up the background for Alzheimer's disease. Additionally, cleavage by gamma secretase is needed following cleavage by beta secretase in order to make the final forms of APP beta. The gene for gamma secretase is PSEN1 (presenilin-1), mutations in which are the leading genetic cause of Alzheimer's disease. Yet these mutations have no clear relation with the activity of the resulting gamma secretase or the accumulation of particular APP cleaved forms, so this area of causality research remains open and active.

But getting back the WNT story, if APP is itself a WNT receptor, then that reinforces the centrality of WNT signaling in this syndrome. Indeed, attempts to treat Alzheimer's by reducing the toxic amyloid (APP beta) build up in various ways have not been successful, so researchers have been looking for causal factors antecedent to that stage. One clue is that a key WNT inhibitor, DKK (for dick-kopf, derived from fly genetics, which have had some prominent German practitioners), has been experimentally an effective therapy for mice with a model form of Alzheimers. DKK is an inhibitor of the canonical WNT pathway, (via the LRP6 co-receptor of Frizzled), shunting it towards more non-canonical signaling. This balance, or "tone" of WNT signaling seems to have broad effects in promoting neurite outgrowth and synapse formation, or the reverse. Once this balance is lost, APP beta induces the production of more DKK, which starts a non-virtuous feedback cycle that may form the core of Alzheimer's pathology. This cycle could be started by numerous genetic defects and influenced by other environmental risk factors, leading to the confusing nature of the syndrome (no pun intended!). And of course the cycle starts long before symptoms are apparent and even longer before autopsy can verify what happened, so getting to the bottom of this story has been hugely frustrating.


  • Even Forbes is covering these molecular details these days.
  • A new low for the US- as a sleazy tax haven.
  • No hypocrisy at the Bible museum!
  • Senator from coal is now in control.
  • Facebook has merely learned from the colleagues at FOX- the Sith network.
  • But does add its own wrinkles.
  • Bill Mitchell on the Australian central bank accounts.

Saturday, July 31, 2021

RAD51 and the DNA Hokey-Pokey

DNA repair and recombination rely on homology search between separate DNA molecules, one of which is double-stranded. How is that done?

BRCA2 is one of the more significant cancer-causing genes, when mutated. It is a huge protein of 3,418 amino acids, with lots of interactions, and functions that are not, even at this late date, very well understood. Like many eukaryotic proteins, it does alot of facilitation and organization of other proteins, roles which have clearly snowballed over evolutionary time. But its core function seems to be to bind right at the site of DNA breaks, and load the recombination protein RAD51 onto the ragged single stranded end. RAD51 then coats the remaining single stranded DNA and does the important work of helping it to find matching DNA elsewhere in the nucleus, which can then be copied to properly repair the break.

It is clear that DNA repair is a critical and highly regulated process, thus the continuing elaboration of proteins like BRCA2 which have mangerial roles. But RAD51 has the more fascinating structural role to play. How does it enable a job that seems impossible- to search efficiently through a whole genome of 3 billion basepairs, crammed in a crowded and jostling nucleus, and wound into double-stranded form on nucleosomes and other chromosomal proteins, to find the exact partner with which to pair and perform the dance of filling in the missing bit of DNA?

RAD51 is, unlike BRCA2, highly conserved, from bacteria to humans. Due to the different genetic methods used to find it, it is named RecA in bacteria, (for a specifically recombination-oriented screen), but is called RAD51 in eukaryotes, following a screen done in yeast cells for all sorts of mutants sensitive to high-energy radiation. Work over the last couple of decades has clarified the structure of RecA/RAD51 and thus how it functions.

Schematic of a DNA break, after processing, searching and finding a homolog to complete the repair. Not mentioned in this post, but the two ends need to be held in a coordinated way to facilitate repair across the break, even while the single stranded ends engage in a nucleus-wide homology search.

RAD51/RecA coating DNA, in scanning electron microscopy. Note how linear and stiff it is. Comparison is with similar DNA coated with another protein, single-strand binding protein, which imposes much less structure.

As mentioned above, RAD51 coats the single stranded end left after a DNA break has been detected and processed / cleaned up by the initial enzymes, and after BRCA2 binds to the recessed junction where the single strand starts. RAD51 forms a stiff and bulky filament, holding the DNA in a stretched conformation that is a thousand times stiffer than single stranded DNA, and 20 times stiffer than double stranded DNA. Interestingly, the single stranded DNA is held deep within the RAD51 filament, quite hard to see from the outside. Only the bases peep out, in triplet sets, amongst the protein structure that holds it so tightly. RAD51 is an ATP-ase, using the energy of ATP to polymerize and construct the filament, and also to de-construct it, but not for the searching operations.

Structure of a RAD51/RecA filament- macro above, and micro below. The single stranded DNA whose homolog is being sought is in orange, tucked deep within the protein filament. In closeup, a slight opening of the incoming double stranded DNA (blue) allows its bases to sample a little bit of the target. The pinkish blobs are positively charged lysines / argenines, ready to mate with the negatively charged incoming DNA backbone. Video here.

So much for the single strand doing the homology search. What about the double stranded DNA being searched against? The RAD51 filament makes provision for that as well, binding it lightly (in the proper directional orientation) and additionally having local splaying interactions that encourage its strands to separate slightly, binding the non-searching single strand, and allowing the searching strand to pair with the triplets peeking out from the core RAD51 filament. At this atomic scale, there is a lot of brownian motion / jostling- the DNA does breathe a bit naturally- so this is not very hard to do in a rapid way. But RAD51 obviously facilitates this in an optimized way.

Another structural view of the core sampling interaction, emphasizing the DNA strands. In brown is the target single strand DNA. In green is the slightly opened strand from the incoming double stranded DNA doing the sampling of one target triplet (with its single strand complement in red held off a little to the side). Note how the target DNA is held in very stretched form, with triplets of bases separated by slight gaps, which are RAD51 protein residues.

The binding of the invading double strand DNA is then very heavily dependent on how well it pairs with the single strand triplets. Pairing with three exposed bases is not a big deal. But pairing with eight consecutive bases stabilizes the match, and pairing with 26 or more seals the deal to be a long-lived match, which can induce de-polymerization of RAD51 and the arrival of repair polymerases. It is clear that RAD51 coordinates a complex dance of on-off sampling of nearby double stranded DNAs, including non-specific capture of local DNA, detailed samping by encouraging strand opening, as well as linear back and forth shifting, allowing some linear scanning as well. These diffusion mechanisms somehow add up to a thorough search of the nucleus for the right partner.

In bacteria, with genomes of a few million base pairs, sequences of 15 nucleotides are usually unique. In a genome of three billion bases, longer sequences are needed to be sure of true homology, nuclear volume is much larger, and there is more complex chromatin to deal with. Yet, the homology search time is not much less- about an hour. Why this is is not yet really clear. In eukaryotes, homologous chromosomes may typically reside close to each other in a semi-stable nuclear architecture. Or other aspects of the chromatin milieu may facilitate the search, paradoxically. And how damaging is an incorrect match? If a closely related sequence is chosen, (sequences which in eukaryotes are common due to replication errors, recombination errors, gene amplification and duplication, and repetitive sequences of many other kinds), it may not matter at all, depending on the size of repair span being copied from the intact homolog. Tract lengths repaired by copying from the other homolog are typically between 50 to 800 nucleotides long.

An even more focused view of the evolving match between a RAD-51 bound single strand (red) and an incoming DNA from a double-stranded sequence match (blue).


Saturday, July 10, 2021

Sneakey Eating

An evolutionary perspective on overeating syndromes.

Most animals have a simple problem in life- find enough food to live and survive. But social animals, if they are even slightly advanced, share food, and thus alter this basic equation. They have to find ways to store and share food in a way that sustains the group, whether that is starving the old, or feeding the helpless larvae that can not feed themselves. Humans have always faced this dilemma, but don't have the rigid programming that insects do.

Humans can lie, and steal, and then lie some more. It isn't pretty, but sometimes it gets the job done. Humans can regard rules as optional, a flexibility that is a perpetual threat to institutions, norms, cultural patterns, and ultimately to group success. We recently went through an administration that regarded norms as suggestions, laws as annoyances, and then wondered why their behavior attracted so much hatred, and such low historical esteem.

This dynamic comes to mind more concretely in the case of overeating syndromes, which exemplify the conflict between the individual and the group. In a prehistoric setting, food was almost always scarce and precious. In all native cultures there are elaborate practices of public food sharing and eating, which contribute to surveillance by the community of what everyone is eating. Anyone who violates such social structures must have been severely penalized.

Public, communal eating is a fundamental human practice.

Imagine then that someone feels a compulsion to eat more than their share. Such a compulsion would be highly advantageous- if successful- to enable survival when the others in the group might be starving or malnourished. Some extra weight might well mean the difference of making it through the next winter or not. But being caught could dramatically alter the calculus. Primitive societies had harsh punishments for violating critical norms, including ostracism or execution. What then? 

I would suggest that this background sets the stage for overeating syndromes that commonly combine secret eating, often at night, stealth, and stealing. In a world of plenty like today, it is stigmatized and medicalized, and due to the abundance of food, relatively easy to navigate and thus easy to gain weight from. But pre-historically, it would have been far more fraught, and challenging, probably less likely to result in easily observable weight gains. Like other issues in social life, this conflict would take the form of an arms race between cheaters and rule-enforcers. It would be a cognitive battle between effective surveillance and punishment, vs stealth and the intelligence required to not get caught. So one can view it as one impetus among many other evolutionary forces that shaped human intelligence, and in light of its considerable incidence in modern populations, an arms race that was never resolved. Indeed, it is the type of trait that comes under balancing selection, where a high incidence in a population would be self-defeating, while a low incidence yields a much more successful outcome.


  • Satire- not so funny when you are the target.
  • Making every home a part of the energy solution.
  • Constitution? Who ever heard of enforcing it?

Saturday, June 12, 2021

Mitochondria and the Ratchet of Doom

How do mitochondria escape Muller's ratchet, the genetic degradation of non-mating cells?

Muller's ratchet is one of the more profound concepts in genetics and evolution. Mutations build up constantly, and are overwhelmingly detrimental. So a clonal population of cells which simply divide and live out their lives will all face degradation, and no matter how intense the selection, will eventually end up mutated in some essential function or set of functions, and die out. This gives rise to an intense desire for organisms to exchange and recombine genetic information. This shuffling process can, while producing a lot of bad hands, also deal out some genetically good hands, purifying away deleterious mutations and combining beneficial ones.

This is the principle behind the meiotic sex of eukaryotes with large genomes, and also the widespread genetic exchange done by bacterial cells, via conjugation and other means. In this way, bacteria can stave off genetic obsolescence, and also pick up useful tricks like antibiotic resistance. But what about our mitochondria? These are also, in origin and essence, bacterial cells with tiny genomes which are critically essential to our well-being. They are maternally inherited, which means that the mitochondria from sperm cells, which could have provided new genetic diversity, are, without the slightest compunction, thrown away. This seriously limits opportunities for genetic exchange and improvement, for a genome that is roughly 16 thousand bases long and codes for 37 genes, many of which are central to our metabolism.

One solution to the problem has been to move genes to the nucleus. Most bacteria have a few thousand genes, so the 37 of the mitochondrial genome are a small remnant, specialized to keep local regulation intact, while the vast majority of needed proteins are encoded in the nucleus and imported through rather ornate mechanisms to take their places in one of the variety of the organelle's locations- inner matrix, inner membrane, inter-membrane space, or outer membrane.

The more intriguing solution, however, has been to perform constant and intensive quality control (with recombination) on mitochondria via a fission and fusion cycle. It turns out that mitochondria are constantly dividing and re-fusing into large networks in our cells. And there are a lot of them- typically thousands in our cells. Mitochondria are also capable of recombination and gene conversion, where parts of one DNA are over-written by copying another DNA molecule. This allows a modicum of gene shuffling among mitochondria in our cells. 

The fusion and fission cycle of mitochondria, where fissioned mitochondria are subject to evaluation for function, and disposal.

Lastly, there is a tight control process that eliminates poorly functioning mitochondria, called mitophagy. Since mitochondria function like little batteries, their charge state is a fundamental measure of health. A nuclear-encoded protein called PINK1 enters the mitochondria, and if the charge state is poor, it remains on the outer membrane to recruit other proteins, including parkin and ubiquitin, which jointly mark the defective mitochondrion for degradation through mitophagy. That means that it is engulfed in an autophagosome and fused with a lysozome, which are the garbage disposal / recycling centers of the cell, filled with acidic conditions and degradative enzymes.

The key point is that during the fission / fusion cycle of mitochondria, which happens over tens of minutes, the fissioned state allows individual or small numbers of genomes to be evaluated, and if defective, disposed of. Meanwhile, the fused state allows genetic recombination and shuffling, to recreate genetic diversity from the ambient mutation rate. Since mitochondria are the centers of metabolism, especially redox reactions, they are especially prone to high rates of mutation. So this surveillance is particularly essential. If all else fails, the whole cell may be disposed of via apoptosis, which is also quite sensitive to the mitochondrial state.

In oocytes, mitochondria appear to go through a particularly stringent period of fission, allowing a high level of quality control at this key point. Additionally, mitochondria then go through exponential growth and energy generation to make the oocyte, at which point those which more quality control discards the oocytes that are not up to snuff.

All this adds up to a pretty thorough method of purifying selection. Admittedly, little or no genetic material comes from outside the clonal maternal genetic lineage, but mutations are probably common enough that beneficial mutations arise occasionally, and one can imagine that there may be additional levels of selection for more successful mitochondria over less successful ones, in addition to the charge-dependent rough cut made by this mitophagy selection.

As the penetrating reader my guess, parkin is related to Parkinson's disease, as one of its causal genes, when defective. Neurons are particularly prone to mitochondrial dysfunction, due to their sprawled-out geography. The nuclear genes needed for mitochondria are made only in the cell body / nucleus, and their products (either as proteins, or sometimes as mRNAs) have to be ferried out to the axonal and dendritic periphery to supply their targets with new materials. Neurons have very active transport systems to do this, but still it is a significant challenge. Second, the local population of mitochondria in outlying processes of neurons is going to be small, making the fission/fusion cycle much less effective and less likely to eliminate defective genes and individual mitochondria, or make up for their absence if they are eliminated, leading to local energetic crises.

Cross-section of a neuronal synapse, with a sprinkling of mitochondria available locally to power local operations.

Papers reviewed here:


  • Get back to work. A special, CEO-sponsored cartoon from Tom Tomorrow.
  • They are everywhere.
  • Shouldn't taxes be even a little bit fair?
  • The economics of shame.

Sunday, January 24, 2021

Tale of an Oncogene

Research on a key oncogene of melanoma, MITF, moves from seeing it as a rheostat to seeing it as a supercomputer.

The war on cancer was declared fifty years ago, yet effective therapies are only now trickling in. And very few of them can be characterized as cures. What has been going on, and why is the fight so slow? Here I discuss one example, of melanoma and one of its drivers and central players, the gene MITF.

Melanocytes are not really skin cells, but neural crest cells, i.e. originating in the the embryonic neural tube and giving rise to various peripheral neural structures in the spine, gut, and head. One sub-population migrates off into the epidermis to become melanocytes, which generate skin pigment in melanosome packets, which they distribute around to local keratinocytes. Evolutionarily, these cells are apparently afterthoughts, after originally having developed as part of photoreceptor systems. This history, of unusual evolution and extensive developmental migration and eventual invasion into foreign tissues, has obvious implications for their capacity to form cancers later in life, if mutations re-activate their youthful propensities.

 

Above is shown a sketch of some genes known to play roles in melanoma, and key pathways in which they act. In red are oncogenes known to suffer activating mutations that promote cancer progression. In grey are shown additional oncogenes, ones whose oncogenic mutations are simpler loss-of function, not gain of function, events. And green marks ancillary proteins in these pathways that have not (yet) been found as oncogenes of any sort. MITF is a transcription regulator that drives many genes needed for  melanocyte development and melanosome formation. It also influences cell cycle control and cytoskeletal and cell surface features relevant to migration and invasion of other tissues. This post is based mostly on reviews of the molecules active in melanoma, and the more focused story of MITF.

MITF binds to DNA near target genes, often in concert with other proteins, and activates transcription of the local gene (in most cases, though it represses some targets as well). The evidence linking MITF with melanoma and melanocytes is mostly genetic. It is an essential gene, so complete deletions are lethal. But a wide variety of "mi" mutations in mice and in humans lead to unusual phenotypes like white hair color, loss of hearing, large head formation, small blue eyes, osteopetrosis, and much else. Originally researchers thought there were several different genes involved, but they all resolved down to one complex locus, now called MITF, for mi transcription factor. Certain hereditary mutations also predispose to melanoma, as do some spontaneous mutations. That the dose of MITF also correlates with how active and aggressive a melanoma is also contributes to the recognition that MITF is central to the melanocyte fate and behavior, and also one of the most central players in the disease of melanoma.



The MITF gene spreads over 229,000 base pairs, though it codes for a protein of only 419 amino acids. The gene contains nine alternate transcription start sites, 18 exons (coding regions), and five alternate translation start sites, as sketched above. This structure allows dozens of different forms of the protein to be produced in different tissues and settings, via alternative splicing. The 1M form (above, bottom) is the main one made in melanocytes. Since the gene is essential, mutations that have the phenotypes mentioned above tend to be very small, affecting one amino acid or one splice site, or perhaps truncating translation near the end of the protein. Upstream of the MITF gene and in some of its introns, there are dozens of DNA sites that bind other regulators, which either activate or repress MITF transcription in response to developmental or environmental cues. For example, a LEF1/TCF site binds the protein LEF1, which receives signals from WNT1, which is a central developmental regulator, driving proliferation and differentiation of melanocytes from the stem neural crest cells.

That is just the beginning of MITF's complexity, however. The protein contains in its sequence codes for a wide array of modifications, by regulatory protein kinases (that attach phosphate groups), and other modifiers like SUMO-ylation and ubiquitination. Key cellular regulators like GSK3, AKT, RSK, ERK2, and TAK kinases each attach phosphates that affect MITF's activity. Additionally, MITF interacts with at least a dozen proteins, some of which also bind DNA and alter its target gene specificity, and others that cooperate to activate or repress transcription. One of the better-known signaling inputs is indirectly from the kinase BRAF1, which is a target of the first precision melanoma-fighting drugs. BRAF1 is mutated in half of melanoma cases, to a hyper-active form. It is a kinase responsive to growth factors, generally, and activates a core growth-inducing (MAP) kinase cascade (as shown above), among other pathways. BRAF1 has several effects on MITF by these pathways, but the dominant one seems to be its phosphorylation and activation of PAX3, which is a DNA-binding regulator that activates the MITF gene (and is, notably, absent from the summary figure above, showing how dynamic this field remains). Thus inhibition of BRAF1, which these precision drugs do, effectively reduces MITF expression, most of the time.

Then there are the gene targets of MITF, of which there are thousands, including dozens known to have significant developmental, cell cycle, pigment synthesis, cytoskeletal, and metabolic effects. All this is to say that this one gene participates in a bewilderingly complex network of activities only some of which are recognized to date, and none of which are understood at the kind of quantitative level that would allow for critical modeling and computation of the system. What has been found to date has led to a "switch", or rheostat hypothesis. One of the maddening aspects of melanoma is its resistance to therapy. This is thought in part to be due to this dynamic rheostat, which allows levels of MITF to vary widely and send individual cancer cells reversibly into several different states. At high levels of MITF, cancer cells are pigmented and proliferative (and sensitive to BRAF1 inhibition). But at medium levels of MITF, they revert more to their early migratory behavior, and become metastatic and invasive. So melanoma benefits from a diversity of cell types and states, dynamically switching between states that are both variable in their susceptibility to therapies like anti-BRAF1, and also maximally damaging in their proliferation and ranging activities (diagrammed below).




The theme that comes out of all this is enormous complexity, a complexity that only deepens the more one studies this field. It is a typical example in biology, however, and can be explained by the fact that we are a product of 4 billion years of evolution. The resulting design is far from intelligent- rather, it is a compendium of messy contraptions, historical compromises, and accreted mechanisms. We are very far from having the data to construct proper models that would critically analyze these systems and provide accurate predictions of their behavior. It is not really a computational issue, but a data issue, given the vast complexity we are faced with. Scientists in these fields are still thinking in cartoons, not in equations. 

But there are shortcuts of various kinds. One promising method is to analyze those patients who respond unusually well to one of the new precision treatments. They typically carry some hereditary alteration in some other pathway that in most people generates resistance or backup activity to the one that was drug-treated. If their genomes are fully sequenced and analyzed in depth, they can provide insight into what other pathway(s) may need to be targeted to achieve effective combination treatment. This is a lesson from the HIV and tuberculosis treatment experiences- that the redundancy and responsiveness of biological systems calls for multiple targets and multiple treatments to meet complex disease challenges.