Showing posts with label genetics. Show all posts
Showing posts with label genetics. Show all posts

Saturday, December 23, 2023

How Does Speciation Happen?

Niles Eldredge and the theory of punctuated equilibrium in evolution.

I have been enjoying "Eternal Ephemera", which is an end-of-career memoir/intellectual history from a leading theorist in paleontology and evolution, Niles Eldredge. In this genre, often of epic proportions and scope, the author takes stock of the historical setting of his or her work and tries to put it into the larger context of general intellectual progress, (yes, as pontifically as possible!), with maybe some gestures towards future developments. I wish more researchers would write such personal and deeply researched accounts, of which this one is a classic. It is a book that deserves to be in print and more widely read.

Eldredge's claim to fame is punctuated equilibrium, the theory (or, perhaps better, observation) that evolution occurs much more haltingly than in the majestic gradual progression that Darwin presented in "Origin of Species". This is an observation that comes straight out of the fossil record. And perhaps the major point of the book is that the earliest biologists, even before Darwin, but also including Darwin, knew about this aspect of the fossil record, and were thus led to concepts like catastrophism and "etagen". Only Lamarck had a steadfastly gradualist view of biological change, which Darwin eventually took up, while replacing Lamarck's mechanism of intentional/habitual change with that of natural selection. Eldridge unearths tantalizing and, to him, supremely frustrating, evidence that Darwin was fully aware of the static nature of most fossil series, and even recognized the probable mechanism behind it (speciation in remote, peripheral areas), only to discard it for what must have seemed a clearer, more sweeping theory. But along the way, the actual mechanism of speciation got somewhat lost on the shuffle.

Punctuated equilibrium observes that most species recognized in the fossil record do not gradually turn into their descendents, but are replaced by them. Eldredge's subject of choice is trilobites, which have a long and storied record for almost 300 million years, featuring replacement after replacement, with species averaging a few million years duration each. It is a simple fact, but one that is a bit hard to square with the traditional / Darwinian and even molecular account of evolution. DNA is supposed to act like a "clock", with constant mutational change through time. And natural selection likewise acts everywhere and always... so why the stasis exhibited by species, and why the apparently rapid evolution in between replacements? That is the conundrum of punctuated equilibrium.

There have been lot of trilobites. This comes from a paper about their origin during the Cambrian explosion, arguing that only about 20 million years was enough for their initial speciation (bottom of image).

The equilibrium part, also termed stasis, is seen in the current / recent world as well as in the fossil record. We see species such as horses, bison, and lions that are identical to those drawn in cave paintings. We see fossils of animals like wildebeest that are identical to those living, going back millions of years. And we see unusual species in recent fossils, like saber-toothed cats, that have gone extinct. We do not typically see animals that have transformed over recent geological history from one (morphological) species into another, or really, into anything very different at all. A million years ago, wildebeest seem to have split off a related species, the black wildebeest, and that is about it.

But this stasis is only apparent. Beneath the surface, mutations are constantly happening and piling up in the genome, and selection is relentlessly working to ... do something. But what? This is where the equilibrium part comes in, positing that wide-spread, successful species are so hemmed in by the diversity of ecologies they participate in that they occupy a very narrow adaptive peak, which selection works to keep the species on, resulting in apparent stasis. It is a very dynamic equilibrium. The constant gene flow among all parts of the population that keeps the species marching forward as one gene pool, despite the ecological variability, makes it impossible to adapt to new conditions that do not affect the whole range. Thus, paradoxically, the more successful the species, and the more prominent it is in the fossil record, the less change will be apparent in those fossils over time.

The punctuated part is that these static species in the fossil record eventually disappear and are replaced by other species that are typically similar, but not the same, and do not segue from the original in a gradual way that is visible in the fossil record. No, most species and locations show sudden replacement. How can this be so if evolution by natural selection is true? As above, wide-spread species are limited in what selection can do. Isolated populations, however, are more free to adapt to local conditions. And if one of those local conditions (such as arctic cold) happens to be what later happens to the whole range (such as an ice age), then it is more likely that a peripherally (pre-)adapted population will take over the whole range, than that the resident species adapts with sufficient speed to the new conditions. Range expansion, for the peripheral species, is easier and faster than adaptation, for the wide-ranging originating species.

The punctuated equilibrium proposition came out in the 1970's, and naturally followed theories of speciation by geographic separation that had previously come out (also resurrected from earlier ideas) in the 1930's to 1950's, but which had not made much impression (!) on paleontologists. Paleontologists are always grappling with the difficulties of the record, which is partial, and does not preserve a lot of what we would like to know, like behavior, ecological relationships, and mutational history. But they did come to agree that species stasis is a real thing, not just, as Darwin claimed, an artifact of the incomplete fossil record. Granted- if we had fossils of all the isolated and peripheral locations, which is where speciation would be taking place by this theory, we would see the gradual change and adaptation taking place. So there are gaps in the fossil record, in a way. But as long as we look at the dominant populations, we will rarely see speciation taking place before our eyes, in the fossils.

So what does a molecular biologist have to say about all this? As Darwin insisted early in "Origin", we can learn quite a bit from domesticated animals. It turns out that wild species have a great amount of mostly hidden genetic variation. This is apparent whenever one is domesticated and bred for desired traits. We have bred dogs, for example, to an astonishingly wide variety of traits. At the same time, we have bred them out to very low genetic diversity. Many breeds are saddled with genetic defects that can not be resolved without outbreeding. So we have in essence exchanged the vast hidden genetic diversity of a wild species for great visible diversity in the domesticated species, combined with low genetic diversity.

What this suggests is that wild species have great reservoirs of possible traits that can be selected for the purposes of adaptation under selective conditions. Which suggests that speciation in range edges and isolated environments can be very fast, as the punctuated part of punctuated equilibrium posits. And again, it reinforces the idea that during equilibrium with large populations and ranges, species have plenty of genetic resources to adapt and change, but spend those resources reinforcing / fine tuning their core ecological "franchise", as it were.

In population genetics, it is well known that mutations arise and fix (that is, spread to 100% of the population on both alleles) at the same rate no matter how large the population, in theory. That is to say- bigger populations generate more mutations, but correspondingly hide them better in recessive form (if deleterious) and for neutral mutations, take much longer to allow any individual mutation to drift to either extinction or fixation. Selection against deleterious mutations is more relentless in larger populations, while relaxed selection and higher drift can allow smaller populations to explore wider ranges of adaptive space, perhaps finding globally higher (fitness) peaks than the parent species could find.

Eldredge cites some molecular work that claims that at least twenty percent of sequence change in animal lineages is due specifically to punctuational events of speciation, and not to the gradual background accumulation of mutations. What could explain this? The actual mutation rate is not at issue, (though see here), but the numbers of mutations retained, perhaps due to relaxed purifying selection in small populations, and founder effects and positive selection during the speciation process. This kind of phenomenon also helps to explain why the DNA "clock" mentioned above is not at all regular, but quite variable, making an uneven guide to dating the past.

Humans are another good example. Our species is notoriously low in genetic diversity, compared to most wild species, including chimpanzees. It is evident that our extremely low population numbers (over prehistoric time) have facilitated speciation, (that is, the fixation of variants which might be swamped in bigger populations), which has resulted in a bewildering branching pattern of different hominid forms over the last few million years. That makes fossils hard to find, and speciation hard to pinpoint. But now that we have taken over the planet with a huge population, our bones will be found everywhere, and they will be largely static for the foreseeable future, as a successful, wide-spread species (barring engineered changes). 

I think this all adds up to a reasonably coherent theory that reconciles the rest of biology with the fossil record. However, it remains frustratingly abstract, given the nature of fossils that rarely yield up the branching events whose rich results they record.


Saturday, December 9, 2023

The Way We Were: Origins of Meiosis and Sex

Sex is as foundational for eukaryotes as are mitochondria and internal membranes. Why and how did it happen?

Sexual reproduction is a rather expensive proposition. The anxiety, the dating, the weddings- ugh! But biologically as well, having to find mates is no picnic for any species. Why do we bother, when bacteria get along just fine just dividing in two? This is a deep question in biology, with a lot of issues in play. And it turns out that bacteria do have quite a bit of something-like-sex: they exchange DNA with each other in small pieces, for similar reasons we do. But the eukaryotic form of sex is uniquely powerful and has supported the rapid evolution of eukaryotes to be by far the dominant domain of life on earth.

A major enemy of DNA-encoded life is mutation. Despite the many DNA replication accuracy and repair mechanisms, some rate of mutation still occurs, and is indeed essential for evolution. But for larger genomes, the mutation rate always exceeds the replication rate, (and the purifying natural selection rate), so that damaging mutations build up and the lineage will inevitably die out without some help. This process is called Muller's ratchet, and is why all organisms appear to exchange DNA with others in their environment, either sporadically like bacteria, or systematically, like eukaryotes.

An even worse enemy of the genome is unrepaired damage like complete (double strand) breaks in the DNA. These stop replication entirely, and are fatal. These also need to be repaired, and again, having extra copies of a genome is the way to allow these to be fixed, by processes like homologous recombination and gene conversion. So having access to other genomes has two crucial roles for organisms- allowing immediate repair, and allowing some way to sweep out deleterious mutations over the longer term.

Our ancestors, the archaea, which are distinct from bacteria, typically have circular, single molecule genomes, in multiple copies per cell, with frequent gene conversions among the copies and frequent exchange with other cells. They routinely have five to twenty copies of their genome, and can easily repair any immediate damage using those other copies. They do not hide mutant copies like we do in a recessive allele, but rather by gene conversion (which means, replicating parts of a chromosome into other ones, piecemeal) make each genome identical over time so that it (and the cell) is visible to selection, despite their polyploid condition. Similarly, taking in DNA from other, similar cells uses the target cells' status as live cells (also visible to selection) to insure that the recipients are getting high quality DNA that can repair their own defects or correct minor mutations. All this ensures that their progeny are all set up with viable genomes, instead of genomes riddled with defects. But it comes at various costs as well, such as a constant race between getting lethal mutation and finding the DNA that might repair it. 

Both mitosis and meiosis were eukaryotic innovations. In both, the chromosomes all line up for orderly segregation to descendants. But meiosis engages in two divisions, and features homolog synapsis and recombination before the first division of the parental homologs.

This is evidently a precursor to the process that led, very roughly 2.5 billion years ago, to eukaryotes, but is all done in a piecemeal basis, nothing like what we do now as eukaryotes. To get to that point, the following innovations needed to happen:

  • Linearized genomes, with centromeres and telomeres, and >1 number of chromosomes.
  • Mitosis to organize normal cellular division, where multiple chromosomes are systematically lined up and distributed 1:1 to daughter cells, using extensive cytoskeletal rearrangements and regulation.
  • Mating with cell fusion, where entire genomes are combined, recombined, and then reduced back to a single complement, and packaged into progeny cells.
  • Synapsis, as part of meiosis, where all sister homologs are lined up, damaged to initiate DNA repair and crossing-over.
  • Meiosis division one, where the now-recombined parental homologs are separated.
  • Meiosis division two, which largely follows the same mechanisms as mitosis, separating the reshuffled and recombined sister chromosomes.

This is a lot of novelty on the path to eukaryogenesis, and is just a portion of the many other innovations that happened in this lineage. What drove all this, and what were some plausible steps in the process? The advent of true sex generated several powerful effects:

  1. A definitive solution to Muller's ratchet, by exposing every locus in a systematic way to partial selection and sweeping out deleterious mutations, while protecting most members of the population from those same mutations. Continual recombination of the parental genomes allows beneficial mutations to separate from deleterious ones and be differentially preserved.
  2. Mutated alleles are partially, yet systematically, hidden as recessive alleles, allowing selection when they come into homozygous status, but also allowing them to exist for limited time to buffer the mutation rate and to generate new variation. This vastly increases accessible genetic variation.
  3. Full genome-length alignment and repair by crossing over is part of the process, correcting various kinds of damage and allowing accurate recombination across arbitrarily large genomes.
  4. Crossing over during meiotic synapsis mixes up the parental chromosomes, allowing true recombination among the parental genomes, beyond just the shuffling of the full-length chromosomes. This vastly increases the power of mating to sample genetic variation across the population, and generates what we think of as "species", which represent more or less closed interbreeding pools of genetic variants that are not clones but diverse individuals.

The time point of 2.5 billion years ago is significant because this is the general time of the great oxidation event, when cyanobacteria were finally producing enough oxygen by photosynthesis to alter the geology of earth. (However our current level of atmospheric oxygen did not come about until almost two billion years later, with rise of land plants.) While this mainly prompted the logic of acquiring mitochondria, either to detoxify oxygen or use it metabolically, some believe that it is relevant to the development of meiosis as well. 

There was a window of time when oxygen was present, but the ozone layer had not yet formed, possibly generating a particularly mutagenic environment of UV irradiation and reactive oxygen species. Such higher mutagenesis may have pressured the archaea mentioned above to get their act together- to not distribute their chromosomes so sporadically to offspring, to mate fully across their chromosomes, not just pieces of them, and to recombine / repair across those entire mated chromosomes. In this proposal, synapsis, as seen in meiosis I, had its origin in a repair process that solved the problem of large genomes under mutational load by aligning them more securely than previously. 

It is notable that one of the special enzymes of meiosis is Spo11, which induces the double-strand breaks that lead to crossing-over, recombination, and the chiasmata that hold the homologs together during the first division. This DNA damage happens at quite high rates all over the genome, and is programmed, via the structures of the synaptonemal complex, to favor crossing-over between (parental) homologs vs duplicate sister chromosomes. Such intensive repair, while now aimed at ensuring recombination, may have originally had other purposes.

Alternately, others suggest that it is larger genome size that motivated this innovation. This origin event involves many gene duplication events that ramified the capabilities of the symbiotic assemblage. Such gene dupilcations would naturally lead to recombinational errors in traditional gene conversion models of bacterial / archaeal genetic exchange, so there was pressure to generate a more accurate whole-genome alignment system that confined recombination to the precise homologs of genes, rather than to any similar relative that happened to be present. This led to the synapsis that currently is part of meiosis I, but it is also part of "parameiosis" systems on some eukaryotes, which, while clearly derived, might resemble primitive steps to full-blown meiosis.

It has long been apparent that the mechanisms of meiosis division one are largely derived from (or related to) the mechanisms used for mitosis, via gene duplications and regulatory tinkering. So these processes (mitosis and the two divisions of meiosis) are highly related and may have arisen as a package deal (along with linear chromosomes) during the long and murky road from the last archaeal ancestor and the last common eukaryotic ancestor, which possessed a much larger suite of additional innovations, from mitochondria to nuclei, mitosis, meiosis, cytoskeleton, introns / mRNA splicing, peroxisomes, other organelles, etc.  

Modeling of different mitotic/meiotic features. All cells modeled have 18 copies of a polypoid genome, with a newly evolved process of mitosis. Green = addition of crossing over / recombination of parental chromosomes, but no chromosome exchange. Red = chromosome exchange, but no crossing over. Blue = both crossing over and chromosome exchange, as occurs now in eukaryotes. The Y axis is fitness / survival and the X axis is time in generations after start of modeling.

A modeling paper points to the quantitative benefits of the mitosis when combined with the meiotic suite of innovations. They suggest that in a polyploid archaean lineage, the establishment of mitosis alone would have had revolutionary effects, ensuring accurate segregation of all the chromosomes, and that this would have enabled differentiation among those polyploid chromosome copies, since they would be each be faithfully transmitted individually to offspring (assuming all, instead of one, were replicated and transmitted). Thus they could develop into different chromosomes, rather than remain copies. This would, as above, encourage meiosis-like synapsis over the whole genome to align all the (highly similar) genes properly.

"Modeling suggests that mitosis (accurate segregation of sister chromosomes) immediately removes all long-term disadvantages of polyploidy."

Additional modeling of the meiotic features of chromosome shuffling, and recombination between parental chromosomes, indicates (shown above) that these are highly beneficial to long-term fitness, which can rise instead of decaying with time, per the various benefits of true sex as described above. 

The field has definitely not settled on one story of how meiosis (and mitosis) evolved, and these ideas and hypotheses are tentative at this point. But the accumulating findings that the archaea that most closely resemble the root of the eukaryotic (nuclear) tree have many of the needed ingredients, such as active cytoskeletons, a variety of molecular antecedents of ramified eukaryotic features, and now extensive polyploidy to go with gene conversion and DNA exchange with other cells, makes the momentous gap from archaea to eukaryotes somewhat narrower.


Saturday, May 20, 2023

On the Spectrum

Autism, broader autism phenotype, temperament, and families. It turns out that everyone is on the spectrum.

The advent of genomic sequencing and the hunt for disease-causing mutations has been notably unhelpful for most mental diseases. Possible or proven disease-causing mutations pile up, but they do little to illuminate the biology of what is going on, and even less towards treatment. Autism is a prime example, with hundreds of genes now identified as carrying occasional variants with causal roles. The strongest of these variants affect synapse formation among neurons, and a second class affects long-term regulation of transcription, such as turning genes durably on or off during developmental transitions. Very well- that all makes a great deal of sense, but what have we gained?

Clinically, we have gained very little. What is affected are neural developmental processes that can't be undone, or switched off in later life with a drug. So while some degree of understanding slowly emerges from these studies, translating that to treatment remains a distant dream. One aspect of the genetics of autism, however, is highly informative, which is the sheer number of low-effect and common mutations. Autism can be thought of as coming in two types, genetically- those due to a high effect, typically spontaneous or rare mutation, and those due to a confluence of common variants. The former tends to be severe and singular- an affected child in a family that is otherwise unaffected. The latter might be thought of as familial, where traits that have appeared (mildly) elsewhere in the family have been concentrated in one child, to a degree that it is now diagnosable.

This pattern has given rise to the very interesting concept of the "Broader Autism Phenotype", or BAP. This stems from the observation that families of autistic children have higher rates where ... "the parents, grandparents, and collaterals are persons strongly preoccupied with abstractions of a scientific, literary, or artistic nature, and limited in genuine interest in people." Thus there is not just a wide spectrum of autism proper, based on the particular confluence of genetic and other factors that lead to a diagnosis and its severity, but there is also, outside of the medical spectrum, quite another spectrum of traits or temperaments which tend toward autism and comprise various eccentricities, but have not, at least to date, been medicalized.


The common nature of these variants leads to another question- why are they persistent in the population? It is hard to believe that such a variety and number of variations are exclusively deleterious, especially when the BAP seems to have, well, rather positive aspects. No, I would suggest that an alternative way to describe BAP is "an enhanced ability to focus", and develop interests in salient topics. Ever meet people who are technically useless, but warm-hearted? They are way off on the non-autistic part of the spectrum, while the more technically inclined, the fixers of the world and scholars of obscure topics, are more towards the "ability to focus" part of the spectrum. Only when such variants are unusually concentrated by the genetic lottery do children appear with frank autistic characteristics, totally unable to deal with social interactions, and given to obsessive focus and intense sensitivities.

Thus autism looks like a more general lens on human temperament and evolution, being the tip of a very interesting iceberg. As societies, we need the politicians, backslappers, networkers, and con men, but we also need, indeed increasingly as our societies and technologies developed over the centuries, people with the ability and desire to deal with reality- with technical and obscure issues- without social inflection, but with highly focused attention. Militaries are a prime example, fusing critical needs of managing and motivating people, with a modern technical base of vast scope, reliant on an army of specialists devoted to making all the machinery work. Why does there have to be this tradeoff? Why can't everyone be James Bond, both technically adept and socially debonaire? That isn't really clear, at least to me, but one might speculate that in the first place, dealing with people takes a great deal of specialized intelligence, and there may not be room for everything in one brain. Secondly, the enhanced ability to focus on technical or artistic topics may actively require, as is implicit in doing science and as was exemplified by Mr. Spock, an intentional disregard of social niceties and motivations, if one is to fully explore the logic of some other, non-human, world.


Saturday, February 11, 2023

A Gene is Born

Yes, genes do develop out of nothing.

The "intelligent" design movement has long made a fetish of information. As science has found, life relies on encoded information for its genetic inheritance and the reliable expression of its physical manifestations. The ID proposition is, quite simply, that all this information could not have developed out of a mindless process, but only through "design" by a conscious being. Evidently, Darwinian natural selection still sticks on some people's craw. Michael Behe even developed a pseudo-mathematical theory about how, yes, genes could be copied mindlessly, but new genes could never be conjured out of nothing, due to ... information.

My understanding of information science equates information to loss of entropy, and expresses a minimal cost of the energy needed to create, compute or transmit information- that is, the Shannon limits. A quite different concept comes from physics, in the form of information conservation in places like black holes. This form of information is really the implicit information of the wave functions and states of physical matter, not anything encoded or transmitted in the sense of biology or communication. Physical state information may be indestructable (and un-create-able) on this principle, but coded information is an entirely different matter.

In a parody of scientific discussion, intelligent design proponents are hosted by the once-respectable Hoover Institution for a discussion about, well, god.

So the fecundity that life shows in creating new genes out of existing genes, (duplications), and even making whole-chromosome or whole-genome duplications, has long been a problem for creationists. Energetically, it is easy to explain as a mere side-effect of having plenty of energy to work with, combined with error-prone methods of replication. But creationistically, god must come into play somewhere, right? Perhaps it comes into play in the creation of really new genes, like those that arise from nothing, such as at the origin of life?

A recent paper discussed genes in humans that have over our recent evolutionary history arisen from essentially nothing. It drew on prior work in yeast that elegantly laid out a spectrum or life cycle of genes, from birth to death. It turns out that there is an active literature on the birth of genes, which shows that, just like duplication processes, it is entirely natural for genes to develop out of humble, junky precursors. And no information theory needs to be wheeled in to show that this is possible.

Yeast provides the tools to study novel genes in some detail, with rich genetics and lots of sequenced relatives, near and far. Here is portrayed a general life cycle of a gene, from birth out of non-gene DNA sequences (left) into the key step of translation, and on to a subject of normal natural selection ("Exposed") for some function. But if that function decays or is replaced, the gene may also die, by mutation, becoming a pseudogene, and eventually just some more genomic junk.

The death of genes is quite well understood. The databases are full of "pseudogenes" that are very similar to active genes, but are disabled for some reason, such as a truncation somewhere or loss of reading frame due to a point mutation or splicing mutation. Their annotation status is dynamic, as they are sometimes later found to be active after all, under obscure conditions or to some low level. Our genomes are also full of transposons and retroviruses that have died in this fashion, by mutation.

Duplications are also well-understood, some of which have over evolutionary time given rise to huge families of related proteins, such as kinases, odorant receptors, or zinc-finger transcription factors. But the hunt for genes that have developed out of non-gene materials is a relatively new area, due to its technical difficulty. Genome annotators were originally content to pay attention to genes that coded for a hundred amino acids or more, and ignore everything else. That became untenable when a huge variety of non-coding RNAs came on the scene. Also, occasional cases of very small genes that encoded proteins came up from work that found them by their functional effects.

As genome annotation progressed, it became apparent that, while a huge proportion of genes are conserved between species, (or members of families of related proteins), other genes had no relatives at all, and would never provide information by this highly convenient route of computer analysis. They are orphans, and must have either been so heavily mutated since divergence that their relationships have become unrecognizable, or have arisen recently (that is, since their evolutionary divergence from related species that are used for sequence comparison) from novel sources that provide no clue about their function. Finer analysis of ever more closely related species is often informative in these cases.

The recent paper on human novel genes makes the finer point that splicing and export from the nucleus constitute the major threshold between junk genes and "real" genes. Once an RNA gets out of the nucleus, any reading frame it may have will be translated and exposed to selection. So the acquisition of splicing signals is a key step, in their argument, to get a randomly expressed bit of RNA over the threshold.

A recent paper provided a remarkable example of novel gene origination. It uncovered a series of 74 human genes that are not shared with macaque, (which they took as their reference), have a clear path of origin from non-coding precursors, and some of which have significant biological effects on human development. They point to a gradual process whereby promiscuous transcription from the genome gave rise by chance to RNAs that acquired splice sites, which piped them into the nuclear export machinery and out to the cytoplasm. Once there, they could be translated, over whatever small coding region they might possess, after which selection could operate on their small protein products. A few appear to have gained enough function to encourage expansion of the coding region, resulting in growth of the gene and entrenchment as part of the developmental program.

Brain "organoids" grown from genetically manipulated human stem cells. On left is the control, in middle is where ENSG00000205704 was deleted, and on the right is where ENSG00000205704 is over-expressed. The result is very striking, as an evolutionarily momentous effect of a tiny and novel gene.

One gene, "ENSG00000205704" is shown as an example. Where in macaque, the genomic region corresponding to this gene encodes at best a non-coding RNA that is not exported from the nucleus, in humans it encodes a spliced and exported mRNA that encodes a protein of 107 amino acids. In humans it is also highly expressed in the brain, and when the researchers deleted it in embryonic stem cells and used those cells to grow "organoids", or clumps of brain-like tissue, the growth was significantly reduced by the knockout, and increased by the over-expression of this gene. What this gene does is completely unknown. Its sequence, not being related to anything else in human or other species, gives no clue. But it is a classic example of gene that arose from nothing to have what looks like a significant effect on human evolution. Does that somehow violate physics or math? Nothing could be farther from the truth.

  • Will nuclear power get there?
  • What the heck happened to Amazon shopping?

Saturday, February 4, 2023

How Recessive is a Recessive Mutation?

Many relationships exist between mutation, copy number, and phenotype.

The traditional setup of Mendelian genetics is that an allele of a gene is either recessive or dominant. Blue eyes are recessive to brown eyes, for the simple reason that blue arises from the absence of an enzyme, due to a loss of function mutation. So having some of that enzyme, from even one "brown" copy of that gene, is dominant over the defective "blue" copy. You need two "blue" alleles to have blue eyes. This could be generalized to most genes, especially essential genes, where lacking both copies is lethal, while having one working copy will get you through, and cover for a defective copy. Most gene mutations are, by this model, recessive. 

But most loci and mutations implicated in disease don't really work like that. Some recent papers delved into the genetics of such mutations, and observed that their recessiveness was all over the map, a spectrum, really, of effects from fully recessive to dominant, with most in the middle ground. This is informative for clinical genetics, but also for evolutionary studies, suggesting that evolution is not, after all, blind to the majority of mutations, which are mostly deleterious, exist most of the time in the haploid (one-copy) state, and would be wholly recessive by the usual assumption.

The first paper describes a large study over the Finnish population, which benefited from several advantages. Finns have a good health system with thorough records which are housed in a national biobank. The study used 177,000 health records and 83,000 variants in coding regions of genes collected from sequencing studies. Second, the Finnish population is relatively small and has experienced bottlenecks from smaller founding populations, which amplifies the prevalence of variants that those founders had. That allows those variants to rise to higher rates of appearance, especially in the homozygous state, which generally causes more noticeable disease phenotypes. Both the detectability and the statistics were powered by this higher incidence of some deleterious mutations (while others, naturally, would have been more rare than the world-wide average, or absent altogether).

Thirdly, the authors emphasize that they searched for various levels of recessive effect, which is contrary to the usual practice of just assuming a linear effect. A linear model says that one copy of a mutation has half the effect of two copies- which is true sometimes, but not most of the time, especially in more typical cases of recessive effect where one copy has a good deal less effect, if not zero. Returning to eye color, if one looks in detail, there are many shades of eyes, even of blue eyes, so it is evident that the alleles that affect eye color are various, and express to different degrees (have various penetrance, in the parlance). While complete recessiveness happens frequently, it is not the most common case, since we generally do not routinely express excess amounts of proteins from our genes, making loss of one copy noticeable most of the time, to some degree. This is why the lack of a whole chromosome, or an excess of a whole chromosome, has generally devastating consequences. Trisomies in only three chromosomes are viable (that is, not lethal), and confer various severe syndromes.

A population proportion plot vs age of disease diagnosis for three different diseases and an associated genetic variant. In blue is the normal ("wild-type") case, in yellow is the heterozygote, and in red the homozygote with two variant alleles. For "b", the total lack of XPA causes skin cancer with juvenile onset, and the homozygotic case is not shown. The Finnish data allowed detection of rather small recessive effects from variations that are common in that population. For instanace, "a" shows the barely discernable advancement of age of diagnosis for a disease (hearing loss) that in the homozygotic state is universal by age 10, caused by mutations in GJB2.

The second paper looked more directly at the fitness cost of variations over large populations, in the heterozygous state. They looked at loss-of-function (LOF) mutations of over 17,000 genes, studying their rate of appearance and loss from human populations, as well as in pedigrees. These rates were turned, by a modeling system, into fitness costs, which are stated in percentage terms, vs wild type. A fitness cost of 1% is pretty mild, (though highly significant over longer evolutionary time), while a fitness cost of 10% is quite severe, and one of 100% is immediately lethal and would never be observed in the population. For example, a mutation that is seen rarely, and in pedigrees only persists for a couple of generations, implies a fitness cost of over 10%.

They come up with a parameter "hs", which is the fitness cost "s" of losing both copies of a gene, multiplied by "h", a measure of the dominance of the mutation in a single copy.


In these graphs, human genes are stacked up in the Y axis sorted by their computed "hs" fitness cost in the heterozygous state. Error bars are in blue, showing that this is naturally a rather error-prone exercise of estimation. But what is significant is that most genes are somewhere on the spectrum, with very few having negligible effects, (bottom), and many having highly significant effects (top). Genes on the X chromosome are naturally skewed to much higher significance when mutated, since in males there is no other copy, and even in females, one X chromosome is (randomly) inactivated to provide dosage compensation- that is, to match the male dosage of production of X genes- which results in much higher penetrance for females as well.


So the bottom line is that while diploidy helps to hide alot of variation in sexual organisms, and in humans in particular, it does not hide it completely. We are each estimated to receive, at birth, about 70 new mutations, of which 1/1000 are the kind of total loss of gene function studied here. This work then estimates that 20% of those mutations have a severe fitness effect of >10%, meaning that about one in seventy zygotes carry such a new mutation, not counting what it has inherited from its parents, and will suffer ill effects immediately, even though it has a wild-type copy of that gene as well.

Humans, as other organisms, have a large mutational load that is constantly under surveillance by natural selection. The fact that severe mutations routinely still have significant effects in the heterozygous state is both good and bad news. Good in the sense that natural selection has more to work with and can gradually whittle down on their frequency without necessarily waiting for the chance of two meeting in an unfortunate homozygous state. But bad in the sense that it adds to our overall phenotypic variation and health difficulties a whole new set of deficiencies that, while individually and typically minor, are also legion.


Saturday, January 7, 2023

A New Way of Doing Biology

Structure prediction of proteins is now so good that computers can do a lot of the work of molecular biology.

There are several royal roads to knowledge in molecular biology. First, and most traditional, is purification and reconstitution of biological molecules and the processes they carry out, in the test tube. Another is genetics, where mutational defects, observed in whole-body phenotypes or individually reconstituted molecules, can tell us about what those gene products do. Over the years, genetic mapping and genomic sequencing allowed genetic mutations to be mapped to precise locations, making them increasingly informative. Likewise, reverse genetics became possible, where mutational effects are not generated randomly by chemical or radiation treatment of organisms, but are precisely engineered to find out what a chosen mutation in a chosen molecule could reveal. Lastly, structural biology contributed the essential ground truth of biology, showing how detailed atomic interactions and conformations lead to the observations made at higher levels- such as metabolic pathways, cellular events, and diseases. The paradigmatic example is DNA, whose structure immediately illuminated its role in genetic coding and inheritance.

Now the protein structure problem has been largely solved by the newest generations of artificial intelligence, allowing protein sequences to be confidently modeled into the three dimensional structures they adopt when mature. A recent paper makes it clear that this represents not just a convenience for those interested in particular molecular structures, but a revolutionary new way to do biology, using computers to dig up the partners that participate in biological processes. The model system these authors chose to show this method is the bacterial protein export process, which was briefly discussed in a recent post. They are able to find and portray this multi-step process in astonishing detail by relying on a lot of past research including existing structures and the new AI searching and structure generation methods, all without dipping their toes into an actual lab.

The structure revolution has had two ingredients. First is a large corpus of already-solved structures of proteins of all kinds, together with oceans of sequence data of related proteins from all sorts of organisms, which provide a library of variations on each structural theme. Second is the modern neural networks from Google and other institutions that have solved so many other data-intensive problems, like language translation and image matching / searching. They are perfectly suited to this problem of "this thing is like something else, but not identical". This resulted in the AlphaFold program, which has pretty much solved the problem of determining the 3D structure of novel protein sequences.

"We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14), demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods."

The current authors realized that the determination of protein structures is not very different from the determination of complex structures- the structure of interfaces and combinations between different proteins. Many already-solved structures are complexes of several proteins, and more fundamentally, the way two proteins interact is pretty much the same as the way that a protein folds on itself- the same kinds of detailed secondary motif and atomic complementarity take place. So they used the exact AlphaFold core to create AF2Complex, which searches specifically through a corpus of protein sequences for those that interact in real life.

This turned out to be a very successful project, (though a supercomputer was required), and they now demonstrate it for the relatively simple case of bacterial protein export. The corpus they are working with is about 1500 E. coli periplasmic and membrane proteins. They proceed step by step, asking what interacts with the first protein in the sequence, then what interacts with the next one, etc., till they hit the exporter on the outer membrane. While this sequence has been heavily studied and several structures were already known, they reveal several new structures and interactions as they go along. 

Getting proteins from inside the cell to outside is quite complicated, since they have to traverse two membranes and the intermembrane space, (periplasm), all without getting fouled up or misdirected. This is done by an organized sequence of chaperone and transport proteins that hand the new proteins off to each other. Proteins are recognized by this machinery by virtue of sequence-encoded signals, typically at their front/leading ends. This "export signal" is recognized, in some instances, right as it comes out of the ribosome and captured by the SecA/B/E/Y/G machinery at the inner bacterial membrane. But most exported proteins are not recognized right away, but after they are fully synthesized.

The inner membrane (IM) is below, and the outer membrane (OM) is above, showing the steps of bacterial protein export to the outer membrane. The target protein being transported is the yellow thread, (OmpA), and the various exporting machines are shown in other colors, either in cartoon form or in ribbon structures from the auther's computer predictions. Notably, SurA is the main chaperone that carries OmpA in partially unfolded form across the periplasm to the outer membrane.

SecA is the ATP-using pump that forces the new protein through the SecY channel, which has several other accessory partners. SecB, for example, is thought to be mostly responsible for recognizing the export signal on the target protein. The authors start with a couple of accessory chaperones, PpiD and YfgM, which were strongly suspected to be part of the SecA/B/E/Y/G complex, and which their program easily identifies as interacting with each other, and gives new structures for. PpiD is an important chaperone that helps proline amino acids twist around, (a proline isomerase), which they do not naturally do, helping the exporting proteins fold correctly as they emerge. It also interacts with SecY, providing chaperone assistance (that is, helping proteins fold correctly) right as proteins pass out of SecY and into the periplasm. The second step the authors take is to ask what interacts with PpiD, and they find DsbA, with its structure. This is a disulfide isomerase, which performs another vital function of shuffling the cysteine bonds of proteins coming into the periplasmic space, (which is less reducing than the cytoplasm), and allows stable cysteine bonds to form. This is one more essential chaperone-kind of function needed for relatively complicated secreted proteins. Helping them form at the right places is the role of DsbA, which transiently docks right at the exit port from SecY. 

The author's (computers) generate structures for the interactions of the Sec complex with PpiD, YfgM, and the disulfide isomerase DbsA, illuminating their interactions and respective roles. DbsA helps refold proteins right when then come out of the transporter pore, from the cytoplasm.

Once the target protein has all been pumped through the SecY complex pore, it sticks to PpiD, which does its thing and then dissociates, allowing two other proteins to approach, the signal peptidase LepB, which cleaves off the export signal, and then SurA, which is the transporting chaperone that wraps the new protein around itself for the trip across the periplasm. Specific complex structures and contacts are revealed by the authors for all these interactions. Proteins destined for the outer membrane are characterized by a high proportion of hydrophobic amino acids, some of which seem to be specifically recognized by SurA, to distinguish them from other proteins whose destination is simply to swim around in the periplasm, such as the DsbA protein mentioned above. 

The author's (computers) spit out a ranking of predicted interactions using SurA as a query, and find itself as one protein that interacts (it forms a dimer), and also BamA, which is the central part of the outer membrane transporting pore. Nothing was said about the other high-scoring interacting proteins identified, which may not have had immediate interest.

"In the presence of SurA, the periplasmic domain [of transported target protein OmpA] maintains the same fold, but remarkably, the non-native β-barrel region completely unravels and wraps around SurA ... the SurA/OmpA models appear physical and provide a hypothetical basis for how the chaperone SurA could prevent a polypeptide chain from aggregating and present an unfolded polypeptide to BAM for its final assembly."

At the other end of the journey, at the outer membrane, there is another channel protein called BamA, where SurA docks, as was also found by the author's interaction hunting program. BamA is part of a large channel complex that evidently receives many other proteins via its other periplasmic-facing subunits, BamB, C, and D. The authors went on to do a search for proteins that interact with BamA, finding BepA, a previously unsuspected partner, which, by their model, wedges itself in between BamC and BamB. BepA, however, turns out to have a crucial function in quality control. Conduction of target proteins through the Bam complex seems to be powered only by diffusion, not by ATP or ion gradients. So things can get fouled up and stuck pretty easily. BepA is a protease, and appears, from its structure, to have a finger that gets flipped and turns the protease on when a protein transiting through the pore goes awry / sideways. 


The author's (computers) provide structures of the outer membrane Bam complex, where SurA binds with its cargo. The cargo , unstructured, is not shown here, but some of the detailed interface between SurA and BamA is shown at bottom left. The beta-barrel of BamA provides the obvious route out of the cell, or in some cases sideways into the membrane.

While filling in some new details of the outer membrane protein export system is interesting, what was really exciting about this paper was the ease with which this new way of doing biology went forth. Intimate physical interactions among proteins and other molecules are absolutely central to molecular biology, as this example illustrates. To have a new method that not only reveals such interactions in a reliable way, from sequences of novel proteins, but also presents structurally detailed views of them, is astonishing. Extending this to bigger genomes and collections of targets, vs the relatively small 1500 periplasmic-related proteins tested here remains a challenge, but doubtless one that more effort and more computers will be able to solve.


Saturday, September 17, 2022

Death at the Starting Line- Aneuploidy and Selfish Centromeres

Mammalian reproduction is unusually wasteful, due to some interesting processes and tradeoffs.

Now that we have settled the facts that life begins at conception and abortion is murder, a minor question arises. There is a lot of murder going on in early embryogenesis, and who is responsible? Probably god. Roughly two-thirds of embryos that form are aneuploid (have an extra chromosome or lack a chromosome) and die, usually very soon. Those that continue to later stages of pregnancy cause a high rate of miscarriages-about 15% of pregnancies. A recent paper points out that these rates are unusual compared with most eukaryotes. Mammals are virtually alone in exhibiting such high wastefulness, and the author proposes an interesting explanation for it.

First, some perspective on aneupoidy. Germ cells go through a two-stage process of meiosis where their DNA is divided two ways, first by homolog pairs, (that is, the sets inherited from each parent, with some amount of crossing-over that provides random recombination), and second by individual chromosomes. In more primitive organisms (like yeast) this is an efficient, symmetrical, and not-at-all wasteful process. Any loss of genetic material would be abhorrent, as the cells are putting every molecule of their being into the four resulting spores, each of which are viable.

A standard diagram of meiosis. Note that the microtubules (yellow) engage in a gradual and competitive process of capturing centromeres of each chromosome to arrive at the final state of regular alignment, which can then be followed by even division of the genetic material and the cell.


In animals, on the other hand, meiosis of egg cells is asymmetric, yielding one ovum / egg and three polar bodies, which  have various roles in some species to assist development, but are ultimately discarded. This asymmetric division sets up a competition between chromosomes to get into the egg, rather than into a polar body. One would think that chromosomes don't have much say in the matter, but actually, cell division is a very delicate process that can be gamed by "strong" centromeres.

Centromeres are the central structures on chromosomes that form attachments to the microtubules forming the mitotic spindle. This attachment process is highly dynamic and even competitive, with microtubules testing out centromere attachment sites, and using tension ultimately as the mark of having a properly oriented chromosome with microtubules from each side of the dividing cell (i.e. each microtubule organizing center) attached to each of the centromeres, holding them steady and in tension at the midline of the cell. Well, in oocytes, this does not happen at the midline, but lopsidedly towards one pole, given that one of the product cells is going to be much larger than the others. 

In oocytes, cell division is highly asymmetric with a winner-take-all result. This opens the door to a mortal competition among chromosomes to detect which side is which and to get on the winning side. 

One of the mysteries of biology is why the centromere is a highly degenerate, and also a speedily evolving, structure. They are made up of huge regions of monotonously repeated DNA, which have been especially difficult to sequence accurately. Well, this competition to get into the next generation can go some way to explain this structure, and also why it changes rapidly, (on evolutionary time scales), as centromeric repeats expand to capture more microtubules and get into the egg, and other portions of the machinery evolve to dampen this unsociable behavior and keep everyone in line. It is a veritable arms race. 

But the funny thing is that it is only mammals that show a particularly wasteful form of this behavior, in the form of frequent aneuploidy. The competition is so brazen that some centromeres force their way into the egg when there is already another copy there, generating at best a syndrome like Down, but for all other chromosomes than #21, certain death. This seems rather self-defeating. Or does it?

The latest paper observes that mammals devote a great deal of care to their offspring, making them different from fish, amphibians, and even birds, which put most of their effort into producing the very large egg, and relatively less (though still significant amounts) into care of infants. This huge investment of resources means that causing a miscarriage or earlier termination is not a total loss at all, for the rudely trisomic extra chromosome. No, it allows resource recovery in the form of another attempt at pregnancy, typically quite soon thereafter, at which point the pushy chromosome gets another chance to form a proper egg. It is a classic case of extortion at the molecular scale. 


  • Do we have rules, or not?
  • How low will IBM go, vs its retirees?

Saturday, September 10, 2022

Sex in the Brain

The cognitive effects of gonadotropin-releasing hormone.

If you watch the lesser broadcast TV channels, there are many ads for testosterone- elixir of youth, drive, manliness, blaring sales pitches, etc. Is it any good? Curiously, taking testosterone can cause alot of sexual dysfunctions, due to feedback loops that carefully tune its concentration. So generally no, it isn't much good. But that is not to say that it isn't a powerful hormone. A cascade of other events and hormones lead to the production of testosterone, and a recent paper (review) discussed the cognitive effects of one of its upstream inducers, gonadotropin-releasing hormone, or GnRH. 

The story starts on the male Y chromosome, which carries the gene SRY. This is a transcription activator that (working with and through a blizzard of other regulators and developmental processes) is ultimately responsible for switching the primitive gonad to the testicular fate, from its default which is female / ovarian. This newly hatched testis contains Sertoli cells, which secrete anti-Mullerian hormone (AMH, a gene that is activated by SRY directly), which in the embryo drives the regression of female characteristics. At the same time testosterone from testicular Leydig cells drives development of male physiology. The initial Y-driven setup of testosterone is quickly superceded by hormones of the gonadotropin family, one form of which is provided by the placenta. Gonadotropins continue to be essential through development and life to maintain sexual differentiation. This source declines by the third trimester, by which time the pituitary has formed and takes over gonadotropin secretion. It secretes two gondotropin family members, follicular stimulating hormone (FSH) and leutinizing hormone (LH), which each, despite their names, actually have key roles in male as well as female reproductive development and function. After birth, testosterone levels decline and everything is quiescent until puberty, when the hormonal axis driven by the pituitary reactivates.

Some of the molecular/genetic circuitry leading to very early sex differentiation. Note the leading role of SRY in driving male development. Later, ongoing maintenance of this differentiation depends on the gonadotropin hormones.

This pituitary secretion is in turn stimulated by gonadotropin releasing hormone (GnRH), which is the subject of the current story. GnRH is produced by neurons that, in embryogenesis, originate in the nasal / olfactory epithelium and migrate to the hypothalamus, close enough to the pituitary to secrete directly into its blood supply. This circuit is what revs up in puberty and continues in fine-tuned fashion throughout life to maintain normal (or declining) sex functions, getting feedback from the final sex hormones like estrogen and testosterone in general circulation. The interesting point that the current paper brings up is that GnRH is not just generated by neurons pointing at the pituitary. There is a whole other set of neurons in the hypothalamus that also secrete GnRH, but which project (and secrete GnRH) into the cortex and hippocampus- higher regions of the brain. What are these neurons, and this hormone, doing there?

The researchers note that people with Down Syndrome characteristically have both cognitive and sexual defects resembling incomplete development, (among many other issues), the latter of which resemble or reflect a lack of GnRH, suggesting a possible connection. Puberty is a time of heightened cognitive development, and they guessed that this is perhaps what is missing in Down Syndrome. Down Syndrome typically winds up in early-onset Alzheimer disease, which is also characterized by lack of GnRH, as is menopause, and perhaps other conditions. After going through a bunch of mouse studies, the researchers supplemented seven men affected by Down Syndrome with extra GnRH via miniature pumps to their brains, aimed at target areas of this hormone in the cortex. It is noteworthy that GnRH secretion is highly pulsitile, with a roughly 2 hour period, which they found to be essential for a positive effect. 

Results from the small-scale intervention with GnRH injection. Subjects with Down Syndrome had higher cortical connectivity (left) and could draw from a 3-D model marginally more accurately.

The result (also seen in mouse models of Down Syndrome and of Alzheimer's Disease) was that the infusion significantly raised cognitive function over the ensuing months. It is an amazing and intriguing result, indicating that GnRH drives significant development and supports ongoing higher function in the brain, which is quite surprising for a hormone thought to be confined to sexual functions. Whether it can improve cognitive functions in fully developed adults lacking impeding developmental syndromes remains to be seen. Such a finding would be quite unlikely, though, since the GnRH circuit is presumably part of the normal program that establishes the full adult potential of each person, which evolution has strained to refine to the highest possible level. It is not likely to be a magic controller that can be dialed beyond "max" to create super-cognition.

Why does this occur in Down Syndrome? The authors devote a good bit the paper to an interesting further series of experiments, focusing on regulatory micro-RNAs, several of which are encoded in genomic regions duplicated in Down Syndrome. microRNAs are typically regulators that repress transcription, explaining how this whole circuitry of normal development, now including key brain functions, is under-activated in those with Down Syndrome.

The authors offer a subset of regulatory circuitry focusing on micro-RNA repressors of which several are encoded on the trisomic chromosome regions.

"HPG [hypothalamus / pituitary / gonadal hormone] axis activation through GnRH expression at minipuberty (P12; [the phase of testoserone expression in late mouse gestation critical for sexual development]) is regulated by a complex switch consisting of several microRNAs, in particular miR-155 and the miR-200 family, as well as their target transcriptional repressor-activator genes, in particular Zeb1 and Cebpb. Human chromosome 21 and murine chromosome 16 code for at least five of these microRNAs (miR-99a, let-7c, miR-125b-2, miR-802, and miR-155), of which all except miR-802 are selectively enriched in GnRH neurons in WT mice around minipuberty" - main paper

So, testosterone (or estrogen, for that matter) isn't likely to unlock better cognition, but a hormone a couple of steps upstream just might- GnRH. And it does so not through the bloodstream, but through direct injection into key areas of the brain both during development, and also on an ongoing basis through adulthood. Biology as a product of evolution comprises systems that are highly integrated, not to say jury-rigged, which makes biology as a science difficult, being the quest to separate all the variables and delineate what each component and process is doing.


Saturday, June 18, 2022

Balancing Selection

Human signatures of balancing selection, one form and source of genomic variation.

We generally think of selection as an inexorable force towards greater fitness, eliminating mutations and less fit forms in favor of those more successful. But there is a lot else going on. For one thing, much mutation is meaningless, or "neutral". For another, our lives and traits are so complicated that interactions can lead to hilly adaptive landscapes where many successful solutions exist, rather than just one best solution. One form of adaptive and genetic complexity is balancing selection, which happens when two alleles (i.e. mutants or variants) of one gene have distinct roles in the whole organism or ecological setting, each significant, and thus each is maintained over time. 

A quick example is color in moths. Dark colors work well as camouflage in dirty urban environments, while lighter colors work better in the countryside. Since both conditions exist, and moths move around between them, both color schemes are selected for, resulting in a population that is persistently mixed for this trait. Indeed, the capacity of predators to learn these colors may also lead to an automatic advantage for the less frequent color, another form of balancing selection. Heterozygotes may also have an intrinsic advantage, as is so clearly the case for the sickle cell mutation in hemoglobin, against malaria. These are all classic examples. But to bring it home, a society has only so much capacity for people like Donald Trump. Insofar as sociopathy is genetic, there will necessarily be a frequency-dependent limit, where this trait (and other antisocial traits) may be highly successful at (extremely) low frequency, but terminally destructive at high frequencies.


Schematic selective landscapes. Sometimes selection just optimizes an existing trait by intensifying it (1), or moving it along trait space to a new optimum (2). But other times, multiple forms (i.e. variants, or mutations) of a given locus each have some useful / beneficial characteristic, and may be selected either discretely for particular effects (3), or generally for their diversity (4).

One laborious method to find such sites of balancing selection in a genome is to compare it to genomes of other species. If the same variants exist in each species over long periods of divergence, that argues that such conserved sites of diversity are maintained by balancing selection. Studies of humans and chimpanzees have found some such sites, but not many. But these methods are known to be very conservative, missing out on what is likely to be most cases.

A recent paper offered a slighly more sensitive way to find signs of balancing selection in the human genome, and found quite a lot of them. (Some background here.) It is based, as many investigations of selection are, on a special property of protein-coding genes, due to the degeneracy of the genetic code, that some mutations are "synonymous" and lead to no change in the coded protein, and others are "non-synonymous" and do change the protein. The latter would be assumed to be visible to selection, and sometimes give significant signals of conservation (i.e. low rates of change between species and populations, and few variations maintained in a population). This embedded signal/control pairing of information helps to insulate against many problems in analysis, and can tell us pretty directly how severe selection is on such sites. 

It is worth adding that each basepair in the human genome has its own selective constraints. One position may code for the active site of some enzyme and be extremely well conserved, while the next may be a "synonymous" that has very few or no selective constraints, and another lies in junk DNA that doesn't code for anything or regulate anything, is effectively neutral, and can be changed with no effect. The system is in this sense massively parallel, and able to experience evolution individually at each site concurrently. On the other hand, selection on one site affects the frequencies at nearby sites, since selective "sweeps" through that area of the genome drag the nearby regions of DNA (and whatever variants they may harbor) along, whether positively if the site is increasing in frequency, or negatively if it is deleterious and causing death of its bearers. The reach of this "linkage" effect depends on the recombination frequency, which is relatively low, leading the moderate stability (and linkage) of relatively large "haplotypes" in our genomes.

At any rate, as the methods for detecting selection improve, more selection is detected, which is the lesson of this paper. These authors claim that while their method still significantly under-estimates balancing selection, they find evidnce for the existence of hundreds of sites in humans, when comparing genomes between different geographic regions of the world. A couple hundred of these sites are in the MHC regions- the immunological areas of the genome that code for antibodies and related proteins. These are well-known to be hotspots both for diversity and for the ongoing selective arms race vs pathogens (as we have recently experienced vs Covid). Seeing a lot of balancing selection there makes complete sense, naturally. 

The authors note that their focus on coding regions of the genome, and other technical limitations such as the need to find these sites through population comparisons, argues strongly that their estimate is a severe undercount. Thus one can assume that there will be at least several thousand sites of balanced selection in humans. This is quite apart from the many more sites of ongoing unidirectional selection, mostly purifying against problem mutations, but also towards positive characteristics. An accounting that is only starting to get going, over the vast amounts of variation we harbor. So we live in a dynamic world, inside and out.


  • Green fuel for airplanes... really?
  • Barr is not the good guy here.
  • Free speech- not entirely free.
  • Court to workers: drop dead.
  • Islam and the megadrought.
  • Is crypto this cycle's subprime black hole?