Saturday, March 28, 2026

Death and Resurrection ... Of a Gene

The SLAMF9 gene became non-functional in the human lineage, and then later was re-activated. Why?

Biology is amazingly intricate, but it is often also needlessly complex- evidence for the haphazard, if eventually pointed, mechanisms of the evolutionary process. We will take up the discussion of "junk" DNA again next week, but molecular biology is full of redundant and excessive processes, which should certainly be mystifying from a "design" perspective. At the frontier of natural selection are neutral and near-neutral genetic elements, which change over time due to chance, lacking selection pressure towards conservation. Pseudogenes (of which we have about 20,000- almost as many as functional genes) are one form of neutral element. They are typically remnants of functional genes that have been duplicated and inactivated by mutation. They are a lively area of genome annotation because it is hard to be sure that they are really dead. Despite what looks like an inactivating mutation, they typically still produce RNA transcripts, and may produce partial or alternative proteins as well. The literature is full of experiments finding products and activities from genes annotated elsewhere as pseudogenes. And what looks like a pseudogene from one sample might just be an allele, the same gene being whole and active in other people.

So, it is hard to know what any particular genetic region is doing without a lot of evolutionary, functional, and even population analysis. A recent paper looked deeply at one gene- a gene that seems to have flipped back and forth between functional and non-functional states in the human lineage. It is a rare example of a gene coming back from what is usually a one-way trip into mutational oblivion, once its function- and thus selective pressure for conservation- have disappeared.

SLAMF9 is one of a family (signaling lymphocyte activation molecule family) of surface receptors that occur in many cells of the immune system, help activate responses in these cells, and also recognize some viruses and bacteria. They bind to each other and to other components of the immune system, creating complex signaling networks. Genes involved in our immune systems are commonly subject to rapid evolution, the arms race against our many pathogens being relentless. Sometimes that takes the form of gene inactivation, if a particular receptor, for instance, has been turned against us by a pathogen that uses it for binding and cell entry. 

This week's authors were facing a conundrum. They were studying SLAMF9, and found the mouse version easy to clone and express in the lab. But the human version ... that was another story, frustratingly impossible to express in usable amounts. When they looked at the protein sequence, they were in for a big surprise:

At the front end of SLAMF9, there is very strong conservation across mammals... except when it comes to humans! The signal peptide is what directs this protein to be inserted into the plasma membrane, and is cleaved off the mature protein. In red is highlighted the region starkly different in humans, which naturally affects (not in a good way) the signal cleavage process. "a" and "b" point to important domains of the cytoplasmic side of the final protein, which are just barely preserved/conserved in the human form.

This alignment among various mammalian versions (orthologs) of SLAMF9 shows that they are all pretty much the same... except for the human version. All the way from mouse to chimpanzee nothing has changed at the front end of this protein. That is amazing in itself, showing very strong conservation. But then after our lineage split from chimpanzees, something weird. A small segment at the front of this protein is totally different. This area is important because it carries the cleavage site of the signal sequence. The signal sequence directs the protein to be sent to the membrane (as this is a trans-membrane receptor), and this cleavage site is bad, explaining why the author's attempt to express this protein went so poorly. It might be enough for modest expression in the natural setting, but not enough for their investigations.

At the DNA level, it is clear that what happened to the protein was a double frame shift in translation, out of frame at the front, then recovered frame at the second mutation. The mutations must have been independent events, but the order of their occurrence is not known. The first intron trails off to the left, while the coding sequence tails off to the right.

When they looked at the DNA sequence, the reason for this change in the protein sequence became clearer. There was a frame shift, with only small changes in the DNA sequence that led to the bigger change in the protein sequence. On the left, there is a shift in the splice site at the end of the first intron (splice acceptor). This shifts the mRNA product by four bases (vs the start site of translation), creating a frame shift in translation, as portrayed in the amino acid codes given. On the right, there is a one nucleotide deletion, causing another frame shift that brings the translation back into the normal frame. 

They sampled all the available archeological samples from the human lineage- Neanderthals and Denisovans, and each were the same as the current human sequence. So, whatever happened did so between the split from chimpanzees and the advent of these available homo species. And what happened were two distinct events- the second frame shift and the first frame shift are independent genetic mutations. 

Which happened first? That is uncertain, but the authors show that the right-most frame shift (called g.621delT) did not influence the change in the splice site. The splice site change was caused by a series of about six mutations within the first intron, (not shown), which shifted the pattern of mRNA self-hybridization that helps direct splice site selection. So it is likely that the splice site change happened first, essentially killing the gene. And then the downstream frameshift happened later on to rescue it in a partial, not very well-expressed way. However, either mutation could have happened first to functionally kill off this gene, and then further mutation(s) to recover its function. In any case, both events happened within this roughly six-million-year time span that generated our immediate lineage, becoming firmly fixed as the only version of this gene now in our collective genome.

What might cause these events? It all goes back to the function of SLAMF9. As shown above, it is very highly conserved. But, being part of the immune system and the interface we show to pathogens, it is also on the front line of the bio-warfare arms race. As humans started ranging far beyond their original habitats, they doubtless encountered many new pathogens. It seems likely that killing off this gene might have resolved one such fight, at least for a little while, perhaps by removing a pathogen entry point. But later on, it became beneficial to recover it, which is to say that new mutations that restored its function even a little bit were evidently selected for, and spread in the population. There was a race at this point between the accumulation of more (now neutral) mutations that would have permanently inactivated this gene, and the advent of that one special mutation that could save it. The overall conservation of SLAMF9 argues that saving it must have conferred significant benefits.