Showing posts with label evolution. Show all posts
Showing posts with label evolution. Show all posts

Saturday, April 11, 2026

Pumping Calcium

An ornate ion pump manages rapid outflow of calcium.

In the beginning, the egg cell experienced a wave of calcium release, triggered by union with a sperm cell. This blocked other sperm from entering, and prepared the egg to become a zygote and embark on embryogenesis. It is but one example of the pervasive role of calcium signaling among animals. Another is the muscle activation cycle, which relies on calcium release from the specialized sarcoplasmic reticulum (in response to a nerve activation) to get the cell as a whole contracting. Generally, calcium is kept very low in the cytoplasm, and high in the endoplasmic reticulum and outside the cell. Thus, channels gated by electrical activation or other signals can cause rapid cytoplasmic calcium spikes and signal widely within a cell. 

On the flip side, there have to be pumps that keep the cytoplasmic concentration low, and a recent paper elucidates the structure of one such pump that is remarkably fast, while also closely regulated. It is an impressive machine. PMCA2 is an ATP-using calcium pump that sits in the plasma membrane and carries out what is called the Post-Albers cycle. This is a flip-switch mechanism for pumping ions, where ATP drives conformational switches alternately exposing ion binding sites to each side of the membrane. When the pore is open to the cytoplasm, there is no competition from higher concentrations outside, so the active site can bind one internal calcium, given a high-affinity site. Then, after the conformational switch, the pore is exposed to the outside, and at the same time the site is reconfigured to be lower-affinity, releasing the calcium ion into a high concentration environment. Neurons especially use calcium signaling extensively to operate synapses and regulate growth and development. Their rapid and frequent signaling requires a pump that has especially high capacity. PMCA2 operates at a maximal rate of several thousands of Ca2+ ions pumped per second.

Cartoon of the Post-Albers cycle, which is shared by a large family of active ATP-using pumps that transfer ions against their chemical concentration gradient. M is the main transmembrane domain of the pump, where the ions traverse the membrane. The N, P, and A domains are regulatory, especially binding and cleaving ATP  at an interface between the N, P, and A domain. The cycle links power steps (1,2) with conformational changes that carefully gate the pumping process.

And that is not all. Since calcium has a charge of 2+ and this pump does not intend to alter charge across the membrane, the pump simultaneously has binding sites for counter-ions (generally two OH-) that are transferred in the opposite direction from the calcium. Not only that, but every pump of this kind requires regulation of various kinds. PMAC2 is activated by phosphatidyl inositol 4,5 bisphosphate (PIP2), which is another important signaling molecule generated by specific PI kinases in response to activation of G-protein coupled receptors or protein kinase C, which may respond to external signals. In very general terms, these tend to be pro-growth or stress-induced pathways. These regulatory processes can tune the overall rate of recovery from rapid Ca2+ signaling events, by adjusting the level and activity of pumps like PMAC2. 

ATP binds at the N/P/A domain interface, and its hydrolysis (and loss of ADP) generates extensive shape changes, including into the transmembrane M domain. At the very bottom, the calcium ion is shown in green, bound inside the M domain pumping channel. The motions here are subtle, but enough to dramatically reshape the calcium channel.

The authors, using various substrate variants and other tricks, were able to develop structures of PMAC2 in several steps of the pumping cycle, using cryo-electron microscopy. The ATPase site in the N domain (red) is far from the channel that conducts the calcium ion (brown, far bottom). They show extensive shape changes from binding or losing the ATP molecule, though they mostly concern the intracellular domains (red, blue, yellow). The effects on the transmembrane pore domain are rather subtle, shown on right. The authors claim that, compared to other pumps of this large family, the structural changes are significantly less, suggesting that evolution for speed has caused the mechanism to become more efficient, with less wasted motion per conduction event, at least in the channel region itself.

Relation of the PIP2 binding domain (orange/red stick figures) to the calcium core binding site. PIP2 appears to be essential for rapid pump operation. At bottom is shown some schematics of the gating provided by PIP2 in bound and unbound states, especially via the D873 side chain (negatively charged aspartic acid).


They also find that the activating molecule PIP2 is neatly parked right next to the main calcium binding and conduction region, and is more or less essential for enzyme activity. In the graph above (e), they show that several single mutations made in the calcium binding high affinity site, for example switching the negatively charged D873 for the positively charged K (lysine), kills ion pumping activity. Mutation of the PIP2 binding pocket (KKQ->TLL, around position 347) likewise kills enzyme activity.

Relation of the counter-ion channel (red dots) with the calcium channel. Both are essential parts of the mechanism. Closeups with the coordinating protein side chains shown on the right.

The whole mechanism is alluded to in the last figure, where the central calcium binding site is shown, with the general direction of calcium pumping. The counter-ion transport area is shown nearby as a flurry of red dots (standing for water molecules, which at this scale are interchangeable with OH ions). Specific single mutations in either area, either changing negatively charged E412 to positively charged lysine at the calcium binding pocket, or changing polar S877 in the water/hydroxy binding area to the bulky and hydrophobic F (phenylalanine), each kill pumping activity (graph). 

While it would be ideal to have a more dynamic representation of what is going on, the new structures give tremendous detail, including the associated ATP, PIP2, calcium, and water molecules. The mutations also nail down several functional points. Obviously a rather intricate and well-oiled machine that keeps its bit of cellular calcium homeostasis on an even keel. It is hard to believe that the sum of thousands of machines like this one is life, but the deeper we look the more true that appears to be.


Saturday, April 4, 2026

Not Every Transcript is Golden

 Reflections on junk DNA, and junk transcripts.

Some time ago, a large project in molecular biology determined that most regions of the genome are transcribed. The authors and most observers took this to mean that most regions are functional, quite in contrast to the reigning theory up to that point, that our genomes host a smattering of genes floating in a sea of "junk" DNA. That theory was based on the now-ancient observations of reannealing curves for bulk DNA from humans and other species which found that most of our DNA re-anneals very quickly, due to the fact that it is repetitive. Most of our genomes (60%) are taken up with LINE repeats, SINE repeats, old retro-transposons, stray duplications, and other repetitive material that, at a first glance, seems like junk. There has been a battle ever since, between proponents of junk DNA and those who see function around every corner. As we learn more about the genome, many more functions have indeed come to light, like distant enhancers and regulatory RNAs of many flavors. But overall, there still seems to be a lot of junk. 

A recent paper took an oblique shot at this field, looking at the profusion of alternative gene transcripts, which can number into the hundreds for a single gene. (This was also reviewed.) These are generally called isoforms, and arise due to variable ways one gene's RNA products can be initiated, terminated, and spliced. So not only are most regions of the genome transcribed in some form, actively transcribed regions can be transcribed and processed in myriad ways to lead to different RNA products. Here again, there has been an analogous argument, about whether every such isoform has a function, or whether isoforms might arise from more or less sporadic processes, often as unintended and non-functional sparks coming out of the machinery. The importance of isoforms is very well documented in many cases, so the possibility of function, sometimes highly conserved, is not in question. Only the importance of every last variation in combinatorial collections of isoforms that can number into the hundreds.

Here is an image from the first page (of about six pages) of RNA transcripts coming off the notorious BRCA1 gene, which is intensely studied for its role in breast cancer. Each line is a distinct mRNA transcript. Each darker bar is an exon, which are separated by introns. The darker colored exons are in the protein coding region, while the lighter exons signify the untranslated upstream and downstream ends. I count about 315 transcripts described for this genetic locus. The idea that each of these has some evolutionarily constrained and important function is, on the face of it, absurd.

The authors took an interesting evolutionary approach, reasoning that species with larger population sizes experience more stringent purifying selection, and thus should (in theory) show tighter control over stray genomic products such as isoforms, if most transcript isoforms are neutral (or even deleterious) accidents, rather than intentional and functional forms. Thankfully, animals come in a wide range of population sizes, from insects to crocodiles and primates; very large to very small. While population size is hard to calculate, several convenient proxies are known, like lifespan, body size, etc. When they totted everything up, they saw clear correlations between these proxies and the number of alternative RNA products per gene- also termed transcript diversity. They sliced up the data by organ where the RNA was expressed, and by the source of the RNA variation- either different initiation, different termination, different splicing. In all cases the trend was the same. In species with larger population sizes, the diversity of transcripts was lower, agreeing with their hypothesis that when greater selecive force is available, the slop from the transcription and transcript processing machinery declines.

The authors draw correlations between alternative splicing (AI) diversity in an organism's cells and its population size. 

The authors additionally note that there is a similar relationship between alternative splice site usage and expression level of a gene. That is, the higher the gene expression, the less likely that minor splice sites are used, indicating that here again, higher selective pressure helps to clear out non-functional off-products of the transcription apparatus.

The correlations found here are only that- correlations. While significant, they are not terribly strong, let alone stark. So it is evident that our gene expression machinery has a lot of play in it, and this falls on a spectrum from deleterious to critically functional. It is, after all, machinery, not divine. It is also grist for evolution itself- it is useful to have some slop so that there is always some diversity in the gearing to accommodate new selective pressures. But the idea that just because a distinct transcript exists, it is biologically functional, or that, similarly, because a genomic region is transcribed, it is a "gene" rather than junk DNA.. that does not hold water. Every nucleotide in the genome has its own unique selective constraints, and for many of them, that constraint is zero.


  • The world order, and our position in it, is crumbling.
  • Whence Hungary?
  • Another AI tax, as if gobbling up power wasn't bad enough.
  • Mindless.

Saturday, March 28, 2026

Death and Resurrection ... Of a Gene

The SLAMF9 gene became non-functional in the human lineage, and then later was re-activated. Why?

Biology is amazingly intricate, but it is often also needlessly complex- evidence for the haphazard, if eventually pointed, mechanisms of the evolutionary process. We will take up the discussion of "junk" DNA again next week, but molecular biology is full of redundant and excessive processes, which should certainly be mystifying from a "design" perspective. At the frontier of natural selection are neutral and near-neutral genetic elements, which change over time due to chance, lacking selection pressure towards conservation. Pseudogenes (of which we have about 20,000- almost as many as functional genes) are one form of neutral element. They are typically remnants of functional genes that have been duplicated and inactivated by mutation. They are a lively area of genome annotation because it is hard to be sure that they are really dead. Despite what looks like an inactivating mutation, they typically still produce RNA transcripts, and may produce partial or alternative proteins as well. The literature is full of experiments finding products and activities from genes annotated elsewhere as pseudogenes. And what looks like a pseudogene from one sample might just be an allele, the same gene being whole and active in other people.

So, it is hard to know what any particular genetic region is doing without a lot of evolutionary, functional, and even population analysis. A recent paper looked deeply at one gene- a gene that seems to have flipped back and forth between functional and non-functional states in the human lineage. It is a rare example of a gene coming back from what is usually a one-way trip into mutational oblivion, once its function- and thus selective pressure for conservation- have disappeared.

SLAMF9 is one of a family (signaling lymphocyte activation molecule family) of surface receptors that occur in many cells of the immune system, help activate responses in these cells, and also recognize some viruses and bacteria. They bind to each other and to other components of the immune system, creating complex signaling networks. Genes involved in our immune systems are commonly subject to rapid evolution, the arms race against our many pathogens being relentless. Sometimes that takes the form of gene inactivation, if a particular receptor, for instance, has been turned against us by a pathogen that uses it for binding and cell entry. 

This week's authors were facing a conundrum. They were studying SLAMF9, and found the mouse version easy to clone and express in the lab. But the human version ... that was another story, frustratingly impossible to express in usable amounts. When they looked at the protein sequence, they were in for a big surprise:

At the front end of SLAMF9, there is very strong conservation across mammals... except when it comes to humans! The signal peptide is what directs this protein to be inserted into the plasma membrane, and is cleaved off the mature protein. In red is highlighted the region starkly different in humans, which naturally affects (not in a good way) the signal cleavage process. "a" and "b" point to important domains of the cytoplasmic side of the final protein, which are just barely preserved/conserved in the human form.

This alignment among various mammalian versions (orthologs) of SLAMF9 shows that they are all pretty much the same... except for the human version. All the way from mouse to chimpanzee nothing has changed at the front end of this protein. That is amazing in itself, showing very strong conservation. But then after our lineage split from chimpanzees, something weird. A small segment at the front of this protein is totally different. This area is important because it carries the cleavage site of the signal sequence. The signal sequence directs the protein to be sent to the membrane (as this is a trans-membrane receptor), and this cleavage site is bad, explaining why the author's attempt to express this protein went so poorly. It might be enough for modest expression in the natural setting, but not enough for their investigations.

At the DNA level, it is clear that what happened to the protein was a double frame shift in translation, out of frame at the front, then recovered frame at the second mutation. The mutations must have been independent events, but the order of their occurrence is not known. The first intron trails off to the left, while the coding sequence tails off to the right.

When they looked at the DNA sequence, the reason for this change in the protein sequence became clearer. There was a frame shift, with only small changes in the DNA sequence that led to the bigger change in the protein sequence. On the left, there is a shift in the splice site at the end of the first intron (splice acceptor). This shifts the mRNA product by four bases (vs the start site of translation), creating a frame shift in translation, as portrayed in the amino acid codes given. On the right, there is a one nucleotide deletion, causing another frame shift that brings the translation back into the normal frame. 

They sampled all the available archeological samples from the human lineage- Neanderthals and Denisovans, and each were the same as the current human sequence. So, whatever happened did so between the split from chimpanzees and the advent of these available homo species. And what happened were two distinct events- the second frame shift and the first frame shift are independent genetic mutations. 

Which happened first? That is uncertain, but the authors show that the right-most frame shift (called g.621delT) did not influence the change in the splice site. The splice site change was caused by a series of about six mutations within the first intron, (not shown), which shifted the pattern of mRNA self-hybridization that helps direct splice site selection. So it is likely that the splice site change happened first, essentially killing the gene. And then the downstream frameshift happened later on to rescue it in a partial, not very well-expressed way. However, either mutation could have happened first to functionally kill off this gene, and then further mutation(s) to recover its function. In any case, both events happened within this roughly six-million-year time span that generated our immediate lineage, becoming firmly fixed as the only version of this gene now in our collective genome.

What might cause these events? It all goes back to the function of SLAMF9. As shown above, it is very highly conserved. But, being part of the immune system and the interface we show to pathogens, it is also on the front line of the bio-warfare arms race. As humans started ranging far beyond their original habitats, they doubtless encountered many new pathogens. It seems likely that killing off this gene might have resolved one such fight, at least for a little while, perhaps by removing a pathogen entry point. But later on, it became beneficial to recover it, which is to say that new mutations that restored its function even a little bit were evidently selected for, and spread in the population. There was a race at this point between the accumulation of more (now neutral) mutations that would have permanently inactivated this gene, and the advent of that one special mutation that could save it. The overall conservation of SLAMF9 argues that saving it must have conferred significant benefits.


Saturday, March 7, 2026

How 5S rRNA Gets Into the Ribosome

For a minor component, it gets a lot of molecular love.

As mentioned several times in this space, the ribosome, which synthesizes proteins according to mRNA instructions, is an extremely ancient and complicated machine. Its core, including the catalytic site, is RNA. This marks it as a hold-over from the RNA world, as the thing that made proteins, (probably tiny proteins at first), before proteins had become a thing. But boy has there been a lot of duct-taping since then. In humans, there are four ribosomal RNAs, eighty proteins pasted on the outside, and hundreds of other proteins or RNAs involved in assembling the ribosome, not to mention dozens of initiation factors and other regulators that help during translation.

A recent paper discussed the maturation of 5S ribosomal RNA, which is the smallest rRNA, and one whose function is more peripheral than the large central 16S and 23S rRNAs. It is present in all life forms, though ribosomes inside mitochondria do without it. Its processing is an interesting case study of the complexity that has accumulated over the eons. Exactly what the 5S rRNA does remains a bit unclear, though it clearly contributes to the dynamics of the large ribosomal subunit, and occupies the "central protuberance". One group ligated it into the large subunit 23S rRNA, showing that translation still worked quite well with the 5S portion stably tacked into the structure. But then they also found that these ribosomes fell with high frequency into an unproductive locked state, suggesting that the independent nature of the 5S rRNA plays an important role in the dynamics of the ribosome. 

At any rate, the assembly of 5S into the rest of the structure is a story in itself. There are multiple steps involved, some involving ATP-using helicases. As it comes off the gene, 5S rRNA is bound by two proteins- the TFIIIA regulatory factor that activates its transcription, and also La protein (aka La antigen), which is a storage protein, named after systemic lupus, for which it is one immune target. To be incorportated into ribosomes, the RNA is next bound by a complex of Rpl5 and Rpl11, which will remain with the 5S RNA and become part of the eventual ribosome. Next come Rpf2 and Rrs1, which are two assembly facilitators that bind as a complex. Then comes Rsa4, which is similarly an assembly protein that helps the whole mess bind to the proper place on the (immature) large ribosomal subunit. Lastly, Rea1 (called MDN1 in humans) is an ATP-driven RNA helicase that wrenches the whole 5S-containing protuberance into its final and quite different position. 

The authors provide a scheme for the stepwise processing and assembly of 5S rRNA into the ribosome, involving numerous assembly factors, ribosomal proteins, and a helicase. 

It is quite an amazing story of progressive assembly, all to attach an element of the ribosome that is hardly central, but is rather a relatively late accretion on the machinery. Nevertheless, it evidently deserves specialized attention for correct placement. 

A less schematic view of various steps heading toward ribosomal assembly. 5S rRNA is in teal, and the helicase Rea1 is in dark gold, mounted like a wrench at the top of the (late) structure.

  • We are strangling Ukraine. Why?
  • Building more housing reduces housing shortages.

Saturday, February 21, 2026

Bad Faith

"Skepticism" about vaccines, or about evolution... isn't skepticism at all.

This was going to be a post about the Ediacaran epoch, which is a fascinating time, spanning the hundred million years before the Cambrian, when animals began to appear in the fossil record. First tentatively, as sessile sheets of tissue, then later as beautiful motile segmented discs (below), and later still as something a bit more aggressively shaped. All this is yet another (as though more were needed) justification of the overall scheme of evolution developed by Charles Darwin, who knew about what in his time was already a distinct and puzzling lack of animal fossils below the Cambrian strata. If one seeks data in good faith, and while pursuing a well-reasoned hypothesis, one is bound to find something interesting! Incidentally, David Attenborough and the BBC gave an excellent treatment of this era.

The beautiful, and slightly motile, Dickinsonia, from the mid-Ediacaran, about 560 million years ago.

Yet there are some who don't see things that way. Over at the Discovery institute, they don't discover anything, but they do write blogs taking potshots at science. No fossil is enough, no explanation is acceptable. There is always a question left to answer, a gap that has not been filled, wherein God can be squeezed. It is the essence of bad faith, arguing not from reason and evidence, but from a pre-existing truth that must be defended at all costs, particularly against an inveterate enemy which ignores them so assiduously. An enemy who publishes, and publishes, and for whom the God hypothesis is not even worth mentioning in the quest to explain how things happen. Who regards them and their arguments as beneath contempt.

Well, needless to say, they were all in for Trump. If one likes to argue in bad faith, why not vote for the master? The new administration doesn't seem to have much stomach for the Darwinian evolution culture war of decades past (though there are three years to go.. who knows!?). But they have been doing their best to destroy science in the US and put China in the lead for good. Whether it is climate science, medical science, space science, it is all being gutted. Diplomats and other experts? Who needs them? We have discarded our allies, and our best friends are now Victor Orban, Vladimir Putin, Muhammad bin Salman, and Bibi Netanyahu. Fairness and legality at the Justice Department? Who needs that? The theme is kicking intelligent, moral people in the teeth, to bring in a callous, greedy and corrupt new dispensation for the US. One of the more alarming and consequential elements of this campaign has been against vaccines. All the real scientists were thrown off the advisory panels, replaced with vaccine "skeptics". 

We in the US have a long history with vaccines. The revolutionary war featured solicitations for soldiers already vaccinated against (or recovered from) smallpox, the most frightening scourge of the army's camps. Later on, Washington approved mass vaccination for the army. Vaccination in that day (variolation) was no easy matter. It involved introducing someone else's smallpox into an opened vein, and taking all the risks of infection. It was lethal 1-2% of the time, but those were better odds than the 25% death rate from normally contracted small pox. These people knew how to assess risks, and took the vaccination risk to forestall the greater risk. Then there was the polio vaccine, in the 1950's, that wiped out another scourge. One would think that getting a vaccine made in record time that promptly resolved the Covid pandemic and saved millions of lives would have been greeted with a bit more appreciation. But no, it was all too easy to take that science for granted, and adopt bad faith arguments against that and other vaccines.

Today's vaccines still carry risk, though minuscule compared to variolation. The odds are incredibly good. It takes no brains at all to judge vaccines worthwhile. But we have "skeptics". Why do we have them? For the same reason that we have Donald Trump, or have believers in religion. Humans are gullible and seek easy truths over hard ones. It is easy to feel squeamish about getting an injection. It is easy to believe that a con man with enormous self-confidence and wealth has some degree of competence. And it is easy to believe that we as humans are special with some special someone running the universe who cares about our fate. Those are the easy truths, archetypally ingrained, apparently. It takes higher intellectual standards and discipline to look at reality and accept that many easy truths are not true at all. 

For all their religious faith, (and indeed because of it), the religious people who pervade the current administration, Supreme Court, and culture war strongholds routinely argue in bad faith. They are pre-selected for their ability to believe stupid and wrong things. They are not sincere in looking for evidence, or making logical arguments. They put forth laughable legal arguments. They hardly even bother to explain why the "endangerment finding" on CO2 is wrong on the merits. They care about loyalty ahead of competence. And do not even understand what might be systematically wrong about raging corruption. When faith (in their charismatic leader, or their warped religious convictions) comes first, intellect takes a back seat. That is the fundamental rule of this new dispensation in the US, and it will be damaging us for a very long time.

Saturday, February 14, 2026

We Have Rocks in Our Heads ... And Everywhere Else, Too

On the evolution and role of iron-sulfur complexes.

Some of the more persuasive ideas about the origin of life have it beginning in the rocks of hydrothermal vents. Here was a place with plenty of energy, interesting chemistry, and proto-cellular structures available to host it. Some kind of metabolism would by this theory have come first, followed by other critical elements like membranes and RNA coding/catalysis. This early earth lacked oxygen, so iron was easily available, not prone to oxidation as now. Thus life at this early time used many minerals in its metabolic processes, as they were available for free. Now, on today's earth, they are not so free, and we have complex processes to acquire and manage them. One of the major minerals we use is the iron-sulfur complex, (similar to pyrite), which comes in a variety of forms and is used by innumerable enzymes in our cells. 

The more common iron-sulfur complexes, with sulfur in yellow, iron in orange.


The principle virtue of the iron-sulfur complex is its redox flexibility. With the relatively electronically "soft" sulfur, iron forms semi-covalent-style bonds, while being able to absorb or give up an electron safely, without destroying nearby chemicals as iron alone typically does. Depending on the structure and liganding, the voltage potential of such complexes can be tuned all over the (reduction potential) map, from -600 to +400 mV. Many other cofactors and metals are used in redox reactions, but iron-sulfur is the most common by far.

Reduction potentials (ability to take up an electron, given an electrical push) of various iron-sulfur complexes.

Researchers had assumed that, given the abundance of these elements, iron-sulfur complexes were essentially freely acquired until the great oxidation event, about two to three billion years ago, when free oxygen started rising and free iron (and sulfur) disappeared, salted away into vast geological deposits. Life faced a dilemma- how to reliably construct minerals that were now getting scarce. The most common solution was a three enzyme system in mitochondria that 1) strips a sulfur from the amino acid cysteine, a convenient source inside cells, 2) scaffolds the construction of the iron-sulfur complex, with iron coming from carrier proteins such as frataxin, and 3) employs several carrier proteins to transfer the resulting complexes to enzymes that need them. 

But a recent paper described work that alters this story, finding archaeal microbes that live anaerobically and make do with only the second of these enzymes. A deep phylogenetic analysis shows that the (#2) assembly/scaffold enzymes are the core of this process, and have existed since the last common ancestor of all life. So they are incredibly ancient, and it turns out to that iron-sulfur complexes can not just be gobbled up from the environment, at least not by any reasonably advanced life form. Rather, these complexes need to be built and managed under the care of an enzyme.

The presented structures of the dimer of SmsB (orange) and SmsC (blue) that dimerize again to make up a full iron-sulfur scaffolding and production enzyme in the archaean Methanocaldococcus jannaschii. Note the reaction scheme where ATP comes in and evicts the iron-sulfur cluster. On right is shown how ATP fits into the structure, and how it nudges the iron-sulfur binding area (blue vs green tracing).

A recent paper from this group extended their analysis to the structure of the assembly/scaffold enzyme. They find that, though it is a symmetrical dimer of a complex of two proteins, it only deals with one iron-sulfur complex at at time. It also binds and cleaves ATP. But ATP seems to have more of an inhibitory role than one that stimulates assembly directly. The authors suggest that high levels of ATP signal that less iron-sulfur complex is needed to sustain the core electron transport chains of metabolism, making this ATP inhibition an allosteric feedback control mechanism in these archaeal cells. I might add, however, that ATP binding may well also have a role in extricating the assembled iron-sulfur cluster from the enzyme, as that complex is quite well coordinated, and could use a push to pop out into the waiting arms of target enzymes.

"These ancestral systems were kept in archaea whereas they went through stepwise complexification in bacteria to incorporate additional functions for higher Fe-S cluster synthesis efficiency leading to SUF, ISC and NIF." - That is, the three-component systems present in eukaryotes, which come in three types.

In the author's structure, the iron-sulfur complex, liganded by three cysteines within the SmsC protein. But note how, facing the viewer, the complex is quite exposed, ready to be taken up by some other enzyme that has a nice empty spot for it.

Additionally, these archaea, with this simple one-step iron cluster formation pathway, get their sulfur not from cysteine, but from ambient elemental sulfur. Which is possible, as they live only in anaerobic environments, such as deep sea hydrothermal vents. So they represent a primitive condition for the whole system as may have occurred in the last common ancestor of all life. This ancestor is located at the split between bacteria and archaea, so was a fully fledged and advanced cell, far beyond the earlier glimmers of abiogenesis, the iron sulfur world, and the RNA world.


Saturday, January 24, 2026

Jonathan Singer and the Cranky Book

An eminent scientist at the end of his career writes out his thoughts and preoccupations.

Jonathan Singer was a famous scientist at my graduate school. I did not interact with him, but he played a role in attracting me to the program, as I was interested in biological membranes at the time. Singer himself studied with Linus Pauling, and they were the first to identify a human mutation in a specific gene as a cause for a specific disease- sickle cell disease. After further notable work in electron microscopy, he reached a career triumph by developing, in 1972, the fluid mosaic model of biological membranes. This revolutionized and clarified the field, showing that cells are bounded by something incredibly simple- a bilayer of phospholipids that naturally order themselves into a remarkably stable sheet, (a bubble, one might say), all organized by their charged headgroups and hydrophobic fatty tails. This model also showed that proteins would be swimming around freely in this membrane, and could be integrated in various ways, ether lightly attached on one side, or spanning it completely, thereby enabling complex channel and transporter functions. The model implied the typical length of a protein alpha helix that, by virtue of its hydrophobic side chains, would naturally be able to do this spanning function- a prediction that was spot-on. He could have easily won a Nobel for this work.

I was intrigued when I learned recently that Singer had written a book near the end of his career. It is just the kind of thing that a retired professor loves to do in the sunset of his career, sharing the wisdom and staving off the darkness by taking a stab at the book biz. And Singer's is a classic of the form- highly personal, a bit stilted, and ultimately meandering. I will review some of its high points, and then take a stab of my own at knitting together some of the interesting themes he grapples with.

For at base, Singer turns out to be a spiritual compadre of this blog. He claims to be a rationalist, in a world where, as he has it, no more than 9% of people are rational. Definition? It is the poll question of whether one believes that god created man, rather than the other way around. Singer recognizes that the world around him is crazy, and that the communities he has been a part of have been precious oases amid the general indifference and grasping of the world. But changing it? He is rather fatalistic about that, recognizing that reason is up against overwhelming forces.

His specific themes cover a great deal of biology, and then some more mystical reflections on balance and diversity in biology, and later, in capitalism and politics. He points out that the nature/nurture debate has been settled by twin studies. Nature, which is to say, genetics, is the dominant influence on human characteristics, including a wide variety of psychological traits, including intelligence. Environment and nurture is critical for reaching one's highest potential, and for using it in socially constructive ways, but the limits of that potential are largely set by one's genes. Singer does not, however, draw the inevitable conclusion from these observations, which is that some kind of long-term eugenic approach would be beneficial to our collective future, assuming machines do not replace us forthwith. Biologists know that very small selective coefficients can have big effects, so nothing drastic is needed. But what criteria to use- that is the sticky part. Just as success in the capitalist system hardly signals high moral or personal qualities, nor does incarceration by the justice system always show low ones. It is virtually an insoluble problem, so we muddle along, destined probably for continued cycles of Spenglerian civilizational collapse.

Turning to social affairs, Singer settles on "structural chaos" as his description of how the scientific enterprise works, and how capitalism at large works. With a great deal of waste, and misdirected effort, it nevertheless ends up providing good results- better than those that top-down direction can provide. He seems a sigh a little that "scientific" methods of social organization, such as those in Soviet Russia, were so ineffective, and that the best we can do is to muddle along with the spontaneous entrepreneurship and occasional flashes of innovation that push the process along. Not to mention the "monstrous vulgarity" of advertising, etc. Likewise, democracy is a mess, with most people totally incapable of making the reasoned decisions needed to maintain it. Again, the chaos of democracy is sadly the best we can do, and the duty of rational people, in Singer's view, is to keep alive the flame of intellectual freedom while outside pressures constantly threaten.

Art, and science.

What can we do with this? I think that the unifying thread that Singer was groping for was competition. One can frame competition as a universal principle that shapes the physical, biological, and social worlds. Put two children on a teeter-totter, and you can see how physical forces (e.g. gravitation) compete all the time, subtly producing equilibria that characterize the universe. Chemical equilibria are likewise a product of constant competition, even including the perpetual jostling of phospholipids to find their lowest energy configuration amidst the biological membrane bilayer, which has the side-effect of creating such a stable, yet highly flexible, structure. With Darwin, competition reaches its apotheosis- the endless proliferation, diversification, and selection of organisms. Singer marvels at the fragility of individual life, at the same time that life writ large is so incredibly durable and prolific. Well, the mechanism behind that is competition. And naturally, economics of any free kind, including capitalism and grant-making in science, are based on competition as well- the natural principle that selects which products are useful, which employees are productive, and which technologies are helpful. Waste is part of the process, as diversity amidst excess production is the essential ingredient for subsequent selection. 

And yet.. something is missing. The earth's biosphere would still be a mere bacterial soup if competition were the only principle at work. Bacteria (and their viruses) are the most streamlined competition machines- battlebots of the living world. It took cooperation between a bacterial cell and an archaeal cell to make a revolutionary new entity- the eukaryotic cell. It then took some more cooperation for eukaryotic cells to band together into bodies, making plants and animals. And among animals, cooperation in modest amounts provides for reproduction, family structure, flock structures, and even complex insect societies. It is with humans that cooperation and competition reach their most complex heights, for we are able to regulate ourselves, rationally. We make rules. 

Without rules, human society is anarchic mayhem- a trumpian, dystopian and corrupt nightmare. With them, it (ideally) balances competition with cooperation to harness the benefits of each. Our devotion to sports can be seen as a form of rule worship, and explicit management of the competitive landscape. Can there be too many rules? Absolutely, there are dangers on both sides. Take China as an example. In the last half-century, it revamped its system of rules to lower the instability of political competition, harness the power of economic competition, and completely transform its society. 

The most characteristic and powerful human institution may be the legislature, which is our ongoing effort to make rational rules regulating how the incredibly powerful motive force of competition shapes our lives. Our rules, in the US, were authored, at the outset, by the founders, who were- drumroll please- rationalists. To read the Federalist Papers is to see exquisite reasoning drawing on wide historical precedent, and particularly on the inspirations of the rationalist enlightenment, to formulate a new set of rules mediating between cooperation and competition. Not only were they more fair than the old rules, but they were designed for perpetual improvement and adjustment. The founding was, at base, a rationlist moment, when characters like Franklin, Hamilton, Madison, and Jefferson- deists at best and rationalists through and through, led the new country into a hopeful, constitutional future. At the current moment, two hundred and fifty years on, as our institutions are being wantonly destroyed and anything resembling reason, civility, and truth is under particularly vengeful attack, we should appreciate and own that heritage, which informs a true patriotism against the forces of darkness.


Saturday, December 13, 2025

Mutations That Make Us Human

The ongoing quest to make biologic sense of genomic regions that differentiate us from other apes.

Some people are still, at this late date, taken aback by the fact that we are animals, biologically hardly more than cousins to fellow apes like the chimpanzee, and descendants through billions of years of other life forms far more humble. It has taken a lot of suffering and drama to get to where we are today. But what are those specific genetic endowments that make us different from the other apes? That, like much of genetics and genetic variation, is a tough question to answer.

At the DNA level, we are roughly one percent different from chimpanzees. A recent sequencing of great apes provided a gross overview of these differences. There are inversions, and larger changes in junk DNA that can look like bigger differences, but these have little biological importance, and are not counted in the sequence difference. A difference of one percent is really quite large. For a three gigabyte genome, that works out to 30 million differences. That is plenty of room for big things to happen.

Gross alignment of one chromosome between the great apes. [HSA- human, PTR- chimpanzee, PPA- bonobo, GGO- gorilla, PPY- orangutan (Borneo), PAB- orangutan (Sumatra)]. Fully aligned regions (not showing smaller single nucleotide differences) are shown in blue. Large inversions of DNA order are shown in yellow. Other junk DNA gains and losses are shown in red, pink, purple. One large-scale jump of a DNA segment is show in green. One can see that there has been significant rearrangement of genomes along the way, even as most of this chromosome (and others as well) are easly alignable and traceable through the evolutionary tree.


But most of those differences are totally unimportant. Mutations happen all the time, and most have no effect, since most positions (particularly the most variable ones) in our DNA are junk, like transposons, heterochromatin, telomeres, centromeres, introns, intergenic space, etc. Even in protein-coding genes, a third of the positions are "synonymous", with no effect on the coded amino acid, and even when an amino acid is changed, that protein's function is frequently unaffected. The next biggest group of mutations have bad effects, and are selected against. These make up the tragic pool of genetic syndromes and diseases, from mild to severe. Only a tiny proportion of mutations will have been beneficial at any point in this story. But those mutations have tremendous power. They can drag along their local DNA regions as they are positively selected, and gain "fixation" in the genome, which is to say, they are sufficiently beneficial to their hosts that they outcompete all others, with the ultimate result that mutation becomes universal in the population- the new standard. This process happens in parallel, across all positions of the genome, all at the same time. So a process that seems painfully slow can actually add up to quite a bit of change over evolutionary time, as we see.

So the hunt was on to find "human accelerated regions" (HAR), which are parts of our genome that were conserved in other apes, but suddenly changed on the way to humans. There roughly three thousand such regions, but figuring out what they might be doing is quite difficult, and there is a long tail from strong to weak effects. There are two general rationales for their occurrence. First, selection was lost over a genomic region, if that function became unimportant. That would allow faster mutation and divergence from the progenitors. Or second, some novel beneficial mutation happened there, bringing it under positive selection and to fixation. Some recent work found, interestingly, that clusters of mutations in HAR segments often have countervailing effects, with one major mutation causing one change, and a few other mutations (vs the ancestral sequence) causing opposite changes, in a process hypothesized to amount to evolutionary fine tuning. 

A second property of HARs is that they are overwhelmingly not in coding regions of the genome, but in regulatory areas. They constitute fine tuning adjustments of timing and amount of gene regulation, not so much changes in the proteins produced. That is, our evolution was more about subtle changes in management of processes than of the processes themselves. A recent paper delved in detail into HAR5, one of the strongest such regions, (that is, strongest prior conservation, compared with changes in human sequence), which lies in the regulatory regions upstream of Frizzled8 (FZD8). FZD8 is a cell surface receptor, which receives signals from a class of signaling molecules called WNT (wingless and int). These molecules were originally discovered in flies, where they signal body development programs, allowing cells to know where they are and when they are in the developmental program, in relation to cells next door, and then to grow or migrate as needed. They have central roles in embryonic development, in organ development, and also in cancer, where their function is misused.

For our story, the WNT/FZD8 circuit is important in fetal brain development. Our brains undergo massive cell division and migration during fetal development, and clearly this is one of the most momentous and interesting differences between ourselves and all other animals. The current authors made mutations in mice that reproduce some of the HAR5 sequences, and investigated their effects. 

Two mouse brains at three months of age, one with the human version of the HAR5 region. Hard to see here, but the latter brain is ~7% bigger.

The authors claim that these brains, one with native mouse sequence, and the other with the human sequences from HAR5, have about a seven percent difference in mass. Thus the HAR5 region, all by itself, explains about one fourteenth of the gross difference in brain size between us and chimpanzees. 

HAR5 is a 619 base-pair region with only four sequence differences between ourselves and chimpanzees. It lies 300,000 bases upstream of FZD8, in a vast region of over a million base pairs with no genes. While this region contains many regulatory elements, (generally called enhancers or enhancer modules, only some of which are mapped), it is at the same time an example of junk DNA, where most of the individual positions in this vast sea of DNA are likely of little significance. The multifarious regulation by all these modules is of course important because this receptor participates in so many different developmental programs, and has doubtless been fine-tuned over the millennia not just for brain development, but for every location and time point where it is needed.

Location of the FZD8 gene, in the standard view of the genome at NIH. I have added an arrow that points to the tiny (in relative terms) FZD8 coding region (green), and a star at the location of HAR5, far upstream among a multitude of enhancer sequences. One can see that this upstream region is a vast area (of roughly 1.5 million bases) with no other genes in sight, providing space for extremely complicated and detailed regulation, little of which is as yet characterized.

Diving into the HAR5 functions in more detail, the authors show that it directly increases FZD8 gene expression, (about 2 fold, in very rough terms), while deleting the region from mice strongly decreases expression in mice. Of the four individual base changes in the HAR5 region, two have strong (additive) effects increasing FZD8 expression, while the other two have weaker, but still activating, effects. Thus, no compensatory regulation here.. it is full speed ahead at HAR5 for bigger brain size. Additionally, a variant in human populations that is responsible for autism spectrum disorders also resides in this region, and the authors show that this change decreases FZD8 expression about 20%. Small numbers, sure, but for a process that directs cell division over many cycles in early brain development, this kind of difference can have profound effects.


The HAR5 region causes increased transcription of FZD8, in mice, compared to the native version and a deletion.

The HAR5 region causes increased cell proliferation in embryonic day 14.5 brain areas, stained for neural markers.

"This reveals Hs-HARE5 modifies radial glial progenitor behavior, with increased self-renewal at early developmental stages followed by expanded neurogenic potential. ... Using these orthogonal strategies we show four human-specific variants in HARE5 drive increased enhancer activity which promotes progenitor proliferation. These findings illustrate how small changes in regulatory DNA can directly impact critical signaling pathways and brain development."

So there you have it. The nuts and bolts of evolution, from the molecular to the cellular, the organ, and then the organismal, levels. Humans do not just have bigger brains, but better brains, and countless other subtle differences all over the body. Each of these is directed by genetic differences, as the combined inheritance of the last six million years since our divergence versus chimpanzees. Only with the modern molecular tools can we see Darwin's vision come into concrete focus, as particular, even quantum, changes in the code, and thus biology, of humanity. There is a great deal left to decipher, but the answers are all in there, waiting.


Saturday, November 22, 2025

Ground Truth for Genetic Mutations

Saturation mutagenasis shows that our estimates of the functional effect of uncharacterized mutations are not so great.

Human genomes can now be sequenced for less than $1,000. This technological revolution has enabled a large expansion of genetic testing, used for cancer tissue diagnosis and tracking, and for genetic syndrome analysis both of embryos before birth and affected people after birth. But just because a base among the 3 billion of the genome is different from the "reference" genome, that does not mean it is bad. Judging whether a variant (the modern, more neutral term for mutation) is bad takes a lot of educated guesswork.

A recent paper described a deep dive into one gene, where the authors created and characterized the functional consequence of every possible coding variant. Then they evaluated how well our current rules of thumb and prediction programs for variant analysis compare with what they found. It was a mediocre performance. The gene is CDKN2A, one of our more curious oddities. This is an important tumor suppressor gene that inhibits cell cycle progression and promotes DNA repair- it is often mutated in cancers. But it encodes not one, but two entirely different proteins, by virtue of a complex mRNA splicing pattern that uses distinct exons in some coding portions, and parts of one sequence in two different frames, to encode these two proteins, called p16 and p14. 

One gene, two proteins. CDKN2A has a splicing pattern (mRNA exons shown as boxes at top, with pink segments leading to the p14 product, and the blue segments leading the p16 product) that generates two entirely different proteins from one gene. Each product has tumor suppressing effects, though via distinct mechanisms.

Regardless of the complex splicing and protein coding characteristics, the authors generated all possible variants in every possible coded amino acid (156 amino acids in all, as both produced proteins are relatively short). Since the primary roles of these proteins are in cell cycle and proliferation control, it was possible to assay function by their effect when expressed in cultured pancreatic cells. A deleterious effect on the protein was revealed as, paradoxically, increased growth of these cells. They found that about 600 of the 3,000 different variants in their catalog had such an effect, or 20%.

This is an expected rate of effect, on the whole. Most positions in proteins are not that important, and can be substituted by several similar amino acids. For a typical enzyme, for instance, the active site may be made up of a few amino acids in a particular orientation, and the rest of the protein is there to fold into the required shape to form that active site. Similar folding can be facilitated by numerous amino acids at most positions, as has been richly documented in evolutionary studies of closely-related proteins. These p16 and p14 proteins interact with a few partners, so they need to maintain those key interfacial surfaces to be fully functional. Additionally, the assay these researchers ran, of a few generations of growth, is far less sensitive than a long-term true evolutionary setting, which can sift out very small effects on a protein, so they were setting a relatively high bar for seeing a deleterious effect. They did a selective replication of their own study, and found a reproducibility rate of about 80%, which is not great, frankly.

"Of variants identified in patients with cancer and previously reported to be functionally deleterious in published literature and/or reported in ClinVar as pathogenic or likely pathogenic (benchmark pathogenic variants), 27 of 32 (84.4%) were functionally deleterious in our assay"

"Of 156 synonymous variants and six missense variants previously reported to be functionally neutral in published literature and/or reported in ClinVar as benign or likely benign (benchmark benign variants), all were characterized as functionally neutral in our assay "

"Of 31 VUSs previously reported to be functionally deleterious, 28 (90.3%) were functionally deleterious and 3 (9.7%) were of indeterminate function in our assay."

"Similarly, of 18 VUSs previously reported to be functionally neutral, 16 (88.9%) were functionally neutral and 2 (11.1%) were of indeterminate function in our assay"

Here we get to the key issues. Variants are generally classified as benign, pathogenic/deleterious, or "variant of unknown/uncertain significance". The latter are particularly vexing to clinical geneticists. The whole point of sequencing a patient's tumor or genomic DNA is to find causal variants that can illuminate their condition, and possibly direct treatment. Seeing lots of "VUS" in the report leaves everyone in the dark. The authors pulled in all the common prediction programs that are officially sanctioned by the ACMG- Americal College of Medical Genetics, which is the foremost guide to clinical genetics, including the functional prediction of otherwise uncharacterized sequence variants. There are seven such programs, including one driven by AI, AlphaMissense that is related to the Nobel prize-winning AlphaFold. 

These programs strain to classify uncharacterized mutations as "likely pathogenic", "likely benign", or, if unable to make a conclusion, VUS/indeterminate. They rely on many kinds of data, like amino acid similarity, protein structure, evolutionary conservation, and known effects in proteins of related structure. They can be extensively validated against known mutations, and against new experimental work as it comes out, so we have a pretty good idea of how they perform. Thus they are trusted to some extent to provide clinical judgements, in the absence of better data. 

Each of seven programs (on bottom) gives estimations of variant effect over the same pool of mutations generated in this paper. This was a weird way to present simple data, but each bar contains the functional results the authors developed in their own data (numbers at the bottom, in parentheses, vertical). The bars were then colored with the rate of deleterious (black) vs benign (white) prediction from the program. The ideal case would be total black for the first bar in each set of three (deleterious) and total white in the third bar in each set (benign). The overall lineup/accuracy of all program predictions vs the author data was then overlaid by a red bar (right axis). The PrimateAI program was specially derived from comparison of homologous genes from primates only, yielding a high-quality dataset about the importance of each coded amino acid. However, it only gave estimates for 906 out of the whole set of 2964 variants. On the other hand, cruder programs like PolyPhen-2 gave less than 40% accuracy, which is quite disappointing for clinical use.

As shown above, the algorithms gave highly variable results, from under 40% accurate to over 80%. It is pretty clear that some of the lesser programs should be phased out. Of programs that fielded all the variants, the best were AlphaMissense and VEST, which each achieved about 70% accuracy. This is still not great. The issue is that, if a whole genome sequence is run for a patient with an obscure disease or syndrome, and variants vs the reference sequence are seen in several hundred genes, then a gene like CDKN2A could easily be pulled into the list of pathogenic (and possibly causal) variants, or be left out, on very shaky evidence. That is why even small increments in accuracy are critically important in this field. Genetic testing is a classic needle-in-a-haystack problem- a quest to find the one mutation (out of millions) that is driving a patient's cancer, or a child's inherited syndrome.

Still outstanding is the issue of non-coding variants. Genes are not just affected by mutations in their protein coding regions (indeed many important genes do not code for proteins at all), but by regulatory regions nearby and far. This is a huge area of mutation effects that are not really algorithmically accessible yet. As a prediction problem, it is far more difficult than predicting effects on a coded protein. It will requiring modeling of the entire gene expression apparatus, much of which remains shrouded in mystery.