Saturday, February 4, 2023

How Recessive is a Recessive Mutation?

Many relationships exist between mutation, copy number, and phenotype.

The traditional setup of Mendelian genetics is that an allele of a gene is either recessive or dominant. Blue eyes are recessive to brown eyes, for the simple reason that blue arises from the absence of an enzyme, due to a loss of function mutation. So having some of that enzyme, from even one "brown" copy of that gene, is dominant over the defective "blue" copy. You need two "blue" alleles to have blue eyes. This could be generalized to most genes, especially essential genes, where lacking both copies is lethal, while having one working copy will get you through, and cover for a defective copy. Most gene mutations are, by this model, recessive. 

But most loci and mutations implicated in disease don't really work like that. Some recent papers delved into the genetics of such mutations, and observed that their recessiveness was all over the map, a spectrum, really, of effects from fully recessive to dominant, with most in the middle ground. This is informative for clinical genetics, but also for evolutionary studies, suggesting that evolution is not, after all, blind to the majority of mutations, which are mostly deleterious, exist most of the time in the haploid (one-copy) state, and would be wholly recessive by the usual assumption.

The first paper describes a large study over the Finnish population, which benefited from several advantages. Finns have a good health system with thorough records which are housed in a national biobank. The study used 177,000 health records and 83,000 variants in coding regions of genes collected from sequencing studies. Second, the Finnish population is relatively small and has experienced bottlenecks from smaller founding populations, which amplifies the prevalence of variants that those founders had. That allows those variants to rise to higher rates of appearance, especially in the homozygous state, which generally causes more noticeable disease phenotypes. Both the detectability and the statistics were powered by this higher incidence of some deleterious mutations (while others, naturally, would have been more rare than the world-wide average, or absent altogether).

Thirdly, the authors emphasize that they searched for various levels of recessive effect, which is contrary to the usual practice of just assuming a linear effect. A linear model says that one copy of a mutation has half the effect of two copies- which is true sometimes, but not most of the time, especially in more typical cases of recessive effect where one copy has a good deal less effect, if not zero. Returning to eye color, if one looks in detail, there are many shades of eyes, even of blue eyes, so it is evident that the alleles that affect eye color are various, and express to different degrees (have various penetrance, in the parlance). While complete recessiveness happens frequently, it is not the most common case, since we generally do not routinely express excess amounts of proteins from our genes, making loss of one copy noticeable most of the time, to some degree. This is why the lack of a whole chromosome, or an excess of a whole chromosome, has generally devastating consequences. Trisomies in only three chromosomes are viable (that is, not lethal), and confer various severe syndromes.

A population proportion plot vs age of disease diagnosis for three different diseases and an associated genetic variant. In blue is the normal ("wild-type") case, in yellow is the heterozygote, and in red the homozygote with two variant alleles. For "b", the total lack of XPA causes skin cancer with juvenile onset, and the homozygotic case is not shown. The Finnish data allowed detection of rather small recessive effects from variations that are common in that population. For instanace, "a" shows the barely discernable advancement of age of diagnosis for a disease (hearing loss) that in the homozygotic state is universal by age 10, caused by mutations in GJB2.

The second paper looked more directly at the fitness cost of variations over large populations, in the heterozygous state. They looked at loss-of-function (LOF) mutations of over 17,000 genes, studying their rate of appearance and loss from human populations, as well as in pedigrees. These rates were turned, by a modeling system, into fitness costs, which are stated in percentage terms, vs wild type. A fitness cost of 1% is pretty mild, (though highly significant over longer evolutionary time), while a fitness cost of 10% is quite severe, and one of 100% is immediately lethal and would never be observed in the population. For example, a mutation that is seen rarely, and in pedigrees only persists for a couple of generations, implies a fitness cost of over 10%.

They come up with a parameter "hs", which is the fitness cost "s" of losing both copies of a gene, multiplied by "h", a measure of the dominance of the mutation in a single copy.


In these graphs, human genes are stacked up in the Y axis sorted by their computed "hs" fitness cost in the heterozygous state. Error bars are in blue, showing that this is naturally a rather error-prone exercise of estimation. But what is significant is that most genes are somewhere on the spectrum, with very few having negligible effects, (bottom), and many having highly significant effects (top). Genes on the X chromosome are naturally skewed to much higher significance when mutated, since in males there is no other copy, and even in females, one X chromosome is (randomly) inactivated to provide dosage compensation- that is, to match the male dosage of production of X genes- which results in much higher penetrance for females as well.


So the bottom line is that while diploidy helps to hide alot of variation in sexual organisms, and in humans in particular, it does not hide it completely. We are each estimated to receive, at birth, about 70 new mutations, of which 1/1000 are the kind of total loss of gene function studied here. This work then estimates that 20% of those mutations have a severe fitness effect of >10%, meaning that about one in seventy zygotes carry such a new mutation, not counting what it has inherited from its parents, and will suffer ill effects immediately, even though it has a wild-type copy of that gene as well.

Humans, as other organisms, have a large mutational load that is constantly under surveillance by natural selection. The fact that severe mutations routinely still have significant effects in the heterozygous state is both good and bad news. Good in the sense that natural selection has more to work with and can gradually whittle down on their frequency without necessarily waiting for the chance of two meeting in an unfortunate homozygous state. But bad in the sense that it adds to our overall phenotypic variation and health difficulties a whole new set of deficiencies that, while individually and typically minor, are also legion.


Saturday, January 28, 2023

Building the Middle Class

Why are poor people in the US enslaved to tyrannical, immiserating institutions?

Santa Claus brought an interesting gift this Christmas, Barbara Ehrenreich's "Nickle and Dimed". This is a memoir of her experiment as a low wage worker. Ehrenreich is a well-educated scientist, feminist, journalist, and successful writer, so this was a dive from very comfortable upper middle class circumstances into the depths both of the low-end housing market and the minimum wage economy. While she brings a great deal of humor to the story, it is fundamentally appalling, an affront to basic decency. Our treatment of the poor should be a civil rights issue.

The first question is why we have a minimum wage at all. What is the lowest wage that natural economic conditions would bear, and what economic and social principles bear on this bottom economic rung? In ancient times, slavery was common, which meant a wage of zero. This was replicated in the ante-bellum American South- minimum wage of zero. So as far as natural capitalism is concerned, there is no minimum wage needed and people can rather easily be coerced by various social and violent means to work for the barest subsistence. The minimum wage is entirely a political and social concept, designed to express a society's ideas of minimal economic, civic, and social decency. Maybe that is why, as with so many other things, the US reached a high point in its real minimum wage in the late 1960's, 66% higher than what it is now.

Real minimum wage in the US, vs nominal.

The whole economy of low wage work is very unusual. One would think that supply and demand would operate here, and that difficult work would be rewarded by higher pay. But it is precisely the most difficult work- the most grinding, alienating, dispiriting work that is paid least. There is certainly an education effect on pay, but the social structure of low end work is mostly one of power relations, where desperate people are faced with endlessly greedy employers, who know that the less they pay, the more desperate their workers will be to get even that little amount. It is remarkable what we have allowed this sector to do in the name of "free" capitalism- the drug tests, the uniforms, the life-destroying scheduling chaos, the wage theft, the self-serving corporate propaganda, the surveillance.

Is it a population issue, that there is always an excess of low-wage workers? I think it is really the other way around, that there is a highly flexible supply of low-wage work, thanks to the petty-tyrannical spirit of "entrepreneurs". No one needs the eighth fast food restaurant, the fifteenth nail salon, or the third maid cleaning service. We use and abuse low wage labor because it is there, not because these are essential jobs. If a shortage of low-wage workers really starts to crimp an important industry, it has recourse to far more effective avenues of redress, such as importing workers from abroad, outsourcing the work, or if all else fails, automating it. What people are paid is largely a social construct in the minds of us, the society of employers who couldn't imagine paying decently for the work / servitude of others. To show an exception that illustrates the rule, nurses during the pandemic did in some cases, if they were willing to travel and negotiate, make out like bandits. But nurses who stayed put, played by the rules, and truly cared for those around them, were routinely abused, forced into extra work and bad conditions by employers who did not care about them and had .. no choices. In exceptional cases where true need exists, supply and demand can move the needle. But social power plays a very large role.

Some states have raised their minimum wage, such as California, to $15. This is a more realistic wage, though the state has astronomic housing and other costs as well. Has our economy collapsed here? No. It has had zero discernable effect on the provision of local services, and the low wage economy sails on at a new, and presumably more humane, level. When I first envisioned this essay, I thought that a much more substantial increase in the minimum wage would be the proper answer. But then I found that $15 per hour provides an annual income that is almost at the US level of median income, 34k annually for an individual. The average income in the US is only 53k. So there is not a lot of wiggle room there. We are a nation of the poorly paid, on average living practically hand-to-mouth. On the household level, things may look better if one has the luck to have two or more solid incomes.


My own individual incomes analysis, drawn from reported Social Security data.

Any any rate, a livable wage is not much different from the median wage, and even that is too low in many economically hot areas where real estate is unbearably expensive. This is, incidentally, another large dimension of US poverty, that the stand-pat, NIMBY, no-growth zoning practices of what is now a majority of the country have sentenced the poor and the young to an even lower standard of living than what the income statistics would indicate, as they fork over their precious earnings to the older, richer, and socially settled landlords among us.

So what is the answer? I would advocate for a mix of deep policy change. First is a minimum wage that is livable, which means $15 nationwide, indexed for inflation, and higher as needed in more high-cost states. It should be a basic contract with the citizenry and workers of all types that working should pay decently, and not send you to a food pantry. All those jobs and businesses that can not survive without poorly paid workers... we don't need them. Second would be a government employer of last resort system that would offer a job to anyone who wants one. This would be paid at the minimum wage, and put people to work doing projects of public significance- cleaning up roadways, building schools, offering medical care, checkups, crossing guards, etc. We can, as a society and as civil governments, do a better job employing the poor in a useful way than can the much-vaunted entrepreneurs. Instead of endless strip malls of bottom-feeding commerce, let local governments sweep up available labor for cleaning the environment, instead of fouling it. Welfare should be, instead of a demeaning odyssey through DMV- like bureaucracies, a straight payment to anyone not employed, at half the minimum wage.

Third, we need more public services. Transit should be totally free. Medical care should be completely free. Education should be free. And incidentally, secondary education should be all public, with private schools up to 12th grade banned. When we wonder why our country and politics have become so polarized, a big reason is the physical and spiritual separation between the rich and poor. While the speaker in the video linked below advocates for free housing as well, that would be perhaps a bridge too far, though housing needs to be addressed urgently by forcing governments to zone for their actual population and taking homelessness as a policy-directing index of the need to zone and build more housing.

Fourth, the rich need to be taxed more. The corrosion of  our social system is not only evident at the bottom where misery and quasi-slavery is the rule, but at the top, where the rich contribute less and less to positive social values. The recent Twitter drama showed in an almost mythical way the incredible narcisism and callous ethics that pervade the upper echelons (... if the last administration hadn't shown this already). The profusion of philanthropies are mere performative narcissism and white-washing, while the real damage is being done by the flood of money that flows from the rich into anti-democratic and anti-government projects across the land.

And what is all this social division accomplishing? It is not having any positive eugenic effect, if one takes that view of things. Reproduction is not noticeably affected, despite the richness at the top or the abject poverty at the bottom. It is not having positive social effects, as the rich wall themselves off with increasingly hermetic locations and technologies. They thought, apparently, that cryptocurrencies would be the next step of unshackling the Galtian entrepreneurs of the world from the oppression of national governments. Sadly, that did not work out very well. The rich can not be rich without a society to sponge off. The very idea of saving money presupposes an ongoing social and economic system from which that money can be redeemed by a future self. Making that future society (not to mention the future environment) healthy and cohesive should be our most fervent goal.


Sunday, January 22, 2023

One Tough Molecule: Cholesterol

In praise of cholesterol.

Membranes are an underappreciated aspect of biology. The recent pandemic was caused by a virus that has a very sophisticated system to commandeer many aspects of our cellular apparatus, including our membrane systems, creating complicated vesicular bodies in which to develop and hide. Membranes may not have participated in the very origin of life, (which seems to have involved energy-rich mineral systems), but were essential at the origin of cells, as all cells are surrounded by a classic bilayer membrane, composed of two-faced molecules with water- soluble heads and fatty tails, the latter of which make up the middle of the bilayer.

Membranes everywhere. Eukaryotic cells are filled with membrane-bound compartments. Here, Covid-causing virus (black arrows) hides out in vesicles enclosed within additional membranes. These are post-mortem samples, examined by electron microscopy. In E, from lung cells, asterisks mark the presence of viral particles, while the number sign marks another lamellar structure of membranes involved in lung surfactant synthesis and secretion.

Membranes were also central to the next greatest innovation in life, the eukaryotic cell. Not only are eukaryotes full of membrane-bound compartments, like mitochondria, endosomes, lysosomes, endoplasmic reticulum, golgi apparatus, and others, but their membrane composition changed as well, with the advent of sterol-related molecules. Plants use phytosterols, while animals use cholesterol as an additive to their membranes. Cholesterol has gotten decades of bad press due to its association with atherosclerosis and the whole bad/good HDL story, about the particles that carry cholesterol around the body. But cholesterol is an essential and amazing molecule, painstakingly developed through evolution to strengthen our membranes and provide special nano-localization services.

Cholesterol (right) compared with a normal phospholipid that makes up the bulk of most membranes. Hydrophilic areas are in red/purple/blue, while hydrophobic areas are gray. The phospholipid is sphingomyelin, which appears to be fully saturated, meaning it has no double bonds or kinks in its hydrophobic tails. These on their own tend to be highly floppy, while cholesterol is far more structurally stable.

Cholesterol is a shockingly complex and expensive molecule to make. Its synthesis requires 37 steps, lots of molecular oxygen, and a hundred molecules of ATP. No wonder few bacteria make anything like it in such vast amounts. At the same time, there must be simpler chemicals that could afford similar functions- cholesterol is probably a relic from a lengthy exploration of membrane additives, to find one that is empirically ideal. Historically, cholesterol seems to have arisen after the general oxygenation event, enabling its peculiar synthesis, the symbiosis with mitochondria, and the evolution of eukaryotes generally. Our cells can still all make their own cholesterol, and our bodies have extensive means to regulate amounts, though evidently these mechanisms don't always work optimally for modern, aging humans. 

At any rate, it is now realized that dietary cholesterol has relatively little impact on internal levels or health outcomes. In our cells, cholesterol concentrations are rigorously controlled and highly diverse, being as high as ~40% of all lipids on the external face of the plasma membrane, while only 5% in the mitochondrial membranes. The reasons for this distribution are not entirely understood, but our genomes encode numerous proteins devoted to transferring cholesterol and phospholipids to various places and sides of membranes. A recent paper discussed the fact that cholesterol significantly strengthens membranes, allowing eukaryotes to attain the amoeboid lifestyle, rather than having to grow exoskeletons (i.e. cell walls) as bacteria generally do. 

Cholesterol makes membranes significantly stronger, less bendable, more viscous, and yet does not impair lateral fluidity.

The surface area per lipid goes down drastically (and strength and stiffness go up) as cholesterol is added to a regular phospholipid membrane. This is less meaningful than portrayed in the paper, however, since cholesterol counts as a lipid in this calculation, and with only one fat tail vs two slender tails, it is likely that the reduction in surface area arises as much from cholesterol's smaller cross-section (see cartoon above) as from its organizing / ordering effects on the neighboring phospholipids. 

Not only does it make membranes tougher, but it alters their thickness (by straightening up the phospholipid tails) and selectively prefers to bind certain partner phospholipids (sphingolipids), thereby creating nano-domains. These domains are called "lipid rafts" and at 50 nanometers across, they are exceedingly small, given that membranes are about 5 nanometers thick. These rafts are the prefered places for many hormone and immune system receptors to operate, which, when bound to their partners, lead to greater raft agglomerations that facilitate signaling and particularly the separation of some signals from others. This is just one example of the many roles that cholesterol has gained in cell and molecular biology.

Some reviewers note that while we often imagine nano-tech and nano-bots to be machines of metal, essentially miniaturized versions of our macro-tech, with tiny gears, etc., real nano tech may more properly lie in soft materials that are resilient at this scale, adapted to its challenges of constant thermal motion and mutable structure. Reeds that bend in the wind, not rocks that slowly break down in it. Membranes are being used in the form of liposomes as drug and vaccine delivery vehicles, and deserve a greater appreciation from both biological and technical perspectives.

This video, produced by detailed atomic computer simulation, illustrates how frenetic Brownian motion is. The membrane molecules (teal) are in constant motion, fending off the water molecules (red/white). The adoption of a second membrane component that intercalates, strengthens, and imposes some order here is a highly significant advancement.


  • Maybe giving in to nuclear bluffing and blackmail is not a good idea.

Saturday, January 14, 2023

Evolution of Dogs, and Dog Brains

Deeper genetic studies of the history of dogs reveal causal genes and pathways.

Do traits run in families? Are mental and behavioral attributes heritable? Of course they are, though well-intentioned liberals tend to argue otherwise, that everyone is the same by nature, and education, social services, and perhaps psychotherapy are the only things holding anyone back from limitless potential. Well, there is a place for both nurture and nature, but plain observation and mountains of science, such as twin studies, show that nature plays a dominant role, especially in relatively stable societies where nurture is not grossly deficient. While plenty of evidence exists for this in humans, it is particularly evident in model animals, such as those we have bred to have certain dispositions, like dogs. 

A recent landmark study on the genetics of dogs delves into some of the genetic and molecular detail of these traits. The authors find clear lineage differences between groups of dogs bred for different purposes, and dredge up a telling details about where those differences lie in the dog genome. First off, they have a wealth of data to draw from- full genomes sequenced for hundreds of dogs, and mutation variation panels for many more. They claim data from 4,261 individual dogs and 226 breeds, running the gamut from pure bred to village mutts. Wild dogs, wolves and coyotes were also added as outgroup references. 

The second big advance was to use a highly refined method of data reduction. The scale of this data is huge, and how to pull the needles of meaningful, breed- or trait-correlated variation from the haystack of backbground variation? Most of the variation they find was already present in wolves, meaning that while some new mutations occured during domestication, humans mostly spent their time selecting desirable combinations out of a very rich trove of natural variation already present from the start. The traditional way to do this is by principal component analysis (PCA), which plots the data in high dimensional space, and finds the two orthogonal axes that align with the greatest asymmetry in that data, and casts those two axes to two dimensions for visualization.

That is pretty simple, and crude, and a recent paper showed that a more sensitive way (named PHATE) to explore high dimensional data is able to uncover far more structure from it. It is just the kind of thing that these genomic scientists needed to wring more meaning from their huge data set.

Comparison of different dimensional reduction methods, from the same data set, in this case gene expression from embryonic cell types. One can easily see that PCA analysis is far less effective in revealing structure than is the newer PHATE technique.

This method, used over the dog data, yielded extremely clear differentiation between the major lineages, such as herding dogs vs retrievers vs scent hounds vs pointing dogs. As expected, the mutts, village dogs, and wolves clustered near the middle, not having traveled very far from the ancestral condition (except for one ramification along with "sight hounds", like grey hounds and other hunters, shared with Middle Eastern village dogs). Conversely, lineages like terriers formed a clearly separated path from the ancestral condition to more exquisitely bred extremes, at the ends of the distribution. Incidentally, their geographic view of this data showed that the ends of their distributions consistently were occupied by dogs bred in Britain, stemming from the virtual mania for animal husbandry and breeding (not to say eugenics) prevalent in Victorian times. Darwin was fascinated by this as well, devoting much of his "Origin" to the variation and breeding of pigeons.

Structured differences found in the genomic and other variation data gathered from thousands of dogs, of hundreds of breeds and geographic origins. The genomic data naturally fall into the breeds and types of dogs we are familiar with, while wild and feral dogs tend more to the central, ancestral areas.

This data treatment was not just done for visual clarity, but provided the clean classification that these authors could then use to search for the differentiating mutations in genomes separated by these breeding histories. They also do a bit of psychoanalysis, correlating the various lineages with major trait dimensions, such as trainability, aggressiveness, predatory drive, fear, and energy. This helped to give some rationale to aspects that various lineages might share, despite their separation in the main axes. For example, terriers had high levels of predatory chasing, while herders showed high levels of fear. This just buttresses that the dimensional reduction analysis (done on genomes) uncovered real dimensions of dog mentality, not just labeled by conventional breed types, but also by correlation with imputed general traits. What was the headline of this lineage analysis? 

"Lineage-associated variants are largely non-coding regions implicated in neurodevelopment"

There are two very interesting aspects to unpack here. First is that the vast majority of the mutations (aka variants) were non-coding. They state that of 16,250 variants that passed some threshold of statistical significance with regard to lineage divergences, only 76 were protein coding changes with any significant impact. So instead of changing proteins being made in the body, the story is one of control- the regulation over where, when, and how much of these proteins gets made. This is significant, as many genetic tests for humans are still focused on what is called the "exome", which is to say, the protein-coding parts of our genomes, where certainly many devastating mutations exist.  But it isn't where the vast majority of interesting variations occur, either for disease or particularly for normal trait variation. Those happen in the far larger and murkier regions around each gene that are strung with regulatory control sites. Mutations there can have very subtle effects.

Secondly, of course, is that they found brain and neural development genes to dominate the analysis. This only makes sense for our breeding efforts, which have had to firstly tame what was once a wolf, and then develop its talents in very particular, and sometimes peculiar directions. For instance, they note that scent / blood hounds have relatively low trainability, since they were bred to lead the way and follow their noses, not so much their humans. While the official dog shows focus on looks, coats, and colors, the much harder, and more significant job has clearly been to remake the mind of the dog to serve us. Nothing shows this more clearly than the border collie and related herders, whose ability to work with experienced handlers on difficult tasks is legendary.

The figure below gives an overview of what they found. At the top is the dog genome, with scoring of differential herding dog variants on the Y axis. Highlighted in green are genes that are mentioned below (panel C) as being quite densely involved in neural development and maintenance. Many of these are indeed very highly scoring in the genome graph, but others are less so. The authors are evidently being quite selective in calling out genes of interest, and there are many genes at least equally significant that are not being discussed. For instance, while there are by my count about 50 genes that rise to the "10" level in the graph, only seven or eight of which were called out for presentation in this neural pathways collection. And there are easily hundreds if not a thousand that satisfy the "5" level in the graph, making the selection of genes like SRGAP3 which has a score in this range somewhat willful.

Distinctive variations of sheepdogs are heavily involved in brain development, with a selection illustrated at bottom. At top is a graph of dispersion scores vs genomic location, with some genes involved in neural function called out (green). In the middle, a few of these genes are blown up to show that the variants do not generally occur in the coding regions of these genes, but in surrounding regulatory areas. At bottom is a shown an overlay of the genes found and called out above, lain over an independently curated/assembled diagram depicting molecular details of neuronal guidance, from KEGG.

At any rate, the middle panel of this diagram provides a few magnified examples of where the variations are relative to the coding regions of their respective genes. The coding regions are depicted at top with an arrow showing the start of transcription, and tiny vertical lines showing each "protein-coding" exon fragment, interspersed with large non-coding introns. Clearly the variations are clustered in the regulatory regions near, but not in, these genes.

And at bottom is a curated pathway, assembled from huge amounts of work from many labs, of some molecular aspects of axon guidance- the process by which neurons send axons out from where they start in embryogenesis to the targets, sometimes very far away in the brain, where they synapse with other neurons to make up our (or here the dog's) brain anatomy. The concentration of relevant variations in such genes speaks volumes about what has been going on in this process of rather rapid, directed evolution. The domestication of dogs is thought to have begun, very roughly, about 30 thousand years ago. The speed of this process and its resulting variety suggest (as it did to Darwin, and countless others) that evolution by natural selection has had plenty of time to work the biological wonders we see around us.


  • Somewhat boring lecture on axon guidance mechanisms that allow organized brain development and maintenance.
  • Social capital and social climbing.
  • Eugenics, Israeli-style.
  • Brothers at arms.
  • Yes, genes can arise from junk DNA. And they are important genes.

Saturday, January 7, 2023

A New Way of Doing Biology

Structure prediction of proteins is now so good that computers can do a lot of the work of molecular biology.

There are several royal roads to knowledge in molecular biology. First, and most traditional, is purification and reconstitution of biological molecules and the processes they carry out, in the test tube. Another is genetics, where mutational defects, observed in whole-body phenotypes or individually reconstituted molecules, can tell us about what those gene products do. Over the years, genetic mapping and genomic sequencing allowed genetic mutations to be mapped to precise locations, making them increasingly informative. Likewise, reverse genetics became possible, where mutational effects are not generated randomly by chemical or radiation treatment of organisms, but are precisely engineered to find out what a chosen mutation in a chosen molecule could reveal. Lastly, structural biology contributed the essential ground truth of biology, showing how detailed atomic interactions and conformations lead to the observations made at higher levels- such as metabolic pathways, cellular events, and diseases. The paradigmatic example is DNA, whose structure immediately illuminated its role in genetic coding and inheritance.

Now the protein structure problem has been largely solved by the newest generations of artificial intelligence, allowing protein sequences to be confidently modeled into the three dimensional structures they adopt when mature. A recent paper makes it clear that this represents not just a convenience for those interested in particular molecular structures, but a revolutionary new way to do biology, using computers to dig up the partners that participate in biological processes. The model system these authors chose to show this method is the bacterial protein export process, which was briefly discussed in a recent post. They are able to find and portray this multi-step process in astonishing detail by relying on a lot of past research including existing structures and the new AI searching and structure generation methods, all without dipping their toes into an actual lab.

The structure revolution has had two ingredients. First is a large corpus of already-solved structures of proteins of all kinds, together with oceans of sequence data of related proteins from all sorts of organisms, which provide a library of variations on each structural theme. Second is the modern neural networks from Google and other institutions that have solved so many other data-intensive problems, like language translation and image matching / searching. They are perfectly suited to this problem of "this thing is like something else, but not identical". This resulted in the AlphaFold program, which has pretty much solved the problem of determining the 3D structure of novel protein sequences.

"We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14), demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods."

The current authors realized that the determination of protein structures is not very different from the determination of complex structures- the structure of interfaces and combinations between different proteins. Many already-solved structures are complexes of several proteins, and more fundamentally, the way two proteins interact is pretty much the same as the way that a protein folds on itself- the same kinds of detailed secondary motif and atomic complementarity take place. So they used the exact AlphaFold core to create AF2Complex, which searches specifically through a corpus of protein sequences for those that interact in real life.

This turned out to be a very successful project, (though a supercomputer was required), and they now demonstrate it for the relatively simple case of bacterial protein export. The corpus they are working with is about 1500 E. coli periplasmic and membrane proteins. They proceed step by step, asking what interacts with the first protein in the sequence, then what interacts with the next one, etc., till they hit the exporter on the outer membrane. While this sequence has been heavily studied and several structures were already known, they reveal several new structures and interactions as they go along. 

Getting proteins from inside the cell to outside is quite complicated, since they have to traverse two membranes and the intermembrane space, (periplasm), all without getting fouled up or misdirected. This is done by an organized sequence of chaperone and transport proteins that hand the new proteins off to each other. Proteins are recognized by this machinery by virtue of sequence-encoded signals, typically at their front/leading ends. This "export signal" is recognized, in some instances, right as it comes out of the ribosome and captured by the SecA/B/E/Y/G machinery at the inner bacterial membrane. But most exported proteins are not recognized right away, but after they are fully synthesized.

The inner membrane (IM) is below, and the outer membrane (OM) is above, showing the steps of bacterial protein export to the outer membrane. The target protein being transported is the yellow thread, (OmpA), and the various exporting machines are shown in other colors, either in cartoon form or in ribbon structures from the auther's computer predictions. Notably, SurA is the main chaperone that carries OmpA in partially unfolded form across the periplasm to the outer membrane.

SecA is the ATP-using pump that forces the new protein through the SecY channel, which has several other accessory partners. SecB, for example, is thought to be mostly responsible for recognizing the export signal on the target protein. The authors start with a couple of accessory chaperones, PpiD and YfgM, which were strongly suspected to be part of the SecA/B/E/Y/G complex, and which their program easily identifies as interacting with each other, and gives new structures for. PpiD is an important chaperone that helps proline amino acids twist around, (a proline isomerase), which they do not naturally do, helping the exporting proteins fold correctly as they emerge. It also interacts with SecY, providing chaperone assistance (that is, helping proteins fold correctly) right as proteins pass out of SecY and into the periplasm. The second step the authors take is to ask what interacts with PpiD, and they find DsbA, with its structure. This is a disulfide isomerase, which performs another vital function of shuffling the cysteine bonds of proteins coming into the periplasmic space, (which is less reducing than the cytoplasm), and allows stable cysteine bonds to form. This is one more essential chaperone-kind of function needed for relatively complicated secreted proteins. Helping them form at the right places is the role of DsbA, which transiently docks right at the exit port from SecY. 

The author's (computers) generate structures for the interactions of the Sec complex with PpiD, YfgM, and the disulfide isomerase DbsA, illuminating their interactions and respective roles. DbsA helps refold proteins right when then come out of the transporter pore, from the cytoplasm.

Once the target protein has all been pumped through the SecY complex pore, it sticks to PpiD, which does its thing and then dissociates, allowing two other proteins to approach, the signal peptidase LepB, which cleaves off the export signal, and then SurA, which is the transporting chaperone that wraps the new protein around itself for the trip across the periplasm. Specific complex structures and contacts are revealed by the authors for all these interactions. Proteins destined for the outer membrane are characterized by a high proportion of hydrophobic amino acids, some of which seem to be specifically recognized by SurA, to distinguish them from other proteins whose destination is simply to swim around in the periplasm, such as the DsbA protein mentioned above. 

The author's (computers) spit out a ranking of predicted interactions using SurA as a query, and find itself as one protein that interacts (it forms a dimer), and also BamA, which is the central part of the outer membrane transporting pore. Nothing was said about the other high-scoring interacting proteins identified, which may not have had immediate interest.

"In the presence of SurA, the periplasmic domain [of transported target protein OmpA] maintains the same fold, but remarkably, the non-native β-barrel region completely unravels and wraps around SurA ... the SurA/OmpA models appear physical and provide a hypothetical basis for how the chaperone SurA could prevent a polypeptide chain from aggregating and present an unfolded polypeptide to BAM for its final assembly."

At the other end of the journey, at the outer membrane, there is another channel protein called BamA, where SurA docks, as was also found by the author's interaction hunting program. BamA is part of a large channel complex that evidently receives many other proteins via its other periplasmic-facing subunits, BamB, C, and D. The authors went on to do a search for proteins that interact with BamA, finding BepA, a previously unsuspected partner, which, by their model, wedges itself in between BamC and BamB. BepA, however, turns out to have a crucial function in quality control. Conduction of target proteins through the Bam complex seems to be powered only by diffusion, not by ATP or ion gradients. So things can get fouled up and stuck pretty easily. BepA is a protease, and appears, from its structure, to have a finger that gets flipped and turns the protease on when a protein transiting through the pore goes awry / sideways. 


The author's (computers) provide structures of the outer membrane Bam complex, where SurA binds with its cargo. The cargo , unstructured, is not shown here, but some of the detailed interface between SurA and BamA is shown at bottom left. The beta-barrel of BamA provides the obvious route out of the cell, or in some cases sideways into the membrane.

While filling in some new details of the outer membrane protein export system is interesting, what was really exciting about this paper was the ease with which this new way of doing biology went forth. Intimate physical interactions among proteins and other molecules are absolutely central to molecular biology, as this example illustrates. To have a new method that not only reveals such interactions in a reliable way, from sequences of novel proteins, but also presents structurally detailed views of them, is astonishing. Extending this to bigger genomes and collections of targets, vs the relatively small 1500 periplasmic-related proteins tested here remains a challenge, but doubtless one that more effort and more computers will be able to solve.


Saturday, December 31, 2022

Hand-Waving to God

A decade on, the Discovery Institute is still cranking out skepticism, diversion, and obfuscation.

A post a couple of weeks ago mentioned that the Discovery Institute offered a knowledgeable critique of the lineages of the Ediacaran fauna. They have raised their scientific game significantly, and so I wanted to review what they are doing these days, focusing on two of their most recent papers. The Discovery Institute has a lineage of its own, from creationism. It has adapted to the derision that entailed, by retreating to "intelligent design", which is creationism without naming the creators, nailing down the schedule of creation, or providing any detail of how and from where creation operates. Their review of the Ediacaran fauna raised some highly skeptical points about whether these organisms were animals or not. Particularly, they suggested that cholesterol is not really restricted to animals, so the chemical traces of cholesterol that were so clearly found in the Dickinsonia fossil layers might not really mean that these were animals- they might also be unusual protists of gigantic size, or odd plant forms, etc. While the critique is not unreasonable, it does not alter the balance of the evidence which does indeed point to an animal affinity. These fauna are so primitive and distant that it is fair to say that we can not be sure, and particularly we can not be sure that they had any direct ancestral relationship to any later organisms of the ensuing Cambrian period, when recognizable animals emerged.

Fair enough. But what of their larger point? The Discovery Institute is trying to make the point, I believe, about the sudden-ness of early Cambrian evolution of animals, and thus its implausibility under conventional evolutionary theory. But we are traversing tens of millions of years through these intervals, which is a long time, even in evolutionary terms. Secondly, the Ediacaran period, though now represented by several exquisite fossil beds, spanned a hundred million years and is still far from completely characterized paleontologically, even supposing that early true animals would have fossilized, rather than being infinitesimal and very soft-bodied. So the Cambrian biota could easily have predecessors in the Ediacaran that have or have not yet been observed- it is as yet not easy to say. But what we can not claim is the negative, that no predecessors existed before some time X- say the 540 MYA point at the base of the Cambrian. So the implication that the Discovery Institute is attempting to suggest has very little merit, particularly since everything that they themselves cite about the molecular and paleontological sequence is so clearly progressive and in proper time sequence, in complete accord with the overall theory of evolution.

For we should always keep in mind that an intelligent designer has a free hand, and can make all of life in a day (or in six, if absolutely needed). The fact that this designer works in the shadows of slightly altered mutation rates, or in a few million years rather than twenty million, and never puts fossils out of sequence in the sedimentary record, is an acknowledgement that this designer is a bit dull, and bears a strong resemblence to evolution by natural selection. To put it in psychological terms, the institute is in the "negotiation" stage of grief- over the death of god.

Saturday, December 24, 2022

Brain Waves: Gaining Coherence

Current thinking about communication in the brain: the Communication Through Coherence framework.

Eyes are windows to the soul. They are visible outposts of the brain that convey outwards what we are thinking, as the gather in the riches of our visible surroundings. One of their less appreciated characteristics is that they flit from place to place as we observe a scene, never resting in one place. This is called saccade, and it represents an involuntary redirection of attention all over a visual scene that we are studying, in order to gather high resolution impressions from places of interest. Saccades happen at a variety of rates, centered around 0.1 second. And just as the raster scanning of a TV or monitor can tell us something about how it or its signal works, the eye saccade is thought, by the theory presented below, to represent a theta rhythm in the brain that is responsible for resetting attention- here, in the visual system.

That theory is Communication Through Coherence (CTC), which appears to be the dominant theory of how neural oscillations (aka brain waves) function. (This post is part of what seems like a yearly series of updates on the progress in neuroscience in deciphering what brain waves do, and how the brain works generally.) This paper appeared in 2014, but it expressed ideas that were floating around for a long time, and has since been taken up by numerous other groups that provide empirical and modeling support. A recent paper (titled "Phase-locking patterns underlying effective communication in exact firing rate models of neural networks") offers full-throated support from a computer modeling perspective, for instance. But I would like to go back and explore the details of the theory itself.

The communication part of the theory is how thoughts get communicated within the brain. Communication and processing are simultaneous in the brain, since it is physically arranged to connect processing chains (such as visual processing) together as cells that communicate consecutively, for example creating increasingly abstract representations during sensory processing. While the anatomy of the brain is pretty well set in a static way, it is the dynamic communication among cells and regions of the brain that generates our unconscious and conscious mental lives. Not all parts can be talking at the same time- that would be chaos. So there must be some way to control mental activity to manageable levels of communication. That is where coherence comes in. The theory (and a great deal of observation) posits that gamma waves in the brain, which run from about 30 Hz upwards all the way to 200 Hz, link together neurons and larger assemblages / regions into transient co-firing coalitions that send thoughts from one place to another, precisely and rapidly, insulated from the noise of other inputs. This is best studied in the visual system which has a reasonably well-understood and regimented processing system that progresses from V1 through V4 levels of increasing visual field size and abstraction, and out to cortical areas of cognition.

The basis of brain waves is that neural firing is rapid, and is followed by a refractory period where the neuron is resistant to another input, for a few milliseconds. Then it can fire again, and will do if there are enough inputs to its dendrites. There are also inhibitory cells all over the neural system, dampening down the system so that it is tuned to not run to epileptic extremes of universal activation. So if one set of cells entrains the next set of cells in a rhythmic firing pattern, those cells tend to stay entrained for a while, and then get reset by way of slower oscillations, such as the theta rhythm, which runs at about 4-8 Hz. Those entrained cells are, at their refractory periods, also resistant to inputs that are not synchronized, essentially blocking out noise. In this way trains of signals can selectively travel up from lower processing levels to higher ones, over large distances and over multiple cell connections in the brain.

An interesting part of the theory is that frequency is very important. There is a big difference between slower and faster entraining gamma rhythms. Ones that run slower than the going rate do not get traction and die out, while those that run faster hit the optimal post-refractory excitable state of the receiving cells, and tend to gain traction in entraining them downstream. This sets up a hierarchy where increasing salience, whether established through intrinsic inputs, or through top-down attention, can be encoded in higher, stronger gamma frequencies, winning this race to entrain downstream cells. This explains to some degree why EEG patterns of the brain are so busy and chaotic at the gamma wave level. There are always competing processes going on, with coalitions forming and reforming in various frequencies of this wave, chasing their tails as they compete for salience.

There are often bidirectional processes in the brain, where downstream units talk back to upstream ones. While originally imagined to be bidirectionally entrained in the same gamma rhythm, the CTC theory now recognizes that the distance / lag in signaling would make this impossible, and separates them as distinct streams, observing that the cellular targets of backwards streams are typically not identical to those generating the forward streams. So a one-cycle offset, with a few intermediate cells, would account for this type of interaction, still in gamma rhythm.

Lastly, attention remains an important focus of this theory, so to speak. How are inputs chosen, if not by their intrisic salience, such as flashes in a visual scene? How does a top-down, intentional search of a visual scene, or a desire to remember an event, work? CTC posits that two other wave patterns are operative. First is the theta rhythm of about 4-8 Hz, which is slow enough to encompass many gamma cycles and offer a reset to the system, overpowering other waves with its inhibitory phase. The idea is that salience needs to be re-established each theta cycle freshly, (such as in eye saccades), with maybe a dozen gamma cycles within each theta that can grow and entrain necessary higher level processing. Note how this agrees with our internal sense of thoughts flowing and flitting about, with our attention rapidly darting from one thing to the next.

"The experimental evidence presented and the considerations discussed so far suggest that top-down attentional influences are mediated by beta-band synchronization, that the selective communication of the attended stimulus is implemented by gamma-band synchronization, and that gamma is rhythmically reset by a 4 Hz theta rhythm."

Attention itself, as a large-scale backward flowing process, is hypothesized to operate in the alpha/beta bands of oscillations, about 8 - 30 Hz. It reaches backward over distinct connections (indeed, distinct anatomical layers of the cortex) from the forward connections, into lower areas of processing, such as locations in the visual scene, or colors sought after, or a position a page of text. This slower rhythm could entrain selected lower level regions, setting some to have in-phase and stronger gamma rhythms vs other areas not activated in this way. Why the theta and the alpha/beta rhythms have dramatically different properties is not dwelt on by this paper. One can speculate that each can entrain other areas of the brain, but the theta rhythm is long and strong enough to squelch ongoing gamma rhythms and start many off at the same time in a new competitive race, while the alpha/beta rhythms are brief enough, and perhaps weak enough and focused enough, to start off new gamma rhythms in selected regions that quickly form winning coalitions heading upstream.

Experiments on the nature of attention. The stimulus shown to a subject (probably a monkey) is in A. In E, the monkey was trained to attend to the same spots as in A, even though both were visible. V1 refers to the lowest level of the visual processing area of the brain, which shows activity when stimulated (B, F) whether or not attention is paid to the stimulus. On the other hand, V4 is a much higher level in the visual processing system, subject to control by attention. There, (C, G), the gamma rhythm shows clearly that only one stimulus is being fielded.

The paper discussing this hypothesis cites a great deal of supporting empirical work, and much more has accumulated in the ensuing eight years. While plenty of loose ends remain and we can not yet visualize this mechanism in real time, (though faster MRI is on the horizon), this seems the leading hypothesis that both explains the significance and prevalence of neural oscillations, and goes some distance to explaining mental processing in general, including abstraction, binding, and attention. Progress has not been made by great theoretical leaps by any one person or institution, but rather by the slow process of accumulation of research that is extremely difficult to do, but of such great interest that there are people dedicated enough to do it (with or without the willing cooperation of countless poor animals) and agencies willing to fund it.


  • Local media is a different world now.
  • Florida may not be a viable place to live.
  • Google is god.

Saturday, December 17, 2022

The Pillow Creatures That Time Forgot

Did the Ediacaran fauna lead to anything else, or was it a dead end?

While to a molecular biologist, the evolution of the eukaryotic cell is probably the greatest watershed event after the advent of life itself, most others would probably go with the rise of animals and plants, after about three billion years of exclusively microbial life. This event is commonly located at the base of the Cambrian, (i.e. the Cambrian explosion), which is where the fossils that Darwin and his contemporaries were familiar with began, about 540 million years ago. Darwin was puzzled by this sudden start of the fossil record, from apparently nothing, and presciently held (as he did in the case of the apparent age of the sun) that the data were faulty, and that the ancient character of life on earth would leave other traces much farther back in time.

That has indeed proved to be the case. There are signs of microbial life going back over three billion years, and whole geologies in the subsequent time dependent on its activity, such as the banded iron formations prevalent around two billion years ago that testify to the slow oxygenation of the oceans by photosynthesizing microbes. And there are also signs of animal life prior to the Cambrian, going back roughly to 600 million years ago that have turned up, after much deeper investigations of the fossil record. This immediately pre-Cambrian period is labeled the Ediacaran, for one of its fossil-bearing sites in Australia. A recent paper looked over this whole period to ask whether the evolution of proto-animals during this time was a steady process, or punctuated by mass extinction event(s). They conclude that, despite the patchy record, there is enough to say that there was a steady (if extremely slow) march of ecological diversification and specialization through the time, until the evolution of true animals in the Cambrian literally ate up all the Ediacaran fauna. 

Fossil impression of Dickinsonia, with trailing impressions that some think might be a trail from movement. Or perhaps just friends in the neighborhood.
 
For the difference between the Ediacaran fauna and that of the Cambrian is stark. The Ediacaran fauna is beautiful, but simple. There are no backbones, no sensory organs. No mouth, no limbs, no head. In developmental terms, they seem to have had only two embryological cell layers, rather than our three, which makes all the difference in terms of complexity. How they ate remains a mystery, but they are assumed to have simply osmosed nutrients from their environment, thanks to their apparently flat forms. A bit like sponges today. As they were the most complex animals at the time, (and some were large, up to 2 meters long), they may have had an easy time of it, simply plopping themselves on top of rich microbial mats, oozing with biofilms and other nutrients.

The paper provides a schematic view of the ecology at single locations, and also of longer-term evolution, from a sequence of views (i.e. fossils) obtained from different locations around the world of roughly ten million year intervals through the Ediacaran. One noticeable trend is the increasing development or prevalence of taller fern-like forms that stick up into the water over time, versus the flatter bottom-dwelling forms. This may reflect some degree of competition, perhaps after the bottom microbial mats have been over-"grazed". A second trend is towards slightly more complexity at the end of the period, with one very small form (form C (a) in the image below) even marked by shell remains, though what its animal inhabitant looked like is unknown. 

Schematic representation of putative animals observed during the Ediacaran epoch, from early, (A, ~570 MYA, Avalon assemblage), middle, (B, ~554 MYA, White River and other assemblages), and late (C, ~545 MYA, Nama assemblage). The A panel is also differentiated by successional forms from early to mature ecosystems, while the C panel is differentiated by ocean depth, from shallow to deep. The persistence of these forms is quite impressive overall, as is their common simplicity. But lurking somewhere among them are the makings of far more complicated animals.

Very few of these organisms have been linked to actual animals of later epochs, so virtually all of them seem to have been superceded by the wholly different Cambrian fauna- much of which itself remains perplexing. One remarkable study used mass-spec chemical analysis on some Dickinsonia fossils from the late Ediacaran to determine that they bore specific traces of cholesterol, marking them as probable animals, rather than overgrown protists or seaweed. But beyond that, there is little that can be said. (Note a very critical and informed review of all this from the Discovery Institute, of all places.) Their preservation is often remarkable, considering the age involved, and they clearly form the sole fauna known from pre-Cambrian times. 

But the core question of how the Cambrian (and later) animals came to be remains uncertain, at least as far as the fossil record is concerned. One relevant observation is that there is no sign of burrowing through the sediments of the Ediacaran epoch. So the appearance of more complex animals, while it surely had some kind of precedent deep in the Ediacaran, or even before, did not make itself felt in any macroscopic way then. It is evident that once the triploblastic developmental paradigm arose, out of the various geologic upheavals that occurred at the bases of both the Ediacaran and the Cambrian, its new design including mouths, eyes, spines, bones, plates, limbs, guts, and all the rest that we are now so very familiar with, utterly over-ran everything that had gone before.

Some more fine fossils from Canada, ~ 580 MYA.


  • A video tour of some of the Avalon fauna.
  • An excellent BBC podcast on the Ediacaran.
  • We need to measure the economy differently.
  • Deep dive on the costs of foreign debt.
  • Now I know why chemotherapy is so horrible.
  • Waste on an epic scale.
  • The problem was not the raids, but the terrible intelligence... by our intelligence agency.

Saturday, December 10, 2022

Mechanics of the ATP Synthesizing Machine

ATP sythase is a generator with two rotors, just like any other force-transducing generator.

Protein structural determination has progressed tremendously, with the advent of cryo-electron microscopy which allows much faster determinations of more complex structures than previously. One beneficiary is the enzyme at the heart of the mitochondrion that harnesses the proton motive force (pmf; difference of pH and charge across the inner mitochondrial membrane) to make ATP. The pmf is created by the electron transport chains of respiration, powered by the breakdown of our food, and ATP is the most general currency of energy in our cells. And in bacteria as well. The work discussed today was all done using E. coli, which in this ancient and highly conserved respect is a very close stand-in for our own biology.

The ATP synthase is rotary device. Just like a water wheel has one wheel that harnesses a running stream, linked by gears or other mechanism to a second wheel that grinds corn, or generates electricity, the ATP synthase has one wheel that is powered by protons flowing inwards, linked to another wheel that synthesizes ATP. The second wheel doesn't turn. Rather, the linking rotor from the proton wheel (called Fo) has an asymmetric cam at the end that pokes into the center of the ATP synthase wheel, (called F1), and deforms that second wheel as it rotates around inside. The deformations are what induces the ATP sythase to successively (1) bind ADP and phosphate, (2) close access and join them together into ATP, and lastly (3) release the ATP back out. This wheel has three sections, thus one turn yields three ATPs, and it takes 120 degrees of turn to create one ATP. This mechanism is nicely illustrated in a few videos.

The ATP synthase has several parts. The top rotor (yellow, orange; proton rotor, or "c" rotor) is embedded in the inner mitochondrial membrane, and rotates as it conducts protons from outside (top) inwards. The center rotor (white, red) is attached to it and also rotates as it sticks into the bottom ATP synthesizing subunits (green, khaki). That three-fold symmetric protein complex is static, (held in place by the non-moving stator subunits (blue, teal), and synthesizes ATP as its conformation is progressively banged around by the rotor. At the bottom are diagrams of the ATP generating strokes (three per rotation), with pauses (green) reflecting the strain of synthesizing ATP. All this was detected from the single molecules tracked by polarized light coming from the polarizing gold rods attached to the proton rotor (AuNR- gold nano rod).


Some recent papers focus on the other end of the machine- the proton rotor. It has ten subunits, (termed "c", so this is also called the c rotor), each of which binds a proton. Thus the ultimate stoichiometry is that 10 protons yield 3 ATP, for a 3.33 protons per ATP efficiency. (The pH difference needs to be about 3 units, or 1000 to 5000 fold in proton concentration, to create sufficient pmf.) But there are certain asymmetries involved. For one, there is a "stator" that holds the ATP synthetase stable vs the proton rotor and spans across them, attaching stably to the former and gliding along the rotations of the latter. This stator creates some variation in how the rotors at both ends operate. Also, the 10:3 ratio means that some power strokes that force the ATP sythase along will behave differently, either with more power at the beginning or at the end of the 120 degree arc. 

These papers posit that there is enough flexibility in the linkage to smooth out these ebbs and flows. Within the stator is a critical subunit ("a") which conducts the protons in both directions, both from outside onto the "c" rotor, and then off the "c" rotor and into the inner mitochondrial matrix. Interestingly, the protein rotor of "c" subunits ferries those protons all the way around, so that they come in and go back off at nearly the same point, at the "a" subunit surface. This means that they are otherwise stably bound to the proton rotor as it flies around in the membrane, a hydrophobic environment that presumably offers no encouragement for those protons to leave. So in summary, the protons from outside (the intermembrane space of the mitochondrion) enter by the outer "a" channel, then land on one of the proton rotor's "c" subunits, take one trip around the rotor, and then exit off via the inner "a" channel.

One question is the nature of these channels. There are, elsewhere in biology, channels that find ways to conduct protons in specific fashion, despite their extremely small size and similarity to other cations like sodium and potassium. But a more elegant way has been devised, called the Grotthuss mechanism. The current authors conduct extensive analysis of key mutations in these channels to show that this mechanism is used by the "a" subunit of the Fo protein. By this mechanism, a chain of water molecules are very carefully lined up through the protein. The natural hydrogen exchange property of water, by which the pH character and so many other properties of water occur, then allow an incoming proton to create a chain reaction of protonations and de-protonations along the water chain (nicely illustrated on the Wikipedia page) that, without really moving any of the water molecules, (or requiring much movement of the protons either), effectively conducts a net proton inwards with astonishing efficiency.

It is evident that the interface of the "a" and "c" subunits is such that a force-fed sequence of protons creates power that induces the rotation and eventually through the rotor linkage, the energy to synthesize ATP against its concentration gradient. It should be said parenthetically that this enzyme complex can be driven in reverse, and E. coli do occasionally use up ATP in reverse to re-establish their pmf gradient, which is used for many other processes.

One techical note is of interest. The authors of the main paper used single molecules of the whole ATP sythase, embedded in nano-membranes that they could observe optically and treat with different pH levels on each site to drive their activity. They also attached tiny gold bars (35 × 75 nm) to the top of each proton rotor to track its rotation by polarized light. This allowed very fine observations, which they used to look at the various pauses induced by the jump of each ATP synthesis event, and of each proton as it hopped on/off. Then they mutated selected amino acids in the supposed water channels that conduct proteins through the "a" subunit, which created greater delays, diagnostic of the Grotthuss mechanism. The channel is not lined with ions or ionizable groups, but is simply polar to accommodate a string of waters threading through the membrane and the "a" protein. Additionally, they estimate an "antenna" of considerable size composed of a "b" subunit and some of the "a" subunit of Fo that is exposed to the outside and by its negatively charged nature attracts and lines up plenty of protons, ready to transit through the rotor.

Another presentation of the proton rotor behavior. The stator "a" subunit is orange, and the "c" subunits are circles arranged in a rotor, seen from the top. The graph at right shows some of the matches or mismatches between the three-fold ATP synthesizing rotor (F1) and the ten-fold symmetric proton rotor (Fo, or "c"), leading to quite variable coupling of their power strokes. Yet there is enough elastic give in their coupling to allow continuous and reasonably rapid rotation (100 / sec).

In the end, incredible technical feats of optics, chemistry, and molecular biology are needed to decipher increasing levels of detail about the incredible feat of evolution that is embodied in this tiny powerhouse.