Showing posts with label article review. Show all posts
Showing posts with label article review. Show all posts

Saturday, November 22, 2025

Ground Truth for Genetic Mutations

Saturation mutagenasis shows that our estimates of the functional effect of uncharacterized mutations are not so great.

Human genomes can now be sequenced for less than $1,000. This technological revolution has enabled a large expansion of genetic testing, used for cancer tissue diagnosis and tracking, and for genetic syndrome analysis both of embryos before birth and affected people after birth. But just because a base among the 3 billion of the genome is different from the "reference" genome, that does not mean it is bad. Judging whether a variant (the modern, more neutral term for mutation) is bad takes a lot of educated guesswork.

A recent paper described a deep dive into one gene, where the authors created and characterized the functional consequence of every possible coding variant. Then they evaluated how well our current rules of thumb and prediction programs for variant analysis compare with what they found. It was a mediocre performance. The gene is CDKN2A, one of our more curious oddities. This is an important tumor suppressor gene that inhibits cell cycle progression and promotes DNA repair- it is often mutated in cancers. But it encodes not one, but two entirely different proteins, by virtue of a complex mRNA splicing pattern that uses distinct exons in some coding portions, and parts of one sequence in two different frames, to encode these two proteins, called p16 and p14. 

One gene, two proteins. CDKN2A has a splicing pattern (mRNA exons shown as boxes at top, with pink segments leading to the p14 product, and the blue segments leading the p16 product) that generates two entirely different proteins from one gene. Each product has tumor suppressing effects, though via distinct mechanisms.

Regardless of the complex splicing and protein coding characteristics, the authors generated all possible variants in every possible coded amino acid (156 amino acids in all, as both produced proteins are relatively short). Since the primary roles of these proteins are in cell cycle and proliferation control, it was possible to assay function by their effect when expressed in cultured pancreatic cells. A deleterious effect on the protein was revealed as, paradoxically, increased growth of these cells. They found that about 600 of the 3,000 different variants in their catalog had such an effect, or 20%.

This is an expected rate of effect, on the whole. Most positions in proteins are not that important, and can be substituted by several similar amino acids. For a typical enzyme, for instance, the active site may be made up of a few amino acids in a particular orientation, and the rest of the protein is there to fold into the required shape to form that active site. Similar folding can be facilitated by numerous amino acids at most positions, as has been richly documented in evolutionary studies of closely-related proteins. These p16 and p14 proteins interact with a few partners, so they need to maintain those key interfacial surfaces to be fully functional. Additionally, the assay these researchers ran, of a few generations of growth, is far less sensitive than a long-term true evolutionary setting, which can sift out very small effects on a protein, so they were setting a relatively high bar for seeing a deleterious effect. They did a selective replication of their own study, and found a reproducibility rate of about 80%, which is not great, frankly.

"Of variants identified in patients with cancer and previously reported to be functionally deleterious in published literature and/or reported in ClinVar as pathogenic or likely pathogenic (benchmark pathogenic variants), 27 of 32 (84.4%) were functionally deleterious in our assay"

"Of 156 synonymous variants and six missense variants previously reported to be functionally neutral in published literature and/or reported in ClinVar as benign or likely benign (benchmark benign variants), all were characterized as functionally neutral in our assay "

"Of 31 VUSs previously reported to be functionally deleterious, 28 (90.3%) were functionally deleterious and 3 (9.7%) were of indeterminate function in our assay."

"Similarly, of 18 VUSs previously reported to be functionally neutral, 16 (88.9%) were functionally neutral and 2 (11.1%) were of indeterminate function in our assay"

Here we get to the key issues. Variants are generally classified as benign, pathogenic/deleterious, or "variant of unknown/uncertain significance". The latter are particularly vexing to clinical geneticists. The whole point of sequencing a patient's tumor or genomic DNA is to find causal variants that can illuminate their condition, and possibly direct treatment. Seeing lots of "VUS" in the report leaves everyone in the dark. The authors pulled in all the common prediction programs that are officially sanctioned by the ACMG- Americal College of Medical Genetics, which is the foremost guide to clinical genetics, including the functional prediction of otherwise uncharacterized sequence variants. There are seven such programs, including one driven by AI, AlphaMissense that is related to the Nobel prize-winning AlphaFold. 

These programs strain to classify uncharacterized mutations as "likely pathogenic", "likely benign", or, if unable to make a conclusion, VUS/indeterminate. They rely on many kinds of data, like amino acid similarity, protein structure, evolutionary conservation, and known effects in proteins of related structure. They can be extensively validated against known mutations, and against new experimental work as it comes out, so we have a pretty good idea of how they perform. Thus they are trusted to some extent to provide clinical judgements, in the absence of better data. 

Each of seven programs (on bottom) gives estimations of variant effect over the same pool of mutations generated in this paper. This was a weird way to present simple data, but each bar contains the functional results the authors developed in their own data (numbers at the bottom, in parentheses, vertical). The bars were then colored with the rate of deleterious (black) vs benign (white) prediction from the program. The ideal case would be total black for the first bar in each set of three (deleterious) and total white in the third bar in each set (benign). The overall lineup/accuracy of all program predictions vs the author data was then overlaid by a red bar (right axis). The PrimateAI program was specially derived from comparison of homologous genes from primates only, yielding a high-quality dataset about the importance of each coded amino acid. However, it only gave estimates for 906 out of the whole set of 2964 variants. On the other hand, cruder programs like PolyPhen-2 gave less than 40% accuracy, which is quite disappointing for clinical use.

As shown above, the algorithms gave highly variable results, from under 40% accurate to over 80%. It is pretty clear that some of the lesser programs should be phased out. Of programs that fielded all the variants, the best were AlphaMissense and VEST, which each achieved about 70% accuracy. This is still not great. The issue is that, if a whole genome sequence is run for a patient with an obscure disease or syndrome, and variants vs the reference sequence are seen in several hundred genes, then a gene like CDKN2A could easily be pulled into the list of pathogenic (and possibly causal) variants, or be left out, on very shaky evidence. That is why even small increments in accuracy are critically important in this field. Genetic testing is a classic needle-in-a-haystack problem- a quest to find the one mutation (out of millions) that is driving a patient's cancer, or a child's inherited syndrome.

Still outstanding is the issue of non-coding variants. Genes are not just affected by mutations in their protein coding regions (indeed many important genes do not code for proteins at all), but by regulatory regions nearby and far. This is a huge area of mutation effects that are not really algorithmically accessible yet. As a prediction problem, it is far more difficult than predicting effects on a coded protein. It will requiring modeling of the entire gene expression apparatus, much of which remains shrouded in mystery.


Saturday, October 18, 2025

When the Battery Goes Dead

How do mitochondria know when to die?

Mitochondria are the energy centers within our cells, but they are so much more. They are primordial bacteria that joined with archaea to collaborate in the creation of eukaryotes. They still have their own genomes, RNA transcription and protein translation. They play central roles in the life and death of cells, they divide and coalesce, they motor around the cell as needed, kiss other organelles to share membranes, and they can get old and die. When mitochondria die, they are sent to the great garbage disposal in the sky, the autophagosome, which is a vesicle that is constructed as needed, and joins with a lysosome to digest large bits of the cell, or of food particles from the outside.

The mitochondrion spends its life (only a few months) doing a lot of dangerous reactions and keeping an electric charge elevated over its inner membrane. It is this charge, built up from metabolic breakdown of sugars and other molecules, that powers the ATP-producing rotary enzyme. And the decline of this charge is a sign that the mitochondrion is getting old and tired. A recent paper described how one key sensor protein, PINK1, detects this condition and sets off the disposal process. It turns out that the membrane charge does not only power ATP synthesis, but it powers protein import to the mitochondrion as well. Over the eons, most of the mitochondrion's genes have been taken over by the nucleus, so all but a few of the mitochondrion's proteins arrive via import- about 1500 different proteins in all. And this is a complicated process, since mitochondria have inner and outer membranes, (just as many bacteria do), and proteins can be destined to any of these four compartments- in either membrane, in the inside (matrix), or in the inter-membrane space. 

Figure 12-26. Protein import by mitochondria.
Textbook representation of mitochondrial protein import, with a signal sequence (red) at the front (N-terminus) of the incoming protein (green), helping it bind successively to the TOM and TIM translocators. 

The outer membrane carries a protein import complex called TOM, while the inner membrane carries an import complex called TIM. These can dock to each other, easing the whole transport process. The PINK1 protein is a somewhat weird product of evolution, spending its life being synthesized, transported across both mitochondrial membranes, and then partially chopped up in the mitochondrial matrix before its remains are exported again and fully degraded. That is when everything is working correctly! When the mitochondrial charge declines, PINK1 gets stuck, threaded through TOM, but unable to transit the TIM complex. PINK1 is a kinase, which phosphorylates itself as well as ubiquitin, so when it is stuck, two PINK1 kinases meet on the outside of the outer membrane, activate each other, and ultimately activate another protein, PARKIN, whose name derives from its importance in parkinson's disease, which can be caused by an excess of defective mitochondria in sensitive tissues, specifically certain regions and neurons of the brain. PARKIN is a ubiquitin ligase, which attaches the degradation signal ubiquitin to many proteins on the surface of the aged mitochondrion, thus signaling the whole mess to be gobbled up by an autophagosome.

A data-rich figure 1 from the paper shows purification of the tagged complex (top), and then the EM structure at bottom. While the purification (B, C) show the presence of TIM subunits, they did not show up in the EM structures, perhaps becuase they were not stable enough or frequent enough in proportion to the TOM subunits. But the PINK1+TOM_VDAC2 structures are stunning, helping explain how PINK1 dimerized so easily when it translocation is blocked.

The current authors found that PINK1 had convenient cysteine residues that allowed it to be experimentally crosslinked in the paired state, and thus freeze the PARKIN-activating conformation. They isolated large amounts of such arrested complexes from human cells, and used electon microscopy to determine the structure. They were amazed to see, not just PINK1 and the associated TOM complex, but also VDAC2, which is the major transporter that lets smaller molecules easily cross the outer membrane. The TOM complexes were beautifully laid out, showing the front end (N-terminus) of PINK1 threaded through each TOM complex, specifically the TOM40 ring structure.

What was missing, unfortunately, was any of the TIM complex, though some TIM subunits did co-purify with the whole complex. Nor was PARKIN or ubiquitin present, leaving out a good bit of the story. So what is VDAC2 doing there? The authors really don't know, though they note that reactive oxygen byproducts of mitochondrial metabolism would build up during loss of charge, acting as a second signal of mitochondrial age. These byproducts are known to encourage dimerization of VDAC channels, which naturally leads by the complex seen here to dimerization and activation of the PINK1 protein. Additionally, VDACs are very prevalent in the outer membrane and prominent ubiquitination targets for autophagy signaling.

To actually activate PARKIN ubiquitination, PINK1 needs to dissociate again, a process that the authors speculate may be driven by binding of ubiquitin by PINK1, which might be bulky enough to drive the VDACs apart. This part was quite speculative, and the authors promise further structural studies to figure out this process in more detail. In any case, what is known is quite significant- that the VDACs template the joining of two PINK1 kinases in mid-translocation, which, when the inner membrane charge dies away, prompts the stranded PINK1 kinases to activate and start the whole disposal cascade. 

Summary figure from the authors, indicating some speculative steps, such as where the reactive oxygen species excreted by VDAC2 sensitise PINK1, perhaps by dimerizing the VDAC channel itself. And where ubiquitin binding by PINK1 and/or VDAC prompts dissociation, allowing PARKIN to come in and get activated by PINK1 and spread the death signal around the surface of the mitochondrion.

It is worth returning briefly to the PINK1 life cycle. This is a protein whose whole purpose, as far as we know, is to signal that mitochondria are old and need to be given last rites. But it has a curiously inefficient way of doing that, being synthesized, transported, and degraded continuously in a futile and wasteful cycle. Evolution could hardly have come up with a more cumbersome, convoluted way to sense the vitality of mitochondria. Yet there we are, doubtless trapped by some early decision which was surely convenient at the time, but results today in a constant waste of energy, only made possible by the otherwise amazingly efficient and finely tuned metabolic operations of PINK1's target, the mitochondrion.


Note that at the glacial maxima, sea levels were almost 500 feet (150 meters) lower than today. And today, we are hitting a 3 million year peak level.

Sunday, October 5, 2025

Cycles of Attention

 A little more research about how attention affects visual computation.

Brain waves are of enormous interest, and their significance has gradually resolved over recent decades. They appear to represent synchronous firing of relatively large populations of neurons, and thus the transfer of information from place to place in the brain. They also induce other neurons to entrain with them. The brain is an unstable apparatus, never entraining fully with any one particular signal (that way lies epilepsy). Rather, the default mode of the brain is humming along with a variety of transient signals and thus brain waves as our thoughts, both conscious and unconscious, wander over space and time.

A recent paper developed this growing insight a bit further, by analyzing forward and backward brainwave relations in visual perception. Perception takes place in a progressive way at the back of the brain in the visual cortex, which develops the raw elements of a visual scene (already extensively pre-processed by the retina) into more abstract, useful representations, until we ... see a car, or recognize a face. At the same time, we perceive very selectively, only attending to very small parts of the visual scene, always on the go to other parts and things of interest. There is a feedback process, once things in a scene are recognized, to either attend to them more, or go on to other things. The "spotlight of attention" can direct visual processing, not just by filtering what comes out of the sausage grinder, but actually reaching into the visual cortex to direct processing to specific things. And this goes for all aspects of our cognition, which are likewise a cycle of search, perceive, evaluate, and search some more.

Visual processing generates gamma waves of information in an EEG, directed to, among other areas, the frontal cortex that does more general evaluation of visual information. Gamma waves are the highest frequency brain oscillations, (about 50-100 Hz), and thus are the most information rich, per unit time. This paper also confirmed that top-down oscillations, in contrast, are in the alpha / beta frequencies, (about 5-20 Hz). What they attempted was to link these to show that the top-down beta oscillations entrain and control the bottom-up gamma oscillations. The idea was to literally close the loop on attentional control over visual processing. This was all done in humans, using EEG to measure oscillations all over the brain, and TMS (transcranial magnetic stimulation) to experimentally induce top-down currents from the frontal cortex as their subjects looked at visual fields.

Correlation of frontal beta frequencies onto gamma frequencies from the visual cortex, while visual stimulus and TMS stimulation are both present. At top left is the overall data, showing how gamma cycles from the hind brain fall into various portions of a single beta wave, (bottom), after TMS induction on the forebrain. There is strong entrainment, a bit like AM radio amplitude modulation, where the higher frequency signal (one example top right) sits within the lower-frequency beta signal (bottom right). 

I can not really speak to the technical details and quality of this data, but it is clear that the field is settling into this model of what brain waves are and how they reflect what is going on under the hood. Since we are doing all sorts of thinking all the time, it takes a great deal of sifting and analysis to come up with the kind of data shown here, out of raw EEG from electrodes merely placed all over the surface of the skull. But it also makes a great deal of sense, first that the far richer information of visual bottom-up data comes in higher frequencies, while the controlling information takes lower frequencies. And second, that brain waves are not just a passive reflection of passing reflections, but are used actively in the brain to entrain some thoughts, accentuating them and bringing them to attention, while de-emphasizing others, shunting them to unconsciousness, or to oblivion.


Saturday, September 27, 2025

Dopamine: Get up and Go, or Lie Down and Die

The chemistry of motivation.

A recent paper got me interested in the dopamine neurotransmitter system. There are a limited number of neurotransmitters, (roughly a hundred), which are used for all communication at synapses between neurons. The more common transmitters are used by many cells and anatomical regions, making it hazardous in the extreme to say that a particular transmitter is "for" something or other. But there are themes, and some transmitters are more "niche" than others. Serotonin and dopamine are specially known for their motivational valence and involvement in depression, schizophrenia, addiction, and bipolar disorder, among many other maladies.

This paper described the reason why cancer patients waste away- a syndrome called cachexia. This can happen in other settings, like extreme old age, and in other illnesses. The authors ascribe cachexia (using mice implanted with tumors) to the immune system's production of IL6, one of scores of cytokines, or signaling proteins that manage the vast distributed organ that is our immune system. IL6 is pro-inflammatory, promoting inflammation, fever, and production of antibody-producing B cells, among many other things. These authors find that it binds to the area postrema in the brain stem, where many other blood-borne signals are sensed by the brain- signals that are generally blocked by the blood-brain barrier system.

The binding of IL6 at this location then activates a series of neuronal connections that these authors document, ending up inhibiting dopamine signaling out of the ventral tegmental area (VTA) in the lower midbrain, ultimately reducing dopamine action in the nucleus accumbens, where it is traditionally associated with reward, addiction, and schizophrenia. These authors use optically driven engineered neurons at an intermediate location, the parabrachial nucleus, (PBN), to reproduce how neuron activation there drives inhibition downstream, as the natural IL6 signal also does.  

Schematic of the experimental setup and anatomical locations. The graph shows how dopamine is strongly reduced under cachexia, consequent to the IL6 circuitry the authors reveal.

What is the rationale of all this? When we are sick, our body enters a quite different state- lethargic, barely motivated, apathetic, and resting. All this is fine if our immune system has things under control, uses our energy for its own needs, and returns us to health forthwith, but it is highly problematic if the illness goes on longer. This work shows in a striking and extreme way what had already been known- that prominent dopamine-driven circuits are core micro-motivational regulators in our brains. For an effective review of this area, one can watch a video by Robert Lustig, outlining at a very high level the relationship of the dopamine and serotonin systems.

Treatment of tumor-laden mice with an antibody to IL6 that reduces its activity relieves them of cachexia symptoms and significantly extends their lifespans.

It is something that the Buddhists understood thousands of years ago, and which the Rolling Stones and the advertising industry have taken up more recently. While meditation may not grant access to the molecular and neurological details, it seems to have convinced the Buddha that we are on a treadmill of desire, always unsatisfied, always reaching out for the next thing that might bring us pleasure, but which ultimately just feeds the cycle. Controlling that desire is the surest way to avoid suffering. Nowhere is that clearer than in addiction- real, clinical addictions that are all driven by the dopamine system. No matter what your drug of choice- gambling, sugar, alcohol, cocaine, heroin- the pleasure that they give is fleeting and alerts the dopamine system to motivate the user to seek more of the same. There are a variety of dopamine pathways, including those affecting Parkinson's and reproductive functions, but the ones at issue here are the mesolimbic and mesocortical circuits, that originate in the midbrain VTA and extend respectively to the nucleus accumbens in the lower forebrain, and to the cerebral cortex. These are integrated with the rest of our cognition, enabling motivation to find the root causes of a pleasurable experience, and raise the priority of actions that repeat those root causes. 

So, if you gain pleasure from playing a musical instrument, then the dopamine system will motivate you to practice more. But if you gain pleasure from cocaine, the dopamine system will motivate you to seek out a dealer, and spend your last dollar for the next fix. And then steal some more dollars. This system shows specifically the dampening behavior that is so tragic in addictions. Excess activation of dopamine-driven neurons can be lethal to those cells. So they adjust to keep activation in an acceptable range. That is, they keep you unsatisfied, in order to allow new stimuli to motivate you to adjust to new realities. No matter how much pleasure you give yourself, and especially the more intense that pleasure, it is never enough because this system always adjusts the baseline to match. One might think of dopamine as the micro-manager, always pushing for the next increment of action, no matter how much you have accomplished before, no matter how rosy or bleak the outlook. It gets us out of bed and moving through our day, from one task to the next.

In contrast, the serotonin system is the macro-manager, conveying feelings of general contentment, after a life well-lived and a series of true accomplishments. Short-circuiting this system with SSRIs like prozac carries its own set of hazards, like lack of general motivation and emotional blunting, but it does not have the risk of addiction, because serotonin, as Lustig portrays it, is an inhibitory neurotransmitter, with no risk of over-excitement. The brain does not re-set the baseline of serotonin the same way that it continually resets the baseline of dopamine.

How does all this play out in other syndromes? Depression is, like cachexia, at least in part syndrome of insufficient dopamine. Conversely, bipolar disorder in its manic phase appears to involve excess dopamine, causing hyperactivity and wildly excessive motivation, flitting from one task to the next. But what have dopamine antagonists like haloperidol and clozapine been used for most traditionally? As anti-psychotics in the treatment of schizophrenia. And that is a somewhat weird story. 

Everyone knows that the medication of schizophrenia is a haphazard affair, with serious side effects and limited efficacy. A tradeoff between therapeutic effects and others that make the recipient worse off. A paper from a decade ago outlined why this may be the case- the causal issues of schizophrenia do not lie in the dopamine system at all, but in circuits far upstream. These authors suggest that ultimately schizophrenia may derive from chronic stress in early life, as do so many other mental health maladies. It is a trail of events that raise the stress hormone cortisol, which diminishes cortical inhibition of hippocampal stress responses, and specifically diminishes the GABA (another neurotransmitter) inhibitory interneurons in the hippocampus. 

It is the ventral hippocampus that has a controlling influence over the VTA that in turn originates the relevant dopamine circuitry. The theory is that the ventral hippocampus sets the contextual (emotional) tone for the dopamine system, on top of which episodic stimulation takes place from other, more cognitive and perception-based sources. Over-activity of this hippocampal regulation raises the gain of the other signals, raising dopamine far more than appropriate, and also lowering it at other times. Thus treating schizophrenia with dopamine antagonists counteracts the extreme highs of the dopamine system, which in the nucleus accumbens can lead to hallucinations, delusions, paranoia, and manic activity, but it is a blunt instrument, also impairing general motivation, and further reducing cognitive, affect, parkinsonism, and other problems caused by low dopamine that occurs during schizophrenia in other systems such as the meso-cortical and the nigrostriatal dopamine pathways.

Manipulation of neurotransmitters is always going to be a rough job, since they serve diverse cells and pathways in our brains. Wikipedia routinely shows tables of binding constants for drugs (clozapine, for instance) to dozens of different neurotransmitter receptors. Each drug has its own profile, hitting some receptors more and others less, sometimes in curious, idiosyncratic patterns, and (surprisingly) across different neurotransmitter types. While some of these may occasionally hit a sweet spot, the biology and its evolutionary background has little relation to our current needs for clinical therapies, particularly when we have not yet truly plumbed the root causes of the syndromes we are trying to treat. Nor is precision medicine in the form of gene therapies or single-molecule tailored drugs necessarily the answer, since the transmitter receptors noted above are not conveniently confined to single clinical syndromes either. We may in the end need specific, implantable and computer-driven solutions or surgeries that respect the anatomical complexity of the brain.


Saturday, September 13, 2025

Action at the Heart of Action

How myosin works as a motor against actin to generate motion.

We use our muscles a lot, but do we know how they work? No one does, fully, but quite a bit is known. At the core is a myosin motor protein, which levers against actin filaments that are ordered in almost crystalline arrays inside muscle cells. This system long predates the advent of muscles, however, since all of our cells contain actin and myosin, which jointly help cells move around, and move cargoes around within cells. Vesicles, for instance, often traffic to where they are needed on roads of actin. The human genome encodes forty different forms of myosin, specialized for all sorts of different tasks. For example, hearing (and balance) depends in tiny rod-like hair cells filled with tight bundles of actin. Several myosin genes have variants associated with severe hearing loss, because they have important developmental roles in helping these structures form. Actin/myosin is one of the ancient transportation systems of life (the other is the dynein motor and microtubules).

Myosin uses ATP to power motion, and a great deal of work has gone into figuring how this happens. A recent paper took things to a new level by slowing down the action significantly. They used a mutant form of myosin that is specifically slower in the power stroke. And they used a quick mix and spray method that cut times between adding actin to the cocked myosin, and getting it frozen in a state ready for cryo-electron microscopy, down to 10 milliseconds. The cycle of the myosin motor goes like this:

  • End of power stroke, myosin bound to actin
  • ATP binds to myosin, unbinds from actin
  • Lever arm of myosin cocks back to a primed state, as ATP is hydolyzed to ADP + Pi
  • ADP is present, and myosin binds to actin again
  • Actin binding triggers both power stroke of the lever, and release of Pi and ADP
  • End of power stroke, myosin bound to actin

A schematic of the myosin/actin cycle. Actin is in pink, myosin in gray and green, with cargoes (if any, or bundle of other myosins as in muscle) linked below the green lever.

The structure that these researchers came up with is:

Basic structure of myosin (colors) with actin (gray), in two conformations- primed or post-power stroke. The blue domain at top (converter) is where the lever extension is attached and is the place with the motion / force is focused. But note how the rest of the myosin structure (lavender, green, yellow, red) also shifts subtly to assist the motion. 

They also provide a video of these transformations, based on molecular dynamics simulations.

Sampling times between 10 milliseconds and 120 milliseconds, they saw structures in each of the before and after configurations, but none in intermediate states. That indicates that the motor action is very fast, and the cocking/priming event puts the enzyme in an unstable configuration. The power stroke may not look like much, but the converter domain is typically hitched to a long element that binds to cargos, leading (below) to quite a bit of motion per stroke and per ATP. About 13 actin units can be traversed along the filament in a single bound, in fact. It is also noteworthy that this mechanism is very linear. The converter domain flips in the power stroke without twisting much, so that cargoes progress linearly along the actin road, without much loss of energy from side-to-side motion.

Fuller picture of how myosin (colored) with its lever extensions (blue) walks along actin (gray) by large steps, that cross up to 13 actin subunits at a time. The inset describes the very small amount of twist that happens, small enough that myosin walks in a rather straight line and easily finds the next actin landing spot without a lot of feeling about.

Finally, these authors delved into a few more details about the big structural transition of the power stroke. Each of these show subtle shifts in the structure that help the main transition along. In f/g the HCM loop dips down to bind actin more tightly. In h/i the black segment already bound to actin squinches down into a new loop, probably swinging myosin slightly over to the right. This segment is at the base of the green segment, so has strong transmission effects on the power stroke. In j/k the ATP binding site, now holding ADP and Pi, loses the phosphate Pi, and there are big re-arrangements of all the surrounding loops- green, lavender, and blue. These images do not really do justice to the whole motion, nor really communicate how the ATP site sends power through the green domain to the converter (top, blue) domain which flips for the power stroke. The video referenced above gives more details, though without much annotation.

Detailed closeups of the before/after power stroke structures. Coloring is consistent with the strucutres above.


  • Reaping what one sows.
  • Oh, and about guns.
  • A room of one's own.

Saturday, September 6, 2025

How to Capture Solar Energy

Charge separation is handled totally differently by silicon solar cells and by photosynthetic organisms.

Everyone comes around sooner or later to the most abundant and renewable form of energy, which is the sun. The current administration may try to block the future, but solar power is the best power right now and will continue to gain on other sources. Likewise, life started by using some sort of geological energy, or pre-existing carbon compounds, but inevitably found that tapping the vast powers streaming in from the sun was the way to really take over the earth. But how does one tap solar energy? It is harder than it looks, since it so easily turns into heat and lost energy. Some kind of separation and control are required, to isolate the power (that is to say, the electron that was excited by the photon of light), and harness it to do useful work.

Silicon solar cells and photosynthesis represent two ways of doing this, and are fundamentally, even diametrically, different solutions to this problem. So I thought it would be interesting to compare them in detail. Silicon is a semiconductor, torn between trapping its valence electrons in silicon atoms, or distributing them around in a conduction band, as in metals. With elemental doping, silicon can be manipulated to bias these properties, and that is the basis of the solar cell.

Schematic of a silicon solar cell. A static voltage exists across the N-type to P-type boundary, sweeping electrons freed by the photoelectric effect (light) up to the conducting electrode layer.


Solar cells have one side doped to N status, and the bulk set to P doping status. While the bulk material is neutral on both sides, at the boundary, a static charge scheme is set up where electrons are attracted into the P-side, and removed from the N-side. This static voltage has very important effects on electrons that are excited by incoming light and freed from their silicon atoms. These high energy electrons enter the conduction band of the material, and can migrate. Due to the prevailing field, they get swept towards the N side, and thus are separated and can be siphoned off with wires. The current thus set up can exert a pressure of about 0.6 volt. That is not much, nor is it equivalent to the 2 to 3 electron volts received from each visible photon. So a great deal of energy is lost as heat.

Solar cells do not care about capturing each energized electron in detail. Their purpose is to harvest a bulk electrical voltage + current with which to do some work in our electrical grids. Photosynthesis takes an entirely different approach, however. This may be mostly for historical and technical reasons, but also because part of its purpose is to do chemical work with the captured electrons. Biology tends to take a highly controlling approach to chemistry, using precise shapes, functional groups, and electrical environments to guide reactions to exact ends. While some of the power of photosynthesis goes toward pumping protons out of the membrane, setting up a gradient later used to make ATP, about half is used for other things like splitting water to replace lost electrons, and making reducing chemicals like NADPH.

A portion of a poster about the core processes of photosynthesis. It provides a highly accurate portrayal of the two photosystems and their transactions with electrons and protons.

In plants, photosynthesis is a chain of processes focused around two main complexes, photosystems I and II, and all occurring within membranes- the thylakoid membranes of the chloroplast. Confusingly, photosystem II comes first, accepting light, splitting water, pumping some protons, and sending out a pair of electrons on mobile plastoquinones, which eventually find their way to photosystem I, which jacks up their energy again using another quantum of light, to produce NADPH. 

Photosystem II is full of chlorophyll pigments, which are what get excited by visible photons. But most of them are "antenna" chlorophylls, passing the excitation along to a pair of centrally located chlorophylls. Note that the light energy is at this point passed as a molecular excitation, not as a free electron. This passage may happen by Förster resonance energy transfer, but is so fast and efficient that stronger Redfield coupling may be involved as well. Charge separation only happens at the reaction center, where an excited electron is popped out to a chain of recipients. The chlorophylls are organized so that the pair at the reaction center have a slightly lower energy of excitation, thus serve as a funnel for excitation energy from the antenna system. These transfers are extremely rapid, on the picosecond time scale.

It is interesting to note tangentially that only red light energy is used. Chlorophylls have two excitation states, excited by red light (680 nm = 1.82 eV) and blue light (400-450 nm, 2.76 eV) (note the absence of green absorbance). The significant extra energy from blue light is wasted, radiated away to let it (the excited electron) relax to the lower excitation state, which is then passed though the antenna complex as though it had come from red light. 

Charge separation is managed precisely at the photosystem II reaction center through a series of pigments of graded energy capacity, sending the excited electron first to a neighboring chlorophyll, then to a pheophytin, then to a pair of iron-coordinated quinones, which then pass two electrons to a plastoquinone that is released to the local membrane, to float off to the cytochrome b6f complex. In photosystem II, another two photons of light are separately used to power the splitting of one water molecule, (giving two electrons and pumping two protons). So the whole process, just within photosystem II, yields, per four light quanta, four protons pumped from one side of the membrane to the other. Since the ATP sythetase uses about three protons per ATP, this nets just over one ATP per four photons. 

Some of the energetics of photosystem II. The orientations and structures of the reaction center paired chlorophylls (Pd1, Pd2), the neighboring chlorophyll (Chl), and then the pheophytin (Ph) and quinones (Qa, Qb) are shown in the inset. Energy of the excited electron is sacrifice gradually to accomplish the charge separation and channeling, down to the final quinone pairing, after which the electrons are released to a plastoquinone and send to another complex in the chain.

So the principles of silicon and biological solar cells are totally different in detail, though each gives rise to a delocalized field, one of electrons flowing with a low potential, and the other of protons used later for ATP generation. Each energy system must have a way to pop off an excited electron in a controlled, useful way that prevents it from recombining with the positive ion it came from. That is why there is such an ornate conduction pathway in photosystem II to carry that electron away. Overall, points go to the silicon cell for elegance and simplicity, and we in our climate crisis are the beneficiaries, if we care to use it. 

But the photosynthetic enzymes are far, far older. A recent paper pointed out that no only are photosystems II and I clearly cousins of each other, but it is likely that, contrary to the consensus heretofore, photosystem II is the original version, at least of the various photosystems that currently exist. All the other photosystems (including those in bacteria that lack oxygen stripping ability) carry traces of the oxygen evolving center. It makes sense that getting electrons is a fundamental part of the whole process, even though that chemistry is quite challenging. 

That in turn raises a big question- if oxygen evolving photosystems are primitive (originating very roughly with the last common ancestor of all life, about four billion years ago) then why was earth's atmosphere oxygenated only from two billion years ago onward? It had been assumed that this turn in Earth history marked the evolution of photosystem II. The authors point out additionally that there is also evidence for the respiratory use of oxygen from these extremely early times as well, despite the lack of free oxygen. Quite perplexing, (and the authors decline to speculate), but one gets the distinct sense that possibly life, while surprisingly complex and advanced from early times, was not operating at the scale it does today. For example, colonization of land had to await the buildup of sufficient oxygen in the atmosphere to provide a protective ozone layer against UV light. It may have taken the advent of eukaryotes, including cyanobacterial-harnessing plants, to raise overall biological productivity sufficiently to overcome the vast reductive capacity of the early earth. On the other hand, speculation about the evolution of early life based on sequence comparisons (as these authors do) is notoriously prone to artifacts, since what evolves at vanishingly slow rates today (such as the photosystem core proteins) must have originally evolved at quite a rapid clip to attain the functions now so well conserved. We simply can not project ancient ages (at the four billion year time scales) from current rates of change.


Saturday, August 23, 2025

Why Would a Bacterium Commit Suicide?

Our innate immune system, including suicide of infected cells, has antecedents in bacteria.

We have a wide variety of defense from pathogens, from our skin and its coating of RNase and antimicrobial peptides, to the infinite combinatorial firepower of the adaptive immune system, which is primed by vaccines. In between is something called the innate immune system, which is built-in and static rather than adaptive, but is very powerful nonetheless. It is largely built around particular proteins that recognize common themes in pathogens, like the free RNA and DNA of viral genomes, or lipopolysaccharide that coats most bacteria. There are also internal damage signals, such as cellular proteins that have leaked out and are visible to wandering immune cells, that raise similar alarms. The alarms lead to inflammation, the gathering of immune cells, and hopefully to resolution of the problem. 

One powerful defensive strategy our cells have is apoptosis, or cellular suicide. If the signals from an incoming infection are too intense, a cell, in addition to activating its specific antiviral defenses, goes a few steps further and generates a massive inflammasome that rounds up and turns on a battery of proteases that chew up the cell, destroying it from inside. The pieces are then strewn around to be picked up by the macrophages and other cleanup crews, which hopefully can learn something from the debris about the invading pathogen. One particular target of these proteases are gasdermins, which are activated via this proteolysis and then assemble into huge pores that plant themselves into the plasma membrane and mitochondrial membranes, rapidly killing the cell by collapsing all the ion gradients across these membranes. 

A human cell committing apoptosis, and falling apart.

A recent paper showed that key parts of this apparatus is present in bacteria as well. It was both technically interesting, since they relied on a lot of AI tools to discern the rather distant relations between pathogen (that is to say, phages- the viruses of bacteria) receptors from bacteria, and generally intriguing, because suicide is generally something thought to be a civilized behavior of cells in multicellular organisms, protecting the rest of the body from spread of the pathogen. Bacteria, despite living in mucky biofilms and other kinds of colonies, are generally thought to be loners, only out for their own reproduction. Why would they kill themselves? Well, anytime they are in a community, that community is almost certainly composed of relatives, probably identical clones of a single founding cell. So it would be a highly related community indeed, and well worth protecting in this way. 

A bacterial gasdermin outruns phages infecting the cell. Two kinds of cells are mixed together here, ones without a gasdermin (green) and ones with (black). All are infected at zero time, and a vital dye is added (pink) that only gets into cells through large pores, like the gasdermin pore. At 45 minutes and after, the green (control) cells are dying and getting blown apart by escaping phages. On the other hand, the gasdermin+ cells develop pores and get stained pink, showing that they are dead too. But they don't blow up, indicating that they have shut down phage propagation.

The researchers heard that some bacteria have gasdermins, so they wondered whether they have the other parts of the system- the proteases and the sensor proteins. And indeed, they do. While traditional sequence similarity analysis didn't say so, structural comparison courtesy of the AlphaFold program showed that a protease in the same operon as gasdermin had CARD domains. These domains are signatures of caspases and of caspase interacting proteins, like the sensor proteins in the human innate immune system. They bind other CARD domains, thus mediating assembly of the large complexes that lead to inflammation and apoptosis.

Structure of the bacterial CARD domain, courtesy of AlphaFold, showing some similarity with a human CARD domain, which was not very apparent on the sequence level.

The operon of this bacterium, which encodes the whole system- gasdermin, protease (two of them), and sensor.

The researchers then raised their AI game by using another flavor of AlphaFold to predict interactions that the bacterial CARD/protease protein might have. This showed an interaction with another protein in the same operon, with similarity to NLR sensor proteins in humans, which they later confirmed happened in vitro as well. This suggests that this bacterium, and many bacteria, have the full circuit of sensor for incoming phage, activatable caspase-like protease, and then cleavable gasdermin as the effector of cell suicide.

A comparison of related operons from several other bacteria.

Looking at other bacteria, they found that many have similar systems. Some link to other effectors, rather than a pore-forming gasdermin. But most share a similar sensor-to-protease circuit that is the core of this defense system. Lastly, they also asked what triggers this whole system from the incoming phage. The answer, in this case, is a phage protein called rIIB. Unfortunately, it is not clear either what rIIB does for the phage or whether it triggers the CARD/gasdermin system by activating the bacterial NLR sensor protein, as would be assumed. What is known, however, is that rIIB has a function in defending phage against another bacterial defense system called RexAB. This it looks as though this particular arms race has ramified into a complicated back and forth as bacteria try as best they can to insure themselves against mass infection.


Saturday, August 9, 2025

A Wonderland of RNA

A snoRNA mates with the 7SL RNA and mRNA to promote protein secretion.

As molecular biologists wander through the wilderness of the cell, they keep stumbling across RNAs. From early on, the ribosomal RNA (rRNA) and amino acid transfer RNA (tRNA) were obviously incredibly abundant, in their somewhat inefficient job of carrying on translation. Messenger RNAs (mRNA) were less abundant, but recognized from the start for their key role relaying information from the genome. But over the decades, more and more types of RNA kept popping up. Here is one tabulation of genes by type in humans:

One big step in the realization of the prevalence of RNA was the ENCODE project, done as part of the human genome project. They found that most of the genome is transcribed to RNA, one way or another. Not all those products are important, or abundant, but just the fact that all this RNA is floating around was startling. This does not mean that there isn't junk DNA, (or junk RNA), but it does mean that a lot of potential function lurks waiting to be found. And the last couple of decades have seen many such finds. 

From the list above, microRNAs are small fragments that bind to matching mRNAs and repress their translation to protein. They have wide-ranging networks of regulation, mostly of a fine-tuning nature, but sometimes quite decisive and relevant to human biology and pathology. snRNAs are small nuclear DNAs, some of which function in RNA splicing. snoRNAs are small nucleolar RNAs, some of which mate with various sections of the ribosomal RNA as it is being assembled in the nucleolus, and guide chemical modifications made by enzymes, such as attachment of methyl and uridine groups. The non-coding (nc) RNAs are typically products of protein coding genes that, due to splicing or altered start sites, happen to not code for anything, and occasionally have significant regulatory roles. 

In general, RNAs may have a few different mechanisms of action: guide characteristics, where they mate with their antisense sequence in a target RNA and direct some other process like sequestration, cleavage, or chemical modification. Or they may bind to specific proteins, such as the RNAs that bind to chromatin and regulate X-linked dosage compensation. Or they have structural, even catalytic roles, like the ribosomal and spliceosomal RNAs. 

What should be clear that there are many more genes are recognizable by sequence than we understand. Only a couple hundred snoRNA genes are understood by their targets and activity. But there are well over a thousand in the genome. What do the rest do? A recent paper took on this quest, devising a novel way to isolate these snoRNAs and their partners from the welter of other material. They did this by crosslinking everything, ligating the RNAs locally to each other (which linked the snoRNAs to their targets) and then reverse-transcribing the RNAs before trying to capture them individually by custom anti-sense DNA probes, one per gene. It was a complicated procedure, but far more productive than trying to capture them directly as RNA with antisense RNA probes, since these snoRNAs are intensely structured (lots of hairpins and other duplexes) and expected to be tightly bound to other things.

Taking the most abundant snoRNAs, these researchers then looked for novel partners and functions. After seeing that they recovered plenty of the known interactions, the most interesting novel interaction they came up with was of a gene called SNORA73. This was found linked to two other RNAs, 7SL RNA and various mRNAs. 

Just another holdover from the RNA world. The SRP particle (in red) is built around the 7SL snRNA (helix). This particle detects the signal peptide (green) of the nascent protein emerging from the ribosome (beige, blue), and clamps on (right) to arrest translation. Translation is later resumed after the whole complex has successfully docked with the membrane receptor, allowing the SRP to be released, and the peptide to be threaded through the membrane. 

Funny story ... 7SL RNA is yet another snRNA that has a key role in translation. It is the core of the signal recognition particle (SRP), which binds to "signal" sequences in proteins as they come off the ribosome. These are a special code segement at the start that says "I want to be secreted across (or into) a membrane, not just located in the cytoplasm". The SRP captures this signal segment, and then sticks its head into the ribosome, stalling its translation. Then the whole mess goes off to the membrane (endoplasmic reticulum in eukaryotes, or plasma membrane in bacteria) where it docks with the SRP receptor complex. This is the signal for translation to restart, the SRP to come off, and the nascent protein to thread its way through the membrane to the other side. 

Incidentally, it is notable also that SRP is scaffolded by a large RNA, with a few proteins stuck on for decoration / specificity. This makes sense as an echo of early evolution, where not only did RNAs likely arise before proteins ever existed, but those RNAs had gotten quite large while the earliest proteins were still relatively small. The genetic code appears to have started as a two letter code, before the third letter was munged onto the end, vastly expanding the chemical repertoire of proteins and making them premier catalysts. 

A few results, indicating that knockdown of SNORA73 (with the anti-RNAs LNA-1 and LNA-2) dramatically decreases secretion of the proteins CLU and LGAL3BP. On left  are signals from proteins isolated from inside and outside the cells, as indicated. On right is a graph of the same data. The mRNA levels are not changed nor the protein levels. Only the level of secretion is altered.

So the implication of all this was that SNORA73 affects protein translation/secretion. This is indeed the case, when these authors assayed the secretion of one of the SNORA73-bound mRNA-encoded proteins in the presence of an inhibitor of SNORA73 (above). The mechanism is that SNORA73 serves as a special glue between the 7SL snRNA and the translating mRNA, with parts of its RNA sequence complementary to both a segment of the 7SL snRNA, and also to a small 10 base-long segment of the mRNA. The mRNA segment is hanging off the ribosome while the beginning of the message is being translated. The whole setup helps the SRP find these mRNAs efficiently and hold on to them effectively, increasing not their translation rate, but their secretion rate.

Models of the structures of SNORA73 (which is made by a pair of similar genes, A and B), as they bind to the 7SL snRNA, and the target mRNAs. These binding areas are far apart, to allow the mRNA tail (that is not yet in the ribosome) to reach the MBM binding site. The psi pocket is of uncertain function, but in other snoRNAs directs the uridine addition to target rRNA.

The mRNAs that have this 10 base (MBM) signal that binds to SNORA73 are a subset of those that express secreted proteins, though it is not really clear from this work what kind of a subset this is. Perhaps this mechanism makes up for weak signal sequences, or some other defect in the protein's access to the secretion machinery. Whatever that logic, we have here a conjunction of four RNAs, (7 SL snRNA, the SNORA73 snoRNA, the mRNA target, and the ribosomal RNA structure) all collaborating to promote the secretion of a target protein. This is just one of thousands of uncharacterized and conserved RNAs visible in our genome. It is startling to think what else might be going on.


Saturday, August 2, 2025

The Origin of Life

What do we know about how it all began? Will we ever know for sure?

Of all the great mysteries of science, the origin of life is maybe the one least likely to ever be solved. It is a singular event that happened four billion years ago in a world vastly different from ours. Scientists have developed a lot of ideas about it and increased knowledge of this original environment, but in the end, despite intense interest, the best we will be able to do is informed speculation. Which is, sure, better than uninformed speculation, (aka theology), but still unsatisfying. 

A recent paper about sugars and early metabolism (and a more fully argued precursor) piqued my interest in this area. It claimed that there are non-enzymatic ways to generate most or all of the core carbohydrates of glycolysis and CO2 fixation around pentose sugars, which are at the core of metabolism and the supply of sugars like ribose that form RNA, ATP, and other key compounds. The general idea is that at the very beginning of life, there were no enzymes and proteins, so our metabolism is patterned on reactions that originally happened naturally, with some kind of kick from environmental energy sources and mineral catalysts, like iron, which was very abundant. 

That is wonderful, but first, we had better define what we mean by life, and figure out what the logical steps are to cross this momentous threshold. Life is any chemical process that can accomplish Darwinian evolution. That is, it replicates in some fashion, and it has to encode those replicated descendants in some way that is subject to mutation and selection. With those two ingredients, we are off to the races. Without them, we are merely complex minerals. Crystals replicate, sometimes quite quickly, but they do not encode descendent crystals in a way that is complex at all- you either get the parent crystal, or you get a mess. This general theory is why the RNA world hypothesis was, and remains, so powerful. 

The RNA world hypothesis is that RNA is likely the first genetic material, before DNA (which is about 200 times more stable) was devised. RNA also has catalytic capabilities, so it could encode in its own structure some of the key mechanisms of life, therefore embodying both of the critical characteristics of life specified above. The fact that some key processes remain catalyzed by RNA today, such as ribosomal synthesis of proteins, spliceosomal re-arrangement of RNAs, and cutting of RNAs by RNAse P, suggest that proteins (as well as DNA) were the Johnny-come-latelies of the chemistry of life, after RNA had, in its lumbering, inefficient way, blazed the trail. 


In this image of the ribosome, RNA is gray, proteins are yellow. The active site is marked with a bright light. Which came first here-
protein or RNA?


But what kind of setting would have been needed for RNA to appear? Was metabolism needed? Does genetics come first, or does metabolism come first? If one means a cyclic system of organic transformations encoded by protein or RNA enzymes, then obviously genetics had to come first. But if one means a mess of organic chemicals that allowed some RNA to be made and provide modest direction to its own chemical fate, and to a few other reactions, then yes, those chemicals had to come first. A great deal of work has been done speculating what kind of peculiar early earth conditions might have been conducive to such chemistries. Hydrothermal vents, with their constant input of energy, and rich environment of metallic catalysts? Clay particles, with their helpful surfaces that can faux-crystalize formation of RNAs? Warm ponds, hot ponds, UV light.... the suggestions are legion. The main thing to realize is that early earth was surely highly diverse, had a lot of energy, and had lots of carbon, with a CO2-rich atmosphere. UV would have created a fair amount of carbon monoxide, which is the feedstock of the Fischer-Tropsch reactions that create complex organic compounds, including lipids, which are critical for formation of cells. Early earth very likely had pockets that could produce abundant complex organic molecules.

Thus early life was surely heterotrophic, taking in organic chemicals that were given by the ambient conditions for free. And before life really got going, there was no competition- there was nothing else to break those chemicals down, so in a sort of chemical pre-Darwinian setting, life could progress very slowly (though RNA has some instability in water, so there are limits). Later, when some of the scarcer chemicals were eaten up by other already-replicating life forms, then the race was on to develop those enzymes, of what we now recognize as metabolism, which could furnish those chemicals out of more common ingredients. Onwards the process then went, hammering out ever more extensive metabolic sequences to take in what was common and make what was precious- those ribose sugars, or nucleoside rings that originally had arrived for free. The first enzymes would have been made of RNA, or metals, or whatever was at hand. It was only much later that proteins, first short, then longer, came on the scene as superior catalysts, extensively assisted by metals, RNAs, vitamins, and other cofactors.

Where did the energy for all this come from? To cross the first threshold, only chemicals (which embodied outside energy cycles) were needed, not energy. Energy requirements accompanied the development of metabolism, as the complex chemicals become scarcer and they needed to be made internally. Only when the problem of making complex organic chemicals from simpler ones presented itself did it also become important to find some separate energy source to do that organic chemistry. Of course, the first complex chemicals absolutely needed were copies of the original RNA molecules. How that process was promoted, through some kind of activated intermediates, remains particularly unclear.

All this happened long before the last universal common ancestor, termed "LUCA", which was already an advanced cell just prior to the split into the archaeal and bacterial lineages, (much later to rejoin to create the most amazing form of life- eukaryotes). There has been quite a bit of analysis of LUCA to attempt to figure out the basic requirements of life, and what happened at the origin. But this ("top-down") approach is not useful. The original form of life was vastly more primitive, and was wholly re-written in countless ways before it became the true bacterial cell, and later still, LUCA. Only the faintest traces remain in our RNA-rich biochemistry. Just think about the complexity of the ribosome as an RNA catalyst, and one can appreciate the ragged nature of the RNA world, which was probably full of similar lumbering catalysts for other processes, each inefficient and absurdly wasteful of resources. But it could reproduce in Darwinian fashion, and thus it could improve. 

Today we find on earth a diversity of environments, from the bizarre mineral-driven hydrothermal vents under the ocean to the hot springs of Yellowstone. The geology of earth is wondrously varied, making it quite possible to credit one or more of the many theories of how complex organic molecules may have become a "soup" somewhere on the early Earth. When that soup produces ribose sugars and the other rudiments of RNA, we have the makings of life. The many other things that have come to characterize it, such as lipid membranes and metabolism of compounds are fundamentally secondary, though critically important for progress beyond that so-pregnant moment.