Showing posts with label naturalism. Show all posts
Showing posts with label naturalism. Show all posts

Saturday, March 15, 2025

Eccentricity, Obliquity, Precession, and Glaciation

The glacial cycles of the last few million years were highly determined by earth's orbital mechanics.

Naturalism as a philosophy came into its own when Newton explained the heavens as a machine, not a pantheon. It was stunning to realize that age-old mysteries were thoroughly explicable and that, if we kept at it with a bit of diligence and intellectual openness, we could attain ever-widening vistas of understanding, which now reach to the farthest reaches of the universe. 

In our current day, the mechanics of Earth's climate have become another example of this expansion of understanding, and, sadly, another example of resistance to naturalism, to scientific understanding, and ultimately to the stewardship of our environment. It has dawned on the scientific community (and anyone else willing to look) over the last few decades that our industrial production of CO2 is heating the climate, and that it needs to stop if the biosphere is to be saved from an ever-more degrading crisis. But countervailing excuses and interests abound, and we are now ruled by an adminstration in the US whose values run toward lies and greed, and which naturally can not abide moral responsibility.

The Cenozoic, our present age after the demise of the dinosaurs, has been characterized by falling levels of CO2 in the atmosphere. This has led to a progression from very warm climates 50 mya (million years ago) to ice ages beginning roughly 3 mya. The reasons for this are not completely clear. There has been a marked lack of vocanism, which is one of main ways CO2 gets back into the atmosphere. This contrasts strongly with ages of extreme volcanism like the Permian-Triassic boundary and extinction events, about 250 mya. It makes one think that the earth may be storing up a mega-volcanic event for the future. Yeet plate tectonics has kept plugging along, and has sent continents to the poles, where they previously hung out in more equatorial locations. That makes ice ages possible, giving glaciers something to glaciate, rather than letting ocean circulation keep the poles temperate. Additionally, the uplift of the Himalayas has dramatically increased rock exposure and weathering, which is the main driver of CO2 burial, by carbonate formation. And on top of all that has been the continued evolution of plant life, particularly the grasses, which have extra mechanisms to extract CO2 out of the atmosphere.

CO2 in the atmosphere has been falling through most of the Cenozoic.

All this has led to the very low levels of CO2 in the atmosphere, which have been stable at about 300 ppm over the last million years, very gradually declining prior to that time. Now we are pushing 420 ppm and beyond, which the biosphere has not seen for ten million years or more, and doing so at speeds that no amount of evolution can accommodate. The problem is clear enough, once the facts are laid out.

But what about those glaciations, which have been such a dramatic and influential feature of Earth's climate over the last few million years? They have followed a curious periodicity, advancing and retreating repeatedly over this time. Does that have anything to do with CO2? It turns out that it does not, and we have to turn our eyes to the heavens again for an explanation. It was Milankovitch, a century ago, who first solidified the theory that the changing orbital parameters of Earth, and particularly the intensity of the sun in the Northern hemisphere, where most of the land surface of Earth lies, that causes this repetitive climatic behavior.  

Cycles of orbital parameters and glaciation, over a million years.

It was in 1976 that a more refined analysis put a mathematical model and better data behind the Milankovitch cycles, showing that one major element of our orbit around the sun- the variation of eccentricity- had the greatest overall effect on the 100,000 year periodicity of recent glacial cycles. Eccentricity is how skewed our orbit is from round-ness, which varies slightly over time, due to interactions with other planets. Secondly, the position of the Earth's tilt at various points of this eliptical orbit, whether closer to the sun in northern summer, or father away, has critical effects on net solar input and on glaciation. The combined measure is called the precessional index, expressing the earth-sun distance in June. The eccentricity itself has a period of about 93,000 years, and the precessional index has a periodicity of 21,000 years. As glacial cycles over the last 800,000 years have had a strong 100,000 year periodicity, it is clearly the eccentricity alone that has the strongest single effect.

Lastly, there is also the tilt of the Earth, called obliquity, which varies slightly with a 40,000 year cycle. A recent paper made a major claim that they had finally solved the whole glaciation cycle in more detail than previously, by integrating all these cycles into a master algorithm for when glaciations start/end. They were curious about exactly what drives the deglaciation phase, within the large eccentricity-driven energetic cycle. The rule they came up with, again using better data and more complicated algorithms, is that it reaches its maximum rate when, after a minimum of eccentricity, the precession parameter (the purple line, below) has reached a peak, and the obliquity parameter (the green line, below) is rising. That is, when the Earth's degree of tilt and closeness to the sun in Norther summer are mutually reinforcing. There are also lags built into this, since it takes one or two thousand years for these orbital effects to build heat up in the climate system, a bit like spring happening annually well after the equinox.

"We find that the set of precession peaks (minima) responsible for terminations since 0.9 million years ago is a subset of those peaks that begin (i.e., the precession parameter starts decreasing) while obliquity is increasing. Specifically, termination occurs with the first of these candidate peaks to occur after each eccentricity minimum."

 

 

Summary diagram from Barker, et al. At the very top is a synopsis of the orbital variables. At bottom are the glacial cycles, marked with yellow dots (maximum slope of deglaciation), red dots (maximum extent of deglaciation) and blue dots (maximum slope of reglaciation, also called inception). Above this graph is an analysis of the time spans between the yellow and red dots, showing the strength of each deglaciation (gray double arrows). They claim that this strength is proportion to an orbita parameter illustrated above with the T-designation of each glacial cycle. This parameter is precession lagged by obliquity. Finally in the upper graph, the orbital cycles are shown directly, especially including eccentricity in gray, and the time points of the yellow nodes are matched here with purple nodes, lagged with the preceeding (by ~2,000 years) rising obliquity as an orange node. The green verticle bars were applied by me to ease the clear correlation of eccentricity maxima vs deglaciation maxima.

I have to say that the communication of this paper is not crystal clear, and the data a bit iffy. The T5 deglaciation, for instance, which is relatively huge, comes after a tiny minimum of eccentricity and at a tiny peak of precession, making the scale of the effect hard to understand from the scale of the inputs. T3 shows the opposite, with large inputs yielding a modest, if extended, deglacial cycle. And the obliquity values that are supposed to drive the deglaciation events are quite scattered over their respective cycle. But I take their point that ultimately, it is slight variations in the solar inputs that drive these cycles, and we just need to tease out / model the details to figure out how it works.

There is another question in the field, which is that, prior to 800,000 years ago, glacial cycles were much less dramatic, and had a faster cadence of about 40,000 years. This is clearly more lined up with the obliquity parameter as a driver. So while obliquity is part of the equation in the recent period, involved in triggering deglaciation, it was the primary driver a million years ago, when CO2 levels were perhaps slightly higher and the system didn't need the extra push from eccentricity to cycle milder glaciations. Lastly, why are the recent glacial cycles so pronounced, when the orbital forcing effects are so small and take thousands of years to build up? Glaciation is self-reinforcing, in that higher reflectivity from snow / ice drives down warming. Conversely, retreat of glaciers can release large amounts of built-up methane and other forms of carbon from permafrost, continental shelves, the deep ocean, etc. So there may be some additional cycle, such as a smaller CO2 or methane cycle, that halts glaciation at its farthest extent- that aspect remains a bit unclear.

Overall, the earlier paper of Hays et al. found that summer insolation varies by at most 10% over Earth's various orbital cycles. That is not much, yet it drives glaciation of ice sheets thousands of feet thick, and reversals back to deglaciation that uncovers bare rock all over the far north. It shows that Earth's climate is extremely sensitive to small effects. The last time CO2 was as high as it is now, (~16 mya), Greenland was free of ice. We are heading in that direction very rapidly now, in geological terms. Earth has experienced plenty of catastrophes in the past, even some caused biologically, such as the oxygenation of the atmosphere. But this, what we are doing to the biosphere now, is something quite new.


  • That new world order we were working on...
  • Degradation and corruption at FAA.. what could go wrong?
  • Better air.
  • Congress has the power, should it choose to use it.
  • Ongoing destruction, degradation.
  • Oh, Canada!

Saturday, March 1, 2025

The Train Tracks of Synapsis

Structures that align and tether the chromosomes in meiosis are now understood in some molecular detail.

It has been one of the wonders of biology- the synaptonemal complex that aligns homologous chromosomes during meiosis. While chromosomes regularly line up in the middle of the cell during mitosis, so that they can be evenly divided between the daughter cells, in this process they only have to join at their centromeres, which get dragged to the midline of the cell, and then pulled back apart at cell division. In meiosis, on the other hand, not only do the sister chromosomes that have just replicated stick together at their centromeres, but the homologous chromosomes, which have never bothered about each other since sperm fused with egg, suddenly seek each other out and pair up in an elaborate dance of DNA breakage, alignment, cross-over, and repair. Then in the first division, these cross-over-joined homologs line up at the midline and get pulled apart as their crossovers are repaired. The second division follows, much more like mitosis, where the duplicated sister chromosomes line up at the midline based on their centromere attachments, and then separate into haploid gametes.

Comparison of mitosis vs meiosis, which goes through an extra division and alternate chrosomosome pairing and separation processes in the firsts division.

The two divisions are fundamentally different, with the first involving novel chromosome pairings and attachments. The opening act of all this, which I won't go into further, is a sprinkling of ~400 DNA strand breaks induced specifically all over the genome, which sets up a repair process at each site, where the chromosomes (using Rad51) seek out good copies of the damaged DNA- that is, another, matching, DNA molecule. There are specific processes that appear to prevent use of the recently replicated "sister", which would be the most closely identical copy that could be used. Instead, there is a bias to use the "homologous" copy from the other parent. But these homologous chromosomes have just been replicated as well. How to line all this up so that the chromosomes all line up neatly and separate neatly during the first meiotic division? The answer is the synaptonemal complex.

Schematic of the synaptonemal complex joining two homologous chromosomes. The lateral elements are on each side, and the central element lines up the center. Crossing the gap is the transverse elements, now known to be composed of the SYCP1 protein. At bottom is a diagram from its atomic structure of how SYCP1 coils together, and how its ends join to zip up the synaptonemal gap. 

This is a train track of connecting proteins between the homologous chromosomes. It is evident that the DNA breaks come first, followed by the search for matching homologs, followed by the radiating and progressive assembly of the synaptonemal complex out from the break repair sites. The components of its major structures have been mostly characterized- the lateral element where the DNA loops line up; the transverse element that spans the gap between the homologous chromosomes, and the central element, proteins at the midline that help the transverse elements assemble. A paper from 2023 characterized the transverse element protein, SYCP1, which is a long coil of a protein that dimerizes to make a strong coil, and then dimerizes again head-to-head to create the symmetric bridge over the whole width of the synaptonemal complex. Which is about 100 nanometers in width. 

These authors then focus on a series of experiments using key mutations at the dimer-dimer head-to-head interaction area, to demonstrate how this head-to-head zippering works in detail. Mutating just two amino acids in this contact region eliminates the head-to-head interaction, making synapsis impossible. In these cases, the homologous chromosomes (from mice) remain in proximity, especially at crossover sites, but are no longer zippered up and closely aligned.

Spreads of mouse meiotic chromosomes, labeled as shown with antibodies against two synaptonemal proteins. From the top, wild-type SYCP1, then single individual mutations in the end-joining region, and at bottom SYCP1 with two point mutations that eliminate its function entirely. The chromosomes at the bottom are aligned only by virtue of their crossover points, but not by a zippered up synaptonemal complex. Needless to say, mice like this are not fertile.


Thus what was once a hazy mystery in the highest power microscopes has been defined in molecular terms, highlighting once again the power of curiosity, and the essentially moral aim of truth-seeking- to reveal what is true, rather than dictate it. But who cares about all that? Truth, knowledge, science... these values are now not only in question, but under active attack. Who is making America great, and who is diminishing it? Those in our institutions of power who have a voice will hopefully see the consequences and act on them, before our history and values are entirely corrupted.


  • Sociopaths at work.
  • Evidently the model is that we become a version of China/Russia, and make a tripolar world. Not a little Orwellian. And who knows, perhaps we will offer Russia a deal to partition Canada. That is, after we get done partitioning Ukraine.
  • A black day.
  • Oh, wait, the next day was even worse.
  • Shades of Stalin, with a sad sartorial hat-tip to Steve Jobs.
  • Unlawful and vindictive destruction at the NIH, and of biological research in general.
  • And all for love.

Saturday, February 15, 2025

Cloudy, With a Chance of RNA

Long RNAs play structural and functional roles in regulation of chromosome replication and expression.

One of the wonderful properties of the fruit fly as a model system of genetics and molecular biology has been its polytene chromosomes. These are hugely expanded bundles of chromosomes, replicated thousands of times, which have been observed microscopically since the late 1800's. They exist in the larval salivary gland, where huge amounts of gene expression are needed, thus the curious evolutionary solution of expanding the number of templates, not only of the gene needed, but of the entire genome. 

These chromosomes where closely mapped and investigated, almost like runic keys to the biology of the fly, especially in the day before molecular biology. Genetic translocations, loops, and other structural variations could be directly observed. The banding patterns of light, dark, expanded, and compressed regions were mapped in excruciating detail, and mapped to genetic correlates and later to gene expression patterns. These chromosomes provided some of the first suggestions of heterochromatin- areas of the genome whose expression is shut down (repressed). They may have genes that are shut off, but they may also be structural components, such as centromeres and telomeres. These latter areas tend to have very repetitive DNA sequences, inherited from old transposons and other junk. 

A diagram of polytene chromosomes, bunched up by binding at the centromeres. The banding pattern is reproducible and represents differences in proteins bound to various areas of the genome, and gene activity.

It has become apparent that RNA plays a big role in managing these areas of our chromosomes. The classic case is the XIST RNA, which is a long (17,000 bases) non-coding RNA that forms a scaffold by binding to lots of "heterogeneous" RNA-binding proteins, and most importantly, stays bound near the site of its creation, on the X chromosome. Through a regulatory cascade that is only partly understood, the XIST RNA is turned off on one of the X chromosomes, and turned on the other one (in females), leading the XIST molecule to glue itself to its chromosome of origin, and then progressively coat the rest of that chromosome and turn it off. That is, one entire X is turned into heterochromatin by a process that requires XIST scaffolding all along its length. That results in "dosage compensation" in females, where one X is turned off in all their cells, allowing dosage (that is, the gene expression) of its expressed genes to approximate those of males, despite the presence of the extra X chromosome. Dosage is very important, as shown by Down Syndrome, which originates from a duplication of one of the smallest human chromosomes, creating imbalanced gene dosage.

A recent paper described work on "ASAR" RNAs, which similarly arise from highly repetitive areas of human chromosomes, are extremely long (180,000 bases), and control expression and chromosome replication in an allele-specific way on (at least) several non-X chromosomes. These RNAs, again, like XIST, specifically bind a bunch of heternuclear binding proteins, which is presumably central to their function. Indeed, these researchers dissected out the 7,000 base segment of ASAR6 that is densest in protein binding sites, and find that, when transplanted into a new location, this segment has dramatic effects on chromosome condensation and replication, as shown below.

The intact 7,000 base core of ASAR6 was transplanted into chromosome 5, and mitotic chromosomes were spread and stained. The blue is a general DNA stain. The green is a stain for newly synthesized DNA, and the red is a specific probe for the ASAR6 sequence. One can see on the left that this chromosome 5 is replicating more than any other chromosome, and shows delayed condensation. In contrast, the right frame shows a control experiment where an anti-sense version of the ASAR6 7,000 base core was transplanted to chromosome 5. The antisense sequence not only does not have the wild-type function, but also inhibits any molecule that does by tightly binding to it. Here, the chromosome it resides on (arrows) is splendidly condensed, and hardly replicating at all (no green color).


Why RNA? It has become clear over the last two decades that our cells, and particularly our nuclei, are swimming with RNAs. Most of the genome is transcribed in some way or other, despite a tiny proportion of it coding for anything. 95% of the RNAs that are transcribed never get out of the nucleus. There has been a growing zoo of different kinds of non-coding RNAs functioning in translational control, ribosomal maturation, enhancer function, and here, in chromosome management. While proteins tend to be compact bundles, RNAs can be (as these ASARs are) huge, especially in one dimension, and thus capable of physically scaffolding the kinds of structures that can control large regions of chromosomes.

Chromosomes are sort of cloudy regions in our cells, long a focus of observation and clearly also a focus of countless proteins and now RNAs that bind, wind, disentangle, transcribe, replicate, and congregate around them. What all these RNAs and especially the various heteronuclear proteins actually do remains pretty unclear. But they form a sort of organelle that, while it protects and manages our DNA, remarkably also allows access to it for sequence-specific binding proteins and the many processes that plow through it.

"In addition, recent studies have proposed that abundant nuclear proteins such as HNRNPU nonspecifically interact with ‘RNA debris’ that creates a dynamic nuclear mesh that regulates interphase chromatin structure."


Saturday, February 8, 2025

Sugar is the Enemy

Diabetes, cardiovascular health, and blood glucose monitoring.

Christmas brought a book titled "Outlive: The Science and Art of Longevity". Great, I thought- something light and quick, in the mode Gweneth Paltrow or Deepak Chopra. I have never been into self-help or health fad and diet books. Much to my surprise, however, it turned out to be a rather rigorous program of preventative medicine, with a side of critical commentary on our current medical system. A system that puts various thresholds, such as blood sugar and blood pressure, at levels that represent serious disease, and cares little about what led up to them. Among the many recommendations and areas of focus, blood glucose levels stand out, both for their pervasive impact on health and aging, and also because there are new technologies and science that can bring its dangers out of the shadows.

Reading: 

Where do cardiovascular problems, the biggest source of mortality, come from? Largely from metabolic problems in the control of blood sugar. Diabetics know that uncontrolled blood sugar is lethal, on both the acute and long-terms. But the rest of us need to realize that the damage done by swings in blood sugar are more insidious and pervasive than commonly appreciated. Both microvascular (what is commonly associated with diabetes, in the form of problems with the small vessels of the kidney, legs, and eyes) and macrovascular (atherosclerosis) are due to high and variable blood sugar. The molecular biology of this was impressively unified in 2005 in the paper above, which argues that excess glucose clogs the mitochondrial respiration mechanisms. Their membrane voltage maxes out, reactive forms of oxygen accumulate, and glucose intermediates pile up in the cell. This leads to at least four different and very damaging consequences for the cell, including glucose modification (glycation) of miscellaneous proteins, a reduction of redox damage repair capacity, inflammation, and increased fatty acid export from adipocytes to endothelial (blood vessel) cells. Not good!

Continuous glucose monitored concentrations from three representative subjects, over one day. These exemplify the low, moderate, and severe variability classes, as defined by the Stanford group. Line segments are individually classed as to whether they fall into those same categories. There were 57 subject in the study, of all ages, none with an existing diagnosis of diabetes. Yet five of them had diabetes by traditional criteria, and fourteen had pre-diabetes by those criteria. By this scheme, 25 had severe variability as their "glucotype", 25 had moderate variability, and only 7 had low variability. As these were otherwise random subjects selected to not have diabetes, this is not great news about our general public health, or the health system.

Additionally, a revolution has occurred in blood glucose monitoring, where anyone can now buy a relatively simple device (called a CGM) that gives continuous blood glucose monitoring to a cell phone, and associated analytical software. This means that the fasting blood glucose level that is the traditional test is obsolete. The recent paper from Stanford (and the literature it cites) suggests, indeed, that it is variability in blood glucose that is damaging to our tissues, more so than sustained high levels.

One might ask why, if blood glucose is such a damaging and important mechanism of aging, hasn't evolution developed tighter control over it. Other ions and metabolites are kept under much tighter ranges. Sodium ranges between 135 to 145 mM, and calcium from 8.8 to 10.7 mM. Well, glucose is our food, and our need for glucose internally is highly variable. Our livers are tiny brains that try very hard to predict what we need, based on our circadian rhythms, our stress levels, our activity both current and expected. It is a difficult job, especially now that stress rarely means physical activity, and nor does travel, in our automobiles. But mainly, this is a problem of old age, so evolution cares little about it. Getting a bigger spurt of energy for a stressful event when we, in our youth, are in crisis may, in the larger scheme of things, outweigh the slow decay of the cardiovascular system in old age. Not to mention that traditional diets were not very generous at all, certainly not in sugar and refined carbohydrates.


Saturday, February 1, 2025

Proving Evolution the Hard Way

Using genomes and codon ratios to estimate selective pressures was so easy... why is it not working?

The fruits of evolution surround us with abundance, from the tallest tree to the tiniest bacterium, and the viruses of that bacterium. But the process behind it is not immediately evident. It was relatively late in the enlightenment before Darwin came up with the stroke of insight that explained it all. Yet that mechanism of natural selection remains an abstract concept requiring an analytical mind and due respect for very inhuman scales of the time and space in play. Many people remain dumbfounded, and in denial, while evolutionary biology has forged ahead, powered by new discoveries in geology and molecular biology.

A recent paper (with review) offered a fascinating perspective, both critical and productive, on the study of evolutionary biology. It deals with the opsin protein that hosts the visual pigment 11-cis-retinal, by which we see. The retinal molecule is the same across all opsins, but different opsin proteins can "tune" the light wavelength of greatest sensitivity, creating the various retinal-opsin combinations for all visual needs, across the cone cells and rod cells. This paper considered the rhodopsin version of opsin, which we use in rod cells to perceive dim light. They observed that in fish species, the sensitivity of rhodopsin has been repeatedly adjusted to accommodate light at different depths of the water column. At shallow levels, sunlight is similar to what we see, and rhodopsin is tuned to about 500 nm, while deeper down, when the light is more blue-ish, rhodopsin is tuned towards about 480 nm maximum sensitivity. There are also special super-deep fish who see by their own red-tinged bioluminescence, and their rhodopsins are tuned to 526 nm. 

This "spectrum" of sensitivities of rhodopsin has a variety of useful scientific properties. First, the evolutionary logic is clear enough, matching the fish's vision to its environment. Second, the molecular structure of these opsins is well-understood, the genes are sequenced, and the history can be reconstructed. Third, the opsin properties can be objectively measured, unlike many sequence variations which affect more qualitative, difficult-to-observe, or impossible-to-observe biological properties. The authors used all this to carefully reconstruct exactly which amino acids in these rhodopsins were the important ones that changed between major fish lineages, going back about 500 million years.

The authors' phylogenetic tree of fish and other species they analyzed rhodopsin molecules from. Note how mammals occupy the bottom small branch, indicating how deeply the rest of the tree reaches. The numbers in the nodes indicate the wavelength sensitivity of each (current or imputed) rhodopsin. Many branches carry the author's inference, from a reconstructed and measured protein molecule, of what precise changes happened, via positive selection, to get that lineage.

An alternative approach to evolutionary inference is a second target of these authors. That is a codon-based method, that evaluates the rate of change of DNA sites under selection versus sites not under selection. In protein coding genes (such as rhodopsin), every amino acid is encoded by a triplet of DNA nucleotides, per the genetic code. With 64 codons for ~20 amino acids, it is a redundant code where many DNA changes do not change the protein sequence. These changes are called "synonymous". If one studies the rate of change of synonymous sites in the DNA, (which form sort of a control in the experiment), compared with the rate of change of non-synonymous sites, one can get a sense of evolution at work. Changing the protein sequence is something that is "seen" by natural selection, and especially at important positions in the protein, some of which are "conserved" over billions of years. Such sites are subject to "negative" selection, which to say rapid elimination due to the deleterious effect of that DNA and protein change.

Mutations in protein coding sequence can be synonymous, (bottom), with no effect, or non-synonymous (middle two cases), changing the resulting protein sequence and having some effect that may be biologically significant, thus visible to natural selection.


This analysis has been developed into a high art, also being harnessed to reveal "positive" selection. In this scenario, if the rate of change of the non-synonymous DNA sites is higher than that of the synonymous sites, or even just higher than one would expect by random chance, one can conclude that these non-synonymous sites were not just not being selected against, but were being selected for, an instance of evolution establishing change for the sake of improvement, instead of avoiding change, as usual.

Now back to the rhodopsin study. These authors found that a very small number of amino acids in this protein, only 15, were the ones that influenced changes to the spectral sensitivity of these protein complexes over evolutionary time. Typically only two or three changes occurred over a shift in sensitivity in a particular lineage, and would have been the ones subject to natural selection, with all the other changes seen in the sequence being unrelated, either neutral or selected for other purposes. It is a tour de force of structural analysis, biochemical measurement, and historical reconstruction to come up with this fully explanatory model of the history of piscene rhodopsins. 

But then they went on to compare what they found with what the codon-based methods had said about the matter. And they found that there was no overlap whatsover. The amino acids identified by the "positive selection" codon based methods were completely different than the ones they had found by spectral analysis and phylogenetic reconstruction over the history of fish rhodopsins. The accompanying review is particularly harsh about the pseudoscientific nature of this codon analysis, rubbishing the entire field. There have been other, less drastic, critiques as well.

But there is method to all this madness. The codon based methods were originally conceived in the analysis of closely related lineages. Specifically, various Drosophia (fly) species that might have diverged over a few million years. On this time scale, positive selection has two effects. One is that a desirable amino acid (or other) variation is selected for, and thus swept to fixation in the population. The other, and corresponding effect, is that all the other variations surrounding this desirable variation (that is, which are nearby on the same chromosome) are likewise swept to fixation (as part of what is called a haplotype). That dramatically reduces the neutral variation in this region of the genome. Indeed, the effect on neutral alleles (over millions of nearby base pairs) is going to vastly overwhelm the effect from the newly established single variant that was the object of positive selection, and this imbalance will be stronger the stronger the positive selection. In the limit case, the entire genomes of those without the new positive trait/allele will be eliminated, leaving no variation at all.

Yet, on the longer time scale, over hundreds of millions of years, as was the scope of visual variation in fish, all these effects on the neutral variation level wash out, as mutation and variation processes resume, after the positively selected allele is fixed in the population. So my view of this tempest in an evolutionary teapot is that these recent authors (and whatever other authors were deploying codon analysis against this rhodopsin problem) are barking up the wrong tree, mistaking the proper scope of these analyses. Which, after all, focus on the ratio between synonymous and non-synonymous change in the genome, and thus intrinsically on recent change, not deep change in genomes.


  • That all-American mix of religion, grift, and greed.
  • Christians are now in charge.
  • Mechanisms of control by the IMF and the old economic order.
  • A new pain med, thanks to people who know what they are doing.

Saturday, January 18, 2025

Eeking Out a Living on Ammonia

Some archaeal microorganisms have developed sophisticated nano-structures to capture their food: ammonia.

The earth's nitrogen cycle is a bit unheralded, but critical to life nonetheless. Gaseous nitrogen (N2) is all around us, but inert, given its extraordinary chemical stability. It can be broken down by lightning, but little else. It must have been very early in the history of life that the nascent chemical-biological life forms tapped out the geologically available forms of nitrogen, despite being dependent on nitrogen for countless critical aspects of organic chemistry, particularly of nucleic acids, proteins, and nucleotide cofactors. The race was then on to establish a way to capture it from the abundant, if tenaciously bound, dinitrogen of the air. It was thus very early bacteria that developed a way (heavily dependent, unsurprisingly, on catalytic metals like molybdenum and iron) to fix nitrogen, meaning breaking up the triple N≡N bond, and making ammonia, NH3 (or ammonium, NH4+). From there, the geochemical cycle of nitrogen is all down-hill, with organic nitrogen being oxidized to nitric oxide (NO), nitrite (NO2-), nitrate (NO3), and finally denitrification back to N2. Microorganisms obtain energy from all of these steps, some living exclusively on either nitrite or nitrate, oxidizing them as we oxidize carbon with oxygen to make CO2. 

Nitrosopumilus, as imaged by the authors, showing its corrugated exterior, a layer entirely composed of ammonia collecting elements (can be hexameric or pentameric). Insets show an individual hexagonal complex, in face-on and transverse views. Note also the amazing resolution of other molecules, such as the ribosomes floating about.

A recent paper looked at one of these denizens beneath our feet, an archaeal species that lives on ammonia, converting it to nitrite, NO2. It is a dominant microbe in its field, in the oceans, in soils, and in sewage treatment plants. The irony is that after we spend prodigious amounts of fossil fuels fixing huge amounts of nitrogen for fertilizer, most of which is wasted, and which today exceeds the entire global budget of naturally fixed nitrogen, we are faced with excess and damaging amounts of nitrogen in our effluent, which is then processed in complex treatment plants by our friends the microbes down the chain of oxidized states, back to gaseous N2.

Calculated structure of the ammonia-attracting pore. At right are various close-up views including the negatively charged amino acids (D, E) concentrated at the grooves of the structure, and the pores where ammonium can transit to the cell surface. 

The Nitrosopumilus genus is so successful because it has a remarkable way to capture ammonia from the environment, a way that is roughly two hundred times more efficient than that of its bacterial competitors. Its surface is covered by a curious array of hexagons, which turn out to be ammonia capture sites. In effect, its skin is an (relatively) enormous chemical antenna for ammonia, which is naturally at low concentration in sea water. These authors do a structural study, using the new methods of particle electron microscopy, to show that these hexagons have intensely negatively charged grooves and pores, to which positively charged ammonium ions are attracted. Within this outer shell, but still outside the cell membrane, enzymes at the cell surface transform the captured ammonium to other species such as hydroxylamine, which enforces the ammonium concentration gradient towards the cell surface, and which are then pumped inside.

Cartoon model of the ammonium attraction and transit mechanisms of this cell wall. 

It is a clever nano-material and micro-energetic system for concentrating a specific chemical- a method that might inspire human applications for other chemicals that we might need- chemicals whose isolation demands excessive energy, or whose geologic abundance may not last forever.


Saturday, January 4, 2025

Drilling Into the Transcriptional Core

Machine learning helps to tease out the patterns of DNA at promoters that initiate transcription.

One of the holy grails of molecular biology is the study of transcriptional initiation. While there are many levels of regulation in cells, the initiation of transcription is perhaps, of all of them, the most powerful. An organism's ability to keep the transcription of most genes off, and turn on genes that are needed to build particular tissues, and regulate others in response to other urgent needs, is the very soul of how multicellular organisms operate. The decision to transcribe a gene into its RNA message (mRNA) represents a large investment, as that transcript can last hours or more and during that time be translated into a great many protein copies. Additionally, this process identifies where, in the otherwise featureless landscape of genomic DNA, genes are located, which is another significant process, one that it took molecular biologists a long time to figure out.

Control over transcription is generally divided into two conceptual and physical regions- enhancers and promoters. Enhancers are typically far from the start site of transcription, and are modules of DNA sequences that bind innumerable regulatory proteins which collectively tune, in fine and rough ways, initiation. Promoters, in contrast, are at the core and straddle the start site of transcription (TSS, for short). They feature a much more limited set of motifs in the DNA sequence. The promoter is the site where the proteins bound to the various enhancers converge and encourage the formation of a "preinitiation complex", which includes the RNA polymerase that actually carries out transcription, plus a lot of ancillary proteins. The RNA polymerase can not initiate on its own or find a promoter on its own. It requires direction by the regulatory proteins and their promoter targets before finding its proper landing place. So the study of promoter initiation and regulation has a very long history, as a critical part of the central flow of information in molecular biology, from DNA to protein.

A schematic of a promoter, where initiation of transcription of Gene A, happens, with the start site (+1) right at the boundary of the orange and green colors. At this location, the RNA polymerase will melt the DNA strands, and start synthesizing an RNA strand using the (bottom) template strand of the DNA. Regulatory proteins bound to enhancers far away in the genomic DNA bend through space to activate proteins bound at the core promoter to load the polymerase and initiate this process.

A recent paper provided a novel analysis of promoter sequences, using machine learning to derive a relatively comprehensive account of the relevant sequences. Heretofore, many promoters had been dissected in detail and several key features found. But many human promoters had none of them, showing that our knowledge was incomplete. This new approach started strictly from empirical data- the genome sequence, plus large experimental compilations of nascent RNAs, as they are expressed in various cells, and mapped to the precise base where they initiated from- that is, their respective TSS. These were all loaded into a machine learning model that was supplemented with explanatory capabilities. That is, it was not just a black box, but gave interpretable results useful to science, in the form of small sequence signatures that it found are needed to make particular promoters work. These signatures presumably bind particular proteins that are the operational engines of regulatory integration and promoter function.

The TATA motif, found about 30 base pairs upstream of the transcription start site in many promoters. This is a motif view, where the statistical prevalence of the base is reflected in the height of the letter (top, in color) and its converse is reflected below in gray. Regular patterns like this found in DNA usually mean that some protein typically binds to this site, in this case TFIID.


For example, the grand-daddy of them all is the TATA box, which dates back to bacteria / archaea and was easily dug up by this machine learning system. The composition of the TATA box is shown above in a graphical form, where the probability of occurrence (of a base in the DNA) is reflected in height of the base over the axis line. A few G/C bases surround a central motif of T/A, and the TSS is typically 30 base pairs downstream. What happens here is that one of the central proteins of the RNA polymerase positioning complex, TFIID, binds strongly to this sequence, and bends the DNA here by ninety degrees, forming a launchpad of sorts for the polymerase, which later finds and opens DNA at the transcription start site. TFIID and the TATA box are well known, so it certainly is reassuring that this algorithmic method recovered it. TATA boxes are common at regulated promoters, being highly receptive to regulation by enhancer protein complexes. This is in contrast to more uniformly expressed (housekeeping) genes which typically use other promoter DNA motifs, and incidentally tend to have much less precise TSS positions. They might have start sites that range over a hundred base pairs, more or less stochastically.

The main advance of this paper was to find more DNA sites, and new types of sites, which collectively account for the positioning and activation of all promoters in humans. Instead of the previously known three or four factors, they found nine major DNA sequences, and a smattering of weaker patterns, which they combine into a predictive model that matches empirical data. Most of these DNA sequences were previously known, but not as part of core promoters. For example, one is called YY1, because it binds the YY1 protein, which has long been appreciated to be a transcriptional repressor, from enhancer positions. But now it turns out to also be core promoter participant, identifying and turning on a class of promoters that, as for most of the new-found sequence elements, tend to operate genes that are not heavily regulated, but rather universally expressed and with delocalized start sites. 

Motifs and initiator elements found by the current work. Each motif, presumably matched by a protein that binds it, gets its own graph of relation of the motif location (at 0 on the X axis) vs the start site of transcription that it directs, which for TATA is about 30 base pairs downstream. Most of the newly discovered motifs are bi-directional, directing start sites and transcription both upstream and downstream. This wastes a lot of effort, as the upstream transcripts are typically quickly discarded. The NFY motif has an interesting pattern of 10.5 bp periodicity of its directed start sites, which suggests that the protein complex that binds this site hugs one side of the DNA quite closely, setting up start sites on that side of the helix.

Secondly, these authors find that most of the new sequences they identify have bidirectional effects. That is, they set up promoters to fire in both directions, both typically about forty base pairs downstream and also upstream from their binding site. This explains a great deal of transcription data derived from new sequencing technologies, which shows that many promoters fire in both directions, even though the "upstream" or non-gene side transcript tends to be short-lived.


Overview of the new results, summarized by type of DNA sequence pattern. The total machine learning prediction was composed of predictions for larger motifs, which were the dominant pattern, plus a small contribution from "initiators", which comprise a few patterns right at the start site, plus a large but diffuse contribution from tiny trinucleotide patterns, such as the CG pattern known to mark active genes and carry activating DNA methylation marks.


A third finding was the set of trinucleotide motifs that serve as the sort of fudge factor for their machine learning model, filling in details to make the match to empirical data come out better. The length was set more or less arbitrarily, but they play a big part in the model fit. They note that one common example is the CG pattern, which is one of the stronger trinucleotide motifs. This pattern is known as CpG, and is the target of chemical methylation of DNA by regulatory enzymes, which helps to mark and regulate genes. The current work suggests that there may be more systems of this kind yet to be discovered, which play a modulating role in gene/promoter selection and activation.

The accuracy of this new learning and modeling system exemplifies some of the strengths of AI, of which machine learning is a sub-discipline. When there is a lot of data available, and a problem that is well defined and on the verge of solution (like the protein folding problem), then AI, or these machine learning methods, can push the field over the edge to a solution. AI / ML are powerful ways to explore a defined solution space for optimal results. They are not "intelligent" in the normal sense of the word, (at least not yet), which would imply having generalized world models that would allow them to range over large areas of knowledge, solve undefined problems, and exercise common sense.


Saturday, December 21, 2024

Inside the Process of Speciation

Adaptive radiations are messy, so no wonder we have a hard time reconstructing them.

Darwin drew a legendary diagram in his great book, of lineage trees tracing speciation from ancestors to descendants. It was just a sketch, and naturally had clear fork points where one species turns into two. But in real life, speciation is messier, with range overlaps, inter-breeding, and difficulties telling species apart. Ornithologists are still lumping and splitting species to this day, as more data come in about ranges, genetics, sub-populations, breeding behavior, etc. And if defining existing species is difficult, defining exactly where they split in the distant past is even harder.

Darwin's notebook sketch of speciation, from ancestors ... to descendants.

The advent of molecular data from genomes gave a tremendous boost to the amount of information on which to base phylogenetic inferences. It gave us a whole new domain of life, for one thing. And it has helped sharpen countless phylogenies that not been fully specified by fossil and morphological data. But still, difficulties remain. The deepest and most momentous divergences, like the origin of life itself, and the origin of eukaryotes, remain shrouded in hazy and inconclusive trees, as do many other lineages, such as the origin of birds. It seems to be a rule that when a group of organisms undergoes rapid evolution / speciation, the tree they are on (as reconstructed by us from contemporary data) becomes correspondingly unclear and unresolved, difficult to trace through that tumultuous time. In part this is simply a matter of timing. If dramatic events happened within a few million years a billion years ago, our ability to resolve the sequence of those events is going to be weak in any case, compared to the same events spread out over a hundred million years.

A recent paper documented some of this about phylogeny in general, by correlating times of morphological change with times of phylogenetic haziness, which they term "gene-tree conflict". That is to say, if one samples genes across genomes to draw phylogenetic trees, different genes will give different trees. And this phenomenon increases right when there are other signs of rapid evolutionary change, i.e. changing morphology.

"One insight gleaned from phylogenomics is that gene-tree conflict, frequently caused by population-level processes, is often rampant during the origin of major lineages."

They identify three mechanisms behind this observation: incomplete lineage sorting (ILS), hybridization, and rapid evolution. Obviously, these need to be unpacked a bit. ILS is a natural consequence of the fact that species arise not from single organisms, but from populations. Gene mutations that differentiate the originating and future species happen all over the respective genomes, and enter the future lineage at different times. Some may happen well after the putative speciation event, and become fixed (that is, prevalent) later in that species. Others may have happened well before the speciation event, and die off in most of the descending lineages. The fact is that not every gene is going to march in lock step with the speciation event, in terms of its variants. So phylogenetic inference is best done using lots of genes plus statistical methods to arrive at the most likely explanation of the diverse individual gene trees.

Graphs drawn from different sources relating gene conflicts in lineage estimation, (top), versus rate of morphological change from the fossil record, (bottom), in birds, and over time on the X axis. There are dramatic upticks in all metrics going back towards the end-Cretaceous extinction event.


Similarly, hybridization means that proto-species are still occasionally interbreeding with their ancestors or other relatives, (think of Neanderthals), thereby mixing up the gene trees relative to the overall speciation tree. This can even happen by gene transfer mediated by viruses. "Rapid evolution" is not defined by these authors, and comes dangerously close to using the conclusion (of high morphological change during periods of "gene-tree conflict") to describe their premise. But generally, this would mean that some genes are evolving rapidly, due to novel selective pressures, thus deviating from the general march of neutral evolution that affects most loci more evenly. This rate change can mess up phylogenetic inferences, lengthening some (gene) tree branches versus others, and making a unitary tree (that is, for the species or lineage as a whole) hard to draw.

But these are all rather abstract ideas. How does this process look on the ground? A wonderful paper on the tomato gives us some insight. This group traced the evolutionary history of a genus of tomato (Solanum sect. Lycopersicon) in the South American Andes (plus Galapagos islands just off-shore, interestingly enough). These form a tight group of about thirteen species that evolved from a single ancestor over the last two million years, before jumping onto our lunch plates via intensive breeding by native South Americans. This has been a rapid process of evolution, and phylogenies have been difficult to draw, for all the reasons given above. The tomatoes are mostly reproductively isolated, but not fully, and have various specializations for their microhabitats. So are they real species? And how can they evolve and specialize if they do not fully isolate from each other?

Gene-based phylogenetic tree of Andean tomato species. The consensus tree is in black at the right, while alternate trees (cloud) are drawn from 2,745 windows of 100 kb across the tomato genomes, clearly giving diverse views of the lineage tree. Lycopersicon are the species under study, while Lycopericoides is an "outgroup" genus used as a control / comparison. 

In the graph above, there is, as they say, rampant discord among genomic segments, versus the overall consensus tree that they arrived at:

"However, these summary support measures conceal rampant phylogenetic complexity that is evident when examining the evolutionary history of more defined genomic partitions."

For one thing, much of the sequence diversity in the ancestor survives in the descendent lineages. The founders were not single plants, by any means. Second, there has been a lot of "introgression", which is to say, breeding / hybridization between lineages after their putative separation. 

Lastly, they find a high rate of novel mutations, often subject to clear positive selection. Ten enyzmes in the carotenoid biosynthesis pathway, which affects fruit color in a group that has evolved red fruits, carry novel mutations. A UV light damage repair gene shows strong signs of positive selection, in high-altitude species. Others show novel mutations in a temperature stress response gene, and selection on genes defending plants against heavy metals in the soil. 

Their conclusion (as that of the previous paper) is that adaptive radiations are characterized by several components that scramble normal phylogenetic analysis, including variably preserved diversity from the originating species, post-divergence gene flow (i.e. mating), and rapid adaptation to new conditions along with strong environmental selection over the pre-existing diversity. All of these mechanisms are happening at the same time, and each position in the genome is being affected at the same time, so this is a massively parallel process that, while slow in human time, can be very rapid in geologic time. They note how tomato speciation compares with some other well-known cases:

"Nonetheless, based on our crude estimates within each analysis, we infer that relatively small yet substantial fractions of the euchromatic genome are implicated in each source of genetic variation. We find little evidence that one of these processes predominates in its contribution, although our estimates suggest that de novo mutation might be relatively more influential and cross-species introgression relatively less so. This latter observation is in interesting contrast with several recent studies of animal adaptive radiations, including in Darwin’s Finches [18], Equids [14], and fish [13], where evidence suggests that hybridization and introgression might be much more pervasive and influential than previously suspected, and more abundant than we detect in Solanum."

Naturally, neither of these studies go back in time to nail down exactly what happened during these evolutionary radiations, nor what caused them. They only give hints about causation. Why the stasis of some species, and the rapid niche-finding and filling by others? Was the motive force natural selection, or god? The latter paper gives some clear hints about possible selective pressures and rationales that were at work in the Andes and Galapagos on the genus of Solanum. But it is always frustratingly a matter of abstract reasoning, in the manner of Darwin, that paints the forces at work, however detailed the genetic and biogeographic analyses and however convincing the analogous laboratory experiments on model, usually microbial, organisms. We have to think carefully, and within the discipline of known forces and mechanisms, to arrive at intellectually honest answers to these questions, insofar as they can be answered at all.


Saturday, December 7, 2024

Cranking Up DNA, One Gyration at a Time

The mechanism of DNA gyrase, which supercoils bacterial DNA.

Imagine that you have a garden hose that is thirty miles long. How would you keep it from getting tangled? That is unlikely to be easy. Now add randomly placed heavy machinery that actively twists that hose as it travels / pulls along, causing it to wind up ahead, and unwind behind. And that machinery can be placed in either direction, often getting into head-on conflicts, not to mention going at quite different speeds. That is the problem our cells have, managing their DNA. 

They use a set of topoisomerases to manage the topology of DNA- that is, its twist-i-ness. One easy method is to nick the DNA on one of its two strands, allowing it to relax by spinning around the remaining phosphate bond, before resealing it back to a double strand and sending it on its way. But what if you encounter coils or knots that can't be resolved that way? The next level is to cut one entire DNA molecule, not just one side/strand of it, and pass the conflicting one though it. All organisms contain topoisomerases of both kinds, and they are essential.

How DNA gets twisted. While most topoisomerases relax DNA (top) to resolve the many twisty problems posed by transcription and replication, gyrase increases twist by grabbing and holding a quasi-positive twist, then cutting and resolving it, as shown at bottom.

Bacteria have an additional enzyme that we do not have, called gyrase, to crank up the supercoiling of their DNA, to make it easier to open for transcription. Gyrase works just like a type II topoisomerase that cuts a double-stranded DNA and lets another DNA through, but it does so in a special way that puts a twist on the DNA first, so instead of relaxing the DNA, it increases the stress. How exactly that works has been a bit mysterious, though gyrases and the general principles they operate under have been clear for decades. Gyrase uses ATP, and grabs onto two parts of a DNA molecule, one of which is pre-twisted into coil, after which one is cut and the other passed through to create a change (-2) in the twisting number of that DNA.

A general model of gyrase action. The G segment of DNA is firmly held by the gyrase dimer in the center.  The same DNA is forcibly twisted about, around the pinwheel structures, and bent back around to enter through the N-gate (as the T segment). Then, the N gate closes, paving the way for the G-segment to be cut and separated (step 3). ATP is the energy source behind all this structural drama. The T-segment then passes through the cut, enters the C-gate, and the cycle is complete.

A recent paper determined the structure of active gyrase complexes, and was able to trace the pre-twisted conformation. This, combined with a lot of past work on the ATPase and cleavage functions of gyrase, allows a reasonably full picture of how this enzyme works. It is a symetric dimer of a two-subunit protein, so there are four protein chains in all. There are three major regions of the full structure. The N-gate at top where one segment (the T-segment) of DNA binds, then the central DNA gate, where the other (G-segment) DNA binds and is later cut to let the T-segment through, and the C-gate, where the T segment ends up and is released at the end of the cycle. 

Focus on the pinwheel structure that dramatically pre-twists the DNA around between the G and T segments, pre-positioning the complex for strand passage and increased supercoiling.

The magic is that the T-segment and the G-segment of DNA are parts of the same DNA molecule, by being wrapped around the ears of the protein, which are also called pinwheels. That is what the newest structure solves in greatest detail. These pinwheels essentially allow the enzyme to yank an otherwise normal DNA strand into a pre-knotted (positive supercoil) form that, when cut and resolved as shown, results in a negative increment of supercoiling or twist. If they mutated the pinwheels away, the enzyme could still hold, cut, and relax DNA, but it could not increase its supercoiling. It is the ability of the pinwheel structures to set up a pre-twisted structure onto the DNA that makes this enzyme a machine to increase negative supercoiling, and thus ease other DNA transactions. 

Topoisomerase enzymes through evolution, from gyrase (left) to human topoII on the right. Note how the details of the protein structure are virtually unrecognizable, while the overall shape and DNA-binding stays the same.

Bacteria also have more normal type II topoisomerases that cut DNA merely to relax it, so one might wonder how these two enzymes get along. Well, gyrase is responsible for the overall negative supercoiling of the bacterial genome, while the other topoisomerases have more localized roles to relieve transient knots and over-twisting. Indeed, if you negatively twist DNA enough, you can separate its strands entirely, which is not usually desirable. Further research shows that too much of either topoisomerase is lethal, and that they are kept in balance by transcriptional controls over the amount of each topoisomerase. This suggests a futile cycle of DNA winding and unwinding, as the optimal condition in bacterial cells when both are present in just the right amounts.