Showing posts with label molecular biology. Show all posts
Showing posts with label molecular biology. Show all posts

Saturday, March 1, 2025

The Train Tracks of Synapsis

Structures that align and tether the chromosomes in meiosis are now understood in some molecular detail.

It has been one of the wonders of biology- the synaptonemal complex that aligns homologous chromosomes during meiosis. While chromosomes regularly line up in the middle of the cell during mitosis, so that they can be evenly divided between the daughter cells, in this process they only have to join at their centromeres, which get dragged to the midline of the cell, and then pulled back apart at cell division. In meiosis, on the other hand, not only do the sister chromosomes that have just replicated stick together at their centromeres, but the homologous chromosomes, which have never bothered about each other since sperm fused with egg, suddenly seek each other out and pair up in an elaborate dance of DNA breakage, alignment, cross-over, and repair. Then in the first division, these cross-over-joined homologs line up at the midline and get pulled apart as their crossovers are repaired. The second division follows, much more like mitosis, where the duplicated sister chromosomes line up at the midline based on their centromere attachments, and then separate into haploid gametes.

Comparison of mitosis vs meiosis, which goes through an extra division and alternate chrosomosome pairing and separation processes in the firsts division.

The two divisions are fundamentally different, with the first involving novel chromosome pairings and attachments. The opening act of all this, which I won't go into further, is a sprinkling of ~400 DNA strand breaks induced specifically all over the genome, which sets up a repair process at each site, where the chromosomes (using Rad51) seek out good copies of the damaged DNA- that is, another, matching, DNA molecule. There are specific processes that appear to prevent use of the recently replicated "sister", which would be the most closely identical copy that could be used. Instead, there is a bias to use the "homologous" copy from the other parent. But these homologous chromosomes have just been replicated as well. How to line all this up so that the chromosomes all line up neatly and separate neatly during the first meiotic division? The answer is the synaptonemal complex.

Schematic of the synaptonemal complex joining two homologous chromosomes. The lateral elements are on each side, and the central element lines up the center. Crossing the gap is the transverse elements, now known to be composed of the SYCP1 protein. At bottom is a diagram from its atomic structure of how SYCP1 coils together, and how its ends join to zip up the synaptonemal gap. 

This is a train track of connecting proteins between the homologous chromosomes. It is evident that the DNA breaks come first, followed by the search for matching homologs, followed by the radiating and progressive assembly of the synaptonemal complex out from the break repair sites. The components of its major structures have been mostly characterized- the lateral element where the DNA loops line up; the transverse element that spans the gap between the homologous chromosomes, and the central element, proteins at the midline that help the transverse elements assemble. A paper from 2023 characterized the transverse element protein, SYCP1, which is a long coil of a protein that dimerizes to make a strong coil, and then dimerizes again head-to-head to create the symmetric bridge over the whole width of the synaptonemal complex. Which is about 100 nanometers in width. 

These authors then focus on a series of experiments using key mutations at the dimer-dimer head-to-head interaction area, to demonstrate how this head-to-head zippering works in detail. Mutating just two amino acids in this contact region eliminates the head-to-head interaction, making synapsis impossible. In these cases, the homologous chromosomes (from mice) remain in proximity, especially at crossover sites, but are no longer zippered up and closely aligned.

Spreads of mouse meiotic chromosomes, labeled as shown with antibodies against two synaptonemal proteins. From the top, wild-type SYCP1, then single individual mutations in the end-joining region, and at bottom SYCP1 with two point mutations that eliminate its function entirely. The chromosomes at the bottom are aligned only by virtue of their crossover points, but not by a zippered up synaptonemal complex. Needless to say, mice like this are not fertile.


Thus what was once a hazy mystery in the highest power microscopes has been defined in molecular terms, highlighting once again the power of curiosity, and the essentially moral aim of truth-seeking- to reveal what is true, rather than dictate it. But who cares about all that? Truth, knowledge, science... these values are now not only in question, but under active attack. Who is making America great, and who is diminishing it? Those in our institutions of power who have a voice will hopefully see the consequences and act on them, before our history and values are entirely corrupted.


  • Sociopaths at work.
  • Evidently the model is that we become a version of China/Russia, and make a tripolar world. Not a little Orwellian. And who knows, perhaps we will offer Russia a deal to partition Canada. That is, after we get done partitioning Ukraine.
  • A black day.
  • Oh, wait, the next day was even worse.
  • Shades of Stalin, with a sad sartorial hat-tip to Steve Jobs.
  • Unlawful and vindictive destruction at the NIH, and of biological research in general.
  • And all for love.

Saturday, February 15, 2025

Cloudy, With a Chance of RNA

Long RNAs play structural and functional roles in regulation of chromosome replication and expression.

One of the wonderful properties of the fruit fly as a model system of genetics and molecular biology has been its polytene chromosomes. These are hugely expanded bundles of chromosomes, replicated thousands of times, which have been observed microscopically since the late 1800's. They exist in the larval salivary gland, where huge amounts of gene expression are needed, thus the curious evolutionary solution of expanding the number of templates, not only of the gene needed, but of the entire genome. 

These chromosomes where closely mapped and investigated, almost like runic keys to the biology of the fly, especially in the day before molecular biology. Genetic translocations, loops, and other structural variations could be directly observed. The banding patterns of light, dark, expanded, and compressed regions were mapped in excruciating detail, and mapped to genetic correlates and later to gene expression patterns. These chromosomes provided some of the first suggestions of heterochromatin- areas of the genome whose expression is shut down (repressed). They may have genes that are shut off, but they may also be structural components, such as centromeres and telomeres. These latter areas tend to have very repetitive DNA sequences, inherited from old transposons and other junk. 

A diagram of polytene chromosomes, bunched up by binding at the centromeres. The banding pattern is reproducible and represents differences in proteins bound to various areas of the genome, and gene activity.

It has become apparent that RNA plays a big role in managing these areas of our chromosomes. The classic case is the XIST RNA, which is a long (17,000 bases) non-coding RNA that forms a scaffold by binding to lots of "heterogeneous" RNA-binding proteins, and most importantly, stays bound near the site of its creation, on the X chromosome. Through a regulatory cascade that is only partly understood, the XIST RNA is turned off on one of the X chromosomes, and turned on the other one (in females), leading the XIST molecule to glue itself to its chromosome of origin, and then progressively coat the rest of that chromosome and turn it off. That is, one entire X is turned into heterochromatin by a process that requires XIST scaffolding all along its length. That results in "dosage compensation" in females, where one X is turned off in all their cells, allowing dosage (that is, the gene expression) of its expressed genes to approximate those of males, despite the presence of the extra X chromosome. Dosage is very important, as shown by Down Syndrome, which originates from a duplication of one of the smallest human chromosomes, creating imbalanced gene dosage.

A recent paper described work on "ASAR" RNAs, which similarly arise from highly repetitive areas of human chromosomes, are extremely long (180,000 bases), and control expression and chromosome replication in an allele-specific way on (at least) several non-X chromosomes. These RNAs, again, like XIST, specifically bind a bunch of heternuclear binding proteins, which is presumably central to their function. Indeed, these researchers dissected out the 7,000 base segment of ASAR6 that is densest in protein binding sites, and find that, when transplanted into a new location, this segment has dramatic effects on chromosome condensation and replication, as shown below.

The intact 7,000 base core of ASAR6 was transplanted into chromosome 5, and mitotic chromosomes were spread and stained. The blue is a general DNA stain. The green is a stain for newly synthesized DNA, and the red is a specific probe for the ASAR6 sequence. One can see on the left that this chromosome 5 is replicating more than any other chromosome, and shows delayed condensation. In contrast, the right frame shows a control experiment where an anti-sense version of the ASAR6 7,000 base core was transplanted to chromosome 5. The antisense sequence not only does not have the wild-type function, but also inhibits any molecule that does by tightly binding to it. Here, the chromosome it resides on (arrows) is splendidly condensed, and hardly replicating at all (no green color).


Why RNA? It has become clear over the last two decades that our cells, and particularly our nuclei, are swimming with RNAs. Most of the genome is transcribed in some way or other, despite a tiny proportion of it coding for anything. 95% of the RNAs that are transcribed never get out of the nucleus. There has been a growing zoo of different kinds of non-coding RNAs functioning in translational control, ribosomal maturation, enhancer function, and here, in chromosome management. While proteins tend to be compact bundles, RNAs can be (as these ASARs are) huge, especially in one dimension, and thus capable of physically scaffolding the kinds of structures that can control large regions of chromosomes.

Chromosomes are sort of cloudy regions in our cells, long a focus of observation and clearly also a focus of countless proteins and now RNAs that bind, wind, disentangle, transcribe, replicate, and congregate around them. What all these RNAs and especially the various heteronuclear proteins actually do remains pretty unclear. But they form a sort of organelle that, while it protects and manages our DNA, remarkably also allows access to it for sequence-specific binding proteins and the many processes that plow through it.

"In addition, recent studies have proposed that abundant nuclear proteins such as HNRNPU nonspecifically interact with ‘RNA debris’ that creates a dynamic nuclear mesh that regulates interphase chromatin structure."


Saturday, February 1, 2025

Proving Evolution the Hard Way

Using genomes and codon ratios to estimate selective pressures was so easy... why is it not working?

The fruits of evolution surround us with abundance, from the tallest tree to the tiniest bacterium, and the viruses of that bacterium. But the process behind it is not immediately evident. It was relatively late in the enlightenment before Darwin came up with the stroke of insight that explained it all. Yet that mechanism of natural selection remains an abstract concept requiring an analytical mind and due respect for very inhuman scales of the time and space in play. Many people remain dumbfounded, and in denial, while evolutionary biology has forged ahead, powered by new discoveries in geology and molecular biology.

A recent paper (with review) offered a fascinating perspective, both critical and productive, on the study of evolutionary biology. It deals with the opsin protein that hosts the visual pigment 11-cis-retinal, by which we see. The retinal molecule is the same across all opsins, but different opsin proteins can "tune" the light wavelength of greatest sensitivity, creating the various retinal-opsin combinations for all visual needs, across the cone cells and rod cells. This paper considered the rhodopsin version of opsin, which we use in rod cells to perceive dim light. They observed that in fish species, the sensitivity of rhodopsin has been repeatedly adjusted to accommodate light at different depths of the water column. At shallow levels, sunlight is similar to what we see, and rhodopsin is tuned to about 500 nm, while deeper down, when the light is more blue-ish, rhodopsin is tuned towards about 480 nm maximum sensitivity. There are also special super-deep fish who see by their own red-tinged bioluminescence, and their rhodopsins are tuned to 526 nm. 

This "spectrum" of sensitivities of rhodopsin has a variety of useful scientific properties. First, the evolutionary logic is clear enough, matching the fish's vision to its environment. Second, the molecular structure of these opsins is well-understood, the genes are sequenced, and the history can be reconstructed. Third, the opsin properties can be objectively measured, unlike many sequence variations which affect more qualitative, difficult-to-observe, or impossible-to-observe biological properties. The authors used all this to carefully reconstruct exactly which amino acids in these rhodopsins were the important ones that changed between major fish lineages, going back about 500 million years.

The authors' phylogenetic tree of fish and other species they analyzed rhodopsin molecules from. Note how mammals occupy the bottom small branch, indicating how deeply the rest of the tree reaches. The numbers in the nodes indicate the wavelength sensitivity of each (current or imputed) rhodopsin. Many branches carry the author's inference, from a reconstructed and measured protein molecule, of what precise changes happened, via positive selection, to get that lineage.

An alternative approach to evolutionary inference is a second target of these authors. That is a codon-based method, that evaluates the rate of change of DNA sites under selection versus sites not under selection. In protein coding genes (such as rhodopsin), every amino acid is encoded by a triplet of DNA nucleotides, per the genetic code. With 64 codons for ~20 amino acids, it is a redundant code where many DNA changes do not change the protein sequence. These changes are called "synonymous". If one studies the rate of change of synonymous sites in the DNA, (which form sort of a control in the experiment), compared with the rate of change of non-synonymous sites, one can get a sense of evolution at work. Changing the protein sequence is something that is "seen" by natural selection, and especially at important positions in the protein, some of which are "conserved" over billions of years. Such sites are subject to "negative" selection, which to say rapid elimination due to the deleterious effect of that DNA and protein change.

Mutations in protein coding sequence can be synonymous, (bottom), with no effect, or non-synonymous (middle two cases), changing the resulting protein sequence and having some effect that may be biologically significant, thus visible to natural selection.


This analysis has been developed into a high art, also being harnessed to reveal "positive" selection. In this scenario, if the rate of change of the non-synonymous DNA sites is higher than that of the synonymous sites, or even just higher than one would expect by random chance, one can conclude that these non-synonymous sites were not just not being selected against, but were being selected for, an instance of evolution establishing change for the sake of improvement, instead of avoiding change, as usual.

Now back to the rhodopsin study. These authors found that a very small number of amino acids in this protein, only 15, were the ones that influenced changes to the spectral sensitivity of these protein complexes over evolutionary time. Typically only two or three changes occurred over a shift in sensitivity in a particular lineage, and would have been the ones subject to natural selection, with all the other changes seen in the sequence being unrelated, either neutral or selected for other purposes. It is a tour de force of structural analysis, biochemical measurement, and historical reconstruction to come up with this fully explanatory model of the history of piscene rhodopsins. 

But then they went on to compare what they found with what the codon-based methods had said about the matter. And they found that there was no overlap whatsover. The amino acids identified by the "positive selection" codon based methods were completely different than the ones they had found by spectral analysis and phylogenetic reconstruction over the history of fish rhodopsins. The accompanying review is particularly harsh about the pseudoscientific nature of this codon analysis, rubbishing the entire field. There have been other, less drastic, critiques as well.

But there is method to all this madness. The codon based methods were originally conceived in the analysis of closely related lineages. Specifically, various Drosophia (fly) species that might have diverged over a few million years. On this time scale, positive selection has two effects. One is that a desirable amino acid (or other) variation is selected for, and thus swept to fixation in the population. The other, and corresponding effect, is that all the other variations surrounding this desirable variation (that is, which are nearby on the same chromosome) are likewise swept to fixation (as part of what is called a haplotype). That dramatically reduces the neutral variation in this region of the genome. Indeed, the effect on neutral alleles (over millions of nearby base pairs) is going to vastly overwhelm the effect from the newly established single variant that was the object of positive selection, and this imbalance will be stronger the stronger the positive selection. In the limit case, the entire genomes of those without the new positive trait/allele will be eliminated, leaving no variation at all.

Yet, on the longer time scale, over hundreds of millions of years, as was the scope of visual variation in fish, all these effects on the neutral variation level wash out, as mutation and variation processes resume, after the positively selected allele is fixed in the population. So my view of this tempest in an evolutionary teapot is that these recent authors (and whatever other authors were deploying codon analysis against this rhodopsin problem) are barking up the wrong tree, mistaking the proper scope of these analyses. Which, after all, focus on the ratio between synonymous and non-synonymous change in the genome, and thus intrinsically on recent change, not deep change in genomes.


  • That all-American mix of religion, grift, and greed.
  • Christians are now in charge.
  • Mechanisms of control by the IMF and the old economic order.
  • A new pain med, thanks to people who know what they are doing.

Saturday, January 18, 2025

Eeking Out a Living on Ammonia

Some archaeal microorganisms have developed sophisticated nano-structures to capture their food: ammonia.

The earth's nitrogen cycle is a bit unheralded, but critical to life nonetheless. Gaseous nitrogen (N2) is all around us, but inert, given its extraordinary chemical stability. It can be broken down by lightning, but little else. It must have been very early in the history of life that the nascent chemical-biological life forms tapped out the geologically available forms of nitrogen, despite being dependent on nitrogen for countless critical aspects of organic chemistry, particularly of nucleic acids, proteins, and nucleotide cofactors. The race was then on to establish a way to capture it from the abundant, if tenaciously bound, dinitrogen of the air. It was thus very early bacteria that developed a way (heavily dependent, unsurprisingly, on catalytic metals like molybdenum and iron) to fix nitrogen, meaning breaking up the triple N≡N bond, and making ammonia, NH3 (or ammonium, NH4+). From there, the geochemical cycle of nitrogen is all down-hill, with organic nitrogen being oxidized to nitric oxide (NO), nitrite (NO2-), nitrate (NO3), and finally denitrification back to N2. Microorganisms obtain energy from all of these steps, some living exclusively on either nitrite or nitrate, oxidizing them as we oxidize carbon with oxygen to make CO2. 

Nitrosopumilus, as imaged by the authors, showing its corrugated exterior, a layer entirely composed of ammonia collecting elements (can be hexameric or pentameric). Insets show an individual hexagonal complex, in face-on and transverse views. Note also the amazing resolution of other molecules, such as the ribosomes floating about.

A recent paper looked at one of these denizens beneath our feet, an archaeal species that lives on ammonia, converting it to nitrite, NO2. It is a dominant microbe in its field, in the oceans, in soils, and in sewage treatment plants. The irony is that after we spend prodigious amounts of fossil fuels fixing huge amounts of nitrogen for fertilizer, most of which is wasted, and which today exceeds the entire global budget of naturally fixed nitrogen, we are faced with excess and damaging amounts of nitrogen in our effluent, which is then processed in complex treatment plants by our friends the microbes down the chain of oxidized states, back to gaseous N2.

Calculated structure of the ammonia-attracting pore. At right are various close-up views including the negatively charged amino acids (D, E) concentrated at the grooves of the structure, and the pores where ammonium can transit to the cell surface. 

The Nitrosopumilus genus is so successful because it has a remarkable way to capture ammonia from the environment, a way that is roughly two hundred times more efficient than that of its bacterial competitors. Its surface is covered by a curious array of hexagons, which turn out to be ammonia capture sites. In effect, its skin is an (relatively) enormous chemical antenna for ammonia, which is naturally at low concentration in sea water. These authors do a structural study, using the new methods of particle electron microscopy, to show that these hexagons have intensely negatively charged grooves and pores, to which positively charged ammonium ions are attracted. Within this outer shell, but still outside the cell membrane, enzymes at the cell surface transform the captured ammonium to other species such as hydroxylamine, which enforces the ammonium concentration gradient towards the cell surface, and which are then pumped inside.

Cartoon model of the ammonium attraction and transit mechanisms of this cell wall. 

It is a clever nano-material and micro-energetic system for concentrating a specific chemical- a method that might inspire human applications for other chemicals that we might need- chemicals whose isolation demands excessive energy, or whose geologic abundance may not last forever.


Saturday, January 4, 2025

Drilling Into the Transcriptional Core

Machine learning helps to tease out the patterns of DNA at promoters that initiate transcription.

One of the holy grails of molecular biology is the study of transcriptional initiation. While there are many levels of regulation in cells, the initiation of transcription is perhaps, of all of them, the most powerful. An organism's ability to keep the transcription of most genes off, and turn on genes that are needed to build particular tissues, and regulate others in response to other urgent needs, is the very soul of how multicellular organisms operate. The decision to transcribe a gene into its RNA message (mRNA) represents a large investment, as that transcript can last hours or more and during that time be translated into a great many protein copies. Additionally, this process identifies where, in the otherwise featureless landscape of genomic DNA, genes are located, which is another significant process, one that it took molecular biologists a long time to figure out.

Control over transcription is generally divided into two conceptual and physical regions- enhancers and promoters. Enhancers are typically far from the start site of transcription, and are modules of DNA sequences that bind innumerable regulatory proteins which collectively tune, in fine and rough ways, initiation. Promoters, in contrast, are at the core and straddle the start site of transcription (TSS, for short). They feature a much more limited set of motifs in the DNA sequence. The promoter is the site where the proteins bound to the various enhancers converge and encourage the formation of a "preinitiation complex", which includes the RNA polymerase that actually carries out transcription, plus a lot of ancillary proteins. The RNA polymerase can not initiate on its own or find a promoter on its own. It requires direction by the regulatory proteins and their promoter targets before finding its proper landing place. So the study of promoter initiation and regulation has a very long history, as a critical part of the central flow of information in molecular biology, from DNA to protein.

A schematic of a promoter, where initiation of transcription of Gene A, happens, with the start site (+1) right at the boundary of the orange and green colors. At this location, the RNA polymerase will melt the DNA strands, and start synthesizing an RNA strand using the (bottom) template strand of the DNA. Regulatory proteins bound to enhancers far away in the genomic DNA bend through space to activate proteins bound at the core promoter to load the polymerase and initiate this process.

A recent paper provided a novel analysis of promoter sequences, using machine learning to derive a relatively comprehensive account of the relevant sequences. Heretofore, many promoters had been dissected in detail and several key features found. But many human promoters had none of them, showing that our knowledge was incomplete. This new approach started strictly from empirical data- the genome sequence, plus large experimental compilations of nascent RNAs, as they are expressed in various cells, and mapped to the precise base where they initiated from- that is, their respective TSS. These were all loaded into a machine learning model that was supplemented with explanatory capabilities. That is, it was not just a black box, but gave interpretable results useful to science, in the form of small sequence signatures that it found are needed to make particular promoters work. These signatures presumably bind particular proteins that are the operational engines of regulatory integration and promoter function.

The TATA motif, found about 30 base pairs upstream of the transcription start site in many promoters. This is a motif view, where the statistical prevalence of the base is reflected in the height of the letter (top, in color) and its converse is reflected below in gray. Regular patterns like this found in DNA usually mean that some protein typically binds to this site, in this case TFIID.


For example, the grand-daddy of them all is the TATA box, which dates back to bacteria / archaea and was easily dug up by this machine learning system. The composition of the TATA box is shown above in a graphical form, where the probability of occurrence (of a base in the DNA) is reflected in height of the base over the axis line. A few G/C bases surround a central motif of T/A, and the TSS is typically 30 base pairs downstream. What happens here is that one of the central proteins of the RNA polymerase positioning complex, TFIID, binds strongly to this sequence, and bends the DNA here by ninety degrees, forming a launchpad of sorts for the polymerase, which later finds and opens DNA at the transcription start site. TFIID and the TATA box are well known, so it certainly is reassuring that this algorithmic method recovered it. TATA boxes are common at regulated promoters, being highly receptive to regulation by enhancer protein complexes. This is in contrast to more uniformly expressed (housekeeping) genes which typically use other promoter DNA motifs, and incidentally tend to have much less precise TSS positions. They might have start sites that range over a hundred base pairs, more or less stochastically.

The main advance of this paper was to find more DNA sites, and new types of sites, which collectively account for the positioning and activation of all promoters in humans. Instead of the previously known three or four factors, they found nine major DNA sequences, and a smattering of weaker patterns, which they combine into a predictive model that matches empirical data. Most of these DNA sequences were previously known, but not as part of core promoters. For example, one is called YY1, because it binds the YY1 protein, which has long been appreciated to be a transcriptional repressor, from enhancer positions. But now it turns out to also be core promoter participant, identifying and turning on a class of promoters that, as for most of the new-found sequence elements, tend to operate genes that are not heavily regulated, but rather universally expressed and with delocalized start sites. 

Motifs and initiator elements found by the current work. Each motif, presumably matched by a protein that binds it, gets its own graph of relation of the motif location (at 0 on the X axis) vs the start site of transcription that it directs, which for TATA is about 30 base pairs downstream. Most of the newly discovered motifs are bi-directional, directing start sites and transcription both upstream and downstream. This wastes a lot of effort, as the upstream transcripts are typically quickly discarded. The NFY motif has an interesting pattern of 10.5 bp periodicity of its directed start sites, which suggests that the protein complex that binds this site hugs one side of the DNA quite closely, setting up start sites on that side of the helix.

Secondly, these authors find that most of the new sequences they identify have bidirectional effects. That is, they set up promoters to fire in both directions, both typically about forty base pairs downstream and also upstream from their binding site. This explains a great deal of transcription data derived from new sequencing technologies, which shows that many promoters fire in both directions, even though the "upstream" or non-gene side transcript tends to be short-lived.


Overview of the new results, summarized by type of DNA sequence pattern. The total machine learning prediction was composed of predictions for larger motifs, which were the dominant pattern, plus a small contribution from "initiators", which comprise a few patterns right at the start site, plus a large but diffuse contribution from tiny trinucleotide patterns, such as the CG pattern known to mark active genes and carry activating DNA methylation marks.


A third finding was the set of trinucleotide motifs that serve as the sort of fudge factor for their machine learning model, filling in details to make the match to empirical data come out better. The length was set more or less arbitrarily, but they play a big part in the model fit. They note that one common example is the CG pattern, which is one of the stronger trinucleotide motifs. This pattern is known as CpG, and is the target of chemical methylation of DNA by regulatory enzymes, which helps to mark and regulate genes. The current work suggests that there may be more systems of this kind yet to be discovered, which play a modulating role in gene/promoter selection and activation.

The accuracy of this new learning and modeling system exemplifies some of the strengths of AI, of which machine learning is a sub-discipline. When there is a lot of data available, and a problem that is well defined and on the verge of solution (like the protein folding problem), then AI, or these machine learning methods, can push the field over the edge to a solution. AI / ML are powerful ways to explore a defined solution space for optimal results. They are not "intelligent" in the normal sense of the word, (at least not yet), which would imply having generalized world models that would allow them to range over large areas of knowledge, solve undefined problems, and exercise common sense.


Saturday, December 7, 2024

Cranking Up DNA, One Gyration at a Time

The mechanism of DNA gyrase, which supercoils bacterial DNA.

Imagine that you have a garden hose that is thirty miles long. How would you keep it from getting tangled? That is unlikely to be easy. Now add randomly placed heavy machinery that actively twists that hose as it travels / pulls along, causing it to wind up ahead, and unwind behind. And that machinery can be placed in either direction, often getting into head-on conflicts, not to mention going at quite different speeds. That is the problem our cells have, managing their DNA. 

They use a set of topoisomerases to manage the topology of DNA- that is, its twist-i-ness. One easy method is to nick the DNA on one of its two strands, allowing it to relax by spinning around the remaining phosphate bond, before resealing it back to a double strand and sending it on its way. But what if you encounter coils or knots that can't be resolved that way? The next level is to cut one entire DNA molecule, not just one side/strand of it, and pass the conflicting one though it. All organisms contain topoisomerases of both kinds, and they are essential.

How DNA gets twisted. While most topoisomerases relax DNA (top) to resolve the many twisty problems posed by transcription and replication, gyrase increases twist by grabbing and holding a quasi-positive twist, then cutting and resolving it, as shown at bottom.

Bacteria have an additional enzyme that we do not have, called gyrase, to crank up the supercoiling of their DNA, to make it easier to open for transcription. Gyrase works just like a type II topoisomerase that cuts a double-stranded DNA and lets another DNA through, but it does so in a special way that puts a twist on the DNA first, so instead of relaxing the DNA, it increases the stress. How exactly that works has been a bit mysterious, though gyrases and the general principles they operate under have been clear for decades. Gyrase uses ATP, and grabs onto two parts of a DNA molecule, one of which is pre-twisted into coil, after which one is cut and the other passed through to create a change (-2) in the twisting number of that DNA.

A general model of gyrase action. The G segment of DNA is firmly held by the gyrase dimer in the center.  The same DNA is forcibly twisted about, around the pinwheel structures, and bent back around to enter through the N-gate (as the T segment). Then, the N gate closes, paving the way for the G-segment to be cut and separated (step 3). ATP is the energy source behind all this structural drama. The T-segment then passes through the cut, enters the C-gate, and the cycle is complete.

A recent paper determined the structure of active gyrase complexes, and was able to trace the pre-twisted conformation. This, combined with a lot of past work on the ATPase and cleavage functions of gyrase, allows a reasonably full picture of how this enzyme works. It is a symetric dimer of a two-subunit protein, so there are four protein chains in all. There are three major regions of the full structure. The N-gate at top where one segment (the T-segment) of DNA binds, then the central DNA gate, where the other (G-segment) DNA binds and is later cut to let the T-segment through, and the C-gate, where the T segment ends up and is released at the end of the cycle. 

Focus on the pinwheel structure that dramatically pre-twists the DNA around between the G and T segments, pre-positioning the complex for strand passage and increased supercoiling.

The magic is that the T-segment and the G-segment of DNA are parts of the same DNA molecule, by being wrapped around the ears of the protein, which are also called pinwheels. That is what the newest structure solves in greatest detail. These pinwheels essentially allow the enzyme to yank an otherwise normal DNA strand into a pre-knotted (positive supercoil) form that, when cut and resolved as shown, results in a negative increment of supercoiling or twist. If they mutated the pinwheels away, the enzyme could still hold, cut, and relax DNA, but it could not increase its supercoiling. It is the ability of the pinwheel structures to set up a pre-twisted structure onto the DNA that makes this enzyme a machine to increase negative supercoiling, and thus ease other DNA transactions. 

Topoisomerase enzymes through evolution, from gyrase (left) to human topoII on the right. Note how the details of the protein structure are virtually unrecognizable, while the overall shape and DNA-binding stays the same.

Bacteria also have more normal type II topoisomerases that cut DNA merely to relax it, so one might wonder how these two enzymes get along. Well, gyrase is responsible for the overall negative supercoiling of the bacterial genome, while the other topoisomerases have more localized roles to relieve transient knots and over-twisting. Indeed, if you negatively twist DNA enough, you can separate its strands entirely, which is not usually desirable. Further research shows that too much of either topoisomerase is lethal, and that they are kept in balance by transcriptional controls over the amount of each topoisomerase. This suggests a futile cycle of DNA winding and unwinding, as the optimal condition in bacterial cells when both are present in just the right amounts. 


Saturday, November 9, 2024

Rings of Death

We make pore-forming proteins that poke holes in cells and kill them. Why?

Gasdermin proteins are parts of the immune system, and exist in bacteria as well. It was only in 2016 that their mechanism of action was discovered, as forming unusual pores. The function of these pores was originally assumed to be offensive, killing enemy cells. But it quickly became apparent that they more often kill the cells that make them, as the culmination of a process called pyroptosis, a form of (inflammatory) cell suicide. Further work has only deepened the complexity of this system, showing that gasdermin pores are more dynamic and tunable in their action than originally suspected.

The structure is quite striking. The protein starts as an auto-inhibited storage form, sitting around in the cell. When the cell comes under attack, a cascade of detection and signaling occurs that winds up expressing a family of proteases called caspases. Some of these caspases can cut the gasdermin proteins, removing their inhibitory domain and freeing them to assemble into multimers. About 26 to 32 of these activated proteins can form a ring on top of a membrane (let's say the plasma membrane), which then cooperatively jut down their tails into the membrane and make a massive hole in it.

Overall structure of assembled gasdermin protein pores.


Simulations of pore assembly, showing how the trapped membrane lipids would pop out of the center, once pore assembly is complete.


These holes, or pores, are big enough to allow small proteins through, and certainly all sorts of chemicals. So one can understand that researchers thought that these were lethal events. And gasdermins are known to directly attack bacterial cells, being responsible in part for defense against Shigella bacteria, among others. But then it was found that gasdermins are the main way that important cytokines like the highly pro-inflammatory IL-1β get out of the cell. This was certainly an unusual mode of secretion, and the gasdermin D pore seems specifically tailored, in terms of shape and charge, to conduct the mature form of IL-1β out of the cell. 

It also turned out that gasdermins don't always kill their host cells. Indeed, they are far more widely used for temporary secretion purposes than for cell killing. And this secretion can apparently be regulated, though the details of that remain unclear. In structural terms, gasdermins can apparently form partial and mini-pores that are far less lethal to their hosts, allowing, by way of their own expression levels, a sensitive titration of the level of response to whatever danger the cell is facing.

Schematic of how lower concentrations of gasdermin D (lower path, blue) allow smaller pores to form with less lethality.

Equally interesting, the bacterial forms of gasdermin have just begun to be studied. While they may have other functions, they certainly can kill their host cell in a suicide event, and researchers have shown that they can shut down phage infection of a colony or lawn of bacterial cells. That is, if a phage-infected cell can signal and activate its gasdermin proteins fast enough, it can commit suicide before the phage has time to fully replicate, beating the phage at its own race of infection and propagation. 

Bacteria committing suicide for the good of the colony or larger group? That introduces the theme of group selection, since committing suicide certainly doesn't do the individual bacterium any good. It is only in a family group, clonal colony, or similar community that suicide for the sake of the (genetically related) group makes sense. We, as multicellular organisms, are way past that point. Our cells are fully devoted to the good of the organism, not themselves. But to see this kind of heroism among bacteria is, frankly, remarkable.

Bacteria have even turned around to attack the attacker. The Shigella bacteria mentioned above, which are directly killed by gasdermins, have evolved an enzymatic activity that tags gasdermin with ubiquitin, sending it to the cellular garbage disposal and saving themselves from destruction. It is an interesting validation of the importance of gasdermins and the arms race that is afoot, within our bodies.


  • A tortured ballot.
  • Great again? Corruption and degradation is our lot.
  • We may be in a (lesser) Jacksonian age. Populism, bad taste, big hair, and mass deportation.
  • Beautiful Jupiter.
  • Bill Mitchell on our Depression job guarantee: "So for every $1 outlaid the total societal benefits were around $6 over the lifetime of the participant."
  • US sanctions are scrambling our alliances and the financial system.
  • Solar works for everyone.


Saturday, October 26, 2024

A Hunt for Causes of Atherosclerosis

Using the most advanced tools of molecular biology to sift through the sands of the genome for a little gold.

Blood vessels have a hard life. Every time you put on shoes, the vessels in your feet get smashed and smooshed, for hours on end. And do they complain? Generally, not much. They bounce back and make do with the room you give them. All through the body, vessels are subject to the pumping of the heart, and variations in blood volume brought on by our salt balance. They have to move when we do, and deal with it whenever we sit or lie on them. Curiously, it is the veins in our legs and calves, that are least likely to be crushed in daily life, that accumulate valve problems and go varicose. Atherosclerosis is another, much more serious problem in larger vessels, also brought on by age and injury, where injury and inflammation of the lining endothelial cells can lead to thickening, lipid/cholesterol accumulation, necrosis, calcification, and then flow restriction and fragmentation risk. 

Cross-section of a sclerotic blood vessel. LP stands for lipid pool, while the box shows necrotic and calcified bits of tissue.

The best-known risk factors for atherosclerosis are lipid-related, such as lack of liver re-capture of blood lipids, or lack of uptake around the body, keeping cholesterol and other lipid levels high in the blood. But genetic studies have found hundreds of areas of the genome with risk-conferring (or risk-reducing) variants, most of which are not related to lipid management. These genome-wide association studies (or GWAS) look for correlations between genetic markers and disease in large populations. So they pick up a lot of low-impact genetic variations that are difficult to study, due to their large number and low impact, which can often imply peripheral / indirect function. High-impact variations (mutations) tend to not survive in the population very long, but when found tend to be far more directly involved and informative.

A recent paper harnessed a variety of modern tools and methods to extract more from the poor information provided by GWAS. They come up with a fascinating tradeoff / link between atherosclerosis and cerebral cavernous malformation (CCM), which is distinct blood vessel syndrome that can also lead to rupture and death. The authors set up a program of analysis that was prodigious, and only possible with the latest tools. 

The first step was to select a cell line that could model the endothelial cells at issue. Then they loaded these cells with custom expression-reducing RNA regulators against each one of the ~1600 genes found in the neighborhood of the mutations uncovered by the GWAS analyses above, plus 600 control genes. Then they sequenced all the RNA messages from these single cells, each of which had received one of these "knock-down" RNA regulators. This involved a couple hundred thousand cells and billions of sequencing reads- no simple task! The point was to gather comprehensive data on what other genes were being affected by the genetic lesion found in the GWAS population, and then to (algorithmically) assemble them into coherent functional groups and pathways which could both identify which genes were actually being affected by the original mutations, and also connect them to the problems resulting in atherosclerosis.

Not to be outdone, they went on to harness the AlphaFold program to hunt for interactions among the proteins participating in some of the pathways they resolved through this vast pipeline, to confirm that the connections they found make sense.

They came up with about fifty different regulated molecular programs (or pathways), of which thirteen were endothelial cell specific. Things like angiogenesis, wound healing, flow response, cell migration, and osmoregulation came up, and are naturally of great relevance. Five of these latter programs were particularly strongly connected to coronary artery disease risk, and mostly concerned endothelial-specific programs of cell adhesion. Which makes sense, as the lack of strong adhesion contributes to injury and invasion by macrophages and other detritus from the blood, and adhesion among the endothelial cells plays a central role in their ability / desire to recover from injury, adjust to outside circumstances, reshape the vessel they are in, etc.

Genes near GWAS variations and found as regulators of other endothelial-related genes are mapped into a known pathway (a) of molecular signaling. The color code of changed expression refers to the effect that the marked gene had on other genes within the five most heavily disease-linked programs/pathways. The numbers refer to those programs, (8=angiogenesis and osmoregulation, 48=cell adhesion, 35=focal adhesion, related to cell adhesion, 39=basement membrane, related to cell polarity and adhesion, 47=angiogenesis, or growth of blood vessels). At bottom (c) is a layout of 41 regulated genes within the five disease-related programs, and how they are regulated by knockdown of the indicated genes on the X axis. Lastly, in d, some of these target genes have known effects on atherosclerosis or vascular barrier syndromes when mutated. And this appears to generally correlate with the regulatory effects of the highlighted pathway genes.

"Two regulators of this (CCM) pathway, CCM2 and TLNRD1, are each linked to a CAD (coronary artery disease) risk variant, regulate other CAD risk genes and affect atheroprotective processes in endothelial cells. ... Specifically, we show that knockdown of TLNRD1 or CCM2 mimics the effects of atheroprotective laminar blood flow, and that the poorly characterized gene TLNRD1 is a newly identified regulator in the CCM pathway."

On the other hand, excessive adhesiveness and angiogenesis can be a problem as well, as revealed by the reverse correlation they found with CCM syndrome. The interesting thing was that the gene CCM2 came up as one of strongest regulators of the five core programs associated with atherosclerosis risk mutations. As can be guessed from its name, it can harbor mutations that lead to CCM. CCM is a relatively rare syndrome (at least compared with coronary artery disease) of localized patches of malformed vessels in the brain, which are prone to rupture, which can be lethal. CCM2 is part of a protein complex, with KRIT1 and PDCD10, and part of a known pathway from fluid flow sensing receptors to transcription regulators (TFs) that turn on genes relevant to the endothelial cells. As shown in the diagram above, this pathway is full of genes that came up in this pathway analysis, from the atherosclerosis GWAS mutations. Note that there is a repression effect in the diagram above (a) between the CCM complex and the MAP kinase cascade that sends signals downstream, accounting for the color reversal at this stage of the diagram.

Not only did they find that this known set of three CCM gene are implicated in the atherosclerosis mutation results, but one of the genes they dug up through their pipeline, TLNRD1, turned out to be a fourth, hitherto unknown, member of the CCM complex, shown via the AlphaFold program to dock very neatly with the others. It is loss of function mutations of genes encoding this complex, which inhibits the expression of endothelial cell pro-cell adhesion and pro-angiogenesis sets of genes, that cause CCM, unleashing these angiogenesis genes to do too much. 

The logic of this pathway overall is that proper fluid flow at the cell surface, as expected in well-formed blood vessels, activates the pathway to the CCM complex, which then represses programs of new or corrective angiogenesis and cell adhesion- the tissue is OK as it is. Conversely, when turbulent flow is sensed, the CCM complex is turned down, and its target genes are turned up, activating repair, revision, and angiogenesis pathways that can presumably adjust the vessel shape to reduce turbulence, or simply strengthen it.

Under this model, malformations may occur during brain development when/where turbulent flow occurs, reducing CCM activation, which is abetted by mutations that help the CCM complex to fall apart, resulting (rarely) in run-away angiogenesis. The common variants dealt with in this paper, that decrease risk of cardiovascular disease / atherosclerosis, appear to have similar, but much weaker effects, promoting angiogenesis, including recovery from injury and adhesion between endothelial cells. In this way, they keep the endothelium tighter and more resistant to injury, invasion by macrophages, and all the downstream sequelae that result in atherosclerosis. Thus strong reduction of CCM gene function is dangerous in CCM syndrome, but more modest reductions are protective in atherosclerosis, setting up a sensitive evolutionary tradeoff that we are clearly still on the knife's edge of. I won't get into the nature of the causal mutations themselves, but they are likely to be diffuse and regulatory in the latter case.

Image of the CCM complex, which regulates response to blood flow, and whose mutations are relevant both to CCM and to atherosclerosis. The structures of TLNRD1 and the docking complex are provided by AlphaFold. 


This method is particularly powerful by being unbiased in its downstream gene and pattern finding, because it samples every expressed gene in the cell and automatically creates related pathways from this expression data, given the perturbations (knockdown of expression) of single target genes. It does not depend on using existing curated pathways and literature that would make it difficult to find new components of pathways. (Though in this case the "programs" it found align pretty closely with known pathways.) On the other hand, while these authors claim that this method is widely applicable, it is extremely arduous and costly, as evidenced by the contribution of 27 authors at top-flight institutions, an unusually large number in this field. So, for diseases and GWAS data sets that are highly significant, with plenty of funding, this may be a viable method of deeper analysis. Otherwise, it is beyond the means of a regular lab.

  • A backgrounder on sedition, treason, and insurrection.
  • And why it matters.
  • Jan 6 was an attempted putsch.
  • Trumpies for Putin.
  • Solar is a no-brainer.
  • NDAs are blatantly illegal and immoral. One would think we would value truth over lies.

Saturday, October 12, 2024

Pumping DNA

Arnold has nothing on the DNA pumps that load phages.

DNA is a very unwieldy molecule. Elegant in concept, but as organisms accumulated more features and genes, it got extremely long and twisty. So a series of management proteins arose, such as helicases and gyrases to relieve the torsional tension, and topoisomerases to cut and pass strands through each other to resolve knots. Another class is DNA pumps, which can forcefully travel along DNA to thread it into useful spaces, like the head of a phage, or a domain in our nucleus, to facilitate transcriptional isolation or organized recombination and synapsis. While other motors, acting on actin and microtubules, manage DNA segregation during mitosis, cell division, and cell movement, true DNA motors deal directly with DNA.

An iconic electron micrograph of a phage with its head blown open. The previously enclosed DNA is splayed about, suggesting the capsid's great capacity for DNA, and great pressure it was under. Inset shows an intact phage. Note the landing tentacles, which attach to the target bacterium.

There are several types of DNA pump, the lower-powered of which I have reviewed previously. The champions in terms of force, however, are the pumps that fill phage heads. Phages are viruses that infect bacteria, and they operate under a variety of limitations. Size is one- they have to be small and have small genomes, due to the small size of their targets, the brevity of their life cycle, and the mathematics of scattered propagation. Bacterial cells are under turgor pressure, of about three atmospheres, and have strong cell walls to hold everything in. So their infecting phages have several barriers to overcome. One solution is to be under even higher pressure themselves, up to about sixty atmospheres. That way, once the injection system has cut through the cell wall and inner membrane, the phage genome, which is pretty much the only thing in the phage head (or capsid), can shoot out rapidly and take over the cell. 

Schematic of late phage development, where the motor (blue) docks to the phage head and fills it with DNA, after which the tail assembly is attached.

How does the DNA loading pump work? It is closely docked into the phage head structure, has a pentagonal structure attached to the phage head, and a loosely attached, 12-sided inner rosette that they describe as a sort of bearing or ball-race. The outer pentagon has an ATPase at each vertex, and these fire sequentially during the pumping mechanism. Each ATP advances the DNA by about two base pairs. Presumably the head has a structure that guides the DNA into regular loops around its inside walls. 

Structure of the dodecameric portion of the phage DNA pump, without the ATPase pentameric portion. Obviously, the DNA threads through the center.

In the diagram below (reference), three steps are shown. First, (a, top), the "I" ATPase node (red) is linked to the "J" and "A" rosette nodes. "A" is where the rosette hooks into the DNA (red). Next, the rosette is expanded a bit, bringing "A" out of register from "I" and "C" into register with "II". At the same time, "C" links to the DNA two base pairs down from where "A" latched into it. In the third step, the rosette squashes again, the DNA ends up raised by two base pairs, and the process can start all over. It is a bit of a sleeve/ratchet mechanism. They do not speculate at this point which of these steps is the power stroke- were the ATP is hydrolyzed. Getting only two base pairs into the head per ATP doesn't seem very efficient, but it is evidently at the end of packaging, when the pressure rises to extreme levels, where this pump shines. And it can get a 19,000 bp genome into a phage head in three minutes, (~100 bp per second), so it isn't a slouch when it comes to speed, either. 

Model of how this pump works. See text above for details.


Not only is this pump an amazing and powerful bit of biotechnology, able to compress DNA to sixty atmospheres, but it is a fourth fundamental type of motor, in addition to the rotary motors as found in flagella, the linear motors found along actin and microtubules, and the DNA threading/looping motors of condensin/cohesin.


  • The 2024 Nobel prizes show the close nexus between computers and molecular biology. The original finding of miRNA complementarity could not have been made without a computerized sequence search.
  • When truth is a gaffe, and lies are routine.
  • Could crypto be any worse or more corrupting?

Saturday, September 28, 2024

Dangerous Memories

Some memory formation involves extracellular structures, DNA damage, and immune component activation / inflammation.

The physical nature of memories in the brain is under intensive scrutiny. The leading general theory is that of positive reinforcement, where neurons that are co-activated strengthen their connections, enhancing their ability to co-fire and thus to express the same pattern again in the future. The nature of these connections has been somewhat nebulous, assumed to just be the size and stability of their synaptic touch-points. But it turns out that there is a great deal more going on.

A recent paper started with a fishing expedition, looking at changes in gene expression in neurons at various time points after the mice were subjected to a fear learning regimen. They took this out to much longer time points (up to a month) than had been contemplated previously. At short times, a bunch of well-known signals and growth-oriented gene expression happened. At the longest time points, organization of a structure called the perineural net (PNN) was read out of the gene expression signals. This is a extracellular matrix sheath that appears to stabilize neuronal connections and play a role in long-term memory and learning. 

But the real shocker came at the intermediate time point of about four days. Here, there was overexpression of TLR9, which is an immune system detector of broken / bacterial DNA, and inducer in turn of inflammatory responses. This led the authors down a long rabbit hole of investigating what kind of DNA fragmentation is activating this signal, how common this is, how influential it is for learning, and what the downstream pathways are. Apparently, neuronal excitation, particularly over-excitation that might be experienced under intense fear conditions, isn't just stressful in a semiotic sense, but is highly stressful to the participating neurons. There are signs of mitochondrial over-activity and oxidative stress, which lead to DNA breakage in the nucleus, and even nuclear perforation. It is a shocking situation for cells that need to survive for the lifetime of the animal. Granted, these are not germ cells that prioritize genomic stability above all else, but getting your DNA broken just for the purpose of signaling a stress response that feeds into memory formation? That is weird.

Some neuronal cell bodies after fear learning. The red dye is against a marker of DNA repair proteins, which form tight dots around broken DNA. The blue is a general DNA stain, and the green is against a component of the nuclear envelope, showing here that nuclear envelopes have broken in many of these cells.

The researchers found that there are classic signs of DNA breakage, which are what is turning on the TLR9 protein, such as seeing concentrated double-strand DNA repair complexes. All this stress also turned on proteases called caspases, though not the cell suicide program that these caspases typically initiate. Many of the DNA break and repair complexes were, thanks to nuclear perforation, located diffusely at the centrosome, not in the nucleus. TLR9 turns on an inflammatory response via NFKB / RELA. This is clearly a huge event for these cells, not sending them into suicide, but all the alarms short of that are going off.

The interesting part was when the researchers asked whether, by deleting the TLR9 or related genes in the pathway, they could affect learning. Yes, indeed- the fear memory was dependent on the expression of this gene in neurons, and on this cell stress pathway, which appears to be the precondition of setting up the perineural net structures and overall stabilization. Additionally, the DNA damage still happened, but was not properly recognized and repaired in the absence of TLR9, creating an even more dangerous situation for the affected neurons- of genomic instability amidst unrepaired DNA.

When TRL9 is knocked out, DNA repair is cancelled. At bottom are wild-type cells, and at top are mouse neurons after fear learning that have had the gene TLR9 deleted. The red dye is against DNA repair proteins, as is the blue dye in the right-most frames. The top row is devoid of these repair activities.

This paper and its antecedent literature are making the case that memory formation (at least under these somewhat traumatic conditions- whether this is true for all kinds of memory formation remains to be seen) has commandeered ancient, diverse, and quite dangerous forms of cell stress response. It is no picnic in the park with madeleines. It is an all-hands-on-deck disaster scene that puts the cell into a permanently altered trajectory, and carries a variety of long-term risks, such as cancer formation from all the DNA breakage and end-joining repair, which is not very accurate. They mention in passing that some drugs have been recently developed against TLR9, which are being used to dampen inflammatory activities in the brain. But this new work indicates that such drugs are likely double-edged swords, that could impair both learning and the long-term health of treated neurons and brains.