Showing posts with label molecular biology. Show all posts
Showing posts with label molecular biology. Show all posts

Saturday, December 7, 2024

Cranking Up DNA, One Gyration at a Time

The mechanism of DNA gyrase, which supercoils bacterial DNA.

Imagine that you have a garden hose that is thirty miles long. How would you keep it from getting tangled? That is unlikely to be easy. Now add randomly placed heavy machinery that actively twists that hose as it travels / pulls along, causing it to wind up ahead, and unwind behind. And that machinery can be placed in either direction, often getting into head-on conflicts, not to mention going at quite different speeds. That is the problem our cells have, managing their DNA. 

They use a set of topoisomerases to manage the topology of DNA- that is, its twist-i-ness. One easy method is to nick the DNA on one of its two strands, allowing it to relax by spinning around the remaining phosphate bond, before resealing it back to a double strand and sending it on its way. But what if you encounter coils or knots that can't be resolved that way? The next level is to cut one entire DNA molecule, not just one side/strand of it, and pass the conflicting one though it. All organisms contain topoisomerases of both kinds, and they are essential.

How DNA gets twisted. While most topoisomerases relax DNA (top) to resolve the many twisty problems posed by transcription and replication, gyrase increases twist by grabbing and holding a quasi-positive twist, then cutting and resolving it, as shown at bottom.

Bacteria have an additional enzyme that we do not have, called gyrase, to crank up the supercoiling of their DNA, to make it easier to open for transcription. Gyrase works just like a type II topoisomerase that cuts a double-stranded DNA and lets another DNA through, but it does so in a special way that puts a twist on the DNA first, so instead of relaxing the DNA, it increases the stress. How exactly that works has been a bit mysterious, though gyrases and the general principles they operate under have been clear for decades. Gyrase uses ATP, and grabs onto two parts of a DNA molecule, one of which is pre-twisted into coil, after which one is cut and the other passed through to create a change (-2) in the twisting number of that DNA.

A general model of gyrase action. The G segment of DNA is firmly held by the gyrase dimer in the center.  The same DNA is forcibly twisted about, around the pinwheel structures, and bent back around to enter through the N-gate (as the T segment). Then, the N gate closes, paving the way for the G-segment to be cut and separated (step 3). ATP is the energy source behind all this structural drama. The T-segment then passes through the cut, enters the C-gate, and the cycle is complete.

A recent paper determined the structure of active gyrase complexes, and was able to trace the pre-twisted conformation. This, combined with a lot of past work on the ATPase and cleavage functions of gyrase, allows a reasonably full picture of how this enzyme works. It is a symetric dimer of a two-subunit protein, so there are four protein chains in all. There are three major regions of the full structure. The N-gate at top where one segment (the T-segment) of DNA binds, then the central DNA gate, where the other (G-segment) DNA binds and is later cut to let the T-segment through, and the C-gate, where the T segment ends up and is released at the end of the cycle. 

Focus on the pinwheel structure that dramatically pre-twists the DNA around between the G and T segments, pre-positioning the complex for strand passage and increased supercoiling.

The magic is that the T-segment and the G-segment of DNA are parts of the same DNA molecule, by being wrapped around the ears of the protein, which are also called pinwheels. That is what the newest structure solves in greatest detail. These pinwheels essentially allow the enzyme to yank an otherwise normal DNA strand into a pre-knotted (positive supercoil) form that, when cut and resolved as shown, results in a negative increment of supercoiling or twist. If they mutated the pinwheels away, the enzyme could still hold, cut, and relax DNA, but it could not increase its supercoiling. It is the ability of the pinwheel structures to set up a pre-twisted structure onto the DNA that makes this enzyme a machine to increase negative supercoiling, and thus ease other DNA transactions. 

Topoisomerase enzymes through evolution, from gyrase (left) to human topoII on the right. Note how the details of the protein structure are virtually unrecognizable, while the overall shape and DNA-binding stays the same.

Bacteria also have more normal type II topoisomerases that cut DNA merely to relax it, so one might wonder how these two enzymes get along. Well, gyrase is responsible for the overall negative supercoiling of the bacterial genome, while the other topoisomerases have more localized roles to relieve transient knots and over-twisting. Indeed, if you negatively twist DNA enough, you can separate its strands entirely, which is not usually desirable. Further research shows that too much of either topoisomerase is lethal, and that they are kept in balance by transcriptional controls over the amount of each topoisomerase. This suggests a futile cycle of DNA winding and unwinding, as the optimal condition in bacterial cells when both are present in just the right amounts. 


Saturday, November 9, 2024

Rings of Death

We make pore-forming proteins that poke holes in cells and kill them. Why?

Gasdermin proteins are parts of the immune system, and exist in bacteria as well. It was only in 2016 that their mechanism of action was discovered, as forming unusual pores. The function of these pores was originally assumed to be offensive, killing enemy cells. But it quickly became apparent that they more often kill the cells that make them, as the culmination of a process called pyroptosis, a form of (inflammatory) cell suicide. Further work has only deepened the complexity of this system, showing that gasdermin pores are more dynamic and tunable in their action than originally suspected.

The structure is quite striking. The protein starts as an auto-inhibited storage form, sitting around in the cell. When the cell comes under attack, a cascade of detection and signaling occurs that winds up expressing a family of proteases called caspases. Some of these caspases can cut the gasdermin proteins, removing their inhibitory domain and freeing them to assemble into multimers. About 26 to 32 of these activated proteins can form a ring on top of a membrane (let's say the plasma membrane), which then cooperatively jut down their tails into the membrane and make a massive hole in it.

Overall structure of assembled gasdermin protein pores.


Simulations of pore assembly, showing how the trapped membrane lipids would pop out of the center, once pore assembly is complete.


These holes, or pores, are big enough to allow small proteins through, and certainly all sorts of chemicals. So one can understand that researchers thought that these were lethal events. And gasdermins are known to directly attack bacterial cells, being responsible in part for defense against Shigella bacteria, among others. But then it was found that gasdermins are the main way that important cytokines like the highly pro-inflammatory IL-1β get out of the cell. This was certainly an unusual mode of secretion, and the gasdermin D pore seems specifically tailored, in terms of shape and charge, to conduct the mature form of IL-1β out of the cell. 

It also turned out that gasdermins don't always kill their host cells. Indeed, they are far more widely used for temporary secretion purposes than for cell killing. And this secretion can apparently be regulated, though the details of that remain unclear. In structural terms, gasdermins can apparently form partial and mini-pores that are far less lethal to their hosts, allowing, by way of their own expression levels, a sensitive titration of the level of response to whatever danger the cell is facing.

Schematic of how lower concentrations of gasdermin D (lower path, blue) allow smaller pores to form with less lethality.

Equally interesting, the bacterial forms of gasdermin have just begun to be studied. While they may have other functions, they certainly can kill their host cell in a suicide event, and researchers have shown that they can shut down phage infection of a colony or lawn of bacterial cells. That is, if a phage-infected cell can signal and activate its gasdermin proteins fast enough, it can commit suicide before the phage has time to fully replicate, beating the phage at its own race of infection and propagation. 

Bacteria committing suicide for the good of the colony or larger group? That introduces the theme of group selection, since committing suicide certainly doesn't do the individual bacterium any good. It is only in a family group, clonal colony, or similar community that suicide for the sake of the (genetically related) group makes sense. We, as multicellular organisms, are way past that point. Our cells are fully devoted to the good of the organism, not themselves. But to see this kind of heroism among bacteria is, frankly, remarkable.

Bacteria have even turned around to attack the attacker. The Shigella bacteria mentioned above, which are directly killed by gasdermins, have evolved an enzymatic activity that tags gasdermin with ubiquitin, sending it to the cellular garbage disposal and saving themselves from destruction. It is an interesting validation of the importance of gasdermins and the arms race that is afoot, within our bodies.


  • A tortured ballot.
  • Great again? Corruption and degradation is our lot.
  • We may be in a (lesser) Jacksonian age. Populism, bad taste, big hair, and mass deportation.
  • Beautiful Jupiter.
  • Bill Mitchell on our Depression job guarantee: "So for every $1 outlaid the total societal benefits were around $6 over the lifetime of the participant."
  • US sanctions are scrambling our alliances and the financial system.
  • Solar works for everyone.


Saturday, October 26, 2024

A Hunt for Causes of Atherosclerosis

Using the most advanced tools of molecular biology to sift through the sands of the genome for a little gold.

Blood vessels have a hard life. Every time you put on shoes, the vessels in your feet get smashed and smooshed, for hours on end. And do they complain? Generally, not much. They bounce back and make do with the room you give them. All through the body, vessels are subject to the pumping of the heart, and variations in blood volume brought on by our salt balance. They have to move when we do, and deal with it whenever we sit or lie on them. Curiously, it is the veins in our legs and calves, that are least likely to be crushed in daily life, that accumulate valve problems and go varicose. Atherosclerosis is another, much more serious problem in larger vessels, also brought on by age and injury, where injury and inflammation of the lining endothelial cells can lead to thickening, lipid/cholesterol accumulation, necrosis, calcification, and then flow restriction and fragmentation risk. 

Cross-section of a sclerotic blood vessel. LP stands for lipid pool, while the box shows necrotic and calcified bits of tissue.

The best-known risk factors for atherosclerosis are lipid-related, such as lack of liver re-capture of blood lipids, or lack of uptake around the body, keeping cholesterol and other lipid levels high in the blood. But genetic studies have found hundreds of areas of the genome with risk-conferring (or risk-reducing) variants, most of which are not related to lipid management. These genome-wide association studies (or GWAS) look for correlations between genetic markers and disease in large populations. So they pick up a lot of low-impact genetic variations that are difficult to study, due to their large number and low impact, which can often imply peripheral / indirect function. High-impact variations (mutations) tend to not survive in the population very long, but when found tend to be far more directly involved and informative.

A recent paper harnessed a variety of modern tools and methods to extract more from the poor information provided by GWAS. They come up with a fascinating tradeoff / link between atherosclerosis and cerebral cavernous malformation (CCM), which is distinct blood vessel syndrome that can also lead to rupture and death. The authors set up a program of analysis that was prodigious, and only possible with the latest tools. 

The first step was to select a cell line that could model the endothelial cells at issue. Then they loaded these cells with custom expression-reducing RNA regulators against each one of the ~1600 genes found in the neighborhood of the mutations uncovered by the GWAS analyses above, plus 600 control genes. Then they sequenced all the RNA messages from these single cells, each of which had received one of these "knock-down" RNA regulators. This involved a couple hundred thousand cells and billions of sequencing reads- no simple task! The point was to gather comprehensive data on what other genes were being affected by the genetic lesion found in the GWAS population, and then to (algorithmically) assemble them into coherent functional groups and pathways which could both identify which genes were actually being affected by the original mutations, and also connect them to the problems resulting in atherosclerosis.

Not to be outdone, they went on to harness the AlphaFold program to hunt for interactions among the proteins participating in some of the pathways they resolved through this vast pipeline, to confirm that the connections they found make sense.

They came up with about fifty different regulated molecular programs (or pathways), of which thirteen were endothelial cell specific. Things like angiogenesis, wound healing, flow response, cell migration, and osmoregulation came up, and are naturally of great relevance. Five of these latter programs were particularly strongly connected to coronary artery disease risk, and mostly concerned endothelial-specific programs of cell adhesion. Which makes sense, as the lack of strong adhesion contributes to injury and invasion by macrophages and other detritus from the blood, and adhesion among the endothelial cells plays a central role in their ability / desire to recover from injury, adjust to outside circumstances, reshape the vessel they are in, etc.

Genes near GWAS variations and found as regulators of other endothelial-related genes are mapped into a known pathway (a) of molecular signaling. The color code of changed expression refers to the effect that the marked gene had on other genes within the five most heavily disease-linked programs/pathways. The numbers refer to those programs, (8=angiogenesis and osmoregulation, 48=cell adhesion, 35=focal adhesion, related to cell adhesion, 39=basement membrane, related to cell polarity and adhesion, 47=angiogenesis, or growth of blood vessels). At bottom (c) is a layout of 41 regulated genes within the five disease-related programs, and how they are regulated by knockdown of the indicated genes on the X axis. Lastly, in d, some of these target genes have known effects on atherosclerosis or vascular barrier syndromes when mutated. And this appears to generally correlate with the regulatory effects of the highlighted pathway genes.

"Two regulators of this (CCM) pathway, CCM2 and TLNRD1, are each linked to a CAD (coronary artery disease) risk variant, regulate other CAD risk genes and affect atheroprotective processes in endothelial cells. ... Specifically, we show that knockdown of TLNRD1 or CCM2 mimics the effects of atheroprotective laminar blood flow, and that the poorly characterized gene TLNRD1 is a newly identified regulator in the CCM pathway."

On the other hand, excessive adhesiveness and angiogenesis can be a problem as well, as revealed by the reverse correlation they found with CCM syndrome. The interesting thing was that the gene CCM2 came up as one of strongest regulators of the five core programs associated with atherosclerosis risk mutations. As can be guessed from its name, it can harbor mutations that lead to CCM. CCM is a relatively rare syndrome (at least compared with coronary artery disease) of localized patches of malformed vessels in the brain, which are prone to rupture, which can be lethal. CCM2 is part of a protein complex, with KRIT1 and PDCD10, and part of a known pathway from fluid flow sensing receptors to transcription regulators (TFs) that turn on genes relevant to the endothelial cells. As shown in the diagram above, this pathway is full of genes that came up in this pathway analysis, from the atherosclerosis GWAS mutations. Note that there is a repression effect in the diagram above (a) between the CCM complex and the MAP kinase cascade that sends signals downstream, accounting for the color reversal at this stage of the diagram.

Not only did they find that this known set of three CCM gene are implicated in the atherosclerosis mutation results, but one of the genes they dug up through their pipeline, TLNRD1, turned out to be a fourth, hitherto unknown, member of the CCM complex, shown via the AlphaFold program to dock very neatly with the others. It is loss of function mutations of genes encoding this complex, which inhibits the expression of endothelial cell pro-cell adhesion and pro-angiogenesis sets of genes, that cause CCM, unleashing these angiogenesis genes to do too much. 

The logic of this pathway overall is that proper fluid flow at the cell surface, as expected in well-formed blood vessels, activates the pathway to the CCM complex, which then represses programs of new or corrective angiogenesis and cell adhesion- the tissue is OK as it is. Conversely, when turbulent flow is sensed, the CCM complex is turned down, and its target genes are turned up, activating repair, revision, and angiogenesis pathways that can presumably adjust the vessel shape to reduce turbulence, or simply strengthen it.

Under this model, malformations may occur during brain development when/where turbulent flow occurs, reducing CCM activation, which is abetted by mutations that help the CCM complex to fall apart, resulting (rarely) in run-away angiogenesis. The common variants dealt with in this paper, that decrease risk of cardiovascular disease / atherosclerosis, appear to have similar, but much weaker effects, promoting angiogenesis, including recovery from injury and adhesion between endothelial cells. In this way, they keep the endothelium tighter and more resistant to injury, invasion by macrophages, and all the downstream sequelae that result in atherosclerosis. Thus strong reduction of CCM gene function is dangerous in CCM syndrome, but more modest reductions are protective in atherosclerosis, setting up a sensitive evolutionary tradeoff that we are clearly still on the knife's edge of. I won't get into the nature of the causal mutations themselves, but they are likely to be diffuse and regulatory in the latter case.

Image of the CCM complex, which regulates response to blood flow, and whose mutations are relevant both to CCM and to atherosclerosis. The structures of TLNRD1 and the docking complex are provided by AlphaFold. 


This method is particularly powerful by being unbiased in its downstream gene and pattern finding, because it samples every expressed gene in the cell and automatically creates related pathways from this expression data, given the perturbations (knockdown of expression) of single target genes. It does not depend on using existing curated pathways and literature that would make it difficult to find new components of pathways. (Though in this case the "programs" it found align pretty closely with known pathways.) On the other hand, while these authors claim that this method is widely applicable, it is extremely arduous and costly, as evidenced by the contribution of 27 authors at top-flight institutions, an unusually large number in this field. So, for diseases and GWAS data sets that are highly significant, with plenty of funding, this may be a viable method of deeper analysis. Otherwise, it is beyond the means of a regular lab.

  • A backgrounder on sedition, treason, and insurrection.
  • And why it matters.
  • Jan 6 was an attempted putsch.
  • Trumpies for Putin.
  • Solar is a no-brainer.
  • NDAs are blatantly illegal and immoral. One would think we would value truth over lies.

Saturday, October 12, 2024

Pumping DNA

Arnold has nothing on the DNA pumps that load phages.

DNA is a very unwieldy molecule. Elegant in concept, but as organisms accumulated more features and genes, it got extremely long and twisty. So a series of management proteins arose, such as helicases and gyrases to relieve the torsional tension, and topoisomerases to cut and pass strands through each other to resolve knots. Another class is DNA pumps, which can forcefully travel along DNA to thread it into useful spaces, like the head of a phage, or a domain in our nucleus, to facilitate transcriptional isolation or organized recombination and synapsis. While other motors, acting on actin and microtubules, manage DNA segregation during mitosis, cell division, and cell movement, true DNA motors deal directly with DNA.

An iconic electron micrograph of a phage with its head blown open. The previously enclosed DNA is splayed about, suggesting the capsid's great capacity for DNA, and great pressure it was under. Inset shows an intact phage. Note the landing tentacles, which attach to the target bacterium.

There are several types of DNA pump, the lower-powered of which I have reviewed previously. The champions in terms of force, however, are the pumps that fill phage heads. Phages are viruses that infect bacteria, and they operate under a variety of limitations. Size is one- they have to be small and have small genomes, due to the small size of their targets, the brevity of their life cycle, and the mathematics of scattered propagation. Bacterial cells are under turgor pressure, of about three atmospheres, and have strong cell walls to hold everything in. So their infecting phages have several barriers to overcome. One solution is to be under even higher pressure themselves, up to about sixty atmospheres. That way, once the injection system has cut through the cell wall and inner membrane, the phage genome, which is pretty much the only thing in the phage head (or capsid), can shoot out rapidly and take over the cell. 

Schematic of late phage development, where the motor (blue) docks to the phage head and fills it with DNA, after which the tail assembly is attached.

How does the DNA loading pump work? It is closely docked into the phage head structure, has a pentagonal structure attached to the phage head, and a loosely attached, 12-sided inner rosette that they describe as a sort of bearing or ball-race. The outer pentagon has an ATPase at each vertex, and these fire sequentially during the pumping mechanism. Each ATP advances the DNA by about two base pairs. Presumably the head has a structure that guides the DNA into regular loops around its inside walls. 

Structure of the dodecameric portion of the phage DNA pump, without the ATPase pentameric portion. Obviously, the DNA threads through the center.

In the diagram below (reference), three steps are shown. First, (a, top), the "I" ATPase node (red) is linked to the "J" and "A" rosette nodes. "A" is where the rosette hooks into the DNA (red). Next, the rosette is expanded a bit, bringing "A" out of register from "I" and "C" into register with "II". At the same time, "C" links to the DNA two base pairs down from where "A" latched into it. In the third step, the rosette squashes again, the DNA ends up raised by two base pairs, and the process can start all over. It is a bit of a sleeve/ratchet mechanism. They do not speculate at this point which of these steps is the power stroke- were the ATP is hydrolyzed. Getting only two base pairs into the head per ATP doesn't seem very efficient, but it is evidently at the end of packaging, when the pressure rises to extreme levels, where this pump shines. And it can get a 19,000 bp genome into a phage head in three minutes, (~100 bp per second), so it isn't a slouch when it comes to speed, either. 

Model of how this pump works. See text above for details.


Not only is this pump an amazing and powerful bit of biotechnology, able to compress DNA to sixty atmospheres, but it is a fourth fundamental type of motor, in addition to the rotary motors as found in flagella, the linear motors found along actin and microtubules, and the DNA threading/looping motors of condensin/cohesin.


  • The 2024 Nobel prizes show the close nexus between computers and molecular biology. The original finding of miRNA complementarity could not have been made without a computerized sequence search.
  • When truth is a gaffe, and lies are routine.
  • Could crypto be any worse or more corrupting?

Saturday, September 28, 2024

Dangerous Memories

Some memory formation involves extracellular structures, DNA damage, and immune component activation / inflammation.

The physical nature of memories in the brain is under intensive scrutiny. The leading general theory is that of positive reinforcement, where neurons that are co-activated strengthen their connections, enhancing their ability to co-fire and thus to express the same pattern again in the future. The nature of these connections has been somewhat nebulous, assumed to just be the size and stability of their synaptic touch-points. But it turns out that there is a great deal more going on.

A recent paper started with a fishing expedition, looking at changes in gene expression in neurons at various time points after the mice were subjected to a fear learning regimen. They took this out to much longer time points (up to a month) than had been contemplated previously. At short times, a bunch of well-known signals and growth-oriented gene expression happened. At the longest time points, organization of a structure called the perineural net (PNN) was read out of the gene expression signals. This is a extracellular matrix sheath that appears to stabilize neuronal connections and play a role in long-term memory and learning. 

But the real shocker came at the intermediate time point of about four days. Here, there was overexpression of TLR9, which is an immune system detector of broken / bacterial DNA, and inducer in turn of inflammatory responses. This led the authors down a long rabbit hole of investigating what kind of DNA fragmentation is activating this signal, how common this is, how influential it is for learning, and what the downstream pathways are. Apparently, neuronal excitation, particularly over-excitation that might be experienced under intense fear conditions, isn't just stressful in a semiotic sense, but is highly stressful to the participating neurons. There are signs of mitochondrial over-activity and oxidative stress, which lead to DNA breakage in the nucleus, and even nuclear perforation. It is a shocking situation for cells that need to survive for the lifetime of the animal. Granted, these are not germ cells that prioritize genomic stability above all else, but getting your DNA broken just for the purpose of signaling a stress response that feeds into memory formation? That is weird.

Some neuronal cell bodies after fear learning. The red dye is against a marker of DNA repair proteins, which form tight dots around broken DNA. The blue is a general DNA stain, and the green is against a component of the nuclear envelope, showing here that nuclear envelopes have broken in many of these cells.

The researchers found that there are classic signs of DNA breakage, which are what is turning on the TLR9 protein, such as seeing concentrated double-strand DNA repair complexes. All this stress also turned on proteases called caspases, though not the cell suicide program that these caspases typically initiate. Many of the DNA break and repair complexes were, thanks to nuclear perforation, located diffusely at the centrosome, not in the nucleus. TLR9 turns on an inflammatory response via NFKB / RELA. This is clearly a huge event for these cells, not sending them into suicide, but all the alarms short of that are going off.

The interesting part was when the researchers asked whether, by deleting the TLR9 or related genes in the pathway, they could affect learning. Yes, indeed- the fear memory was dependent on the expression of this gene in neurons, and on this cell stress pathway, which appears to be the precondition of setting up the perineural net structures and overall stabilization. Additionally, the DNA damage still happened, but was not properly recognized and repaired in the absence of TLR9, creating an even more dangerous situation for the affected neurons- of genomic instability amidst unrepaired DNA.

When TRL9 is knocked out, DNA repair is cancelled. At bottom are wild-type cells, and at top are mouse neurons after fear learning that have had the gene TLR9 deleted. The red dye is against DNA repair proteins, as is the blue dye in the right-most frames. The top row is devoid of these repair activities.

This paper and its antecedent literature are making the case that memory formation (at least under these somewhat traumatic conditions- whether this is true for all kinds of memory formation remains to be seen) has commandeered ancient, diverse, and quite dangerous forms of cell stress response. It is no picnic in the park with madeleines. It is an all-hands-on-deck disaster scene that puts the cell into a permanently altered trajectory, and carries a variety of long-term risks, such as cancer formation from all the DNA breakage and end-joining repair, which is not very accurate. They mention in passing that some drugs have been recently developed against TLR9, which are being used to dampen inflammatory activities in the brain. But this new work indicates that such drugs are likely double-edged swords, that could impair both learning and the long-term health of treated neurons and brains.

Sunday, September 15, 2024

Road Rage Among the Polymerases

DNA polymerase is faster than RNA polymerase. RNA polymerase also leaves detritus in its wake. What happens when they collide?

DNA is a country road- one lane, two directions. Yet in our cells it can be extremely busy, with transcription (RNA synthesis) happening all the time, and innumerable proteins hanging on as signposts, chemical modifications, and even RNA hybridized into sections, creating separated DNA structures called R-loops. When it is time for DNA replication, what happens when all these things collide? One might think that biology had worked all this out by now, but these collisions can be quite dangerous, sending the RNA polymerase careering into the other (new) DNA strand, causing the DNA polymerase to stall or miss sections, and causing DNA breaks, which activate loud cellular alarm bells and mutations.

Despite decades of work, this area of biology is still not yet very well understood, since the conditions are difficult to reproduce and study. So I can only give a few hints of what is going from current work in the field. A couple of decades ago, a classic experiment showed that in bacteria, DNA polymerases can be stopped cold by a collision with an RNA polymerase going in the opposite direction. However, this stall is alleviated by a DNA helicase enzyme, which can pry apart the DNA strands and anything attached, and the DNA replication complex sails through, after a pause of a couple of seconds. The RNA polymerase, meanwhile, is not thrown off completely, but switches its template from the complementary strand it was using previously to the newly synthesized DNA strand just made by the passing DNA polymerase. This was an amazing result, since the elongating RNA polymerase is a rather tightly attached complex. But here, it jumps ship to the new DNA strand, even though the old DNA strand remains present, and will shortly be replicated by the lagging strand DNA polymerase complex.

General schematic of encounters between replication forks and RNA polymerases (pink, RNAP). Only co-directional, not head-on, collisions are shown here. Ribosomes (yellow) in bacteria operate directly on the nascent mRNA, and can helpfully nudge the RNA polymerase along. In this scheme, DNA damage happens after the nascent RNA is used as a primer by a new DNA polymerase (bottom), which will require special repair. 

The ability of the RNA polymerase to switch template strands, along with the nascent RNA it was making, suggests very intriguing flexibility in the system. Indeed, DNA polymerases that come up from behind the RNA polymerase (using the same strand as their template) have a much easier time of it, passing with hardly a pause, and only temporarily displacing the RNA polymerase. But things are different when the RNA polymerase has just found an error and has back-tracked to fix it. Then, the DNA polymerase complex is seriously impeded. It may even use the nascent RNA hanging off the polymerase and hybridized to the local DNA as a primer to continue synthesis, after it has bumped off the RNA polymerase that made it. This leads in turn to difficulties in repair and double strand breaks in that DNA, which is the worst kind of mutation. 

The presence of RNA in the mix, in the form of single strands of RNA hybridized to one of the DNA strands, (that is, R-loops), turns out to be a serious problem. These can arise either from nascent transcription, as above, or from hybridization of non-coding RNAs that are increasingly recognized as significant gene regulators. RNA forms a slightly stronger hybrid with DNA than DNA itself does, in fact. Such R-loops (displacing one DNA strand) are quite common over active genomes, and apparently present a block to replication complexes. One would think that such fork complexes would be supplied with the kinds of helicases that could easily plow through such structures, but that is not quite the case. R-loops cause replication complex stalling, and can invoke DNA damage responses, for reasons that are not entirely clear yet. 

A recent paper that piqued my interest in all this studied an ATPase motor protein that occurs at stalled replication forks and helps them restart, presumably by acting as a DNA or RNA pump of some kind, and forcing the replication complex through obstructions. It is named WRNIP1, for WRN interacting protein, for it also interacts with Werner syndrome protein, another interesting protein at the replication fork. This is another ATPase that is a helicase and also a backwards 3' -> 5' exonuclease that cleans up DNA ends around DNA repair sites, helping to remove mismatched and damaged DNA so the repair can be as accurate as possible. As one can guess, mutations in this gene cause Werner Syndrome, a striking progeria syndrome of early aging and susceptibility to cancer. 

While the details of R-loop toxicity and repair are still being worked out, it is fascinating that such conflicts still exist after several billion years to figure them out. It is apparent that the design of DNA, while exceedingly elegant, results in intrinsic conflicts between expression and replication that are resolved amicably most of the time. But when either process gets overly congested, or encounters unexpected roadblocks, then tempers can flare, and an enormous apparatus of DNA damage signaling and repair is called in, sirens blaring, to do what it can to cut through the mess.


  • Who really believes in climate change?
  • The very strong people of the GOP. 
  • The ancient Easter Islanders mixed with South Americans.

Sunday, August 11, 2024

Modeling Cell Division

Is molecular biology ready to use modeling to inform experimental work?

The cell cycle is a holy grail of biology. The first mutants that dissected some of its regulatory apparatus, the CDC mutants of Saccharomyces cerevisiae (yeast), electrified the field and led to a Nobel prize. These were temperature sensitive mutants, making only small changes to the protein sequence that rendered that protein inactive at high temperature (thus inducing a cell cycle arrest phenotype), while allowing wild-type growth at normal temperatures. In the fifty years since, a great deal of the circuitry has been worked out, with the result that it is now possible, as a recent paper describes, to make a detailed mathematical model of the process that claims to be useful in the sense of explaining existing findings in a unified model and making predictions of places to look for additional actors.

At the center of this regulatory scheme are transcription activators, SBF/MBF, that are partly controlled by, and in turn control the synthesis of, a series of cyclins. Cyclins are proteins that were observed (another Nobel prize) to have striking variations in abundance during the cell cycle. There are characteristic cyclins for each phase of the cell cycle, which goes from G1, a resting phase, to S, which is DNA replication, to G2, a second resting phase, and then M, which is mitosis, which brings us back to G1. Cyclins work by binding to a central protein kinase, Cdc28, which, as regulated by each distinct cyclin, phosphorylates and thus regulates distinct sets of target proteins. The key decision a cell has to make is whether to commit to DNA replication, i.e. S phase. No cell wants to run out of energy during this process, so its size and metabolic state needs to be carefully monitored. That is done by Cyclin 3 (Cln3), Whi5, and Bck2, which each influence whether the SBF/MBF regulators are active. 

Some highly simplified elements of the yeast cell cycle. Cyclins (Cln and Clb) are regulators of a central protein kinase, Cdc28, that direct it to regulate appropriate targets at each stage of the cell cycle. Cyclins themselves are regulated by transcriptional control (here, the activators SBF and MBF), and then destroyed at appropriate times by proteolysis, rendering them abundant only at specific times during the cell cycle. Focusing on the "START" process that starts the process from rest (G1 phase) to new bud formation and DNA replication (S phase), Cln3 and Bck2 respond to upstream nutritional and size cues, and each activate the SBF/MBF transcription activator.

As outlined in the figure above, Cyclin 3 is the G1 cyclin, which, in complex with Cdc28 phosphorylates Whi5, turning it off. Whi5 is an inhibitor that binds to SBF/MBF, so the Cyclin 3 activation turns these regulators on, and thus starts off the cell cycle under the proper conditions. Incidentally, the mammalian version of Whi5, Rb (for retinoblastoma), is a notorious oncogene, that, when mutated, releases cells from regulatory control over cell division. SBF and MBF bind to genes for the next series of cyclins, Cln1, Cln2, Clb5, Clb6. The first two are further G1 cyclins that orchestrate the end of G1. They induce phosphorylation and inactivation of Sic1 and Cdc6, which are inhibitors of Clb5 and Clb6. These latter two are then the initiators of S phase and DNA replication. Meanwhile, Cln3 stays around till M phase, but is then degraded in definitive fashion by the proteases that end M phase. Starvation conditions lead to rapid degradation of Cln3 at all times, and thus to no chance of starting a new cell cycle.

Charts of the abundance of some cyclins through the cell cycle. Each one has its time to shine, after which it is ubiquitinated and sent off to the recycling center / proteasome.

Bck2 is another activator of SBF/MBF that is unrelated to the Cln3/Whi5 system, but also integrates cell size and metabolic status information. Null mutants of Cln3 (or Bck2) are viable, if altered in cell cycle, while double null mutants of Cln3 and Bck2 are dead, indicating that these regulators are each important, in a complementary way, in cell cycle control. Given that little is known about Bck2, the modelers in this paper assume various properties and hope for the best down the line, predicting that cell size (at the key transition to S phase) is more affected in the Cln3 null mutant than in the Bck2 null mutant, since in the former, excess active Whi5 soaks up most of the available SBF/MBF, and requiring extra-high and active levels of Bck2 to overcome this barrier and activate the G1 cyclins and other genes.

The modelers are working from the accumulated, mostly genetic data, and in turn validate their models against the same genetic data, plus a few extra mutants they or others have made. The models are mathematical representations of how each node (i.e protein, or gene) in the system responds to the others, but since there are a multitude of unknowns, (such as what really regulates Bck2 from upstream, to cite just one example), the system is not really able to make predictions, but rather fine-tunes/reconciles what knowledge there is, and, at best, points to gaps in knowledge. It is a bit like AI, which magically recombines and regurgitates material from a vast corpus based on piece-wise cues, but is not going to find new data, other than through its notorious hallucinations.

For example, a new paper came out after this modeling, which finds that Cln3 affects Cln2 abundance by mechanisms quite apart from its SBF/MBF transcriptional control, and that it regulates cell size in large part at M phase, not through its G1/S gating. All this comes from new experimental work, unanticipated by the modeling. So, in the end, experimental work always trumps modeling, which is a bit different than how things are in, say, physics, where sometimes the modeling can be so strong that it predicts new particles, forces, and other phenomena, to be validated later experimentally. Biology may have its master predictive model in the theory of evolution, but genetics and molecular biology remain much more of an empirical slog through the resulting glorious mess.


  • Bitcoin isn't a currency, but rather just another asset class, one without any fundamental or socially positive value. A little like gold, actually, except without gold's resilience against social / technological disruption.
  • The disastrous post-Soviet economic transition, on our advice.
  • The enormous labor drain, and resource drain, from global South to North.

Saturday, July 27, 2024

Putting Body Parts in Their Places

How HOX genes run development, on butterfly wings.

I have written about the HOX complex of genes several times, because they constitute a grail of developmental genetics- genes that specify the identity of body parts. They occupy the middle of a body plan cascade of gene regulation, downstream from broader specifiers for anterior/posterior orientation, regional and segment specification, and in turn upstream of many more genes that specify the details of organ and tissue construction. Each of the HOX genes encodes a transcriptional regulator, and the name of one says it all- antennapedia. In fruit flies, where all this was first discovered, loss of antennapedia converts some legs into antennae, and extra expression of antennapedia converts antennae on the head into legs.

The HOX complex (named for the homeobox DNA binding motif of the proteins they encode) is linear, arranged from head-affecting genes (labial, proboscipedia) to abdomen-affecting genes (abdominal A, abdominal B; evidently the geneticist's flair for naming ran out by this point). This arrangement is almost universally conserved, and turns out to reflect molecular mechanisms operating on the complex. That is, it "opens" in a progressive manner during development, on the chromosome. Repression of chromatin is a very common and sturdy way to turn genes off, and tends to affect nearby genes, in a spreading effect. So it turns out to be easy, in some sense, to set up the HOX complex to have this chromatin repression lifted in a segmental fashion, by upstream regulators, whereby only the head sections are allowed to be expressed in head tissues, but all the genes are allowed to be expressed in the final abdominal segment. That is why the unexpected expression of antennapedia, which is the fifth of eight HOX genes, in the head, leads to a thoracic tissue (legs) forming on the head.

A recent paper delved a little more deeply into this story, using butterflies, which have a normal linearly conserved HOX cluster and are easy to diagnose for certain body part transformations (called homeotic) on their beautiful wings. The main thing these researchers were interested in is the genetic elements that separate one part of the HOX cluster from other parts. These are boundary or "insulator" elements that separate topologically associated domains (called TADs). Each HOX gene is surrounded by various regulatory enhancer and inhibitor sites in the DNA that are bound by regulatory proteins. And it is imperative that these sites be directed only to the intended gene, not neighboring genes. That is why such TADs exist, to isolate the regulation of genes from others nearby. There are now a variety of methods to map such TADs, by looking where chromatin (histones) are open or closed, or where DNA can be cut by enzymes in the native chromatin, or where crosslinks can be formed between DNA molecules, and others.

The question posed here was whether a boundary element, if deleted, would cause a homeotic transformation in the butterflies they were studying. They found, unfortunately, that it was impossible to generate whole animals with the deletions and other mutations they were engineering, so they settled for injecting the CRISPER mutational molecules into larval tissues and watching how they affected the adults in mosaic form, with some mutant tissues, some wild-type. The boundary they focused on was between antennapedia (Antp) and ultrabithorax (Ubx), and the tissues the forewings, where Ubx is normally off, and hindwings, where Ubx is normally on. Using methods to look at the open state of chromatin, they found that the Ubx gene is dramatically opened in hindwings, relative to forewings. Nevertheless, the boundary remains in place throughout, showing that there is a pretty strong isolation from Antp to Ubx, though they are next door and a couple hundred thousand basepairs apart. Which in genomic terms is not terribly far, while it leaves plenty of space for enhancers, promotes, introns, boundary elements, and other regulatory paraphernalia.

Analysis of the site-to-site chromosomal closeness and accessibility across the HOX locus of the butterfly Junonia coenia. The genetic loci are noted at the bottom, and the site-to-site hit rates are noted in the top panels, with blue for low rates of contact, and orange/red for high rates of contact. At top is the forewing, and at bottom is the hindwing, where Ubx is expressed, thus the high open-ness and intra-site contact within its topological domain (TAD). Yet the boundary between Ubx and Anp to its left (dotted lines at bottom) remains very strong in both tissues. In green is a measure of transcription from this DNA, in differential terms hindwing minus forewing, showing the strong repression of Ubx in the forewing, top panel.

The researchers naturally wanted to mutate the boundary element, (Antp-Ubx_BE), which they deduced lay at a set of binding sites (featuring CCCTC) for the protein CTCF, a well-known insulating boundary regulator. Note, interestingly, that in the image above, the last exon (blue) of Ubx (transcription goes right to left) lies across the boundary element, and in the topological domain of the Antp gene. This means that while all the regulatory apparatus of Ubx is located in its own domain, on the right side, it is OK for transcription to leak across- that has no regulatory implications. 

Effects of removing the boundary element between Ubx and Antp. Detailed description is in the text below. 

Removal of this boundary element, using CRISPER technology in portions of the larval tissues, had the expected partial effects on the larval, and later adult, wings of this butterfly. First, note that in panel D insets, the wild type larval forewing shows no expression of Ubx, (green), while the wild type hind wing shows wide-spread expression. This is the core role of the HOX locus and the Ubx gene- locate its expression in the correct body parts to then induce the correct tissues to develop. The larval wing tissue of the mosaic mutant, also in D, shows, in the forewing, extensive patchy expression of Ubx. This is then reflected in the adult (different animals) in the upper panels, in the mangled eyespot of the fully formed wing (center panel, compared to wild-type forewing and hindwing to each side). It is a small effect, but then these are small mutations, done in only a fraction of the larval cells, as well.

So here we are, getting into the nuts and bolts of how body parts are positioned and encoded. There are large regions around these genes devoted to regulatory affairs, including the management of chromatin repression, the insulation of one region from another, the enhancer and repressor sites that integrate myriad upstream signals (i.e. other DNA binding proteins) to come up with the detailed pattern of expression of these HOX genes. Which in turn control hundreds of other genes to execute the genetic program. This program can hardly be thought of as a blueprint, nor a "design" in anyone's eye, divine or otherwise. It resembles much more a vast pile of computer code that has accreted over time with occasional additions of subroutines, hacks, duplicated bits, and accidental losses, adding up to a method for making a body that is robust in some respects to the slings and arrows of fortune, but naturally not to mutations in its own code.


Saturday, June 15, 2024

The Quest for the Perfect Message, in E. coli

Translation efficiency has some weird rules, and a tortured history.

One would think we know everything there is to know about the workhorse of bacterial molecular biology, Escherichia coli. And that would be especially true for its technological applications, like the expression of engineered genes, which is at the very heart of molecular biology and much of biotechnology. Getting genes you put into E. coli expressed at high levels is critical for making drugs, and for making enough for structural and biochemical studies. For decades, the wisdom of the field was to design introduced genes using the codon adaptation index (CAI). This is a measurement of the three-letter codes (codons of the genetic code) that are used in highly expressed genes. They tend to correspond to tRNAs that are more abundant in the cell. So, for example, the amino acid leucine is encoded by six different codons, any of which can be chosen at intended leucine positions in the intended protein. In E. coli, CTG is over ten times more frequently used than CTA, however. Thus, even though they code for the same amino acid, one is more common, perhaps because its cognate tRNA is more common and more easily used during translation. This is basically a diffusion-based argument, that translation will be easier if the tRNA that carries the next amino acid is easier to find.

A recent paper provides a remarkable review of this field. For one thing, it turns out that use of the CAI has virtually no effect on translation efficiency. Whether using rare or common codons, translation is equally efficient for introduced genes. Needless to say, this is quite surprising. It seems as though the role of common vs uncommon tRNAs/codons is more to manage the health of the cell by relieving bottlenecks to translation in a global sense and managing the free pool of ribosomes, rather than regulating the efficiency of translation of any particular mRNA message. tRNAs are highly abundant generally, so there are significant savings possible by managing their levels judiciously, and reducing investment in some versus others.

So what does affect the efficiency of translation? Some messages are better translated than others, after all. The authors point to a completely different mechanism, which is the melting stability of the first ten codons of the mRNA message. RNA can form hairpin and other secondary structures / shapes, and this can apparently strongly affect the ability of ribosomes to find initiation sites. While eukaryotic ribosomes scan in from the 5 prime cap of the mRNA, bacterial ribosomes bind directly to a sequence slightly upstream of the initiating AUG codon. And this can be inhibited by mRNAs that are not neatly ironed out, but knotted up in hairpins and loops. 

Ratio of occurrence of nucleosides in the third codon position of the first ten codons of high versus low expressing genes in E. coli. This was not run on native E. coli genes, but on a large panel of transgenes engineered from outside. The strong bias towards A at this position in high expressing genes shows a preference for initiating sequences to have weak secondary structure, allowing better ribosome access.


Use of A-rich sequences around the ribosomal initiation sites and the first ten codons, then, dramatically increases the translation efficiency, (via the initiation efficiency) of introduced genes, and provide a much more robust method to control their expression. But then the authors make another observation, which is that the bacteria themselves do not seem to use this mechanism for their own genes. In a massive analysis of data from other labs, (below), there is actually a negative correlation between the quality of the initiation region (X- axis) and the abundance of the respective protein (Y- axis). Again, quite a surprising result, which the authors can only speculate about. 

There is negative correlation between the initiation codon quality (X- axis), as shown above, and the native E. coli gene expression level (Y- axis). So these cells are not optimizing their translation at all in accordance with the findings above.

The picture that they paint is that highly expressed genes in E. coli benefit from consistent, smooth translation. This depends less on maximal initiation speed than on the holistic picture of translation. The CAI optimal codons (called translationally optimal in this paper, or TO) tend to be poor at initiation, but have good codon-anticodon pairing and thus low A content. So there are conflicting pressures at work, in basic chemical terms, where different codons are intrinsically good for initiation, and complementary ones for elongation. The obvious solution is to use the initiation-optimal codons for the first ten codons, and translationally optimal codons the rest of the way. But that is not what is found either. The authors claim that, for native proteins, lower levels of initiation are actually beneficial for smoother protein production with less noise from time to time and cell to cell. 

Additionally, lower initiation rates preserve free ribosome levels globally, another important goal for the cell, via evolutionary selection. The authors find, for instance, a correlation between low variability of initiation (low noise) and low initiation rate. This is a bit mystifying, since ribosomes should always be present in excess, and it is not immediately apparent why holdups to translation initiation would lend themselves to more even initiation. Perhaps the search process by which ribosomes find free mRNAs is inefficient, so that those with slower initiation sequences have a constant backlog of incoming, bound and poised ribosomes, while after they get past the initiation region, those ribosomes progress rapidly and rejoin the free pool. That would be one way of setting up a smooth production process, suitable for essential protein products, that is relatively insensitive to the free ribosome concentration and other variations in the cell.

Technologists trying to express some drug-associated protein in bacteria don't care about smoothness and noise, but just want to maximize production while not killing the cell (or before killing the cell). So all these subtle considerations that go into the evolution of the native gene complement of E. coli and its high or low expression levels don't apply. But for researchers trying to predict the expression level of a given natural gene, it is maddening, since it seems currently impossible to predict the expression level (via translation) of a gene from its sequence. It is one more case where modeling of what is going on inside cells is surprisingly difficult, even for a system we had thought we understood, in one of the simplest and most well-studied bacteria. As researchers never tire of saying ... more research is needed.


Saturday, June 8, 2024

A Membrane Transistor

Voltage sensitive domains can make switches out of ion channels, antiporters, and other enzymes.

The heart of modern electronics is the transistor. It is a valve or switch, using a small electrical signal to control the flow of other electrical signals. We have learned that the simple logic this mechanism enables can be elaborated into hugely complex, even putatively intelligent, computers, databases, applications, and other paraphernalia of modernity. The same mechanism has a very long history in biology, quite apart from its use in neurons and brains, since membranes are typically charged, well-poised to be sensitive to changes in charge for all sorts of signaling.

The voltage sensitive domain (VSD) in proteins is an ancient (going back to archaea) bundle of four alpha helices that were first found attached to voltage-sensitive ion channels, including sodium, potassium, and calcium channels. But later it became fascinatingly apparent that it can control other protein activities as well. A recent paper discussed the mechanism and structure of a sodium/hydrogen antiporter with a role in sperm navigation, which uses a VSD to control its signaling. But there are also voltage-sensitive phosphatases, and other kinds of effectors hooked up to VSD domains. 

Schematic of a basic VSD, with helix 4 in pink, moving against the other three helices colored teal. Imagine a membrane going horizontally over these embedded proteins. When voltage across the local membrane changes, (hyperpolarized or de-polarized), helix 4 can plunge by one helical repeat unit in either direction, up or down.

One of the helixes (#4) in the VSD bundle has positive charges, while the others have specifically positioned negative charges. This creates a structure where changes in the ambient voltage across the membrane it sits in can cause helix #4 to plunge down by one or two steps (that is, turns of the alpha helix) versus its partners. This movement can then be propagated out along extensions of helix #4 to other domains of the protein in order to switch on or off their activities.

The helices of numerous proteins that have a VSD domain (in red) are drawn out, showing the diversity of how this domain is used.

While the studied protein, SLC9C1, is essential in mammalian sperm for motility, the paper studied its workings in sea urchin sperm, a common model system. The logic (as illustrated below) is that (female) chemoattractants bind to receptors on the sperm surface. These receptors generate cyclic GMP, which turns on potassium channels that increase the voltage across the membrane. This broadcasts the signal locally, and is received by the SLC9C1 transporter, which does two things. It activates a linked soluble adenylate cyclase enzyme, making the further signaling molecule cAMP. And it also activates the transporter itself, pumping protons out (in return 1:1 for sodium ions in) and causing cytoplasmic alkalinization. The cAMP activates sodium ion channels to cancel the high membrane voltage (a fast process), and the alkalinization activates calcium channels that direct the sperm directional swimming responses- the ultimate response. The latter is relatively slow, so the whole cascade has timing characteristics that allow the signal to be dampened, but the response to persist a bit longer as the sperm moves through a variable and stochastic gradient.

A schematic of the logic of this pathway, and of the SLC9C1 anti-porter. At top, the transport mechanism is crudely illustrated as a rocking motion that ensures that only one H+ is exchanged for one Na+ for each cycle of transport. The transport is driven thermodynamically by the higher concentration of Na+ outside.


But these researchers weren't interested in what the sperm were thinking, but rather how this widely used protein domain became hitched to this unusual protein and how it works there, turning on a sodium/hydrogen antiporter rather than the usual ion channel. They estimate that the #4 helix of the VSD moves by 10 angstroms, or 1 nm, upon voltage activation, which is a substantial movement, roughly equivalent to the width of these helices. In their final model, this movement significantly reshapes the intracellular domain of the transporter, which in turn releases its hold on the transporter's throat, allowing it to move cyclically as it needs to exchange hydrogen ions for sodium ions. This protein is known to bind and activate an adenylyl cyclase, which produces cAMP, which is one key next actor in the signaling cascade. This activation may be physically direct, or it may be through the local change in pH- that part is as yet unknown. cAMP also, incidentally, binds to and turns up the activity of this transporter, providing a bit of positive feedback.

Model of the SLC9C1 protein, with the VSD in teal and a predicted activation mechanism illustrated (only the third panel is activated/open). Upon voltage activation, the very long helix 4 dips down and changes orientation, dramatically opening the intracellular portion of the transporter (purple and orange portion). This in turn lets go of the bottom of the actual transporter portion of the protein (gray), allowing alkalinization of the cytoplasm to go forth. At the bottom sides, in brown, is the cAMP binding domain, which lowers the voltage threshold for activation.

There are a variety of interesting lessons from this work. One is that useful protein domains like VSD are often duplicated and propagated to unexpected places to regulate new processes. Another is that the new cryo-electron microscopy methods have made structural biology like this far easier and more common than it used to be, especially for membrane proteins, which are exceedingly difficult to crystalize. A third is that signaling systems in biology are shockingly complex. One would think that getting sperm cells to where they are going would take a bare minimum of complexity, yet we are studying a five or more part cascade involving two cyclic nucleotides, four ions, intricate proteins to manage them all, and who knows what else into the mix. It is difficult to account for all this, other than to say that when you have a few billion years to tinker with things, and have eons of desperate races to the egg for selective pressure, they tend to get more ornate. And a fourth is that it is regulatory switches all the way down.