Showing posts with label medicine. Show all posts
Showing posts with label medicine. Show all posts

Saturday, April 26, 2025

Covid Builds a Fortress Within

 Viral proteins build peculiar vesicles to hide the viral replication apparatus.

SARS-CoV is still with us, a brutal addition to the already extensive army of respiratory viruses infecting humanity. While most people clear it, we have a hard time doing so, a testament to a tough evolutionary arms race. A fair portion of our extremely complicated immune system is devoted to viruses, including basics like recognizing double-stranded RNA and viral replication structures. A trick that coronaviruses and allied species possess has gradually come to light, which is the formation of vesicular structures that appear to host their replication apparatus. 

Coronavirus-infected cells display a variety of vesicular structures, including "zippered" endoplasmic reticulum, convoluted membranes (CV), dense membrane spherules (DMS) and double-membrane vesicles (DMV). The endoplasmic reticulum (ER) is the cellular organelle where membrane proteins and secreted proteins are first made, before they are sorted out to various other membrane systems and the outside (and where the bulk of membrane lipid production happens, among much else). Coronaviruses appear to commandeer the ER and divert its membranes to the new structures. It is the DMV that turns out to have an important function- hosting viral replication. How do we know this? Researchers recently turned to a classic technique- radioactive labeling of new RNA production in infected cells, followed by electron microscopy combined with auto-radiography. The image below shows in stunning detail various organelles within an infected cell, and the black dots are film grains turned by the radioactive RNA to mark synthesis sites. They are quite closely aligned with the DMV structures.

Exquisite auto-radiograph and electron micrograph of a SARS-CoV-infected cell. The mitochondria (m) are most apparent, followed by the viral replication organelles (RO, aka DMV), followed by the endoplasmic reticulum (ER), lipid droplets (LD), nucleus (N), and virion-containing region (VCR). The black dots from photo-sensitive film exposed by radioactive RNA is clustered around the DMV structures.

This finding leads to several questions. How do these structures form? And, given the need for replication to both get inputs such as nucleotides and to export outputs like the virus's genomic RNAs, why use membranes that are impermeable to such molecules? Why use two membranes, when one suffices for most cellular organelles like the ER, lysosomes, peroxisomes, etc? This had puzzled the field for some time. Now, it turns out (in another recent paper) a couple of powerful viral proteins solve both questions at once. Coronavirus products nsp3 and nsp4 have long been known as important for viral success, but recent work puts them at the heart of DMV formation, into what is now called a replication organelle (RO), as well as a DMV. They are expressed in the ER and seem to play the leading role (along with several host proteins and lipids) in curving its membranes into the DMV shape. They also form dimeric pairs (nsp3 on one membrane, and nsp4 on a facing membrane) that seal two membranes together, as seen in the DMV structure. And thirdly, they, once fully assembled and mated, form a pore which keeps out pretty much everything big, but lets through single stranded RNA and small molecules.

Structure determination of the multimeric nsp3/4 pore structure from purified DMV vesicles, several views. Note the tight pore going through the center, and differential sizes of the inner and outer membrane rings. It is a protein complex that both bends the membrane and keeps only the most essential traffic going through it.

This structure is beautiful in a way. The central pore, at about 1.5 nm, is lined with positive charges like lysine and asparagine, the better to conduct negatively charged RNA. The inner membrane structure is tighter than that of the outer membrane, the better to curve those membranes into the observed spherical size. While it is a little hard to believe that such DMV vesicles, even studded with such a bespoke pore, can conduct the kind of traffic, both in and out, needed to sustain high rates of viral replication, that is quite evidently how it works. These researchers make a few mutations in the newly revealed key positively charged central pore amino acids to show that, if those charges are lost, replication of the virus was "abolished". This creates an obvious drug target as well- some chemical that plugs this pore or otherwise blocks the assembly of this ornate structure.

Additionally, the assembly of all this out of flat ER was also studied. The nsp3/4 proteins are originally connected end-to-end and do a delicate dance of pulling on each other (after cleavage) to dramatically curve the membrane between them, forming a tight loop from the (future) outside DMV membrane to the (future) inside one. On the other hand, another way they can assemble (right side in diagram below) is from separated (ER) membranes, leading to the "zippered" ER conformation that is also seen in infected cells. Whether the latter can be transformed into the former remains a question. 

Models for assembly of the linked nsp3-nsp4 proteins into the curved membranes of the DMV pore, with super-curvature at the pore junction between outside and inside membranes. TM stands for transmembrane domain, NTD for N-terminal domain (front), CTD for C-terminal domain (rear), and Ecto for the ecto-domains of each protein that are not within the membrane.


It is naturally implicit in this work that, if the pores of nsp3/4 allow through the absolute essentials of viral replication, they also block the various cellular sensors of viral presence, such as the RIG and TLR proteins, thus delaying the host response. Perhaps the RNAs allowed out are modified prior to exit to make them look more host-like. All those assumptions have yet to be nailed down explicitly. At any rate, viral assembly takes place elsewhere, so it is not entirely clear yet what exactly is being hidden here.

There were some technical innovations along the way to these results. These researchers tagged the nsp proteins in a way that allowed them to easily purify DMV vesicles out of whole cells, speeding their cry-electron microscopy work of getting these structures. Did they just use the Alpha fold program and do all this the easy way? Not at all. They did use Alpha fold to refine some of the structures, to extract more atomic detail. But they notably did not trust the AI to cook this kind of finding up from scratch. Some things still need to be done empirically, if you really want the truth.


Sunday, April 13, 2025

The Genome Remains Murky

A brilliant case study identifying the molecular cause of certain neuro-developmental disorders shows how difficult genome-based diagnoses remain.

Molecular medicine is increasingly effective in assessing both hereditary syndromes and cancers. The sequencing approach generally comes in two flavors- whole genome sequencing, or exome sequencing, where only the most important (protein-coding) parts are sampled. In each case, the hunt is for mutations (more blandly called variants) that cause the syndrome being investigated, from among the large number of variants we all carry. This approach is becoming standard-of-care in oncology, due to tremendous influence and clinical significance of cancer-driving mutations, many of which now match directly to tailored treatments that address them (thus the "precision" in precision medicine).

But another arm of precision medicine is the hunt for causes of congenital problems. There are innumerable genetic disorders whose causal analysis can lead not only to an informative diagnosis, and sometimes to useful treatments, but also to fundamental understanding of human biology. Sufferers of these syndromes may spend a lifetime searching for a diagnosis, being shuffled from one doctor or center to another and subject to various forms of hypothetical medicine, before some deep sequencing pinpoints the cause of their disease and founds a new diagnostic category that provides, if not relief, at least understanding and a medical home. 

A recent paper from Britain provided a classic of this form, investigating the causes of neurodevelopmental (NDD) disorders, which encompass a huge range of problems from mild to severe. They comment that even after the most modern analysis and intensive sequencing, 60% of NDD cases still can not be assigned causes. A large part of the problem is that, despite knowing the full sequence of the human genome, its function is less well-understood. The protein-coding genes (20,000 of those, roughly) are delineated and studied pretty closely. But that only accounts for 1 to 2% of the genome. The rest ranges from genes for a blizzard of non-coding RNAs, some of which are critical, to large regulatory regions with smatterings of important sites, to junk of various kinds- pseudogenes, relic retroviruses, repetitive elements, etc. The importance of any of these elements (and individual DNA base positions within them) varies tremendously. This means specifically that exome sequencing is not going to cut it. Exome sequencing focuses on a very small part of the genome, which is fine if your syndrome (such as a common cancer) is well characterized and known to arise from the usual suspects. But for orphan syndromes, it does not cast a wide enough net. Secondly, even with full genome sequencing, so little is known about the remoter regions of the genome that assigning a function to variations found there is difficult to impossible. It takes statistical analysis of incidence of the variation vs the incidence of the syndrome.

These authors used a trove of data- the Genomics England 100,000 genomes project, focusing on the ~9,000 genomes in this collection from people with NDD syndromes. (Plus additional genomes collected elsewhere.) (We can note in passing that Britain's nationalized health system remains at the forefront of innovative research and care.) What they found was an unusually high incidence of a particular mutation in a non-protein-coding gene called RNU4-2. The product of this gene is an RNA called U4, which is an important part of the spliceosome, where it pairs RNA-to-RNA with another RNA, U6, in a key step of selecting the first (5-prime) side of an intron that is to be spliced out of mRNA messages. This gene would never have come up in exome analysis, being non-protein-coding. Yet it is critically important, as splicing happens to the vast majority of human genes. Additionally, differential splicing- the selection of alternative exons and splice sites in a regulated way- happens frequently in developmental programs and neurological cell types. There is a class of syndromes called spliceosomopathies that are caused by defects in mRNA splicing, and tend to appear as syndromes in these processes.

As shown in the images (all based on a large corpus of other work on spliceosomes), RNU4-2/U4 pairs intimately with the U6 spliceosomal RNA, and the mutation found by the current group (which is a single nucleotide insertion) causes a bulge in this pairing, as marked. Meanwhile, the U6 RNA pairs at the same time with the exon-intron junction of the target mRNA (bottom image), at a site that is very close to the U4 pairing region (top image). The upshot is that this single base insertion into U4 causes some portion of the target mRNAs to be mis-spliced, using non-natural 5 prime splice sites and thus altering their encoded proteins. This may cause minor problems in the protein, but more often will cause a shift in translation frame, a premature stop codon, and total loss of the functional protein. So this tiny mutation can have severe effects and is indeed genetically dominant- that is, one copy overrides a second wild-type copy to generate the NDD diseases that were studied.

The U4 RNA (teal) paired with the U6 RNA (gray), within an early spliceosome complex. The mutation studied here is pointed out in black (n.64_65insT - i.e. insertion of a T). Note how it would cause a bulge in the pairing. Importantly, the location in the U6 RNA that pairs with the mRNA (see below) is right next door, at the ACAGAGA (light gray). The authors use this structural work from others to suggest how the mutation they found can alter selected splicing sites and thus lead to disease. Other single nucleotide insertions that cause similar syndromes are marked with black arrows, while single nucleotide substitutions that cause less severe syndromes are marked with orange RNA segments.

The U6 RNA (pink) paired with its mRNA target to be spliced. It binds right at the intron (gray) exon (black) boundary, where the cut will eventually be made the remove the intron. The bump from the mis-paired mutant U4 RNA (see above) distorts this binding, sending U6 to select wrong locations for spicing.


The researchers went on to survey this and other spliceosomal RNA genes for similar mutations, and found few to none outside the region marked in the diagram above. For example, there is a highly similar gene called RNU4-1. But this gene is expressed about 100-fold less in brain and other tissues, making RNU4-2 the principal source of U4 RNA, and much more significant as a causal factor for NDD. It appears that other locations in RNU4-2 (and other spliceosomal RNA genes) are even more important than the one mutated location found here, thus are never found, being lethal and heavily selected against, in this highly conserved gene. 

They also noted that, while this RNU4-2 mutation is severe, and thus must happen spontaneously (i.e. not inherited from parents), it only occurrs on the maternal alleles, not paternal alleles in the affected children. They speculate that this may be due to effects this gene may have in male gametogenesis, killing affected sperm preferentially, but not affected oocytes. Lastly, this set of mutations (in the small region shown in the first figure above) appears to account for, in their estimation, about 0.4 % of all NDD seen in Britain. This is a remarkably high rate for such a particular mutation that is not heritable. They speculate that some mutation hotspot kind of process may be causing these events, above the general mutation rate. What this all says about so-called "intelligent design", one may be reluctant to explore too deeply. On the other hand, this still leaves plenty of room to hunt for additional variations that cause these syndromes.

In this research, we see that clinically critical variations can pop up in many places, not just among the "usual suspects", genetically and genomically speaking. While much of the human genome is junk, most of it is also expressed (as RNA) and all of it is fair game for clinically important (if tragic) effects. The NDD syndromes caused by the mutation studied here are very severe- for more so than the ADD or mild autism diagnoses that make up most of the NDD spectrum. Understanding the causal nexus between the genome and human biology and its pathologies, remains an ongoing and complicated scientific adventure.


  • Playing the heel. Being the heel
  • It sure is great to be the victim.
  • Oh, right.. now we really know what is going on.
  • More spiritual warfare.
  • Another grift.

Saturday, March 29, 2025

What Causes Cancer? What is Cancer?

There is some frustration in the literature.

Fifty years into the war on cancer, what have we learned and gained? We do not have a general cure, though we have a few cures and a lot of treatments. We have a lot of understanding, but no comprehensive theory or guide to practice. While some treatments are pin-point specific to certain proteins and even certain mutated forms of those proteins, most treatments remain empirical, even crude, and few provide more than a temporary respite. Cancer remains an enormous challenge, clinically and intellectually.

Recently, a prominent journal ran a provocative commentary about the origins of cancer, trashing the reigning model of "Somatic Mutation Theory", or SMT. Which is the proposition that cancer is caused by mutations that "drive" cell proliferation, and thus tumor growth. I was surprised at the cavalier insinuations being thrown around by these authors, their level of trash talk, and the lack of either compelling evidence or coherent alternative model. Some of their critiques have a fair basis, as discussed below, but to say, as the title does, that this is "The End of the Genetic Paradigm of Cancer" is simply wrong.

"It is said that the wise only believe in what they can see, and the fools only see what they can believe in. The latter attitude cements paradigms, and paradigms are amplified by any new-looking glass that puts one’s way of seeing the world on steroids. In cancer research, such a self-fulfilling prophecy has been fueled by next-generation DNA sequencing."

"However, in the quest for predictive biomarkers and molecular targets, the cancer research community has abandoned deep thinking for deep sequencing, interpreting data through the lens of clinical translation detached from fundamental biology."

Whew!

The main critique, once the gratuitous insults and obligatory references to Kuhn and Feynman are cleared away, is that cancer does not resemble other truly clonal disease / population processes, like viral or bacterial infections. In such processes, (which have become widely familiar after the COVID and HIV pandemics), a founder genotype can be identified, and its descendants clearly derive from that founder, while accumulating additional mutations that may respond to the Darwinian pressures, such as the immune system and other host defenses. While many cancers are clearly driven by some founding mutation, when treatments against that particular "driver" protein are given, resistance emerges, indicating that the cancer is a more diverse population with a very active mutation and adaptation process. 

Additionally, tumors are not just clones fo the driving cell, but have complex structure and genetic variety. Part of this is due to the mutator phenotypes that arise during carcinogenesis, that blow up the genome and cause large numbers of additional mutations- many deleterious, but some carrying advantages. More significantly, tumors arise from and continue to exist in the context of organs and tissues. They can not just grow wildly as though they were on a petri plate, but must generate, for example, vascular structures and a "microenvironment" including other cells that facilitate their life. Similarly, metastasis is highly context-dependent and selective- only very few of the cells released by a tumor land in a place they find conducive to new growth. This indicates, again, that the organ setting of cancer cells is critically important, and accounts in large part for this overall difference between cancers and more straightforward clonal processes. 

Schematic of cancer development, from a much more conventional and thorough review of the field.

Cancer cells need to work with the developmental paradigms of the organism. For instance, the notorious "EMT", or epithelial-mesenchymal transition is a hallmark of de-differentiation of many cancer cells. They frequently regress in developmental terms to recover some of the proliferative and self-repair potential of stem cells. What developmental program is available or allowed in a particular tissue will vary tremendously. Thus cancer is not caused by each and every oncogenic mutation, and each organ has particular and distinct mutations that tend to cause cancers within it. Indeed, some organs hardly foster any cancers at all, while other organs with more active (and perhaps evolutionarily recent) patterns of proliferation (such as breast tissue, or prostate tissue) show high rates of cancer. Given the organ setting, cancer "driver" mutations need not only unleash the cell's own proliferation, but re-engineer its relations with other cells to remove their inhibition of its over-growth, and pursuade them to provide the environment it needs- nutritionally, by direct contact, by growth factors, vascular formation, immune interactions, etc., in a sort of para-organ formation process. It is a complicated job, and one mutation is, empirically, rarely enough.

"Instead, cancer can be broadly understood as “development gone awry”. Within this perspective, the tissue organization field theory is based on two principles that unite phylogenesis and ontogenesis."

"The organicist perspective is based on the interdependency of the organism and its organs. It recognizes a circular causal regimen by closure of constraints that makes parts interdependent, wherein these constraints are not only molecules, but also biophysical force."

As an argument or alternative theory, this leaves quite a bit to be desired, and does not obviate the role of  initiating mutations in the process.

It remains, however, that oncogenic mutations cause cancer, and treatments that address those root causes have time and again shown themselves to be effective cancer treatments, if tragically incomplete. The rise of shockingly effective immunotherapies for cancer have shown, however, that the immune system takes a more holistic approach to attacking disease than such "precision" single-target therapies, and can make up for the vagaries of the tissue environment and the inflammatory, developmental, and mutational derangements that happen later in cancer development. 

In one egregious citation, the authors hail an observation that certain cancers need both a mutation and a chemical treatment to get started, and that the order of these events is not set in stone. Traditionally, the mutation is induced first, and then the chemical treatment, which causes inflammation, comes second. They state: 

"The qualitative dichotomy between a mutagenic initiator that creates ’cancer cells’ and the non-genetic, tissue-perturbing promoter that expands them may not be as clear-cut. Indeed, the reverse experiment (first treatment with the promoter followed by the initiator) equally produces tumors. This result refutes the classical model that requires that the mutagenic (alleged) initiator must act first."

The citation is to a paper entitled "The reverse experiment in two-stage skin carcinogenesis. It cannot be genuinely performed, but when approximated, it is not innocuous". This paper dates from 1993, long before sequencing was capable of evaluating the mutation profiles of cancer cells. Additionally, the authors of this paper themselves point out (in the quote below) a significant assymetry in the treatments. Their results are not "equal":

"The two substances showed a reciprocal enhancing effect, which was sometimes weak, sometimes additive, and sometimes even synergistic, and was statistically most significant when the results were assessed from the time of DMBA application. Although the reverse experiment was not in any way innocuous it always resulted in a lower tumor crop than the classical sequence of DMBA followed by a course of TPA treatment. 

However, the lower tumor crop in the reverse experiment cannot be used to prove a qualitative difference between initiators and promoters."

(DMBA is the mutagen, while TPA is the inflammatory accelerant.)

So chemical treatment can prepare the ground for subsequent mutant generation in forming cancers in this system, while being much less efficient than the traditional order of events. This is not a surprise, given that this chemical (TPA) treatment causes relatively long-term inflammation and cell proliferation on its own.

"An epistemic shift towards a biological theory of cancer may still be an uphill battle in the current climate of thought created by the ease of data collection and a culture of research that discourages ’disruptive science’. Here, we have made an argument for dropping the SMT and its epicycles. We presented new and old but sidelined theoretical alternatives to the SMT that embrace theory and organismal biology and can guide experiments and data interpretation. We expect that the diminishing returns from the ceaselessly growing databases of somatic mutations, the equivalent to Darwin’s gravel pit, may soon reach a pivot point."

One rarely reads such grandiloquent summaries (or mixed metaphors) in scientific papers! But here they are truly beating up on straw men. In the end, it is true that cancer is quite unlike clonal infectious diseases, and for this, as for many other reasons, has had scientists scratching their heads for decades, if not centuries. But rest assured that this chest-thumping condescension is quite unnecessary, since those in the field are quite aware of these difficulties. The various nebulous alternatives these authors offer, whether the "epigenetic landscape", the "tissue organization field theory", or the "biological theory of cancer" all have kernels of logic, but the SMT remains the foundation-stone of cancer study and treatment, while being, for all the reasons enumerated above and by these authors, only part of the edifice, not the whole truth.


Saturday, February 8, 2025

Sugar is the Enemy

Diabetes, cardiovascular health, and blood glucose monitoring.

Christmas brought a book titled "Outlive: The Science and Art of Longevity". Great, I thought- something light and quick, in the mode Gweneth Paltrow or Deepak Chopra. I have never been into self-help or health fad and diet books. Much to my surprise, however, it turned out to be a rather rigorous program of preventative medicine, with a side of critical commentary on our current medical system. A system that puts various thresholds, such as blood sugar and blood pressure, at levels that represent serious disease, and cares little about what led up to them. Among the many recommendations and areas of focus, blood glucose levels stand out, both for their pervasive impact on health and aging, and also because there are new technologies and science that can bring its dangers out of the shadows.

Reading: 

Where do cardiovascular problems, the biggest source of mortality, come from? Largely from metabolic problems in the control of blood sugar. Diabetics know that uncontrolled blood sugar is lethal, on both the acute and long-terms. But the rest of us need to realize that the damage done by swings in blood sugar are more insidious and pervasive than commonly appreciated. Both microvascular (what is commonly associated with diabetes, in the form of problems with the small vessels of the kidney, legs, and eyes) and macrovascular (atherosclerosis) are due to high and variable blood sugar. The molecular biology of this was impressively unified in 2005 in the paper above, which argues that excess glucose clogs the mitochondrial respiration mechanisms. Their membrane voltage maxes out, reactive forms of oxygen accumulate, and glucose intermediates pile up in the cell. This leads to at least four different and very damaging consequences for the cell, including glucose modification (glycation) of miscellaneous proteins, a reduction of redox damage repair capacity, inflammation, and increased fatty acid export from adipocytes to endothelial (blood vessel) cells. Not good!

Continuous glucose monitored concentrations from three representative subjects, over one day. These exemplify the low, moderate, and severe variability classes, as defined by the Stanford group. Line segments are individually classed as to whether they fall into those same categories. There were 57 subject in the study, of all ages, none with an existing diagnosis of diabetes. Yet five of them had diabetes by traditional criteria, and fourteen had pre-diabetes by those criteria. By this scheme, 25 had severe variability as their "glucotype", 25 had moderate variability, and only 7 had low variability. As these were otherwise random subjects selected to not have diabetes, this is not great news about our general public health, or the health system.

Additionally, a revolution has occurred in blood glucose monitoring, where anyone can now buy a relatively simple device (called a CGM) that gives continuous blood glucose monitoring to a cell phone, and associated analytical software. This means that the fasting blood glucose level that is the traditional test is obsolete. The recent paper from Stanford (and the literature it cites) suggests, indeed, that it is variability in blood glucose that is damaging to our tissues, more so than sustained high levels.

One might ask why, if blood glucose is such a damaging and important mechanism of aging, hasn't evolution developed tighter control over it. Other ions and metabolites are kept under much tighter ranges. Sodium ranges between 135 to 145 mM, and calcium from 8.8 to 10.7 mM. Well, glucose is our food, and our need for glucose internally is highly variable. Our livers are tiny brains that try very hard to predict what we need, based on our circadian rhythms, our stress levels, our activity both current and expected. It is a difficult job, especially now that stress rarely means physical activity, and nor does travel, in our automobiles. But mainly, this is a problem of old age, so evolution cares little about it. Getting a bigger spurt of energy for a stressful event when we, in our youth, are in crisis may, in the larger scheme of things, outweigh the slow decay of the cardiovascular system in old age. Not to mention that traditional diets were not very generous at all, certainly not in sugar and refined carbohydrates.


Saturday, October 26, 2024

A Hunt for Causes of Atherosclerosis

Using the most advanced tools of molecular biology to sift through the sands of the genome for a little gold.

Blood vessels have a hard life. Every time you put on shoes, the vessels in your feet get smashed and smooshed, for hours on end. And do they complain? Generally, not much. They bounce back and make do with the room you give them. All through the body, vessels are subject to the pumping of the heart, and variations in blood volume brought on by our salt balance. They have to move when we do, and deal with it whenever we sit or lie on them. Curiously, it is the veins in our legs and calves, that are least likely to be crushed in daily life, that accumulate valve problems and go varicose. Atherosclerosis is another, much more serious problem in larger vessels, also brought on by age and injury, where injury and inflammation of the lining endothelial cells can lead to thickening, lipid/cholesterol accumulation, necrosis, calcification, and then flow restriction and fragmentation risk. 

Cross-section of a sclerotic blood vessel. LP stands for lipid pool, while the box shows necrotic and calcified bits of tissue.

The best-known risk factors for atherosclerosis are lipid-related, such as lack of liver re-capture of blood lipids, or lack of uptake around the body, keeping cholesterol and other lipid levels high in the blood. But genetic studies have found hundreds of areas of the genome with risk-conferring (or risk-reducing) variants, most of which are not related to lipid management. These genome-wide association studies (or GWAS) look for correlations between genetic markers and disease in large populations. So they pick up a lot of low-impact genetic variations that are difficult to study, due to their large number and low impact, which can often imply peripheral / indirect function. High-impact variations (mutations) tend to not survive in the population very long, but when found tend to be far more directly involved and informative.

A recent paper harnessed a variety of modern tools and methods to extract more from the poor information provided by GWAS. They come up with a fascinating tradeoff / link between atherosclerosis and cerebral cavernous malformation (CCM), which is distinct blood vessel syndrome that can also lead to rupture and death. The authors set up a program of analysis that was prodigious, and only possible with the latest tools. 

The first step was to select a cell line that could model the endothelial cells at issue. Then they loaded these cells with custom expression-reducing RNA regulators against each one of the ~1600 genes found in the neighborhood of the mutations uncovered by the GWAS analyses above, plus 600 control genes. Then they sequenced all the RNA messages from these single cells, each of which had received one of these "knock-down" RNA regulators. This involved a couple hundred thousand cells and billions of sequencing reads- no simple task! The point was to gather comprehensive data on what other genes were being affected by the genetic lesion found in the GWAS population, and then to (algorithmically) assemble them into coherent functional groups and pathways which could both identify which genes were actually being affected by the original mutations, and also connect them to the problems resulting in atherosclerosis.

Not to be outdone, they went on to harness the AlphaFold program to hunt for interactions among the proteins participating in some of the pathways they resolved through this vast pipeline, to confirm that the connections they found make sense.

They came up with about fifty different regulated molecular programs (or pathways), of which thirteen were endothelial cell specific. Things like angiogenesis, wound healing, flow response, cell migration, and osmoregulation came up, and are naturally of great relevance. Five of these latter programs were particularly strongly connected to coronary artery disease risk, and mostly concerned endothelial-specific programs of cell adhesion. Which makes sense, as the lack of strong adhesion contributes to injury and invasion by macrophages and other detritus from the blood, and adhesion among the endothelial cells plays a central role in their ability / desire to recover from injury, adjust to outside circumstances, reshape the vessel they are in, etc.

Genes near GWAS variations and found as regulators of other endothelial-related genes are mapped into a known pathway (a) of molecular signaling. The color code of changed expression refers to the effect that the marked gene had on other genes within the five most heavily disease-linked programs/pathways. The numbers refer to those programs, (8=angiogenesis and osmoregulation, 48=cell adhesion, 35=focal adhesion, related to cell adhesion, 39=basement membrane, related to cell polarity and adhesion, 47=angiogenesis, or growth of blood vessels). At bottom (c) is a layout of 41 regulated genes within the five disease-related programs, and how they are regulated by knockdown of the indicated genes on the X axis. Lastly, in d, some of these target genes have known effects on atherosclerosis or vascular barrier syndromes when mutated. And this appears to generally correlate with the regulatory effects of the highlighted pathway genes.

"Two regulators of this (CCM) pathway, CCM2 and TLNRD1, are each linked to a CAD (coronary artery disease) risk variant, regulate other CAD risk genes and affect atheroprotective processes in endothelial cells. ... Specifically, we show that knockdown of TLNRD1 or CCM2 mimics the effects of atheroprotective laminar blood flow, and that the poorly characterized gene TLNRD1 is a newly identified regulator in the CCM pathway."

On the other hand, excessive adhesiveness and angiogenesis can be a problem as well, as revealed by the reverse correlation they found with CCM syndrome. The interesting thing was that the gene CCM2 came up as one of strongest regulators of the five core programs associated with atherosclerosis risk mutations. As can be guessed from its name, it can harbor mutations that lead to CCM. CCM is a relatively rare syndrome (at least compared with coronary artery disease) of localized patches of malformed vessels in the brain, which are prone to rupture, which can be lethal. CCM2 is part of a protein complex, with KRIT1 and PDCD10, and part of a known pathway from fluid flow sensing receptors to transcription regulators (TFs) that turn on genes relevant to the endothelial cells. As shown in the diagram above, this pathway is full of genes that came up in this pathway analysis, from the atherosclerosis GWAS mutations. Note that there is a repression effect in the diagram above (a) between the CCM complex and the MAP kinase cascade that sends signals downstream, accounting for the color reversal at this stage of the diagram.

Not only did they find that this known set of three CCM gene are implicated in the atherosclerosis mutation results, but one of the genes they dug up through their pipeline, TLNRD1, turned out to be a fourth, hitherto unknown, member of the CCM complex, shown via the AlphaFold program to dock very neatly with the others. It is loss of function mutations of genes encoding this complex, which inhibits the expression of endothelial cell pro-cell adhesion and pro-angiogenesis sets of genes, that cause CCM, unleashing these angiogenesis genes to do too much. 

The logic of this pathway overall is that proper fluid flow at the cell surface, as expected in well-formed blood vessels, activates the pathway to the CCM complex, which then represses programs of new or corrective angiogenesis and cell adhesion- the tissue is OK as it is. Conversely, when turbulent flow is sensed, the CCM complex is turned down, and its target genes are turned up, activating repair, revision, and angiogenesis pathways that can presumably adjust the vessel shape to reduce turbulence, or simply strengthen it.

Under this model, malformations may occur during brain development when/where turbulent flow occurs, reducing CCM activation, which is abetted by mutations that help the CCM complex to fall apart, resulting (rarely) in run-away angiogenesis. The common variants dealt with in this paper, that decrease risk of cardiovascular disease / atherosclerosis, appear to have similar, but much weaker effects, promoting angiogenesis, including recovery from injury and adhesion between endothelial cells. In this way, they keep the endothelium tighter and more resistant to injury, invasion by macrophages, and all the downstream sequelae that result in atherosclerosis. Thus strong reduction of CCM gene function is dangerous in CCM syndrome, but more modest reductions are protective in atherosclerosis, setting up a sensitive evolutionary tradeoff that we are clearly still on the knife's edge of. I won't get into the nature of the causal mutations themselves, but they are likely to be diffuse and regulatory in the latter case.

Image of the CCM complex, which regulates response to blood flow, and whose mutations are relevant both to CCM and to atherosclerosis. The structures of TLNRD1 and the docking complex are provided by AlphaFold. 


This method is particularly powerful by being unbiased in its downstream gene and pattern finding, because it samples every expressed gene in the cell and automatically creates related pathways from this expression data, given the perturbations (knockdown of expression) of single target genes. It does not depend on using existing curated pathways and literature that would make it difficult to find new components of pathways. (Though in this case the "programs" it found align pretty closely with known pathways.) On the other hand, while these authors claim that this method is widely applicable, it is extremely arduous and costly, as evidenced by the contribution of 27 authors at top-flight institutions, an unusually large number in this field. So, for diseases and GWAS data sets that are highly significant, with plenty of funding, this may be a viable method of deeper analysis. Otherwise, it is beyond the means of a regular lab.

  • A backgrounder on sedition, treason, and insurrection.
  • And why it matters.
  • Jan 6 was an attempted putsch.
  • Trumpies for Putin.
  • Solar is a no-brainer.
  • NDAs are blatantly illegal and immoral. One would think we would value truth over lies.

Saturday, September 28, 2024

Dangerous Memories

Some memory formation involves extracellular structures, DNA damage, and immune component activation / inflammation.

The physical nature of memories in the brain is under intensive scrutiny. The leading general theory is that of positive reinforcement, where neurons that are co-activated strengthen their connections, enhancing their ability to co-fire and thus to express the same pattern again in the future. The nature of these connections has been somewhat nebulous, assumed to just be the size and stability of their synaptic touch-points. But it turns out that there is a great deal more going on.

A recent paper started with a fishing expedition, looking at changes in gene expression in neurons at various time points after the mice were subjected to a fear learning regimen. They took this out to much longer time points (up to a month) than had been contemplated previously. At short times, a bunch of well-known signals and growth-oriented gene expression happened. At the longest time points, organization of a structure called the perineural net (PNN) was read out of the gene expression signals. This is a extracellular matrix sheath that appears to stabilize neuronal connections and play a role in long-term memory and learning. 

But the real shocker came at the intermediate time point of about four days. Here, there was overexpression of TLR9, which is an immune system detector of broken / bacterial DNA, and inducer in turn of inflammatory responses. This led the authors down a long rabbit hole of investigating what kind of DNA fragmentation is activating this signal, how common this is, how influential it is for learning, and what the downstream pathways are. Apparently, neuronal excitation, particularly over-excitation that might be experienced under intense fear conditions, isn't just stressful in a semiotic sense, but is highly stressful to the participating neurons. There are signs of mitochondrial over-activity and oxidative stress, which lead to DNA breakage in the nucleus, and even nuclear perforation. It is a shocking situation for cells that need to survive for the lifetime of the animal. Granted, these are not germ cells that prioritize genomic stability above all else, but getting your DNA broken just for the purpose of signaling a stress response that feeds into memory formation? That is weird.

Some neuronal cell bodies after fear learning. The red dye is against a marker of DNA repair proteins, which form tight dots around broken DNA. The blue is a general DNA stain, and the green is against a component of the nuclear envelope, showing here that nuclear envelopes have broken in many of these cells.

The researchers found that there are classic signs of DNA breakage, which are what is turning on the TLR9 protein, such as seeing concentrated double-strand DNA repair complexes. All this stress also turned on proteases called caspases, though not the cell suicide program that these caspases typically initiate. Many of the DNA break and repair complexes were, thanks to nuclear perforation, located diffusely at the centrosome, not in the nucleus. TLR9 turns on an inflammatory response via NFKB / RELA. This is clearly a huge event for these cells, not sending them into suicide, but all the alarms short of that are going off.

The interesting part was when the researchers asked whether, by deleting the TLR9 or related genes in the pathway, they could affect learning. Yes, indeed- the fear memory was dependent on the expression of this gene in neurons, and on this cell stress pathway, which appears to be the precondition of setting up the perineural net structures and overall stabilization. Additionally, the DNA damage still happened, but was not properly recognized and repaired in the absence of TLR9, creating an even more dangerous situation for the affected neurons- of genomic instability amidst unrepaired DNA.

When TRL9 is knocked out, DNA repair is cancelled. At bottom are wild-type cells, and at top are mouse neurons after fear learning that have had the gene TLR9 deleted. The red dye is against DNA repair proteins, as is the blue dye in the right-most frames. The top row is devoid of these repair activities.

This paper and its antecedent literature are making the case that memory formation (at least under these somewhat traumatic conditions- whether this is true for all kinds of memory formation remains to be seen) has commandeered ancient, diverse, and quite dangerous forms of cell stress response. It is no picnic in the park with madeleines. It is an all-hands-on-deck disaster scene that puts the cell into a permanently altered trajectory, and carries a variety of long-term risks, such as cancer formation from all the DNA breakage and end-joining repair, which is not very accurate. They mention in passing that some drugs have been recently developed against TLR9, which are being used to dampen inflammatory activities in the brain. But this new work indicates that such drugs are likely double-edged swords, that could impair both learning and the long-term health of treated neurons and brains.

Saturday, August 31, 2024

Wherever Did the Pandemic Go?

Covid has attenuated. But is that from its own evolution, or from our immune reactions to it?

Looking at recent gatherings such as the political conventions and the Olympics, it is evident that the pandemic is over. A graph from the CDC says that mortality from Covid-19 is now similar to influenza- not great, but not catastrophic either, running at roughly a thousand deaths a week, and this with negligible public precautions.

Overall mortality of Covid-19 in the US.

A fundamental scientific and policy question about this is why: did the virus evolve to a less virulent state, or have we evolved (or engineered) enough immunity to fend off the worst? Even after the intense focus on this virus and all the research that has been done, this is a difficult question to answer. There has been a parade of variants, one supposedly more virulent and dangerous than the last, except that we are less affected and increasingly able to ignore them. The scientific community is evidently divided on this causal question, with no good ways to test these basic hypotheses.

I am personally very much in the viral evolution camp, believing that this virus has on its own evolved to be less virulent, even as it gained in transmissibility and ability to evade our immune systems. Surveillance of the virus shows quite high levels this summer, even while its effects are minor, overall. The logic is that this kind of virus does not gain from people shutting themselves up at home and being miserable, let alone dying. Much better for us to be surreptitiously infected and infectious, and able to go about our business, at work and play. We recall that Covid was markedly more lethal at the very outset of the pandemic, before the first set of variants developed. Other cold-type viruses seem to have followed a similar path, and the many zoonotic infections we have picked up (including this one) come from other organisms which carry these pathogens without much difficulty, doubtless after a long evolutionary standoff.

But the graph above makes a different argument, since the vaccines came online around the spring of 2021, reached about fifty percent of the population in late 2021, which is followed by the dramatic drop in covid mortality in spring of 2022. Some researchers point to the lack of attenuation of other pathogens, like HIV, tuberculosis, and smallpox, to say that the evolutionary argument does not hold water. After a pathogen has replicated and spread, (in the case of Covid, in the first week of infection, roughly), it doesn't care what happens to the host- literally whether it lives or dies. They would say that it was the immunization campaign that saved us, and continued infection leading to herd immunity that has created a population increasingly resistant to Covid mortality.

Testing these hypotheses would require Covid-naive populations, which would be ideally split into two study sets, one with vaccination followed by infection, and the other infected directly. This kind of thing may happen as a natural experiment somewhere, and perhaps the closest we can come is the release of Covid restrictions in China. In late 2022/early 2023, China switched abruptly from a zero-tolerance policy of social contact and infection, to a zero-tolerance policy towards bad publicity and accurate mortality reporting, while relaxing anti-Covid restrictions. The result was a surge in death rates, to levels estimated to be higher than those elsewhere, including in the US. This argues that during the restrictive period, the virus had not significantly attenuated via its natural evolution, though then the subsequent mass infection and inoculation did eventually lead in China, as it has elsewhere, to the lower mortality rates seen around the world. 

So, despite the rapidity of viral evolution, one has to conclude that over the short term, the immune hypothesis appears superior to the viral evolution hypothesis, as an explanation of general attenuation of Covid mortality. (Robert Kennedy may disagree, of course!) The evolution of virulence is closely related to the whole lifecycle of a pathogen, especially the way it spreads, making comparisons with other pathogens hazardous. Respiratory pathogens have the opportunity to spread without damaging the host too much, and that seems, in principle, like an advantageous evolutionary path. So I would still hypothesize that over the long term, Covid will settle into a less virulent form that triggers less immune activation (the most lethal aspect of Covid infection), in favor of high transmission and co-existence with our immune systems. Other viruses seem to have followed a similar path. How it interacts with further naive populations would be dispositive, though there may not be any left at this point.


Sunday, March 31, 2024

Nominee for Most Amazing Protein: RAD51

On the repair and resurrection of DNA, which gets a lot of help from a family of proteins including RAD51, DMC1, and RecA.

Proteins do all sorts of amazing things, from composing pores that can select a single kind of ion- even just a proton- to allow across a membrane, to massive polymerizing enzymes that synthesize other proteins, DNA, and RNA. There is really no end to it. But one of the most amazing, even incredible, things that happens in a cell is the hunt for DNA homology. Even over a genome of billions of base pairs, it is possible for one DNA segment to find the single other DNA segment that matches it. This hunt is executed for several reasons. One is to line up the homologous chromosomes at meiosis, and carry out the genetic cross-overs between them (when they are lined up precisely) that help scramble our genetic lineages for optimal mix-and-matching during reproduction. Another is for DNA repair, which is best done with a good copy for reference, especially when a full double-strand break has happened. Just this week, a fascinating article showed that memories in our brains depend in some weird way on DNA breaks occurring in neurons, some of which then use the homologous repair process, including homology search, to patch things up.

The protein that facilitates this DNA homology search is deeply conserved in evolution. It is called RecA in bacteria, radA and radB in archaea, and the RAD51 family in eukaryotes. Naturally, the eukaryotic family is most closely related to the archaeal versions (RAD51 and DMC1 evolving from radA, and a series of other, and poorly understood family members, from radB). In this post, I will mostly just call them all RAD51, unless I am referring to DMC1 specifically. The name comes from genetic screens for radiation-sensitive mutants in human and other eukaryotes, since RAD51 plays a crucial role in DNA repair, as noted above. RAD51 is not a huge protein, but it is an ATPase. It binds to itself, forming linear filaments with ATP at the junction points between units. It binds to a single strand of DNA, which is going to be what does the hunting. And it binds, in a complicated way, to another double-stranded DNA, which it helps to open briefly to allow its quality as a target to be evaluated. 

This diagram describes the repair of double strand breaks (DSB) in DNA. First the ends are covered with a bunch of proteins that signal far and wide that something terrible has happened- the cell cycle has to stop.. fire engines need to be called. One of these proteins is RPA, which simply binds all over single-stranded DNA and protects it. Then the RAD51 protein comes in, displaces RPA, and begins the homology search process. The second DNA shown, in dark black, doesn't just happen, but is hunted for high and low throughout the nucleus to find the exact homolog of the broken end. When that exact match is found, the repair process can proceed, with continued DNA synthesis through the lesion, and resolution of the newly repaired double strands, either to copy up the homolog version, or exchange versions (GC, for gene conversion). 

This diagram shows how the notorious (when mutated) oncogene BRCA2 (in green) works. It binds RAD51 (in blue) and brings it, chain-gang style, to the breakpoints of DNA damage to speed up and specify repair.


There have been several structural studies by this point that clarify how RAD51 does its thing. ATP is simply required to form filaments on single-stranded DNA. When a match has been found and RAD51 is no longer needed, ATP is cleaved, and RAD51 falls off, back to reserve status. The magic starts with how RAD51 binds the single stranded DNA. One RAD51 binds for every ~3 bases in the DNA, and the it binds the phosphate backbone, so that the bases are nicely exposed in front, and all stretched out, ready to hunt for matching DNA.

A series of RAD51 molecules (in this case, RecA from bacteria) bound sequentially to single-stranded DNA (red). Note the ATP homolog chemicals in yellow, positioned between each protein unit. One can see that the DNA is stretched out a bit and the bases point outwards.

A closeup view of one of the RAD51 units from above, showing how the bases of the DNA (yellow) are splayed out into the medium, ready to find their partners. They are arranged in orientations similar to how they sit in normal (B-form) DNA, further enhancing their ability to find partners.

The second, and more mysterious part of the operation is how RAD51 scans double-stranded DNA throughout the genome. It has binding sites for double-stranded DNA, away from the single-stranded DNA, and then it also has a little finger that splits open the double-stranded DNA, encouraging separation and allowing one strand to face up to the single stranded DNA that is held firmly by the RAD51 polymer. The transient search happens in eight-base increments, with tighter capture of the double-strand DNA happening when nine bases are matched, and committment to recombination or repair happening when a match of fifteen bases is found.  

These structures show an intermediate where a double-stranded DNA (ends in teal and lavender, and separated DNA segments in green and red) has been captured, making a twelve base match with the stable single-stranded DNA (brown). Note how the double-stranded DNA ends are held by outside portions of the RAD51 protein. Closeup on the right shows the dangling, non-paired DNA strand in red, and the newly matched duplex DNA with green-brown colored base interactions.

These structures can only give a hint of what is going on, since the whole process relies so clearly on the brownian motion that allows super-rapid diffusion of the stablized single-strand DNA+RAD51 over the genome, which it scans efficiently in one-dimensional fashion, despite all the chromatin and other proteins parked all over the place. And while the structures provide insight into how the process happens, it remains incredible that this search can happen, on what is clearly a quite reliable basis, day and day out, as our genomes get hit by whatever the environment throws at us.

"Unfortunately, most RAD51 and RAD51 paralog point mutations that have been clinically identified are classified as variants of unknown significance (VUSs). Future studies to reclassify these RAD51 gene family VUSs as pathogenic or benign are desperately needed, as many of these genes are now included on hereditary breast and ovarian cancer screening panels. Reclassification of HR-deficient VUSs would enable these patients to benefit from therapies that specifically target HR deficiency, as do poly(ADP)-ribose polymerase (PARP) inhibitors in BRCA1/2-deficient cells."

Lastly, one paper made the point that clinicians need better understanding of the various mutations that can affect RAD51 itself. Genetic testing now is able to find all of our mutations, but we don't always know what each mutation is capable of doing. Thus deeper studies of RAD51 will have beneficial effects on clinical diagnosis, when particular mutations can be assigned as disease-causing, thus justifying specific therapies that would otherwise not be attempted.


Saturday, March 9, 2024

Getting Cancer Cells to Shoot Themselves

New chemicals that make novel linkages among cellular components can be powerful drugs.

One theme that has become common in molecular biology over the years is the prevalence of proteins whose only job is to bring other proteins together. Many proteins lack any of the usual jazzy functions, like catalytic enzyme, or ion channel, or signaling kinase, but just serve as "conveners", bringing other proteins together. Typically they are regulated in some way, by phosphorylation, expression, or localization, and some of these proteins serve as key "scaffolds" for the activation of some process, like G-protein activation, or cell cycle control, or cell growth. 

Well, the drug industry has caught on, and is starting to think about chemicals that can do similar things, resulting in occasionally powerful results. Conventional drug design has aimed to bind to whatever protein is responsible for some ill, and inhibit it. Such as an oncogene, or an over-active component of the immune system. This has led to many great drugs, but has significant limitations. The chemical has to bind not just anywhere on the target, but at the particular spot (the active site) that is its business end, where its action happens. And it has to bind really well, since binding and inhibiting only half the target proteins in a cell (or the body) will typically only have a modest effect. These requirements are quite stringent and result in many protein targets being deemed difficult to drug, or "undruggable".

A paradigm for a new kind of chemical drug, which links two functions, is the PROTAC class, which combines binding with a target on one end, with another end that binds to the cell's protein destruction machinery, thereby not just inhibiting the target, but destroying it. A new paper describes an even more nuclear option along this line of drug development, linking an oncogene with a second part that activates the cellular suicide machinery. One can imagine that this approach can have far more dramatic effects.

These researchers synthesize and demonstrate a chemical that binds on one end the oncogene BCL6, mutations of which can cause B cell lymphoma. This gene is a transcription repressor, and orchestrates the development of particular immunologic T cells called T follicular helper cells. One of its roles is to prevent the suicide of these cells when an antigen is present, which is when the cells are most needed. If over-expressed in cancer, these cells think they really need to protect the body and proliferate wildly.

The other end of this chemical, called TCIP1, binds to BRD4, which is another transcription regulator, but this one activates the cell suicide genes, instead of turning them off. Both ends of this molecule were based on previously known structures. The innovation was solely in linking them together. I should say parenthetically that BRD4 is itself recognized as an oncogene, as it can promote cell growth and prevent cell suicide in many settings. So it has ambivalent roles, (inviting a lot of vague writing), and it is somewhat curious that these researchers focused on BRD4 as an apoptosis driver.

"TCIP1 kills diffuse large B cell lymphoma cell lines, including chemotherapy-resistant, TP53-mutant lines, at EC50 of 1–10 nM in 72 h" 
Here EC50 means the effective concentration where the effect is 50% of maximal. This value of 1.3 nano molar is a very low concentration for a drug, meaning it is highly effective. TP53 is another cancer-driving mutation, common in treatment-resistant cancers. The drug has a characteristic and curious dosage behavior, as its effect decreases at higher concentrations. This is because each individual end of the molecule starts to bind and saturate targets independently, reducing the rate of linkage between the two target proteins, and thus the intended effect.

Chemical structure of TCIP1. The left side binds to BRD4, a regulator of cell suicide, while the right side binds to BCL6, an oncogene.

The authors did numerous controls with related chemicals, and tracked genes that were targeted by the novel chemical, all to show that the dramatic effects they were seeing were specifically caused by the linkage of the two chemical functions. Indeed, BCL6 represses its own transcription in the natural course of affairs, and the new drug reverses this behavior as well, inducing more of its own synthesis, which now potentiates the drug's lethal effect. While the authors did not show effectiveness in animals, they did show that TCIP1 is not toxic in mice. Neither did they show that TCIP1 is orally available, but administered it by injection. But even by this mode, it would, if effective, be a very exciting therapy. Not surprisingly, the authors report a long series of biotech industry ties (rooted at Stanford) and indicate that this technology is under license for drug development.

This approach is highly promising, and a significant advance in the field. It should allow increased flexibility in targeting all kinds of proteins that may or not cause disease, but are specific to or over-expressed in disease states, in order to address those diseases. It will allow increased flexibility in targeting apoptosis (cell suicide) pathways through numerous entry points, to have the same ultimate (and highly effective) therapeutic endpoint. It allows drugs to work at low concentrations, not needing to fully occupy or inhibit their targets. Many possible areas of therapy can be envisioned, but one is aging. By targeting and killing senescent cells, which are notorious for promoting aging, significant increases in lifespan and health are conceivable. 


  • Biden is doing an excellent job.
  • Annals of mental decline.
  • Maybe it is an anti-addiction drug.
  • One gene that really did the trick.
  • A winning issue.
  • It is hard to say yet whether nuclear power is a climate solution, or an expensive distraction.

Saturday, February 17, 2024

A New Form of Life is Discovered

An extremely short RNA is infectious and prevalent in the human microbiome.

While the last century might be called the DNA century, at least for molecular biology, the current century might be called that of RNA. A blizzard of new RNA types and potentials have been discovered in the normal eukaryotic milieu, including miRNA, eRNA, lincRNA. An RNA virus caused a pandemic, which was remedied by an RNA vaccine. Nobel prizes have been handed out in these fields, and we are also increasingly aware that RNA lies at the origin of life itself, as the first genetic and catalytic mechanism.

One of these Nobel prize winners recently undertook a hunt for small RNAs that might be lurking in the human microbiome- the soup of bacteria, fungi, and all the combined products that cover our surfaces, inside and out. What they found was astonishing- an RNA of merely 1164 nucleotides, which folds up into a rigid, linear rod, which they call "obelisks". This is not a product of the host genome, nor of any other known organism, but is rather some kind of extremely minimal pathogen that, like a transposon or self-splicing intron, is entirely nucleic-acid based. And the more they hunted, the more they found, ultimately finding thousands of obelisk-like entities hidden in the many databases of the world drawn from various environmental and microbiome samples. There is some precedent for this kind of structure, in the form of hepatitis D. This "viroid" of only 1682 nucleotides is a parasite of hepatitis B virus, depending on that virus for key replication functions. While normal viruses (like hepatitis B) encode many key functions of their own, like envelope proteins, genome packaging proteins, and replication enzymes, viroids tend to not encode anything, though hepatitis D does encode one antigenic protein, which exacerbates hepatitis B infections.

The obelisk RNA viroid-like species appear to encode one or two proteins, and possibly a ribozyme as well. The functions of all these are as yet unknown, but necessarily the RNAs rely entirely some host cell (currently unknown) functions to do their thing, such as the RNA polymerase to create copies of itself. Unknown also is whether they are dependent on other viruses, or only on cells for their propagation. Being just discovered, the researchers can do a great deal of bioinformatics, such as predicting the structure of the encoded protein, and the structure of the RNA genome. But key biology, like how they interact with host cells, what functions the host provides, and how they replicate, not to mention possible pathogenic consequences, remain unknown.

The highly self-complementary structure of one obelisk RNA sequence, leading to its identification and naming. In green is one reading frame, which codes for the main protein, of unknown function.

The curious thing about these new obelisk viroid-like RNAs is that, while common in human microbiomes, both oral and gut-derived, they are found only in 5-10% of them, not in all samples. This sort of suggests that they may account for some of the variability traceable to microbiomes, such as autoimmune issues, chronic ailments, nutritional variations, even effects on mood, etc.

Once a lot of databases were searched, obelisk RNAs turn up everywhere, even in some bacteria.

This work was done entirely in silico. Not a single wet-lab experiment was performed. It is a testament to the power of having alot of genomes at our disposal, and of modern computational firepower. This lab just had the idea that novel small viroid-like RNAs might exhibit certain types of (circular, self-complementary) structure, which led to this discovery of a novel form of "life". Are these RNAs alive? Certainly not. They are mere molecules and parasites that feed off, and transport themselves between, more fully functional cells. But they are part of the tapestry of life, which itself is wholly molecular, with many amazing emergent properties. Whether these obelisks turn out to have any medical or ecological significance, they are one more example of the lengths (and shorts) to which Darwinian selection has gone in the struggle for existence.