Showing posts with label cancer. Show all posts
Showing posts with label cancer. Show all posts

Saturday, November 22, 2025

Ground Truth for Genetic Mutations

Saturation mutagenasis shows that our estimates of the functional effect of uncharacterized mutations are not so great.

Human genomes can now be sequenced for less than $1,000. This technological revolution has enabled a large expansion of genetic testing, used for cancer tissue diagnosis and tracking, and for genetic syndrome analysis both of embryos before birth and affected people after birth. But just because a base among the 3 billion of the genome is different from the "reference" genome, that does not mean it is bad. Judging whether a variant (the modern, more neutral term for mutation) is bad takes a lot of educated guesswork.

A recent paper described a deep dive into one gene, where the authors created and characterized the functional consequence of every possible coding variant. Then they evaluated how well our current rules of thumb and prediction programs for variant analysis compare with what they found. It was a mediocre performance. The gene is CDKN2A, one of our more curious oddities. This is an important tumor suppressor gene that inhibits cell cycle progression and promotes DNA repair- it is often mutated in cancers. But it encodes not one, but two entirely different proteins, by virtue of a complex mRNA splicing pattern that uses distinct exons in some coding portions, and parts of one sequence in two different frames, to encode these two proteins, called p16 and p14. 

One gene, two proteins. CDKN2A has a splicing pattern (mRNA exons shown as boxes at top, with pink segments leading to the p14 product, and the blue segments leading the p16 product) that generates two entirely different proteins from one gene. Each product has tumor suppressing effects, though via distinct mechanisms.

Regardless of the complex splicing and protein coding characteristics, the authors generated all possible variants in every possible coded amino acid (156 amino acids in all, as both produced proteins are relatively short). Since the primary roles of these proteins are in cell cycle and proliferation control, it was possible to assay function by their effect when expressed in cultured pancreatic cells. A deleterious effect on the protein was revealed as, paradoxically, increased growth of these cells. They found that about 600 of the 3,000 different variants in their catalog had such an effect, or 20%.

This is an expected rate of effect, on the whole. Most positions in proteins are not that important, and can be substituted by several similar amino acids. For a typical enzyme, for instance, the active site may be made up of a few amino acids in a particular orientation, and the rest of the protein is there to fold into the required shape to form that active site. Similar folding can be facilitated by numerous amino acids at most positions, as has been richly documented in evolutionary studies of closely-related proteins. These p16 and p14 proteins interact with a few partners, so they need to maintain those key interfacial surfaces to be fully functional. Additionally, the assay these researchers ran, of a few generations of growth, is far less sensitive than a long-term true evolutionary setting, which can sift out very small effects on a protein, so they were setting a relatively high bar for seeing a deleterious effect. They did a selective replication of their own study, and found a reproducibility rate of about 80%, which is not great, frankly.

"Of variants identified in patients with cancer and previously reported to be functionally deleterious in published literature and/or reported in ClinVar as pathogenic or likely pathogenic (benchmark pathogenic variants), 27 of 32 (84.4%) were functionally deleterious in our assay"

"Of 156 synonymous variants and six missense variants previously reported to be functionally neutral in published literature and/or reported in ClinVar as benign or likely benign (benchmark benign variants), all were characterized as functionally neutral in our assay "

"Of 31 VUSs previously reported to be functionally deleterious, 28 (90.3%) were functionally deleterious and 3 (9.7%) were of indeterminate function in our assay."

"Similarly, of 18 VUSs previously reported to be functionally neutral, 16 (88.9%) were functionally neutral and 2 (11.1%) were of indeterminate function in our assay"

Here we get to the key issues. Variants are generally classified as benign, pathogenic/deleterious, or "variant of unknown/uncertain significance". The latter are particularly vexing to clinical geneticists. The whole point of sequencing a patient's tumor or genomic DNA is to find causal variants that can illuminate their condition, and possibly direct treatment. Seeing lots of "VUS" in the report leaves everyone in the dark. The authors pulled in all the common prediction programs that are officially sanctioned by the ACMG- Americal College of Medical Genetics, which is the foremost guide to clinical genetics, including the functional prediction of otherwise uncharacterized sequence variants. There are seven such programs, including one driven by AI, AlphaMissense that is related to the Nobel prize-winning AlphaFold. 

These programs strain to classify uncharacterized mutations as "likely pathogenic", "likely benign", or, if unable to make a conclusion, VUS/indeterminate. They rely on many kinds of data, like amino acid similarity, protein structure, evolutionary conservation, and known effects in proteins of related structure. They can be extensively validated against known mutations, and against new experimental work as it comes out, so we have a pretty good idea of how they perform. Thus they are trusted to some extent to provide clinical judgements, in the absence of better data. 

Each of seven programs (on bottom) gives estimations of variant effect over the same pool of mutations generated in this paper. This was a weird way to present simple data, but each bar contains the functional results the authors developed in their own data (numbers at the bottom, in parentheses, vertical). The bars were then colored with the rate of deleterious (black) vs benign (white) prediction from the program. The ideal case would be total black for the first bar in each set of three (deleterious) and total white in the third bar in each set (benign). The overall lineup/accuracy of all program predictions vs the author data was then overlaid by a red bar (right axis). The PrimateAI program was specially derived from comparison of homologous genes from primates only, yielding a high-quality dataset about the importance of each coded amino acid. However, it only gave estimates for 906 out of the whole set of 2964 variants. On the other hand, cruder programs like PolyPhen-2 gave less than 40% accuracy, which is quite disappointing for clinical use.

As shown above, the algorithms gave highly variable results, from under 40% accurate to over 80%. It is pretty clear that some of the lesser programs should be phased out. Of programs that fielded all the variants, the best were AlphaMissense and VEST, which each achieved about 70% accuracy. This is still not great. The issue is that, if a whole genome sequence is run for a patient with an obscure disease or syndrome, and variants vs the reference sequence are seen in several hundred genes, then a gene like CDKN2A could easily be pulled into the list of pathogenic (and possibly causal) variants, or be left out, on very shaky evidence. That is why even small increments in accuracy are critically important in this field. Genetic testing is a classic needle-in-a-haystack problem- a quest to find the one mutation (out of millions) that is driving a patient's cancer, or a child's inherited syndrome.

Still outstanding is the issue of non-coding variants. Genes are not just affected by mutations in their protein coding regions (indeed many important genes do not code for proteins at all), but by regulatory regions nearby and far. This is a huge area of mutation effects that are not really algorithmically accessible yet. As a prediction problem, it is far more difficult than predicting effects on a coded protein. It will requiring modeling of the entire gene expression apparatus, much of which remains shrouded in mystery.


Sunday, May 18, 2025

Histones Require a Towtruck With a Winch

Motorized remodelers adjust and open up chromatin for gene expression.

Wouldn't it be nice if, on a stop-and-go congested highway, you could just plow through all the obstructing cars and go where you want? That is what polymerases get to do on our DNA, once they are set in motion. They plow right through chromatin, histones, DNA-binding transcription regulators, and everything else in their way. But getting them to that point is a different matter. Origins of replication need to be carefully cleared and set up. Promoters of genes need to be activated by the convergence of enhancer-binding proteins, promoter-binding proteins, and mediators of various kinds to get that RNA polymerase set on its path. And that is not so easy in a chromatin environment where the DNA is almost all covered by something, principally histones that wind up our DNA in tight little 146 base pair loops.

A basic schematic of nucleosome cores (yellow), composed of histone proteins, and how they wind up DNA and pack with each other.

So a class of "chromatin remodelers" have evolved that move histones around, and exchange histones in a way that facilitates transcription. It became apparent a couple of decades ago that regions of active transcription have altered histone composition, H2A.z/H3.3 instead of the regular H2A and H3 histones. These histones are looser, allowing regulatory proteins better access to the DNA as well as easing the passage of the polymerases. But how do they get there? It has also gradually become apparent that regulatory proteins come in different types, with some "pioneer" regulators able to bind in the midst of packed chromatin. These in turn recruit additional regulators, including enzymes that loosen up histones by chemically altering them with methyl, acetyl, and other modifications, and remodeling enzymes that push histones around, revealing DNA where other regulators can bind, and popping out conventional histones for more weakly-binding ones.

A few recent papers revealed the structure of a few of these remodeling enzymes, and compare them between yeast cells and human cells. There are a variety of these machines, which specialize things like nudging nucleosomes into regular spacing, or evicting/moving nucleosomes from particular regions, such as near transcription regulators, or exchanging nucleosomes in active regions. It is the latter that is being studied here. In yeast, this SWI/SNF family remodeler comes in two parts, NuA4, which is a histone acetylase, and SWR1, which uses ATP to winch out H2A and replace it with H2A.z. These protein complexes cooperate with each other and have related effects in opening local chromatin to be more transcription competent. In humans, these complexes are combined into one super-complex, TIP60-C, which weighs in at 1.8 million daltons, a dalton being the mass of one hydrogen atom. One can appreciate here, as in so many other details of biology, the nature of evolution- the tension between conservation and change.

Chromatin remodelers from yeast (top) and human (bottom). SWI/SNF is on the left, and RSC is on the right. Both of these remodelers have wide capabilities of moving or replacing nucleosomes. DNA is in orange, and the histone is at top, within the DNA coils. See text for further description.

At top are the structures of two yeast remodelers, SWI/SNF and RSC. At bottom are structures of the corresponding human remodelers, BAF and PBAF. One can appreciate how similar they are in overview, while being very different in detail. All of these remodelers function in detailed transcriptional control in collaboration with other regulators. The orange structure is the DNA wrapped around a nucleosome, while the large blobs at the bottom of these structures are relatively loose regions where they interact with other transcription regulators, which recruit them to the proper locations. The ATP-driven motor is shown at the top in green, and grabs tightly, with two RecA domains (that is to say, with two hands) to bits of the DNA circling the nucleosome. Given that the whole apparatus is anchored to the histone and other nearby structures, this enables the motor to pull on the DNA. It is pretty slow, moving only 1 or 2 base pairs per ATP used, but with enough copies (of these motors) and time, great things can be done. The structure in (b) indicates where the DNA enters (top) and where it gets pulled towards (bottom) as the motor works. This action can nudge the histone to some new location, relative to the DNA. Alternately, with other forms of anchoring, it can also pop the histone entirely off the DNA, and, since this large protein complex can bind alternative histone complexes, can bring in a new histone for exchange.  

  • Downloadable animation of nucleosome movement (source). 
  • Downloadable animation of overall assembly and nucleosome spacing activity (source).
  • So in short, what we have here is a DNA winch, which in various configurations can adjust, evict, or exchange nucleosomes, as directed by various signals. One signal is the sequence of the DNA itself, another is the standard spacing between chromosomes that is set by some of these motors that have a suitably sized extension, extending out to touch the next histone, and establishes the default nucleosome pattern, genome-wide. But more significant are the various pioneer regulators and histone modifiers that direct these motors to specific areas to reshape the local chromatin to control gene expression. Here is where the regulatory action is, and these proteins have been found to be determinants for cancer progression, and indeed are targets for some investigational anti-cancer drugs.


    • Kunzru on the community of "independent" researchers, aka wingnuts.
    • The nexus of lying, social media, and politics was even worse in Brazil.
    • New NIH director believes in the lab leak theory, and in US responsibility for it, despite the best evidence showing something different.

    Sunday, April 13, 2025

    The Genome Remains Murky

    A brilliant case study identifying the molecular cause of certain neuro-developmental disorders shows how difficult genome-based diagnoses remain.

    Molecular medicine is increasingly effective in assessing both hereditary syndromes and cancers. The sequencing approach generally comes in two flavors- whole genome sequencing, or exome sequencing, where only the most important (protein-coding) parts are sampled. In each case, the hunt is for mutations (more blandly called variants) that cause the syndrome being investigated, from among the large number of variants we all carry. This approach is becoming standard-of-care in oncology, due to tremendous influence and clinical significance of cancer-driving mutations, many of which now match directly to tailored treatments that address them (thus the "precision" in precision medicine).

    But another arm of precision medicine is the hunt for causes of congenital problems. There are innumerable genetic disorders whose causal analysis can lead not only to an informative diagnosis, and sometimes to useful treatments, but also to fundamental understanding of human biology. Sufferers of these syndromes may spend a lifetime searching for a diagnosis, being shuffled from one doctor or center to another and subject to various forms of hypothetical medicine, before some deep sequencing pinpoints the cause of their disease and founds a new diagnostic category that provides, if not relief, at least understanding and a medical home. 

    A recent paper from Britain provided a classic of this form, investigating the causes of neurodevelopmental (NDD) disorders, which encompass a huge range of problems from mild to severe. They comment that even after the most modern analysis and intensive sequencing, 60% of NDD cases still can not be assigned causes. A large part of the problem is that, despite knowing the full sequence of the human genome, its function is less well-understood. The protein-coding genes (20,000 of those, roughly) are delineated and studied pretty closely. But that only accounts for 1 to 2% of the genome. The rest ranges from genes for a blizzard of non-coding RNAs, some of which are critical, to large regulatory regions with smatterings of important sites, to junk of various kinds- pseudogenes, relic retroviruses, repetitive elements, etc. The importance of any of these elements (and individual DNA base positions within them) varies tremendously. This means specifically that exome sequencing is not going to cut it. Exome sequencing focuses on a very small part of the genome, which is fine if your syndrome (such as a common cancer) is well characterized and known to arise from the usual suspects. But for orphan syndromes, it does not cast a wide enough net. Secondly, even with full genome sequencing, so little is known about the remoter regions of the genome that assigning a function to variations found there is difficult to impossible. It takes statistical analysis of incidence of the variation vs the incidence of the syndrome.

    These authors used a trove of data- the Genomics England 100,000 genomes project, focusing on the ~9,000 genomes in this collection from people with NDD syndromes. (Plus additional genomes collected elsewhere.) (We can note in passing that Britain's nationalized health system remains at the forefront of innovative research and care.) What they found was an unusually high incidence of a particular mutation in a non-protein-coding gene called RNU4-2. The product of this gene is an RNA called U4, which is an important part of the spliceosome, where it pairs RNA-to-RNA with another RNA, U6, in a key step of selecting the first (5-prime) side of an intron that is to be spliced out of mRNA messages. This gene would never have come up in exome analysis, being non-protein-coding. Yet it is critically important, as splicing happens to the vast majority of human genes. Additionally, differential splicing- the selection of alternative exons and splice sites in a regulated way- happens frequently in developmental programs and neurological cell types. There is a class of syndromes called spliceosomopathies that are caused by defects in mRNA splicing, and tend to appear as syndromes in these processes.

    As shown in the images (all based on a large corpus of other work on spliceosomes), RNU4-2/U4 pairs intimately with the U6 spliceosomal RNA, and the mutation found by the current group (which is a single nucleotide insertion) causes a bulge in this pairing, as marked. Meanwhile, the U6 RNA pairs at the same time with the exon-intron junction of the target mRNA (bottom image), at a site that is very close to the U4 pairing region (top image). The upshot is that this single base insertion into U4 causes some portion of the target mRNAs to be mis-spliced, using non-natural 5 prime splice sites and thus altering their encoded proteins. This may cause minor problems in the protein, but more often will cause a shift in translation frame, a premature stop codon, and total loss of the functional protein. So this tiny mutation can have severe effects and is indeed genetically dominant- that is, one copy overrides a second wild-type copy to generate the NDD diseases that were studied.

    The U4 RNA (teal) paired with the U6 RNA (gray), within an early spliceosome complex. The mutation studied here is pointed out in black (n.64_65insT - i.e. insertion of a T). Note how it would cause a bulge in the pairing. Importantly, the location in the U6 RNA that pairs with the mRNA (see below) is right next door, at the ACAGAGA (light gray). The authors use this structural work from others to suggest how the mutation they found can alter selected splicing sites and thus lead to disease. Other single nucleotide insertions that cause similar syndromes are marked with black arrows, while single nucleotide substitutions that cause less severe syndromes are marked with orange RNA segments.

    The U6 RNA (pink) paired with its mRNA target to be spliced. It binds right at the intron (gray) exon (black) boundary, where the cut will eventually be made the remove the intron. The bump from the mis-paired mutant U4 RNA (see above) distorts this binding, sending U6 to select wrong locations for spicing.


    The researchers went on to survey this and other spliceosomal RNA genes for similar mutations, and found few to none outside the region marked in the diagram above. For example, there is a highly similar gene called RNU4-1. But this gene is expressed about 100-fold less in brain and other tissues, making RNU4-2 the principal source of U4 RNA, and much more significant as a causal factor for NDD. It appears that other locations in RNU4-2 (and other spliceosomal RNA genes) are even more important than the one mutated location found here, thus are never found, being lethal and heavily selected against, in this highly conserved gene. 

    They also noted that, while this RNU4-2 mutation is severe, and thus must happen spontaneously (i.e. not inherited from parents), it only occurrs on the maternal alleles, not paternal alleles in the affected children. They speculate that this may be due to effects this gene may have in male gametogenesis, killing affected sperm preferentially, but not affected oocytes. Lastly, this set of mutations (in the small region shown in the first figure above) appears to account for, in their estimation, about 0.4 % of all NDD seen in Britain. This is a remarkably high rate for such a particular mutation that is not heritable. They speculate that some mutation hotspot kind of process may be causing these events, above the general mutation rate. What this all says about so-called "intelligent design", one may be reluctant to explore too deeply. On the other hand, this still leaves plenty of room to hunt for additional variations that cause these syndromes.

    In this research, we see that clinically critical variations can pop up in many places, not just among the "usual suspects", genetically and genomically speaking. While much of the human genome is junk, most of it is also expressed (as RNA) and all of it is fair game for clinically important (if tragic) effects. The NDD syndromes caused by the mutation studied here are very severe- for more so than the ADD or mild autism diagnoses that make up most of the NDD spectrum. Understanding the causal nexus between the genome and human biology and its pathologies, remains an ongoing and complicated scientific adventure.


    • Playing the heel. Being the heel
    • It sure is great to be the victim.
    • Oh, right.. now we really know what is going on.
    • More spiritual warfare.
    • Another grift.

    Saturday, March 29, 2025

    What Causes Cancer? What is Cancer?

    There is some frustration in the literature.

    Fifty years into the war on cancer, what have we learned and gained? We do not have a general cure, though we have a few cures and a lot of treatments. We have a lot of understanding, but no comprehensive theory or guide to practice. While some treatments are pin-point specific to certain proteins and even certain mutated forms of those proteins, most treatments remain empirical, even crude, and few provide more than a temporary respite. Cancer remains an enormous challenge, clinically and intellectually.

    Recently, a prominent journal ran a provocative commentary about the origins of cancer, trashing the reigning model of "Somatic Mutation Theory", or SMT. Which is the proposition that cancer is caused by mutations that "drive" cell proliferation, and thus tumor growth. I was surprised at the cavalier insinuations being thrown around by these authors, their level of trash talk, and the lack of either compelling evidence or coherent alternative model. Some of their critiques have a fair basis, as discussed below, but to say, as the title does, that this is "The End of the Genetic Paradigm of Cancer" is simply wrong.

    "It is said that the wise only believe in what they can see, and the fools only see what they can believe in. The latter attitude cements paradigms, and paradigms are amplified by any new-looking glass that puts one’s way of seeing the world on steroids. In cancer research, such a self-fulfilling prophecy has been fueled by next-generation DNA sequencing."

    "However, in the quest for predictive biomarkers and molecular targets, the cancer research community has abandoned deep thinking for deep sequencing, interpreting data through the lens of clinical translation detached from fundamental biology."

    Whew!

    The main critique, once the gratuitous insults and obligatory references to Kuhn and Feynman are cleared away, is that cancer does not resemble other truly clonal disease / population processes, like viral or bacterial infections. In such processes, (which have become widely familiar after the COVID and HIV pandemics), a founder genotype can be identified, and its descendants clearly derive from that founder, while accumulating additional mutations that may respond to the Darwinian pressures, such as the immune system and other host defenses. While many cancers are clearly driven by some founding mutation, when treatments against that particular "driver" protein are given, resistance emerges, indicating that the cancer is a more diverse population with a very active mutation and adaptation process. 

    Additionally, tumors are not just clones fo the driving cell, but have complex structure and genetic variety. Part of this is due to the mutator phenotypes that arise during carcinogenesis, that blow up the genome and cause large numbers of additional mutations- many deleterious, but some carrying advantages. More significantly, tumors arise from and continue to exist in the context of organs and tissues. They can not just grow wildly as though they were on a petri plate, but must generate, for example, vascular structures and a "microenvironment" including other cells that facilitate their life. Similarly, metastasis is highly context-dependent and selective- only very few of the cells released by a tumor land in a place they find conducive to new growth. This indicates, again, that the organ setting of cancer cells is critically important, and accounts in large part for this overall difference between cancers and more straightforward clonal processes. 

    Schematic of cancer development, from a much more conventional and thorough review of the field.

    Cancer cells need to work with the developmental paradigms of the organism. For instance, the notorious "EMT", or epithelial-mesenchymal transition is a hallmark of de-differentiation of many cancer cells. They frequently regress in developmental terms to recover some of the proliferative and self-repair potential of stem cells. What developmental program is available or allowed in a particular tissue will vary tremendously. Thus cancer is not caused by each and every oncogenic mutation, and each organ has particular and distinct mutations that tend to cause cancers within it. Indeed, some organs hardly foster any cancers at all, while other organs with more active (and perhaps evolutionarily recent) patterns of proliferation (such as breast tissue, or prostate tissue) show high rates of cancer. Given the organ setting, cancer "driver" mutations need not only unleash the cell's own proliferation, but re-engineer its relations with other cells to remove their inhibition of its over-growth, and pursuade them to provide the environment it needs- nutritionally, by direct contact, by growth factors, vascular formation, immune interactions, etc., in a sort of para-organ formation process. It is a complicated job, and one mutation is, empirically, rarely enough.

    "Instead, cancer can be broadly understood as “development gone awry”. Within this perspective, the tissue organization field theory is based on two principles that unite phylogenesis and ontogenesis."

    "The organicist perspective is based on the interdependency of the organism and its organs. It recognizes a circular causal regimen by closure of constraints that makes parts interdependent, wherein these constraints are not only molecules, but also biophysical force."

    As an argument or alternative theory, this leaves quite a bit to be desired, and does not obviate the role of  initiating mutations in the process.

    It remains, however, that oncogenic mutations cause cancer, and treatments that address those root causes have time and again shown themselves to be effective cancer treatments, if tragically incomplete. The rise of shockingly effective immunotherapies for cancer have shown, however, that the immune system takes a more holistic approach to attacking disease than such "precision" single-target therapies, and can make up for the vagaries of the tissue environment and the inflammatory, developmental, and mutational derangements that happen later in cancer development. 

    In one egregious citation, the authors hail an observation that certain cancers need both a mutation and a chemical treatment to get started, and that the order of these events is not set in stone. Traditionally, the mutation is induced first, and then the chemical treatment, which causes inflammation, comes second. They state: 

    "The qualitative dichotomy between a mutagenic initiator that creates ’cancer cells’ and the non-genetic, tissue-perturbing promoter that expands them may not be as clear-cut. Indeed, the reverse experiment (first treatment with the promoter followed by the initiator) equally produces tumors. This result refutes the classical model that requires that the mutagenic (alleged) initiator must act first."

    The citation is to a paper entitled "The reverse experiment in two-stage skin carcinogenesis. It cannot be genuinely performed, but when approximated, it is not innocuous". This paper dates from 1993, long before sequencing was capable of evaluating the mutation profiles of cancer cells. Additionally, the authors of this paper themselves point out (in the quote below) a significant assymetry in the treatments. Their results are not "equal":

    "The two substances showed a reciprocal enhancing effect, which was sometimes weak, sometimes additive, and sometimes even synergistic, and was statistically most significant when the results were assessed from the time of DMBA application. Although the reverse experiment was not in any way innocuous it always resulted in a lower tumor crop than the classical sequence of DMBA followed by a course of TPA treatment. 

    However, the lower tumor crop in the reverse experiment cannot be used to prove a qualitative difference between initiators and promoters."

    (DMBA is the mutagen, while TPA is the inflammatory accelerant.)

    So chemical treatment can prepare the ground for subsequent mutant generation in forming cancers in this system, while being much less efficient than the traditional order of events. This is not a surprise, given that this chemical (TPA) treatment causes relatively long-term inflammation and cell proliferation on its own.

    "An epistemic shift towards a biological theory of cancer may still be an uphill battle in the current climate of thought created by the ease of data collection and a culture of research that discourages ’disruptive science’. Here, we have made an argument for dropping the SMT and its epicycles. We presented new and old but sidelined theoretical alternatives to the SMT that embrace theory and organismal biology and can guide experiments and data interpretation. We expect that the diminishing returns from the ceaselessly growing databases of somatic mutations, the equivalent to Darwin’s gravel pit, may soon reach a pivot point."

    One rarely reads such grandiloquent summaries (or mixed metaphors) in scientific papers! But here they are truly beating up on straw men. In the end, it is true that cancer is quite unlike clonal infectious diseases, and for this, as for many other reasons, has had scientists scratching their heads for decades, if not centuries. But rest assured that this chest-thumping condescension is quite unnecessary, since those in the field are quite aware of these difficulties. The various nebulous alternatives these authors offer, whether the "epigenetic landscape", the "tissue organization field theory", or the "biological theory of cancer" all have kernels of logic, but the SMT remains the foundation-stone of cancer study and treatment, while being, for all the reasons enumerated above and by these authors, only part of the edifice, not the whole truth.


    Saturday, September 28, 2024

    Dangerous Memories

    Some memory formation involves extracellular structures, DNA damage, and immune component activation / inflammation.

    The physical nature of memories in the brain is under intensive scrutiny. The leading general theory is that of positive reinforcement, where neurons that are co-activated strengthen their connections, enhancing their ability to co-fire and thus to express the same pattern again in the future. The nature of these connections has been somewhat nebulous, assumed to just be the size and stability of their synaptic touch-points. But it turns out that there is a great deal more going on.

    A recent paper started with a fishing expedition, looking at changes in gene expression in neurons at various time points after the mice were subjected to a fear learning regimen. They took this out to much longer time points (up to a month) than had been contemplated previously. At short times, a bunch of well-known signals and growth-oriented gene expression happened. At the longest time points, organization of a structure called the perineural net (PNN) was read out of the gene expression signals. This is a extracellular matrix sheath that appears to stabilize neuronal connections and play a role in long-term memory and learning. 

    But the real shocker came at the intermediate time point of about four days. Here, there was overexpression of TLR9, which is an immune system detector of broken / bacterial DNA, and inducer in turn of inflammatory responses. This led the authors down a long rabbit hole of investigating what kind of DNA fragmentation is activating this signal, how common this is, how influential it is for learning, and what the downstream pathways are. Apparently, neuronal excitation, particularly over-excitation that might be experienced under intense fear conditions, isn't just stressful in a semiotic sense, but is highly stressful to the participating neurons. There are signs of mitochondrial over-activity and oxidative stress, which lead to DNA breakage in the nucleus, and even nuclear perforation. It is a shocking situation for cells that need to survive for the lifetime of the animal. Granted, these are not germ cells that prioritize genomic stability above all else, but getting your DNA broken just for the purpose of signaling a stress response that feeds into memory formation? That is weird.

    Some neuronal cell bodies after fear learning. The red dye is against a marker of DNA repair proteins, which form tight dots around broken DNA. The blue is a general DNA stain, and the green is against a component of the nuclear envelope, showing here that nuclear envelopes have broken in many of these cells.

    The researchers found that there are classic signs of DNA breakage, which are what is turning on the TLR9 protein, such as seeing concentrated double-strand DNA repair complexes. All this stress also turned on proteases called caspases, though not the cell suicide program that these caspases typically initiate. Many of the DNA break and repair complexes were, thanks to nuclear perforation, located diffusely at the centrosome, not in the nucleus. TLR9 turns on an inflammatory response via NFKB / RELA. This is clearly a huge event for these cells, not sending them into suicide, but all the alarms short of that are going off.

    The interesting part was when the researchers asked whether, by deleting the TLR9 or related genes in the pathway, they could affect learning. Yes, indeed- the fear memory was dependent on the expression of this gene in neurons, and on this cell stress pathway, which appears to be the precondition of setting up the perineural net structures and overall stabilization. Additionally, the DNA damage still happened, but was not properly recognized and repaired in the absence of TLR9, creating an even more dangerous situation for the affected neurons- of genomic instability amidst unrepaired DNA.

    When TRL9 is knocked out, DNA repair is cancelled. At bottom are wild-type cells, and at top are mouse neurons after fear learning that have had the gene TLR9 deleted. The red dye is against DNA repair proteins, as is the blue dye in the right-most frames. The top row is devoid of these repair activities.

    This paper and its antecedent literature are making the case that memory formation (at least under these somewhat traumatic conditions- whether this is true for all kinds of memory formation remains to be seen) has commandeered ancient, diverse, and quite dangerous forms of cell stress response. It is no picnic in the park with madeleines. It is an all-hands-on-deck disaster scene that puts the cell into a permanently altered trajectory, and carries a variety of long-term risks, such as cancer formation from all the DNA breakage and end-joining repair, which is not very accurate. They mention in passing that some drugs have been recently developed against TLR9, which are being used to dampen inflammatory activities in the brain. But this new work indicates that such drugs are likely double-edged swords, that could impair both learning and the long-term health of treated neurons and brains.

    Saturday, August 24, 2024

    Aging and Death

    Our fate was sealed a very long time ago.

    Why do we die? It seems like a cruel and wasteful way to run a biosphere, not to mention a human life. After we have accumulated a lifetime of experience and knowledge, we age, decline, and sign off, whether to go to our just reward, or into oblivion. What is the biological rationale and defense for all this, which the biblical writers assigned to the fairy tale of the snake and the apple?

    A recent paper ("A unified framework for evolutionary genetic and physiological theories of aging") discusses evolutionary theories of aging, but in typical French fashion, is both turgid and uninteresting. Aging is widely recognized as the consequence of natural selection, or more precisely, the lack thereof after organisms have finished reproducing. Thus we are at our prime in early adulthood, when we seek mates and raise young. Evolutionarily, it is all downhill from there. In professional sports, athletes are generally over the hill at 30, retiring around 35. Natural selection is increasingly irrelevant after we have done the essential tasks of life- surviving to mate and reproduce. We may participate in our communities, and do useful things, but from an evolutionary perspective, genetic problems at this phase of life have much less impact on reproductive success than those that hit earlier. 

    All this is embodied in the "disposable soma" theory of aging, which is that our germ cells are the protected jewels of reproduction, while the rest of our bodies are, well, disposable, and thus experience all the indignities of age once their job of passing on the germ cells is done. The current authors try to push another "developmental" theory of aging, which posits that the tradeoffs between youth and age are not so much the resources or selective constraints focused on germ cell propagation vs the soma, but that developmental pathways are, by selection, optimized for the reproductive phase of life, and thus may be out of tune for later phases. Some pathways are over-functional, some under-functional for the aged body, and that imbalance is sadly uncorrected by evolution. Maybe I am not doing justice to these ideas, which maybe feed into therapeutic options against aging, but I find this distinction uncompelling, and won't discuss it further.

    A series of unimpressive distinctions in the academic field studying aging from an evolutionary perspective.

    Where did the soma arise? Single cell organisms are naturally unitary- the same cell that survives also mates and is the germ cell for the next generation. There are signs of aging in single cell organisms as well, however. In yeast, "mother" cells have a limited lifespan and ability to put out daughter buds. Even bacteria have "new" and "old" poles, the latter of which accumulate inclusion bodies of proteinaceous junk, which apparently doom the older cell to senescence and death. So all cells are faced with processes that fail over time, and the only sure bet is to start as a "fresh" cell, in some sense. Plants have taken a distinct path from animals, by having bodies and death, yes, but being able to generate germ cells from mature tissues instead of segregating them very early in development into stable and distinct gonads.

    Multicellularity began innocently enough. Take slime molds, for example. They live as independent amoebae most of the time, but come together to put out spores, when they have used up the local food. They form a small slug-like body, which then grows a spore-bearing head. Some cells form the spores and get to reproduce, but most don't, being part of the body. The same thing happens with mushrooms, which leave a decaying mushroom body behind after releasing their spores. 

    We don't shed alot of tears for the mushrooms of the world, which represent the death-throes of their once-youthful mycelia. But that was the pattern set at the beginning- that bodies are cells differentiated from the germ cells, that provide some useful, competitive function, at the cost of being terminal, and not reproducing. Bodies are forms of both lost energy and material, and lost reproductive potential from all those extra cells. Who could have imagined that they would become so ornate as to totally overwhelm, in mass and complexity, the germ cells that are the point of the whole exercise? Who could have imagined that they would gain feelings, purposes, and memories, and rage against the fate that evolution had in store for them?

    On a more mechanistic level, aging appears to arise from many defects. One is the accumulation of mutations, which in soma cells lead to defective proteins being made and defective regulation of cell processes. An extreme form is cancer, as is progeria. Bad proteins and other junk like odd chemicals and chemically modified cell components can accumulate, which is another cause of aging. Cataracts are one example, where the proteins in our lenses wear out from UV exposure. We have quite intricate trash disposal processes, but they can't keep with everything, as we have learned from the advent of modern chemistry and its many toxins. Another cause is more programmatic: senescent cells, which are aged-out and have the virtue that they are blocked from dividing, but have the defect that they put out harmful signals to the immune system that promote inflammation, another general cause of aging.

    Aging research has not found a single magic bullet, which makes sense from the evolutionary theory behind it. A few things may be fixable, but mostly the breakdowns were never meant to be remedied or fixed, nor can they be. In fact, our germ cells are not completely immune from aging either, as we learn from older fathers whose children have higher rates of autism. We as somatic bodies are as disposable as any form of packaging, getting those germ cells through a complicated, competitive world, and on to their destination.


    Sunday, March 31, 2024

    Nominee for Most Amazing Protein: RAD51

    On the repair and resurrection of DNA, which gets a lot of help from a family of proteins including RAD51, DMC1, and RecA.

    Proteins do all sorts of amazing things, from composing pores that can select a single kind of ion- even just a proton- to allow across a membrane, to massive polymerizing enzymes that synthesize other proteins, DNA, and RNA. There is really no end to it. But one of the most amazing, even incredible, things that happens in a cell is the hunt for DNA homology. Even over a genome of billions of base pairs, it is possible for one DNA segment to find the single other DNA segment that matches it. This hunt is executed for several reasons. One is to line up the homologous chromosomes at meiosis, and carry out the genetic cross-overs between them (when they are lined up precisely) that help scramble our genetic lineages for optimal mix-and-matching during reproduction. Another is for DNA repair, which is best done with a good copy for reference, especially when a full double-strand break has happened. Just this week, a fascinating article showed that memories in our brains depend in some weird way on DNA breaks occurring in neurons, some of which then use the homologous repair process, including homology search, to patch things up.

    The protein that facilitates this DNA homology search is deeply conserved in evolution. It is called RecA in bacteria, radA and radB in archaea, and the RAD51 family in eukaryotes. Naturally, the eukaryotic family is most closely related to the archaeal versions (RAD51 and DMC1 evolving from radA, and a series of other, and poorly understood family members, from radB). In this post, I will mostly just call them all RAD51, unless I am referring to DMC1 specifically. The name comes from genetic screens for radiation-sensitive mutants in human and other eukaryotes, since RAD51 plays a crucial role in DNA repair, as noted above. RAD51 is not a huge protein, but it is an ATPase. It binds to itself, forming linear filaments with ATP at the junction points between units. It binds to a single strand of DNA, which is going to be what does the hunting. And it binds, in a complicated way, to another double-stranded DNA, which it helps to open briefly to allow its quality as a target to be evaluated. 

    This diagram describes the repair of double strand breaks (DSB) in DNA. First the ends are covered with a bunch of proteins that signal far and wide that something terrible has happened- the cell cycle has to stop.. fire engines need to be called. One of these proteins is RPA, which simply binds all over single-stranded DNA and protects it. Then the RAD51 protein comes in, displaces RPA, and begins the homology search process. The second DNA shown, in dark black, doesn't just happen, but is hunted for high and low throughout the nucleus to find the exact homolog of the broken end. When that exact match is found, the repair process can proceed, with continued DNA synthesis through the lesion, and resolution of the newly repaired double strands, either to copy up the homolog version, or exchange versions (GC, for gene conversion). 

    This diagram shows how the notorious (when mutated) oncogene BRCA2 (in green) works. It binds RAD51 (in blue) and brings it, chain-gang style, to the breakpoints of DNA damage to speed up and specify repair.


    There have been several structural studies by this point that clarify how RAD51 does its thing. ATP is simply required to form filaments on single-stranded DNA. When a match has been found and RAD51 is no longer needed, ATP is cleaved, and RAD51 falls off, back to reserve status. The magic starts with how RAD51 binds the single stranded DNA. One RAD51 binds for every ~3 bases in the DNA, and the it binds the phosphate backbone, so that the bases are nicely exposed in front, and all stretched out, ready to hunt for matching DNA.

    A series of RAD51 molecules (in this case, RecA from bacteria) bound sequentially to single-stranded DNA (red). Note the ATP homolog chemicals in yellow, positioned between each protein unit. One can see that the DNA is stretched out a bit and the bases point outwards.

    A closeup view of one of the RAD51 units from above, showing how the bases of the DNA (yellow) are splayed out into the medium, ready to find their partners. They are arranged in orientations similar to how they sit in normal (B-form) DNA, further enhancing their ability to find partners.

    The second, and more mysterious part of the operation is how RAD51 scans double-stranded DNA throughout the genome. It has binding sites for double-stranded DNA, away from the single-stranded DNA, and then it also has a little finger that splits open the double-stranded DNA, encouraging separation and allowing one strand to face up to the single stranded DNA that is held firmly by the RAD51 polymer. The transient search happens in eight-base increments, with tighter capture of the double-strand DNA happening when nine bases are matched, and committment to recombination or repair happening when a match of fifteen bases is found.  

    These structures show an intermediate where a double-stranded DNA (ends in teal and lavender, and separated DNA segments in green and red) has been captured, making a twelve base match with the stable single-stranded DNA (brown). Note how the double-stranded DNA ends are held by outside portions of the RAD51 protein. Closeup on the right shows the dangling, non-paired DNA strand in red, and the newly matched duplex DNA with green-brown colored base interactions.

    These structures can only give a hint of what is going on, since the whole process relies so clearly on the brownian motion that allows super-rapid diffusion of the stablized single-strand DNA+RAD51 over the genome, which it scans efficiently in one-dimensional fashion, despite all the chromatin and other proteins parked all over the place. And while the structures provide insight into how the process happens, it remains incredible that this search can happen, on what is clearly a quite reliable basis, day and day out, as our genomes get hit by whatever the environment throws at us.

    "Unfortunately, most RAD51 and RAD51 paralog point mutations that have been clinically identified are classified as variants of unknown significance (VUSs). Future studies to reclassify these RAD51 gene family VUSs as pathogenic or benign are desperately needed, as many of these genes are now included on hereditary breast and ovarian cancer screening panels. Reclassification of HR-deficient VUSs would enable these patients to benefit from therapies that specifically target HR deficiency, as do poly(ADP)-ribose polymerase (PARP) inhibitors in BRCA1/2-deficient cells."

    Lastly, one paper made the point that clinicians need better understanding of the various mutations that can affect RAD51 itself. Genetic testing now is able to find all of our mutations, but we don't always know what each mutation is capable of doing. Thus deeper studies of RAD51 will have beneficial effects on clinical diagnosis, when particular mutations can be assigned as disease-causing, thus justifying specific therapies that would otherwise not be attempted.


    Saturday, March 9, 2024

    Getting Cancer Cells to Shoot Themselves

    New chemicals that make novel linkages among cellular components can be powerful drugs.

    One theme that has become common in molecular biology over the years is the prevalence of proteins whose only job is to bring other proteins together. Many proteins lack any of the usual jazzy functions, like catalytic enzyme, or ion channel, or signaling kinase, but just serve as "conveners", bringing other proteins together. Typically they are regulated in some way, by phosphorylation, expression, or localization, and some of these proteins serve as key "scaffolds" for the activation of some process, like G-protein activation, or cell cycle control, or cell growth. 

    Well, the drug industry has caught on, and is starting to think about chemicals that can do similar things, resulting in occasionally powerful results. Conventional drug design has aimed to bind to whatever protein is responsible for some ill, and inhibit it. Such as an oncogene, or an over-active component of the immune system. This has led to many great drugs, but has significant limitations. The chemical has to bind not just anywhere on the target, but at the particular spot (the active site) that is its business end, where its action happens. And it has to bind really well, since binding and inhibiting only half the target proteins in a cell (or the body) will typically only have a modest effect. These requirements are quite stringent and result in many protein targets being deemed difficult to drug, or "undruggable".

    A paradigm for a new kind of chemical drug, which links two functions, is the PROTAC class, which combines binding with a target on one end, with another end that binds to the cell's protein destruction machinery, thereby not just inhibiting the target, but destroying it. A new paper describes an even more nuclear option along this line of drug development, linking an oncogene with a second part that activates the cellular suicide machinery. One can imagine that this approach can have far more dramatic effects.

    These researchers synthesize and demonstrate a chemical that binds on one end the oncogene BCL6, mutations of which can cause B cell lymphoma. This gene is a transcription repressor, and orchestrates the development of particular immunologic T cells called T follicular helper cells. One of its roles is to prevent the suicide of these cells when an antigen is present, which is when the cells are most needed. If over-expressed in cancer, these cells think they really need to protect the body and proliferate wildly.

    The other end of this chemical, called TCIP1, binds to BRD4, which is another transcription regulator, but this one activates the cell suicide genes, instead of turning them off. Both ends of this molecule were based on previously known structures. The innovation was solely in linking them together. I should say parenthetically that BRD4 is itself recognized as an oncogene, as it can promote cell growth and prevent cell suicide in many settings. So it has ambivalent roles, (inviting a lot of vague writing), and it is somewhat curious that these researchers focused on BRD4 as an apoptosis driver.

    "TCIP1 kills diffuse large B cell lymphoma cell lines, including chemotherapy-resistant, TP53-mutant lines, at EC50 of 1–10 nM in 72 h" 
    Here EC50 means the effective concentration where the effect is 50% of maximal. This value of 1.3 nano molar is a very low concentration for a drug, meaning it is highly effective. TP53 is another cancer-driving mutation, common in treatment-resistant cancers. The drug has a characteristic and curious dosage behavior, as its effect decreases at higher concentrations. This is because each individual end of the molecule starts to bind and saturate targets independently, reducing the rate of linkage between the two target proteins, and thus the intended effect.

    Chemical structure of TCIP1. The left side binds to BRD4, a regulator of cell suicide, while the right side binds to BCL6, an oncogene.

    The authors did numerous controls with related chemicals, and tracked genes that were targeted by the novel chemical, all to show that the dramatic effects they were seeing were specifically caused by the linkage of the two chemical functions. Indeed, BCL6 represses its own transcription in the natural course of affairs, and the new drug reverses this behavior as well, inducing more of its own synthesis, which now potentiates the drug's lethal effect. While the authors did not show effectiveness in animals, they did show that TCIP1 is not toxic in mice. Neither did they show that TCIP1 is orally available, but administered it by injection. But even by this mode, it would, if effective, be a very exciting therapy. Not surprisingly, the authors report a long series of biotech industry ties (rooted at Stanford) and indicate that this technology is under license for drug development.

    This approach is highly promising, and a significant advance in the field. It should allow increased flexibility in targeting all kinds of proteins that may or not cause disease, but are specific to or over-expressed in disease states, in order to address those diseases. It will allow increased flexibility in targeting apoptosis (cell suicide) pathways through numerous entry points, to have the same ultimate (and highly effective) therapeutic endpoint. It allows drugs to work at low concentrations, not needing to fully occupy or inhibit their targets. Many possible areas of therapy can be envisioned, but one is aging. By targeting and killing senescent cells, which are notorious for promoting aging, significant increases in lifespan and health are conceivable. 


    • Biden is doing an excellent job.
    • Annals of mental decline.
    • Maybe it is an anti-addiction drug.
    • One gene that really did the trick.
    • A winning issue.
    • It is hard to say yet whether nuclear power is a climate solution, or an expensive distraction.

    Saturday, June 10, 2023

    A Hard Road to a Cancer Drug

    The long and winding story of the oncogene KRAS and its new drug, sotorasib.

    After half a century of the "War on Cancer", new treatments are finally straggling into the clinic. It has been an extremely hard and frustrating road to study cancer, let alone treat it. We have learned amazing things, but mostly we have learned how convoluted a few billion years of evolution can make things. The regulatory landscape within our cells is undoubtedly the equal of any recalcitrant bureaucracy, full of redundant offices, multiple veto points, and stakeholders with obscure agendas. I recently watched a seminar in the field, which discussed one of the major genes mutated in cancer and what it has taken to develop a treatment against it. 

    Cancer is caused by DNA mutations, and several different types need to occur in succession. There are driver mutations, which are the first step in the loss of normal cellular control. But additional mutations have to happen for such cells to progress through regulatory blocks, like escape from local environmental controls on cell type and cell division, past surveillance by the immune system, and past the reluctance of differentiated cells to migrate away from their resident organ. By the end, cancer cells typically have huge numbers of mutations, having incurred mutations in their DNA repair machinery in an adaptive effort to evade all these different controls.

    While this means that many different targets exist that can treat some cancers, it also means that any single cancer requires a precisely tailored treatment, specific to its mutated genes. And that resistance is virtually inevitable given the highly mutable nature of these cells. 

    One of the most common genes to be mutated to drive cancer (in roughly 20% of all cases) is KRAS, part of the RAS family of NRAS, KRAS, and HRAS. These were originally discovered through viruses that cause cancer in rats. These viruses (such as Kirsten rat sarcoma virus) had a copy of a rat gene in it, which it overpoduces and uses to overcome normal proliferation controls during infection. The viral gene was called an oncogene, and the original rat (or human) version was called a proto-oncogene, named KRAS. The RAS proteins occupy a central part of the signaling path that external events and stresses turn on to activate cell growth and proliferation, called the MAP kinase cascade. For instance, epidermal growth factor comes along in the blood, binds to a receptor on the outside of a cell, and turns on RAS, then MEK, MAPK, and finally transcription regulators that turn on genes in the nucleus, resulting in new proteins being expressed. "Turning on" means different things at each step in this cascade. The transcription regulators typically get phosphorylated by their upstream kinases like MAPK, which tag them for physical transport into the nucleus, where they can then activate genes. MAPK is turned on by being itself phosphorylated by MEK, and MEK is phosphorylated by RAF. RAF is turned on by binding to RAS, whose binding activity in turn is regulated by the state of a nucleotide (GTP) bound by RAS. When binding GTP, RAS is on, but if binding GDP, it is off.

    A schematic of the RAS pathway, whereby extracellular growth signals are interpreted and amplified inside our cells, resulting in new gene expression as well as other more immediate effects. The cell surface receptor, activated by its ligand, activates associated SOS which activates RAS to the active (GTP) state. This leads to a kinase cascade through RAF, MEK, and MAPK and finally to gene regulators like MYC.

    This whole system seems rather ornate, but it accomplishes one important thing, which is amplification. One turned-on RAF molecule or MEK molecule can turn on / phosphorylate many targets, so this cascade, though it appears linear in a diagram, is acutally a chain reaction of sorts, amplifying as it goes along. And what governs the state of RAS and its bound GTP? The state of the EGFR receptor, of course. When KRAS is activated, the resident GDP leaves, and GTP comes to take its place. RAS is a weak GTPase enzyme itself, slowly converting itself from the active back to the inactive state with GDP. 

    Given all this, one would think that RAS, and KRAS in particular, might be "druggable", by sticking some well-designed molecule into the GTP/GDP binding pocket and freezing it in an inactive state. But the sad fact of the matter is that the affinity KRAS has to GTP is incredibly high- so high it is hard to measure, with a binding constant of about 20 pM. That is, half the KRAS-bound GTP comes off when the ambient concentration of GTP is infinitesimal, 0.02 nano molar. This means that nothing else is likely to be designed that can displace GTP or GDP from the KRAS protein, which means that in traditional terms, it is "undruggable". What is the biological logic of this? Well, it turns out that the RAS enzymes are managed by yet other proteins, which have the specific roles of prying GDP off (GTP exchange factor, or GEF) and of activating the GTP-ase activity of RAS to convert GTP to GDP (GTPase activating protein, or GAP). It is the GEF protein that is stimulated by the receptors like EGFR that induce RAS activity. 

    So we have to be cleverer in finding ways to attack this protein. Incidentally, most of the oncogenic mutations of KRAS are at the twelfth residue, glycine, which occupies a key part of the GAP binding site. As glycine is the smallest amino acid, any other amino acid here is bulkier, and blocks GAP binding, which means that KRAS with any of these mutations can not be turned off. It just keeps on signaling and signaling, driving the cell to think it needs to grow all the time. This property of gain of function and the ability of any mutation to fit the bill is why this particular defect in KRAS is such a common cancer-driving mutation. It accounts for ~90% of pancreatic cancers, for instance. 

    The seminar went on a long tangent, which occupied the field (of those looking for ways to inhibit KRAS with drugs) for roughly a decade. RAS proteins are not intrinsically membrane proteins, but they are covalently modified with a farnesyl fatty tail, which keeps them stuck in the cell's plasma membrane. Indeed, if this modification is prevented, RAS proteins don't work. So great- how to prevent that? Several groups developed inhibitors of the farnesyl transferase enzyme that carries out this modification. The inhibitors worked great, since the farnesyl transferase has a nice big pocket for its large substrate to bind, and doesn't bind it too tightly. But they didn't inhibit the RAS proteins, because there was a backup system- geranygeranyl transferase that steps into the breach as a backup, which can attach an even bigger fatty tail to RAS proteins. Arghhh!

    While some are working on inhibiting both enzymes, the presenter, Kevan Shokat of UCSF, went in another direction. As a chemist, he figured that for the fraction of the KRAS mutants at position 12 that transform from glycine to cysteine, some very specific chemistry (that is, easy methods of cross-linking), can be brought to bear. Given the nature of the genetic code, the fraction of mutations that go from glycine to cysteine are small, there being eight amino acids that are within a one-base change of glycine, coded by GGT. So at best, this approach is going to have a modest impact. Nevertheless, there was little choice, so they forged ahead with a complicated chemical scheme to make a small molecule that could chemically crosslink to that cysteine, with selectivity determined by a modest shape fit to the surface of the KRAS protein near this GEF binding site. 

    A structural model of KRAS, with its extremely tightly-bound substrate GDP in orange. The drug sotorasib is below in teal, bound in another pocket, with a tail extending upwards to the (mutant) cysteine 12, which is not differentiated by color, but sits over a magnesium ion (green) being coordinated by GDP. The main job of sotorasib is to interfere with the binding of the guanine exchange factor (GEF) which happens on the surface to its left, and would reset KRAS to an active state.

    This approach worked surprisingly well, as the KRAS protein obligingly offfered a cryptic nook that the chemists took advantage of to make this hybrid compound, now called the drug sotorasib. This is an FDA-approved treatment for cancers which are specifically driven by this particular KRAS mutation of position 12 from glycine to cysteine. That research group is currently trying to extend their method to other mutant forms, with modest success. 

    So let's take a step back. This new treatment requires, obviously, the patient's tumor to be sequenced to figure out its molecular nature. That is pretty standard these days. And then, only a small fraction of patients will get the good news that this drug may help them. Lung cancers are the principal candidates currently, (of which about 15% have this mutation), while only about 1-2% of other cancers have this mutation. This drug has some toxicity- while it is a magic bullet, its magic is far from perfect, (which is odd given the exquisite selectivity it has for the mutated form of KRAS, which should only exist in cancer tissues). And lastly, it gives, on average, under six months of reprieve from cancer progression, compared to four and a half months with a more generic drug. As mentioned above, tumors at this stage are riven with other mutations and evolve resistence to this treatment with appalling relentlessness.

    While it is great to have developed a new class of drugs like this one against a very recalcitrant target, and done so on a highly rational basis driven by our growing molecular knowlege of cancer biology, this result seems like a bit of a let-down. And note also that this achievement required decades of publicly funded research, and doubtless a billion dollars or more of corporate investment to get to this point. Costs are about twenty five thousand dollars per patient, and overall sales are maybe two hundred million dollars per year, expected to increase steadily.

    Does this all make sense? I am not sure, but perhaps the important part is that things can not get worse. The patent on this drug will eventually expire and its costs will come down. And the research community will keep looking for other, better ways to attack hard targets like KRAS, and will someday succeed.