Showing posts with label molecular biology. Show all posts
Showing posts with label molecular biology. Show all posts

Sunday, July 23, 2023

Many Ways There are to Read a Genome

New methods to unravel the code of transcriptional regulators.

When we deciphered the human genome, we came up with three billion letters of its linear code- nice and tidy. But that is not how it is read inside our cells. Sure, it is replicated linearly, but the DNA polymerases don't care about the sequence- they are not "reading" the book, they are merely copying machines trying to get it to the next generation with as few errors as possible. The book is read in an entirely different way, by a herd of proteins that recognize specific sequences of the DNA- the transcription regulators (also commonly called transcription factors [TF], in the classic scientific sense of some "factor" that one is looking for). These regulators- and there are, by one recent estimate, 1,639 of them encoded in the human genome- constitute an enormously complex network of proteins and RNAs that regulate each other, and regulate "downstream" genes that encode everything else in the cell. They are made in various proportions to specify each cell type, to conduct every step in development, and to respond to every eventuality that evolution has met and mastered over the eons.

Loops occur in the DNA between site of regulator binding, in order to turn genes on (enhancer, E, and transcription regulator/factor, TF).

Once sufficient transcription regulators bind to a given gene, it assembles a transcription complex at its start site, including the RNA polymerase that then generates an RNA copy that can float off to be made into a protein, (such as a transcription regulator), or perhaps function in its RNA form as part of zoo of rRNA, tRNA, miRNA, piRNA, and many more that also help run the cell. Some regulators can repress transcription, and many cooperate with each other. There are also diverse regions of control for any given target gene in its nearby non-coding DNA- cassettes (called enhancers) that can be bound by different regulators and thus activated at different stages for different reasons. 

These binding sites in the DNA that transcription regulators bind to are typically quite small. A classic regulator SP1 (itself 785 amino acids long and bearing three consecutive DNA binding motifs coordinated by a zinc ions) binds to a sequence resembling (G/T)GGGCGG(G/A)(G/A)(C/T). So only ten bases are specified at all, and four of those positions are degenerate. By chance, a genome of three billion bases will have such a sequence about 45,769 times. So this kind of binding is not very strictly specified, and such sites tend to appear and disappear frequently in evolution. That is one of the big secrets of evolution- while some changes are hard, others are easy, and it there is constant variation and selection going on in the regulatory regions of genes, refining and defining where / when they are expressed.

Anyhow, researchers naturally have the question- what is the regulatory landscape of a given gene under some conditions of interest, or of an entire genome? What regulators bind, and which ones are most important? Can we understand, given our technical means, what is going on in a cell from our knowledge of transcription regulators? Can we read the genome like the cell itself does? Well the answer to that is, obviously no and not yet. But there are some remarkable technical capabilities. For example, for any given regulator, scientists can determine where it binds all over the genome in any given cell, by chemical crosslinking methods. The prediction of binding sites for all known regulators has been a long-standing hobby as well, though given the sparseness of this code and the lability of the proteins/sites, one that gives only statistical, which is to say approximate, results. Also, scientists can determine across whole genomes where genes are "open" and active, vs where they are closed. Chromatin (DNA bound with histones in the nucleus) tends to be closed up on repressed and inactive genes, while transcription regulators start their work by opening chromatin to make it accessible to other regulators, on active genes.

This last method offers the prospect of truly global analysis, and was the focus of a recent paper. The idea was to merge a detailed library predicted binding sites for all known regulators all over the genome, with experimental mapping of open chromatin regions in a particular cell or tissue of interest. And then combine all that with existing knowledge about what each of the target genes near the predicted binding sites do. The researchers clustered the putative regulators binding across all open regions by this functional gene annotation to come up with statistically over-represented transcription regulators and functions. This is part of a movement across bioinformatics to fold in more sources of data to improve predictions when individual methods each produce sketchy, unsatisfying results.

In this case, mapping open chromatin by itself is not very helpful, but becomes much more helpful when combined with assessments of which genes these open regions are close to, and what those genes do. This kind of analysis can quickly determine whether you are looking at an immune cell or a neuron, as the open chromatin is a snapshot of all the active genes at a particular moment. In this recent work, the analysis was extended to say that if some regulators are consistently bound near genes participating in some key cellular function, then we can surmise that that regulator may be causal for that cell type, or at least part of the program specific to that cell. The point for these researchers is that this multi-source analysis performs better in finding cell-type specific, and function-specific, regulators than is the more common approach of just adding up the prevalence of regulators occupying open chromatin all over a given genome, regardless of the local gene functions. That kind of approach tends to yield common regulators, rather than cell-type specific ones. 

To validate, they do rather half-hearted comparisons with other pre-existing techniques, without blinding, and with validation of only their own results. So it is hardly a fair comparison. They look at the condition systemic lupus (SLE), and find different predictions coming from their current technique (called WhichTF) vs one prior method (MEME-ChIP).  MEME-ChIP just finds predicted regulator binding sites for genomic regions (i.e. open chromatin regions) given by the experimenter, and will do a statistical analysis for prevalence, regardless of the functions of either the regulator or the genes it binds to. So you get absolute prevalence of each regulator in open (active) regions vs the genome as a whole. 

Different regulators are identified from the same data by different statistical methods. But both sets are relevant.


What to make of these results? The MEME-ChIP method finds regulators like SP1, SP2, SP4, and ZFX/Y. SP1 et al. are very common regulators, but that doesn't mean they are unimportant, or not involved in disease processes. SP1 has been observed as likely to be involved in autoimmune encephalitis in mice, a model of multiple sclerosis, and naturally not so far from lupus in pathology. ZFX is also a prominent regulator in the progenitor cells of the immune system. So while these authors think little of the competing methods, those methods seem to do a very good job of identifying significant regulators, as do their own methods. 

There is another problem with the author's WhatTF method, which is that gene annotation is in its infancy. Users are unlikely to find new functions using existing annotations. Many genes have no known function yet, and new functions are being found all the time for those already assigned functions. So if one's goal is classification of a cell or of transcription regulators according to existing schemes, this method is fine. But if one has a research goal to find new cell types, or new processes, this method will channel you into existing ones instead.

This kind of statistical refinement is unlikely to give us what we seek in any case- a strong predictive model of how the human genome is read and activatated by the herd of gene regulators. For that, we will need new methods for specific interaction detection, with a better appreciation for complexes between different regulators, (which will be afforded by the new AI-driven structural techniques), and more appreciation for the many other operators on chromatin, like the various histone modifying enzymes that generate another whole code of locks and keys that do the detailed regulation of chromatin accessibility. Reading the genome is likely to be a somewhat stochastic process, but we have not yet arrived at the right level of detail, or the right statistics, to do it justice.


  • Unconscious messaging and control. How the dark side operates.
  • Solzhenitsyn on evil.
  • Come watch a little Russian TV.
  • "Ruthless beekeeping practices"
  • The medical literature is a disaster.

Saturday, June 10, 2023

A Hard Road to a Cancer Drug

The long and winding story of the oncogene KRAS and its new drug, sotorasib.

After half a century of the "War on Cancer", new treatments are finally straggling into the clinic. It has been an extremely hard and frustrating road to study cancer, let alone treat it. We have learned amazing things, but mostly we have learned how convoluted a few billion years of evolution can make things. The regulatory landscape within our cells is undoubtedly the equal of any recalcitrant bureaucracy, full of redundant offices, multiple veto points, and stakeholders with obscure agendas. I recently watched a seminar in the field, which discussed one of the major genes mutated in cancer and what it has taken to develop a treatment against it. 

Cancer is caused by DNA mutations, and several different types need to occur in succession. There are driver mutations, which are the first step in the loss of normal cellular control. But additional mutations have to happen for such cells to progress through regulatory blocks, like escape from local environmental controls on cell type and cell division, past surveillance by the immune system, and past the reluctance of differentiated cells to migrate away from their resident organ. By the end, cancer cells typically have huge numbers of mutations, having incurred mutations in their DNA repair machinery in an adaptive effort to evade all these different controls.

While this means that many different targets exist that can treat some cancers, it also means that any single cancer requires a precisely tailored treatment, specific to its mutated genes. And that resistance is virtually inevitable given the highly mutable nature of these cells. 

One of the most common genes to be mutated to drive cancer (in roughly 20% of all cases) is KRAS, part of the RAS family of NRAS, KRAS, and HRAS. These were originally discovered through viruses that cause cancer in rats. These viruses (such as Kirsten rat sarcoma virus) had a copy of a rat gene in it, which it overpoduces and uses to overcome normal proliferation controls during infection. The viral gene was called an oncogene, and the original rat (or human) version was called a proto-oncogene, named KRAS. The RAS proteins occupy a central part of the signaling path that external events and stresses turn on to activate cell growth and proliferation, called the MAP kinase cascade. For instance, epidermal growth factor comes along in the blood, binds to a receptor on the outside of a cell, and turns on RAS, then MEK, MAPK, and finally transcription regulators that turn on genes in the nucleus, resulting in new proteins being expressed. "Turning on" means different things at each step in this cascade. The transcription regulators typically get phosphorylated by their upstream kinases like MAPK, which tag them for physical transport into the nucleus, where they can then activate genes. MAPK is turned on by being itself phosphorylated by MEK, and MEK is phosphorylated by RAF. RAF is turned on by binding to RAS, whose binding activity in turn is regulated by the state of a nucleotide (GTP) bound by RAS. When binding GTP, RAS is on, but if binding GDP, it is off.

A schematic of the RAS pathway, whereby extracellular growth signals are interpreted and amplified inside our cells, resulting in new gene expression as well as other more immediate effects. The cell surface receptor, activated by its ligand, activates associated SOS which activates RAS to the active (GTP) state. This leads to a kinase cascade through RAF, MEK, and MAPK and finally to gene regulators like MYC.

This whole system seems rather ornate, but it accomplishes one important thing, which is amplification. One turned-on RAF molecule or MEK molecule can turn on / phosphorylate many targets, so this cascade, though it appears linear in a diagram, is acutally a chain reaction of sorts, amplifying as it goes along. And what governs the state of RAS and its bound GTP? The state of the EGFR receptor, of course. When KRAS is activated, the resident GDP leaves, and GTP comes to take its place. RAS is a weak GTPase enzyme itself, slowly converting itself from the active back to the inactive state with GDP. 

Given all this, one would think that RAS, and KRAS in particular, might be "druggable", by sticking some well-designed molecule into the GTP/GDP binding pocket and freezing it in an inactive state. But the sad fact of the matter is that the affinity KRAS has to GTP is incredibly high- so high it is hard to measure, with a binding constant of about 20 pM. That is, half the KRAS-bound GTP comes off when the ambient concentration of GTP is infinitesimal, 0.02 nano molar. This means that nothing else is likely to be designed that can displace GTP or GDP from the KRAS protein, which means that in traditional terms, it is "undruggable". What is the biological logic of this? Well, it turns out that the RAS enzymes are managed by yet other proteins, which have the specific roles of prying GDP off (GTP exchange factor, or GEF) and of activating the GTP-ase activity of RAS to convert GTP to GDP (GTPase activating protein, or GAP). It is the GEF protein that is stimulated by the receptors like EGFR that induce RAS activity. 

So we have to be cleverer in finding ways to attack this protein. Incidentally, most of the oncogenic mutations of KRAS are at the twelfth residue, glycine, which occupies a key part of the GAP binding site. As glycine is the smallest amino acid, any other amino acid here is bulkier, and blocks GAP binding, which means that KRAS with any of these mutations can not be turned off. It just keeps on signaling and signaling, driving the cell to think it needs to grow all the time. This property of gain of function and the ability of any mutation to fit the bill is why this particular defect in KRAS is such a common cancer-driving mutation. It accounts for ~90% of pancreatic cancers, for instance. 

The seminar went on a long tangent, which occupied the field (of those looking for ways to inhibit KRAS with drugs) for roughly a decade. RAS proteins are not intrinsically membrane proteins, but they are covalently modified with a farnesyl fatty tail, which keeps them stuck in the cell's plasma membrane. Indeed, if this modification is prevented, RAS proteins don't work. So great- how to prevent that? Several groups developed inhibitors of the farnesyl transferase enzyme that carries out this modification. The inhibitors worked great, since the farnesyl transferase has a nice big pocket for its large substrate to bind, and doesn't bind it too tightly. But they didn't inhibit the RAS proteins, because there was a backup system- geranygeranyl transferase that steps into the breach as a backup, which can attach an even bigger fatty tail to RAS proteins. Arghhh!

While some are working on inhibiting both enzymes, the presenter, Kevan Shokat of UCSF, went in another direction. As a chemist, he figured that for the fraction of the KRAS mutants at position 12 that transform from glycine to cysteine, some very specific chemistry (that is, easy methods of cross-linking), can be brought to bear. Given the nature of the genetic code, the fraction of mutations that go from glycine to cysteine are small, there being eight amino acids that are within a one-base change of glycine, coded by GGT. So at best, this approach is going to have a modest impact. Nevertheless, there was little choice, so they forged ahead with a complicated chemical scheme to make a small molecule that could chemically crosslink to that cysteine, with selectivity determined by a modest shape fit to the surface of the KRAS protein near this GEF binding site. 

A structural model of KRAS, with its extremely tightly-bound substrate GDP in orange. The drug sotorasib is below in teal, bound in another pocket, with a tail extending upwards to the (mutant) cysteine 12, which is not differentiated by color, but sits over a magnesium ion (green) being coordinated by GDP. The main job of sotorasib is to interfere with the binding of the guanine exchange factor (GEF) which happens on the surface to its left, and would reset KRAS to an active state.

This approach worked surprisingly well, as the KRAS protein obligingly offfered a cryptic nook that the chemists took advantage of to make this hybrid compound, now called the drug sotorasib. This is an FDA-approved treatment for cancers which are specifically driven by this particular KRAS mutation of position 12 from glycine to cysteine. That research group is currently trying to extend their method to other mutant forms, with modest success. 

So let's take a step back. This new treatment requires, obviously, the patient's tumor to be sequenced to figure out its molecular nature. That is pretty standard these days. And then, only a small fraction of patients will get the good news that this drug may help them. Lung cancers are the principal candidates currently, (of which about 15% have this mutation), while only about 1-2% of other cancers have this mutation. This drug has some toxicity- while it is a magic bullet, its magic is far from perfect, (which is odd given the exquisite selectivity it has for the mutated form of KRAS, which should only exist in cancer tissues). And lastly, it gives, on average, under six months of reprieve from cancer progression, compared to four and a half months with a more generic drug. As mentioned above, tumors at this stage are riven with other mutations and evolve resistence to this treatment with appalling relentlessness.

While it is great to have developed a new class of drugs like this one against a very recalcitrant target, and done so on a highly rational basis driven by our growing molecular knowlege of cancer biology, this result seems like a bit of a let-down. And note also that this achievement required decades of publicly funded research, and doubtless a billion dollars or more of corporate investment to get to this point. Costs are about twenty five thousand dollars per patient, and overall sales are maybe two hundred million dollars per year, expected to increase steadily.

Does this all make sense? I am not sure, but perhaps the important part is that things can not get worse. The patent on this drug will eventually expire and its costs will come down. And the research community will keep looking for other, better ways to attack hard targets like KRAS, and will someday succeed.


Saturday, May 27, 2023

Where Does Oxygen Come From?

Boring into the photosynthetic reaction center of plants, where O2 is synthesized.

Oxygen might be important to us, but it is really just a waste product. Photosynthetic bacteria found that the crucial organic molecules of life that they were making out of CO2 and storing in the form of reduced compounds (like fats and sugars) had to get those reducing units (i.e. electrons) from somewhere. And water stepped up as a likely candidate, with its abudance and simplicity. After you take four electrons away from two water molecules, you are left with four protons and one molecular oxygen molecule, i.e. O2. The protons are useful to fuel the proton-motive force system across the photosynthetic membrane, making ATP. But what to do with the oxygen? It just bubbles away, but can also be used later in metabolism to burn up those high-energy molecules again, if you have evolved aerobic metabolism.

On the early earth, reductants like reduced forms of iron and sulfur were pretty common, so they were the original sources of electrons for all metabolism. Indeed, most theories of the origin of life place it in dynamic rocky redox environments like hydrothermal vents that had such conducive chemistry. But these compounds are not quite common enough for universal photosynthesis. For example, a photosynthetic bacterium floating at the top of the ocean would like to continue basking in the sun and metabolizing, even if the water around it is relatively clear of reduced iron, perhaps because of competition from its colleagues. What to do? The cyanobacteria came up with an amazing solution- split water!

A general overview of plant and cyanobacterial photosystems, comprising the first (PSII), where the first light quantum hits and oxygen is split, an intervening electron transport chain where energy is harvested, and the second (PS1), where a second light quantum hits, more energy is harvested, and the electron ends up added to NADP. From the original water molecules, protons are used to power the membrane proton-motive force and ATP synthesis, while the electrons are used to reduce CO2 and create organic chemicals.

A schematic of the P680 center of photosystem II. Green chlorophylls are at the center, with magnesium atoms (yellow). Light induces electron movement as denoted by the red arrows, out of the chlorophyll center and onwards to other cytochrome molecules. Note that the electrons originate at the bottom out of the oxygen evolving complex, or OEC, (purple), and are transferred via an aromatic tyrosine (TyrZ) side chain, coordinating with a nearby histidine (H189) protein side chain.

This is not very easy, however, since oxygen is highly, even notoriously "electronegative". That is, it likes and keeps its electrons. It takes a super-oxidant to strip those electrons off. Cyanobacteria came up with what is now called photosystem II (that is, it was discovered after photosystem I), which collects light through a large quantum antenna of chlorophyll molecules, ending up at a special pairing of chlorophyll molecules called P680. These collect the photon, and in response bump an electron up in energy and out to an electron chain that courses through the rest of the photosynthetic system, including photosystem I. At this point, P680 is hungry for an electron, indeed has the extreme oxidation potential needed to take electrons from oxygen. And one is conducted in from the oxygen evolving center (OEC), sitting nearby.

A schematic illustrating both the evolutionary convergence that put both photosystems (types I and II) into one organism (cyanobacteria, which later become plant chloroplasts), and the energy levels acquired by the main actors in the photosynthetic process, quoted in electron volts. At the very bottom (center) is a brief downward slide as oxygen is split by the pulling force of the super-oxidation state of light-activated P680. After the electrons are light-excited, they drop down in orderly fashion through a series of electron chain transits to various cytochromes, quinones, ferredoxins, and other carriers that generate either protons or chemical reducing power as they go along. Note how the depth of the oxygen-splitting oxidation state is unique among photosynthetic systems.

A recent paper resolves the long-standing problem of how exactly oxygen is oxidized by cyanobacteria and plants at the OEC, at the very last step before oxygen release. This center is a very strained cubic metal complex of one calcium and four manganese atoms, coordinated by oxygen atoms. The overall process is that two water molecules come in, four protons and four electrons are stripped off, and the remaining oxygens combine to form O2. This is, again, part of the grand process of metabolism, whose point is to add those electrons and protons to CO2, making the organic molecules of life, generally characterized as (-CH2-), such as fats, sugars, etc. Which can be burned later back into CO2. Metals are common throughout organic chemistry as catalysts, because they have a wonderful property of de-localizing electrons and allowing multiple oxidation states, (number of extra or missing electrons), unlike the more sparse and tightly-held states of the smaller elements. So they are used in many redox cofactors and enzymes to facilitate electron movement, such as in chlorophyll itself.


The authors provide a schematic of the manganese-calcium OEC reaction center. The transferring tyrosine is at top, calcium is in fuschia/violet, the manganese atoms are in purple, and the oxygens are in red. Arrows point to the oxygens destined to bond to each other and "evolve" away as O2. Note how one of these (O6) is only singly-coordinated and is sort of awkwardly wedged into the cube. Note also how the bond lengths to calcium are all longer than those to manganese, further straining the cube. These strains help to encourage activation and expulsion of the target oxygens.

Here, in the oxygen evolving center, the manganese atoms are coordinated all around with oxygens, which presents the question- which ones are the ones? Which are destined to become O2, and how does the process happen? These researchers didn't use complicated femtosecond X-ray systems or cyclotrons, (though they draw on the structural work of those who did), but room-temperature FTIR, which is infrared spectroscopy highly sensitive to organic chemical dynamics. Spinach leaf chloroplasts were put through an hour of dark adaptation, (which sets the OEC cycle to state S1), then hit with flashes of laser light to advance the position of the oxygen evolving cycle, since each flash (5 nanoseconds) induces one electron ejection by P680, and one electron transfer out of the OEC. Thus the experimenters could control the progression of the whole cycle, one step at a time, and then take extremely close FTIR measurements of the complexes as they do their thing in response to each single electron ejection. Some of the processes they observed were very fast (20 nanoseconds), but others were pretty slow, up to 1.5 milliseconds for the S4 state to eject the final O2 and reset to the S0 state with new water molecules. They then supplement their spectroscopy with the structural work from others and with computer dynamics simulations of the core process to come up with a full mechanism.


A schematic of the steps of oxygen evolution out of the manganese core complex, from states S0 to S4. Note the highly diverse times that elapse at the various steps, noted in nano, micro, or milli seconds. This is discussed further in the text.


Other workers have provided structural perspectives on this question, showing that the cubic metal structure is bit more weird than expected. An extra oxygen (numbered as #6) wedges its way in the cube, making the already strained structure (which accommodates a calcium and a dangling extra manganese atom) highly stressed. This is a complicated story, so several figures are provided here to give various perspectives. The sequence of events is that first, (S0), two waters enter the reaction center after the prior O2 molecule has left. Water has a mix of acid (H+) and base (OH-) ionic forms, so it is easy to bring in the hydroxyl form instead of complete water, with matching protons quickly entering the proton pool for ATP production. Then another proton quickly leaves as well, so the waters have now become two oxygens, one hydrogen, and four electrons (S0). Two of the coordinated manganese atoms go from their prior +4, +4 oxidation state to +3 and +2, acting as electron buffers. 

The first two electrons are pulled out rapidly, via the nearby tyrosine ring, and off to the P680 center (ending at S2, with Mn 3+ and Mn 4+). But the next steps are much slower, extricating the last two electrons from the oxygens and inducing them to bond each other. With state S3 and one more electron removed, both manganese atoms are back to the 4+ state. In the last step, one last proton leaves and one last electron is extracted over to the tyrosine oxygen, and the oxygen 6 is so bereft as to be in a radical state, which allows it to bow over to oxygen 5 and bond with it, making O2. The metal complex has nicely buffered the oxidation states to allow these extractions to go much more easily and in a more coordinated fashion than can happen in free solution.

The authors provide a set of snapshots of their infrared spectroscopy-supported simulations (done with chemical and quantum fidelity) of the final steps, where oxygens, in the bottom panel, bond together at center. Note how the atomic positions and hydrogen attachments also change subtly as the sequence progresses. Here the manganese atoms are salmon, oxygen red, calcium yellow, hydrogen white, and a chlorine molecule is green.

This closely optimized and efficient reaction system is not just a wonder of biology and of earth history, but an object lesson in chemical technology, since photolysis of water is a very relevant dream for a sustainable energy future- to efficiently provide hydrogen as a fuel. Currently, using solar power to run water electrolyzers is not very efficient (20% for solar, and 70% for electrolysis = 14%). Work is ongoing to design direct light-to-hydrogen hydrolysis, but so far uses high heat and noxious chemicals. Life has all this worked out at the nano scale already, however, so there must be hope for better methods.


  • The US carried off an amazing economic success during the pandemic, keeping everything afloat as 22 million jobs were lost. This was well worth a bit of inflation on the back end.
  • Death
  • Have we at long last hit peak gasoline?
  • The housing crisis and local control.
  • The South has always been the problem.
  • The next real estate meltdown.

Saturday, May 6, 2023

The Development of Metamorphosis

Adulting as a fly involves a lot of re-organization.

Humans undergo a slight metamorphosis, during adolescence. Imagine undergoing pupation like insects do and coming out with a totally new body, with wings! Well, Kafka did, and it wasn't very pleasant. But insects do it all the time, and have been doing it for hundreds of millions of years, taking to the air and dominating the biosphere. What goes on during metamorphosis, how complete is its refashioning of the body, and how did it evolve? A recent paper (review) considered in detail how the brains of insects change during metamorphosis, finding a curious blend of birth, destruction, and reprogramming among their neurons.

Time is on the Y axis, and the emergence of later, more advanced types of insects is on the X axis. This shows the progressive elaboration of non-metamorphosis (ametabolous), partially metamorphosing (hemimetabolous), and fully metamorphosing (holometabolous) forms. Dragonflies are only partially metamorphosing in this scheme, though their adult forms are often highly different from their larval (nymph) form.


Insects evolved from crustaceans, and took to land as small silvertail-like creatures with exoskeletons, roughly 450 million years ago. Over 100 million years, they developed the process of metamorphosis as a way to preserve the benefits of their original lifestyle for early development, in moist locations, while conquering the air and distance as adults. Early insect types are termed ametabolous, meaning that they have no metamorphosis at all, developing straight from eggs to an adult-style form. These go through several molts to accommodate growth, but don't redesign their bodies. Next came hemimetabolous development, which is exemplified by grasshoppers and cockroaches. Also dragonflies, which significantly refashion themselves during the last molt, gaining wings. In the nymph stage, those wings were carried around as small patches of flat embryonic tissue, and then suddenly grow out at the last molt. Dragonflies are extreme, and most hemimetabolous insects don't undergo such dramatic change. Last came holometabolous development, which involves pupation and a total redesign of the body that can go from a caterpillar to a butterfly.

The benefit of having wings is pretty clear- it allows huge increases in range for feeding and mating. Dragonflies are premier flying predators. But as a larva, wallowing in fruit juice or leaf sap or underwater, as dragonflies are, wings and long legs would be a hindrance. This conundrum led to the innovation of metamorphosis, based on the already somewhat dramatic practice of molting off the exoskeleton periodically. If one can grow a whole new skeleton, why not put wings on it, or legs? And metamorphosis has been tremendously successful, used by over 98% of insect species.

The adult insect tissues do not come from nowhere- they are set up as arrested embryonic tissues called imaginal discs. These are small patches that exist in the larva at specific positions. During pupation, while much of the rest of the body refashions itself, imaginal discs rapidly develop into future tissues like wings, legs, genitalia, antennas, and new mouth parts. These discs have a fascinating internal structure that prefigures the future organ. The leg disc is concentrically arranged with the more distant future parts (toes) at its center. Transplanting a disc from one insect to another or one place to another doesn't change its trajectory- it will still become a leg wherever it is put. So it is apparent that the larval stage is an intermediate stage of organismal development, where a bunch of adult features are primed but put on hold, while a simpler and much more primitive larval body plan is executed to accommodate its role in early growth and its niche in tight, moist, hidden places.

The new paper focuses on the brain, which larva need as well as adults. So the question is- how does the one brain develop from the other? Is the larval brain thrown away? The answer is that no, the brain is not thrown away at all, but undergoes its own quite dramatic metamorphosis. The adult brain is substantially bigger, so many neurons are added. A few neurons are also killed off. But most of the larval neurons are reprogrammed, trimmed back and regrown out to new regions to do new functions.

In this figure, the neurons are named as mushroom body outgoing neuron (MBON) or dopaminergic neuron (DAN, also MBIN for incoming mushroom body neuron), mushroom body extrinsic neuron to calyx (MBE-CA), and mushroom body protocerebral posterior lateral 1 (PPL1). MBON-c1 is totally reprogrammed, MBON-d1 changes its projections substantially, as do the (teal) incoming neurons, and MBON-12 was not operational in the larval stage at all. Note how MBON-c1 is totally reprogrammed to serve new locations in the adult.

The mushroom body, which is the brain area these authors focus on, is situated below the antennas and mediates smell reception, learning, and memory. Fly biologists regard it as analogous to our cortex- the most flexible area of the brain. Larvae don't have antennas, so their smell/taste reception is a lot more primitive. The mushroom body in drosophila has about a hundred neurons at first, and continuously adds neurons over larval life, with a big push during pupation, ending up with ~2200 neurons in adults. Obviously this has to wire into the antennas as they develop, for instance.

The authors find that, for instance, no direct connections between input and output neurons of the mushroom body (MBIN and MBON, respectively) survive from larval to adult stages. Thus there can be no simple memories of this kind preserved between these life stages. While there are some signs of memory retention for a few things in flies, for the most part the slate is wiped clean. 

"These MBONs [making feedback connections] are more highly interconnected in their adult configuration compared to their larval one: their adult configuration shows 13 connections (31% of possible connections), while their larval configuration has only 7 (17%). Importantly, only three of these connections (7%) are present in both larva and adult. This percentage is similar to the 5% predicted if the two stages were wired up independently at their respective frequencies."


Interestingly, no neuron changed its type- that is, which neurotransmitter it uses to communicate. So, while pruning and rewiring was pervasive, the cells did not fundamentally change their stripes. All this is driven by the hormonal system (juvenile hormone, which blocks adult development, and ecdysone, which drives molting, and in the absence of juvenile hormone, pupation) which in turn drives a program of transcription factors that direct the genes needed for development. While a great deal is known about neuronal pathfinding and development, this paper doesn't comment on those downstream events- how it is that selected neurons are pruned, turned around, and induced to branch out in totally new directions, for instance. That will be the topic of future work.


  • Corrupt business practices. Why is this lawful?
  • Why such easy bankruptcy for corporations, but not for poor countries?
  • Watch the world's mesmerizing shipping.
  • Oh, you want that? Let me jack up the price for you.
  • What transgender is like.
  • "China has arguably been the biggest beneficiary of the U.S. security system in Asia, which ensured the regional stability that made possible the income-boosting flows of trade and investment that propelled the country’s economic miracle. Today, however, General Secretary of the Chinese Communist Party Xi Jinping claims that China’s model of modernization is an alternative to “Westernization,” not a prime example of its benefits."

Saturday, April 8, 2023

Molecules That See

Being trans is OK: retinal and the first event of vision.

Our vision is incredible. If I was not looking right now and experiencing it myself, it would be unbelievable that a biological system made up of motley molecules could accomplish the speed, acuity and color that our visual system provides. It was certainly a sticking point for creationists, who found (and perhaps still find) it incredible that nature alone can explain it, not to mention its genesis out of the mists of evolutionary time. But science has been plugging away, filling in the details of the pathway, which so far appear to arise by natural means. Where consciousness fits in has yet to be figured out, but everything else is increasingly well-accounted. 

It all starts in the eye, which has a curiously backward sheet of tissue at the back- the retina. Its nerves and blood vessels are on the surface, and after light gets through those, it hits the photoreceptor cells at the rear. These photoreceptor cells come in two types, rods (non-color sensitive) and cones (sensitive to either red, green, or blue). The photoreceptor cells have a highly polarized and complicated structure, where photosensitive pigments are bottom-most in a dense stack of membranes. Above these is a segment where the mitochondria reside, which provide power, as vision needs a lot of energy. Above these is the nucleus of the cell (the brains of the operation) and top-most is the synaptic output to the rest of the nervous system- to those nerves that network on the outside of the retina. 

A single photoreceptor cell, with the outer segment at the very back of the retina, and other elements in front.

Facing the photoreceptor membranes at the bottom of the retina is the retinal pigment epithelium, which is black with melanin. This is finally where light stops, and it also has very important functions in supporting the photoreceptor cells by buffering their ionic, metabolic, and immune environment, and phagocytosing and digesting photoreceptor membranes as they get photo-oxidized, damaged, and sloughed off. Finally, inside the photoreceptor cells are the pigment membranes, which harbor the photo-sensitive protein rhodopsin, which in turn hosts the sensing pigment, retinal. Retinal is a vitamin A-derived long-chain molecule that is bound inside rhodopsin or within other opsins which respectively confer slightly shifted color sensitivity. 

These opsins transform the tickle that retinal receives from a photon into a conformational change that they, as GPCRs (G-protein coupled receptors), transmit to G-proteins, called transducin. For each photon coming in, about 50 transducin molecules are activated. Each of activated transducin G-protein alpha subunits induce (in its target cGMP phosphodisterase) about 1000 cGMP molecules to be consumed. The local drop in cGMP concentration then closes the cGMP-gated cation channels in the photoreceptor cell membrane, which starts the electrical impulse that travels out to the synapse and nervous system. This amplification series provides the exquisite sensitivity that allows single photons to be detected by the system, along with the high density of the retinal/opsin molecules packed into the photoreceptor membranes.

Retinal, used in all photoreceptor cell types. Light causes the cis-form to kick over to the trans form, which is more stable.

The central position of retinal has long been understood, as has the key transition that a photon induces, from cis-retinal to all-trans retinal. Cis-retinal has a kink in the middle, where its double bond in the center of the fatty chain forms a "C" instead of a "W", swinging around the 3-carbon end of the chain. All-trans retinal is a sort of default state, while the cis-structure is the "cocked" state- stable but susceptible to triggering by light. Interestingly, retinal can not be reset to the cis-state while still in the opsin protein. It has to be extracted, sent off to a series of at least three different enzymes to be re-cocked. It is alarming, really, to consider the complexity of all this.

A recent paper (review) provided the first look at what actually happens to retinal at the moment of activation. This is, understandably, a very fast process, and femtosecond x-ray analysis needed to be brought in to look at it. Not only that, but as described above, once retinal flips from the dark to the light-activated state, it never reverses by itself. So every molecule or crystal used in the analysis can only be used once- no second looks are possible. The authors used a spray-crystallography system where protein crystals suspended in liquid were shot into a super-fine and fast X-ray beam, just after passing by an optical laser that activated the retinal. Computers are now helpful enough that the diffractions from these passing crystals, thrown off in all directions, can be usefully collected. In the past, crystals were painstakingly positioned on goniometers at the center of large detectors, and other issues predominated, such as how to keep such crystals cold for chemical stability. The question here was what happens in the femto- and pico-seconds after optical light absorption by retinal, ensconced in its (temporary) rhodopsin protein home.

Soon after activation, at one picosecond, retinal has squirmed around, altering many contacts with its protein. The trans (dark) conformation is shown in red, while the just-activated form is in yellow. The PSB site on the far end of the fatty chain (right) is secured against the rhodopsin host, as is the retinal ring (left side), leaving the middle of the molecule to convey most of the shape change, a bit like a bicycle pedal.

And what happens? As expected, the retinal molecule twists from cis to trans, causing the protein contacts to shift. The retinal shift happens by 200 femtoseconds, and the knock-on effects through the protein are finished by 100 picoseconds. It all makes a nanosecond seem impossibly long! As imaged above, the shape shift of retinal changes a series of contacts it has with the rhodopsin protein, inducing it to change shape as well. The two ends of the retinal molecule seem to be relatively tacked down, leaving the middle, where the shape change happens, to do most of the work. 

"One picosecond after light activation, rhodopsin has reached the red-shifted Batho-Rh intermediate. Already by this early stage of activation, the twisted retinal is freed from many of its interactions with the binding pocket while structural perturbations radiate away as a transient anisotropic breathing motion that is almost entirely decayed by 100 ps. Other subtle and transient structural rearrangements within the protein arise in important regions for GPCR activation and bear similarities to those observed by TR-SFX during photoactivation of seven-TM helix retinal-binding proteins from bacteria and archaea."

All this speed is naturally lost in the later phases, which take many milliseconds to send signals to the brain, discern movement and shape, to identify objects in the scene, and do all the other processing needed before consciousness can make any sense of it. But it is nice to know how elegant and uniform the opening scene in this drama is.


  • Down with lead.
  • Medicare advantage, cont.
  • Ukraine, cont.
  • What the heck is going on in Wisconsin?
  • Graph of the week- world power needs from solar, modeled to 2050. We are only scratching the surface so far.



Saturday, March 11, 2023

An Origin Story for Spider Venom

Phylogenetic analysis shows that the major component of spider venom derives from one ancient ancestor.

One reason why biologists are so fully committed to the Darwinian account of natural selection and evolution is that it keeps explaining and organizing what we see. Despite the almost incredible diversity and complexity of life, every close look keeps confirming what Darwin sensed and outlined so long ago. In the modern era, biology has gone through the "Modern Synthesis", bringing genetics, molecular biology, and evolutionary theory into alignment with mutually supporting data and theories. For example, it was Linus Pauling and colleagues (after they lost the race to determine the structure of DNA) who proposed that the composition of proteins (hemoglobin, in their case) could be used to estimate evolutionary relationships, both among those molecules, and among their host species.

Naturally, these methods have become vastly more powerful, to the point that most phylogenetic analyses of the relationship between species (including the definition of what species are, vs subspecies, hybrids, etc.) are led these days by DNA analysis, which provides the richest possible trove of differentiating characters- a vast spectrum from universally conserved to highly (and forensically) varying. And, naturally, it also constitutes a record of the mutational steps that make up the evolutionary process. The correlation of such analyses with other traditionally used diagnostic characters, and with the paleontological record, is a huge area of productive science, which leads, again and again, to new revelations about life's history.


One sample structure of a DRP- the disulfide rich protein that makes up most of spider venoms.
 The disulfide bond (between two cysteines) is shown in red. There is usually another disulfide helping to hold the two halves of the molecule together as well. The rest of the molecule is (evolutionarily, and structurally) free to change shape and character, in order to carry out its neuron-channel blocking or other toxic function.

One small example was published recently, in a study of spider venoms. Spiders arose, from current estimates, about 375 million years ago, and comprise the second most prevalent form of animal life, second only to their cousins, the insects. They generally have a hunting lifestyle, using venom to immobilize their prey, after capture and before digestion. These venoms are highly complex brews that can have over a hundred distinct molecules, including potassium, acids, tissue- and membrane-digesting enzymes, nucleosides, pore-forming peptides, and neurotoxins. At over three-fourths of the venom, the protein-based neurotoxins are the most interesting and best studied of the venom components, and a spider typically deploys dozens of types in its venom. They are also called cysteine-rich peptides or disulfide-rich peptides (DRPs) due to their composition. The fact that spiders tend to each have a large variety of these DRPs in their collection argues that a lot of gene duplication and diversification has occured.

A general phylogenetic tree of spiders (left). On the right are the signal peptides of a variety of venoms from some of these species. The identity of many of these signal sequences, which are not present in the final active protein, is a sign that these venom genes were recently duplicated.

So where do they come from? Sequences of the peptides themselves are of limited assistance, being small, (averaging ~60 amino acids), and under extensive selection to diversify. But they are processed from larger proteins (pro-proteins) and genes that show better conservation, providing the present authors more material for their evolutionary studies. The figure above, for example, shows, on the far right, the signal peptides from families of these DRP genes from single species. Signal peptides are the small leading section of a translated protein that directs it to be secreted rather than being kept inside the cell. Right after the protein is processed to the right place, this signal is clipped off and thus is not part of the mature venom protein. These signal peptides tend to be far more conserved than the mature venom protein, despite that fact that they have little to do- just send the protein to the right place, which can be accomplished by all sorts of sequences. But this is a sign that the venoms are under positive evolutionary pressure- to be more effective, to extend the range of possible victims, and to overcome whatever resistance the victims might evolve against them. 

Indeed, these authors show specifically that strong positive selection is at work, which is one more insight that molecular data can provide. (First, by comparing the rates of protein-coding positions that are neutral via the genetic code (synonymous) vs those that make the protein sequence change (non-synonymous), and second by the pattern and tempo of evolution of venom sequences compared with the mass of neutral sequences of the species.

"Given their significant sequence divergence since their deep-rooted evolutionary origin, the entire protein-coding gene, including the signal and propeptide regions, has accumulated significant differences. Consistent with this hypothesis, the majority of positively selected sites (~96%) identified in spider venom DRP toxins (all sites in Araneomorphae, and all but two sites in Mygalomorphae) were restricted to the mature peptide region, whereas the signal and propeptide regions harboured a minor proportion of these sites (1% and 3%, respectively)."

 

Phylogenetic tree (left), connecting up venom genes from across the spider phylogeny. On right, some of the venom sequences are shown just by their cysteine (C) locations, which form the basic structural scaffold of these proteins (top figure).


The more general phyogenetic analysis from all their sequences tells these authors that all the venom DRP genes, from all spider species, came from one origin. One easy way to see this is in the image above on the right, where just the cysteine scaffold of these proteins from around the phylogeny are lined up, showing that this scaffold is very highly conserved, regardless of the rest of the sequence. This finding (which confirms prior work) is surprising, since venoms of other animals, like snakes, tend to incorporate a motley bunch of active enzymes and components, sourced from a variety of ancestral sources. So to see spiders sticking so tenaciously to this fundamental structure and template for the major component of their venom is impressive- clearly it is a very effective molecule. The authors point out the cone snails, another notorious venom-maker, originated much more recently, (about 45 million years ago), and shows the same pattern of using one ancestral form to evolve a diversified blizzard of venom components, which have been of significant interest to medical science.


  • Example: a spider swings a bolas to snare a moth.

Saturday, February 11, 2023

A Gene is Born

Yes, genes do develop out of nothing.

The "intelligent" design movement has long made a fetish of information. As science has found, life relies on encoded information for its genetic inheritance and the reliable expression of its physical manifestations. The ID proposition is, quite simply, that all this information could not have developed out of a mindless process, but only through "design" by a conscious being. Evidently, Darwinian natural selection still sticks on some people's craw. Michael Behe even developed a pseudo-mathematical theory about how, yes, genes could be copied mindlessly, but new genes could never be conjured out of nothing, due to ... information.

My understanding of information science equates information to loss of entropy, and expresses a minimal cost of the energy needed to create, compute or transmit information- that is, the Shannon limits. A quite different concept comes from physics, in the form of information conservation in places like black holes. This form of information is really the implicit information of the wave functions and states of physical matter, not anything encoded or transmitted in the sense of biology or communication. Physical state information may be indestructable (and un-create-able) on this principle, but coded information is an entirely different matter.

In a parody of scientific discussion, intelligent design proponents are hosted by the once-respectable Hoover Institution for a discussion about, well, god.

So the fecundity that life shows in creating new genes out of existing genes, (duplications), and even making whole-chromosome or whole-genome duplications, has long been a problem for creationists. Energetically, it is easy to explain as a mere side-effect of having plenty of energy to work with, combined with error-prone methods of replication. But creationistically, god must come into play somewhere, right? Perhaps it comes into play in the creation of really new genes, like those that arise from nothing, such as at the origin of life?

A recent paper discussed genes in humans that have over our recent evolutionary history arisen from essentially nothing. It drew on prior work in yeast that elegantly laid out a spectrum or life cycle of genes, from birth to death. It turns out that there is an active literature on the birth of genes, which shows that, just like duplication processes, it is entirely natural for genes to develop out of humble, junky precursors. And no information theory needs to be wheeled in to show that this is possible.

Yeast provides the tools to study novel genes in some detail, with rich genetics and lots of sequenced relatives, near and far. Here is portrayed a general life cycle of a gene, from birth out of non-gene DNA sequences (left) into the key step of translation, and on to a subject of normal natural selection ("Exposed") for some function. But if that function decays or is replaced, the gene may also die, by mutation, becoming a pseudogene, and eventually just some more genomic junk.

The death of genes is quite well understood. The databases are full of "pseudogenes" that are very similar to active genes, but are disabled for some reason, such as a truncation somewhere or loss of reading frame due to a point mutation or splicing mutation. Their annotation status is dynamic, as they are sometimes later found to be active after all, under obscure conditions or to some low level. Our genomes are also full of transposons and retroviruses that have died in this fashion, by mutation.

Duplications are also well-understood, some of which have over evolutionary time given rise to huge families of related proteins, such as kinases, odorant receptors, or zinc-finger transcription factors. But the hunt for genes that have developed out of non-gene materials is a relatively new area, due to its technical difficulty. Genome annotators were originally content to pay attention to genes that coded for a hundred amino acids or more, and ignore everything else. That became untenable when a huge variety of non-coding RNAs came on the scene. Also, occasional cases of very small genes that encoded proteins came up from work that found them by their functional effects.

As genome annotation progressed, it became apparent that, while a huge proportion of genes are conserved between species, (or members of families of related proteins), other genes had no relatives at all, and would never provide information by this highly convenient route of computer analysis. They are orphans, and must have either been so heavily mutated since divergence that their relationships have become unrecognizable, or have arisen recently (that is, since their evolutionary divergence from related species that are used for sequence comparison) from novel sources that provide no clue about their function. Finer analysis of ever more closely related species is often informative in these cases.

The recent paper on human novel genes makes the finer point that splicing and export from the nucleus constitute the major threshold between junk genes and "real" genes. Once an RNA gets out of the nucleus, any reading frame it may have will be translated and exposed to selection. So the acquisition of splicing signals is a key step, in their argument, to get a randomly expressed bit of RNA over the threshold.

A recent paper provided a remarkable example of novel gene origination. It uncovered a series of 74 human genes that are not shared with macaque, (which they took as their reference), have a clear path of origin from non-coding precursors, and some of which have significant biological effects on human development. They point to a gradual process whereby promiscuous transcription from the genome gave rise by chance to RNAs that acquired splice sites, which piped them into the nuclear export machinery and out to the cytoplasm. Once there, they could be translated, over whatever small coding region they might possess, after which selection could operate on their small protein products. A few appear to have gained enough function to encourage expansion of the coding region, resulting in growth of the gene and entrenchment as part of the developmental program.

Brain "organoids" grown from genetically manipulated human stem cells. On left is the control, in middle is where ENSG00000205704 was deleted, and on the right is where ENSG00000205704 is over-expressed. The result is very striking, as an evolutionarily momentous effect of a tiny and novel gene.

One gene, "ENSG00000205704" is shown as an example. Where in macaque, the genomic region corresponding to this gene encodes at best a non-coding RNA that is not exported from the nucleus, in humans it encodes a spliced and exported mRNA that encodes a protein of 107 amino acids. In humans it is also highly expressed in the brain, and when the researchers deleted it in embryonic stem cells and used those cells to grow "organoids", or clumps of brain-like tissue, the growth was significantly reduced by the knockout, and increased by the over-expression of this gene. What this gene does is completely unknown. Its sequence, not being related to anything else in human or other species, gives no clue. But it is a classic example of gene that arose from nothing to have what looks like a significant effect on human evolution. Does that somehow violate physics or math? Nothing could be farther from the truth.

  • Will nuclear power get there?
  • What the heck happened to Amazon shopping?

Saturday, February 4, 2023

How Recessive is a Recessive Mutation?

Many relationships exist between mutation, copy number, and phenotype.

The traditional setup of Mendelian genetics is that an allele of a gene is either recessive or dominant. Blue eyes are recessive to brown eyes, for the simple reason that blue arises from the absence of an enzyme, due to a loss of function mutation. So having some of that enzyme, from even one "brown" copy of that gene, is dominant over the defective "blue" copy. You need two "blue" alleles to have blue eyes. This could be generalized to most genes, especially essential genes, where lacking both copies is lethal, while having one working copy will get you through, and cover for a defective copy. Most gene mutations are, by this model, recessive. 

But most loci and mutations implicated in disease don't really work like that. Some recent papers delved into the genetics of such mutations, and observed that their recessiveness was all over the map, a spectrum, really, of effects from fully recessive to dominant, with most in the middle ground. This is informative for clinical genetics, but also for evolutionary studies, suggesting that evolution is not, after all, blind to the majority of mutations, which are mostly deleterious, exist most of the time in the haploid (one-copy) state, and would be wholly recessive by the usual assumption.

The first paper describes a large study over the Finnish population, which benefited from several advantages. Finns have a good health system with thorough records which are housed in a national biobank. The study used 177,000 health records and 83,000 variants in coding regions of genes collected from sequencing studies. Second, the Finnish population is relatively small and has experienced bottlenecks from smaller founding populations, which amplifies the prevalence of variants that those founders had. That allows those variants to rise to higher rates of appearance, especially in the homozygous state, which generally causes more noticeable disease phenotypes. Both the detectability and the statistics were powered by this higher incidence of some deleterious mutations (while others, naturally, would have been more rare than the world-wide average, or absent altogether).

Thirdly, the authors emphasize that they searched for various levels of recessive effect, which is contrary to the usual practice of just assuming a linear effect. A linear model says that one copy of a mutation has half the effect of two copies- which is true sometimes, but not most of the time, especially in more typical cases of recessive effect where one copy has a good deal less effect, if not zero. Returning to eye color, if one looks in detail, there are many shades of eyes, even of blue eyes, so it is evident that the alleles that affect eye color are various, and express to different degrees (have various penetrance, in the parlance). While complete recessiveness happens frequently, it is not the most common case, since we generally do not routinely express excess amounts of proteins from our genes, making loss of one copy noticeable most of the time, to some degree. This is why the lack of a whole chromosome, or an excess of a whole chromosome, has generally devastating consequences. Trisomies in only three chromosomes are viable (that is, not lethal), and confer various severe syndromes.

A population proportion plot vs age of disease diagnosis for three different diseases and an associated genetic variant. In blue is the normal ("wild-type") case, in yellow is the heterozygote, and in red the homozygote with two variant alleles. For "b", the total lack of XPA causes skin cancer with juvenile onset, and the homozygotic case is not shown. The Finnish data allowed detection of rather small recessive effects from variations that are common in that population. For instanace, "a" shows the barely discernable advancement of age of diagnosis for a disease (hearing loss) that in the homozygotic state is universal by age 10, caused by mutations in GJB2.

The second paper looked more directly at the fitness cost of variations over large populations, in the heterozygous state. They looked at loss-of-function (LOF) mutations of over 17,000 genes, studying their rate of appearance and loss from human populations, as well as in pedigrees. These rates were turned, by a modeling system, into fitness costs, which are stated in percentage terms, vs wild type. A fitness cost of 1% is pretty mild, (though highly significant over longer evolutionary time), while a fitness cost of 10% is quite severe, and one of 100% is immediately lethal and would never be observed in the population. For example, a mutation that is seen rarely, and in pedigrees only persists for a couple of generations, implies a fitness cost of over 10%.

They come up with a parameter "hs", which is the fitness cost "s" of losing both copies of a gene, multiplied by "h", a measure of the dominance of the mutation in a single copy.


In these graphs, human genes are stacked up in the Y axis sorted by their computed "hs" fitness cost in the heterozygous state. Error bars are in blue, showing that this is naturally a rather error-prone exercise of estimation. But what is significant is that most genes are somewhere on the spectrum, with very few having negligible effects, (bottom), and many having highly significant effects (top). Genes on the X chromosome are naturally skewed to much higher significance when mutated, since in males there is no other copy, and even in females, one X chromosome is (randomly) inactivated to provide dosage compensation- that is, to match the male dosage of production of X genes- which results in much higher penetrance for females as well.


So the bottom line is that while diploidy helps to hide alot of variation in sexual organisms, and in humans in particular, it does not hide it completely. We are each estimated to receive, at birth, about 70 new mutations, of which 1/1000 are the kind of total loss of gene function studied here. This work then estimates that 20% of those mutations have a severe fitness effect of >10%, meaning that about one in seventy zygotes carry such a new mutation, not counting what it has inherited from its parents, and will suffer ill effects immediately, even though it has a wild-type copy of that gene as well.

Humans, as other organisms, have a large mutational load that is constantly under surveillance by natural selection. The fact that severe mutations routinely still have significant effects in the heterozygous state is both good and bad news. Good in the sense that natural selection has more to work with and can gradually whittle down on their frequency without necessarily waiting for the chance of two meeting in an unfortunate homozygous state. But bad in the sense that it adds to our overall phenotypic variation and health difficulties a whole new set of deficiencies that, while individually and typically minor, are also legion.