Showing posts with label deep time. Show all posts
Showing posts with label deep time. Show all posts

Saturday, February 4, 2023

How Recessive is a Recessive Mutation?

Many relationships exist between mutation, copy number, and phenotype.

The traditional setup of Mendelian genetics is that an allele of a gene is either recessive or dominant. Blue eyes are recessive to brown eyes, for the simple reason that blue arises from the absence of an enzyme, due to a loss of function mutation. So having some of that enzyme, from even one "brown" copy of that gene, is dominant over the defective "blue" copy. You need two "blue" alleles to have blue eyes. This could be generalized to most genes, especially essential genes, where lacking both copies is lethal, while having one working copy will get you through, and cover for a defective copy. Most gene mutations are, by this model, recessive. 

But most loci and mutations implicated in disease don't really work like that. Some recent papers delved into the genetics of such mutations, and observed that their recessiveness was all over the map, a spectrum, really, of effects from fully recessive to dominant, with most in the middle ground. This is informative for clinical genetics, but also for evolutionary studies, suggesting that evolution is not, after all, blind to the majority of mutations, which are mostly deleterious, exist most of the time in the haploid (one-copy) state, and would be wholly recessive by the usual assumption.

The first paper describes a large study over the Finnish population, which benefited from several advantages. Finns have a good health system with thorough records which are housed in a national biobank. The study used 177,000 health records and 83,000 variants in coding regions of genes collected from sequencing studies. Second, the Finnish population is relatively small and has experienced bottlenecks from smaller founding populations, which amplifies the prevalence of variants that those founders had. That allows those variants to rise to higher rates of appearance, especially in the homozygous state, which generally causes more noticeable disease phenotypes. Both the detectability and the statistics were powered by this higher incidence of some deleterious mutations (while others, naturally, would have been more rare than the world-wide average, or absent altogether).

Thirdly, the authors emphasize that they searched for various levels of recessive effect, which is contrary to the usual practice of just assuming a linear effect. A linear model says that one copy of a mutation has half the effect of two copies- which is true sometimes, but not most of the time, especially in more typical cases of recessive effect where one copy has a good deal less effect, if not zero. Returning to eye color, if one looks in detail, there are many shades of eyes, even of blue eyes, so it is evident that the alleles that affect eye color are various, and express to different degrees (have various penetrance, in the parlance). While complete recessiveness happens frequently, it is not the most common case, since we generally do not routinely express excess amounts of proteins from our genes, making loss of one copy noticeable most of the time, to some degree. This is why the lack of a whole chromosome, or an excess of a whole chromosome, has generally devastating consequences. Trisomies in only three chromosomes are viable (that is, not lethal), and confer various severe syndromes.

A population proportion plot vs age of disease diagnosis for three different diseases and an associated genetic variant. In blue is the normal ("wild-type") case, in yellow is the heterozygote, and in red the homozygote with two variant alleles. For "b", the total lack of XPA causes skin cancer with juvenile onset, and the homozygotic case is not shown. The Finnish data allowed detection of rather small recessive effects from variations that are common in that population. For instanace, "a" shows the barely discernable advancement of age of diagnosis for a disease (hearing loss) that in the homozygotic state is universal by age 10, caused by mutations in GJB2.

The second paper looked more directly at the fitness cost of variations over large populations, in the heterozygous state. They looked at loss-of-function (LOF) mutations of over 17,000 genes, studying their rate of appearance and loss from human populations, as well as in pedigrees. These rates were turned, by a modeling system, into fitness costs, which are stated in percentage terms, vs wild type. A fitness cost of 1% is pretty mild, (though highly significant over longer evolutionary time), while a fitness cost of 10% is quite severe, and one of 100% is immediately lethal and would never be observed in the population. For example, a mutation that is seen rarely, and in pedigrees only persists for a couple of generations, implies a fitness cost of over 10%.

They come up with a parameter "hs", which is the fitness cost "s" of losing both copies of a gene, multiplied by "h", a measure of the dominance of the mutation in a single copy.


In these graphs, human genes are stacked up in the Y axis sorted by their computed "hs" fitness cost in the heterozygous state. Error bars are in blue, showing that this is naturally a rather error-prone exercise of estimation. But what is significant is that most genes are somewhere on the spectrum, with very few having negligible effects, (bottom), and many having highly significant effects (top). Genes on the X chromosome are naturally skewed to much higher significance when mutated, since in males there is no other copy, and even in females, one X chromosome is (randomly) inactivated to provide dosage compensation- that is, to match the male dosage of production of X genes- which results in much higher penetrance for females as well.


So the bottom line is that while diploidy helps to hide alot of variation in sexual organisms, and in humans in particular, it does not hide it completely. We are each estimated to receive, at birth, about 70 new mutations, of which 1/1000 are the kind of total loss of gene function studied here. This work then estimates that 20% of those mutations have a severe fitness effect of >10%, meaning that about one in seventy zygotes carry such a new mutation, not counting what it has inherited from its parents, and will suffer ill effects immediately, even though it has a wild-type copy of that gene as well.

Humans, as other organisms, have a large mutational load that is constantly under surveillance by natural selection. The fact that severe mutations routinely still have significant effects in the heterozygous state is both good and bad news. Good in the sense that natural selection has more to work with and can gradually whittle down on their frequency without necessarily waiting for the chance of two meeting in an unfortunate homozygous state. But bad in the sense that it adds to our overall phenotypic variation and health difficulties a whole new set of deficiencies that, while individually and typically minor, are also legion.


Saturday, December 31, 2022

Hand-Waving to God

A decade on, the Discovery Institute is still cranking out skepticism, diversion, and obfuscation.

A post a couple of weeks ago mentioned that the Discovery Institute offered a knowledgeable critique of the lineages of the Ediacaran fauna. They have raised their scientific game significantly, and so I wanted to review what they are doing these days, focusing on two of their most recent papers. The Discovery Institute has a lineage of its own, from creationism. It has adapted to the derision that entailed, by retreating to "intelligent design", which is creationism without naming the creators, nailing down the schedule of creation, or providing any detail of how and from where creation operates. Their review of the Ediacaran fauna raised some highly skeptical points about whether these organisms were animals or not. Particularly, they suggested that cholesterol is not really restricted to animals, so the chemical traces of cholesterol that were so clearly found in the Dickinsonia fossil layers might not really mean that these were animals- they might also be unusual protists of gigantic size, or odd plant forms, etc. While the critique is not unreasonable, it does not alter the balance of the evidence which does indeed point to an animal affinity. These fauna are so primitive and distant that it is fair to say that we can not be sure, and particularly we can not be sure that they had any direct ancestral relationship to any later organisms of the ensuing Cambrian period, when recognizable animals emerged.

Fair enough. But what of their larger point? The Discovery Institute is trying to make the point, I believe, about the sudden-ness of early Cambrian evolution of animals, and thus its implausibility under conventional evolutionary theory. But we are traversing tens of millions of years through these intervals, which is a long time, even in evolutionary terms. Secondly, the Ediacaran period, though now represented by several exquisite fossil beds, spanned a hundred million years and is still far from completely characterized paleontologically, even supposing that early true animals would have fossilized, rather than being infinitesimal and very soft-bodied. So the Cambrian biota could easily have predecessors in the Ediacaran that have or have not yet been observed- it is as yet not easy to say. But what we can not claim is the negative, that no predecessors existed before some time X- say the 540 MYA point at the base of the Cambrian. So the implication that the Discovery Institute is attempting to suggest has very little merit, particularly since everything that they themselves cite about the molecular and paleontological sequence is so clearly progressive and in proper time sequence, in complete accord with the overall theory of evolution.

For we should always keep in mind that an intelligent designer has a free hand, and can make all of life in a day (or in six, if absolutely needed). The fact that this designer works in the shadows of slightly altered mutation rates, or in a few million years rather than twenty million, and never puts fossils out of sequence in the sedimentary record, is an acknowledgement that this designer is a bit dull, and bears a strong resemblence to evolution by natural selection. To put it in psychological terms, the institute is in the "negotiation" stage of grief- over the death of god.

Saturday, December 17, 2022

The Pillow Creatures That Time Forgot

Did the Ediacaran fauna lead to anything else, or was it a dead end?

While to a molecular biologist, the evolution of the eukaryotic cell is probably the greatest watershed event after the advent of life itself, most others would probably go with the rise of animals and plants, after about three billion years of exclusively microbial life. This event is commonly located at the base of the Cambrian, (i.e. the Cambrian explosion), which is where the fossils that Darwin and his contemporaries were familiar with began, about 540 million years ago. Darwin was puzzled by this sudden start of the fossil record, from apparently nothing, and presciently held (as he did in the case of the apparent age of the sun) that the data were faulty, and that the ancient character of life on earth would leave other traces much farther back in time.

That has indeed proved to be the case. There are signs of microbial life going back over three billion years, and whole geologies in the subsequent time dependent on its activity, such as the banded iron formations prevalent around two billion years ago that testify to the slow oxygenation of the oceans by photosynthesizing microbes. And there are also signs of animal life prior to the Cambrian, going back roughly to 600 million years ago that have turned up, after much deeper investigations of the fossil record. This immediately pre-Cambrian period is labeled the Ediacaran, for one of its fossil-bearing sites in Australia. A recent paper looked over this whole period to ask whether the evolution of proto-animals during this time was a steady process, or punctuated by mass extinction event(s). They conclude that, despite the patchy record, there is enough to say that there was a steady (if extremely slow) march of ecological diversification and specialization through the time, until the evolution of true animals in the Cambrian literally ate up all the Ediacaran fauna. 

Fossil impression of Dickinsonia, with trailing impressions that some think might be a trail from movement. Or perhaps just friends in the neighborhood.
 
For the difference between the Ediacaran fauna and that of the Cambrian is stark. The Ediacaran fauna is beautiful, but simple. There are no backbones, no sensory organs. No mouth, no limbs, no head. In developmental terms, they seem to have had only two embryological cell layers, rather than our three, which makes all the difference in terms of complexity. How they ate remains a mystery, but they are assumed to have simply osmosed nutrients from their environment, thanks to their apparently flat forms. A bit like sponges today. As they were the most complex animals at the time, (and some were large, up to 2 meters long), they may have had an easy time of it, simply plopping themselves on top of rich microbial mats, oozing with biofilms and other nutrients.

The paper provides a schematic view of the ecology at single locations, and also of longer-term evolution, from a sequence of views (i.e. fossils) obtained from different locations around the world of roughly ten million year intervals through the Ediacaran. One noticeable trend is the increasing development or prevalence of taller fern-like forms that stick up into the water over time, versus the flatter bottom-dwelling forms. This may reflect some degree of competition, perhaps after the bottom microbial mats have been over-"grazed". A second trend is towards slightly more complexity at the end of the period, with one very small form (form C (a) in the image below) even marked by shell remains, though what its animal inhabitant looked like is unknown. 

Schematic representation of putative animals observed during the Ediacaran epoch, from early, (A, ~570 MYA, Avalon assemblage), middle, (B, ~554 MYA, White River and other assemblages), and late (C, ~545 MYA, Nama assemblage). The A panel is also differentiated by successional forms from early to mature ecosystems, while the C panel is differentiated by ocean depth, from shallow to deep. The persistence of these forms is quite impressive overall, as is their common simplicity. But lurking somewhere among them are the makings of far more complicated animals.

Very few of these organisms have been linked to actual animals of later epochs, so virtually all of them seem to have been superceded by the wholly different Cambrian fauna- much of which itself remains perplexing. One remarkable study used mass-spec chemical analysis on some Dickinsonia fossils from the late Ediacaran to determine that they bore specific traces of cholesterol, marking them as probable animals, rather than overgrown protists or seaweed. But beyond that, there is little that can be said. (Note a very critical and informed review of all this from the Discovery Institute, of all places.) Their preservation is often remarkable, considering the age involved, and they clearly form the sole fauna known from pre-Cambrian times. 

But the core question of how the Cambrian (and later) animals came to be remains uncertain, at least as far as the fossil record is concerned. One relevant observation is that there is no sign of burrowing through the sediments of the Ediacaran epoch. So the appearance of more complex animals, while it surely had some kind of precedent deep in the Ediacaran, or even before, did not make itself felt in any macroscopic way then. It is evident that once the triploblastic developmental paradigm arose, out of the various geologic upheavals that occurred at the bases of both the Ediacaran and the Cambrian, its new design including mouths, eyes, spines, bones, plates, limbs, guts, and all the rest that we are now so very familiar with, utterly over-ran everything that had gone before.

Some more fine fossils from Canada, ~ 580 MYA.


  • A video tour of some of the Avalon fauna.
  • An excellent BBC podcast on the Ediacaran.
  • We need to measure the economy differently.
  • Deep dive on the costs of foreign debt.
  • Now I know why chemotherapy is so horrible.
  • Waste on an epic scale.
  • The problem was not the raids, but the terrible intelligence... by our intelligence agency.

Saturday, October 15, 2022

From Geo-Logic to Bio-Logic

Why did ATP become the central energy currency and all-around utility molecule, at the origin of life?

The exploration of the solar system and astronomical objects beyond has been one of the greatest achievements of humanity, and of the US in particular. We should be proud of expanding humanity's knowledge using robotic spacecraft and space-based telescopes that have visited every planet and seen incredibly far out in space, and back in time. But one thing we have not found is life. The Earth is unique, and it is unlikely that we will ever find life elsewhere within traveling distance. While life may concievably have landed on Earth from elsewhere, it is more probable that it originated here. Early Earth had as conducive conditions as anywhere we know of, to create the life that we see all around us: carbon-based, water-based, precious, organic life.

Figuring out how that happened has been a side-show in the course of molecular biology, whose funding is mostly premised on medical rationales, and of chemistry, whose funding is mostly industrial. But our research enterprise thankfully has a little room for basic research and fundamental questions, of which this is one of the most frustrating and esoteric, if philosphically meaningful. The field has coalesced in recent decades around the idea that oceanic hydrothermal vents provided some of the likeliest conditions for the origin of life, due to the various freebies they offer.

Early earth, as today, had very active geology that generated a stream of reduced hydrogen and other compounds coming out of hydrothermal vents, among other places. There was no free oxygen, and conditions were generally reducing. Oxygen was bound up in rocks, water, and CO2. The geology is so reducing that water itself was and still is routinely reduced on its trip through the mantle by processes such as serpentinization.

The essential problem is how to jump the enormous gap from the logic of geology and chemistry, over to the logic of biology. It is not a question of raw energy- the earth has plenty of energetic processes, from vocanoes and tectonics to incoming solar energy. The question is how a natural process that has resolutely chemical logic, running down the usual chemical and physical gradients from lower to higher entropy, could have generated the kind of replicating and coding molecular system where biological logic starts. A paper from 2007 gives a broad and scrupulous overview of the field, featuring detailed arguments supporting the RNA world as the probable destination (from chemical origins) where biological logic really began. 

To rehearse very briefly, RNA has, and still retains in life today, both coding capacity and catalytic capacity, unifying in one molecule the most essential elements of life. So RNA is thought to have been the first molecule with truly biological ... logic, being replaced later with DNA for some of its more sedentary roles. But there is no way to get to even very short RNA molecules without some kind of metabolic support. There has to be an organic soup of energy and small organic molecules- some kind of pre-biological metabolism- to give this RNA something to do and chemical substituents to replicate itself out of. And that is the role of the hydrothermal vent system, which seems like a supportive environment. For the trick in biology is that not everything is coded explicitly. Brains are not planned out in the DNA down to their crenelations, and membranes are not given size and weight blueprints. Biology relies heavily on natural chemistry and other unbiological physical processes to channel its development and ongoing activity. The coding for all this, which seems so vast with our 3 Gb genome, is actually rather sparse, specifying some processes in exquisite detail, (large proteins, after billions of years of jury-rigging, agglomeration, and optimization), while leaving a tremendous amount still implicit in the natural physical course of events.

A rough sketch of the chemical forces and gradients at a vent. CO2 is reduced into various simple organic compounds at the rock interfaces, through the power of the incoming hydrogen rich (electron-rich) chemicals. Vents like this can persist for thousands of years.

So the origin of life does not have to build the plane from raw aluminum, as it were. It just has to explain how a piece of paper got crumpled in a peculiar way that allowed it to fly, after which evolution could take care of the rest of the optimization and elaboration. Less metaphorically, if a supportive chemical environment could spontaneously (in geo-chemical terms) produce an ongoing stream of reduced organic molecules like ATP and acyl groups and TCA cycle intermediates out of the ambient CO2, water, and other key elements common in rocks, then the leap to life is a lot less daunting. And hydrothermal vents do just that- they conduct a warm and consistent stream of chemically reduced (i.e. extra electrons) and chemical-rich fluid out of the sea floor, while gathering up the ambient CO2 (which was highly concentrated on the early Earth) and making it into a zoo of organic chemicals. They also host the iron and other minerals useful in catalytic conversions, which remain at the heart of key metabolic enzymes to this day. And they also contain bubble-like stuctures that could have confined and segregated all this activity in pre-cellular forms. In this way, they are thought to be the most probable locations where many of the ingredients of life were being generated for free, making the step over to biological logic much less daunting than was once thought.

The rTCA cycle, portrayed in the reverse from our oxidative version, as a cycle of compounds that spontaneously generate out of simple ingredients, due to their step-wise reduction and energy content values. The fact that the output (top) can be easily cleaved into the inputs provides a "metabolic" cycle that could exist in a reducing geological setting, without life or complicated enzymes.

The TCA cycle, for instance, is absolutely at the core of metabolism, a flow of small molecules that disassemble (or assemble, if run in reverse) small carbon compounds in stepwise fashion, eventually arriving back at the starting constituents, with only outputs (inputs) of hydrogen reduction power, CO2, and ATP. In our cells, we use it to oxidize (metabolize) organic compounds to extract energy. Its various stations also supply the inputs to innumerable other biosynthetic processes. But other organisms, admittedly rare in today's world, use it in the forward direction to create organic compounds from CO2, where it is called reductive or reverse (rTCA). An article from 2004 discusses how this latter cycle and set of compounds very likely predates any biological coding capacity, and represents an intrisically natural flow of carbon reduction that would have been seen in a pre-biotic hydrothermal vent setting. 

What sparked my immediate interest in all this was a recent paper that described experiments focused on showing why ATP, of all the other bases and related chemicals, became such a central part of life's metabolism, including as a modern accessory to the TCA cycle. ATP is the major energy currency in cells, giving the extra push to thousands of enzymes, and forming the cores of additional central metabolic cofactors like NAD (nicotine adenine dinucleotide), and acetyl-CoA (the A is for adenine), and participating as one of the bases of DNA and RNA in our genetic core processes. 

Of all nucleoside diphosphates, ADP is most easily converted to ATP in the very simple conditions of added acyl phosphate and Fe3+ in water, at ambient temperatures or warmer. Note that the trace for ITP shows the same absorbance before and after the reaction. The others show no reaction either. Panel F shows a time course of the ADP reaction, in hours. The X axis refers to time of chromatography of the sample, not of the reaction.

Why ATP, and not the other bases, or other chemicals? Well, bases appear as early products out of pre-biotic reaction mixtures, so while somewhat complicated, they are a natural part of the milieu. The current work compares how phosphorylation of all the possible di-phosphate bases works, (that is, adenosine, cytidine, guanosine, inosine, and uridine diphosphates), using the plausible prebiotic ingredients ferric ion (Fe3+) and acetyl phosphate. They found surprisingly that only ADP can be productively converted to ATP in this setting, and it was pretty insensitive to pH, other ions, etc. This was apparently due to the special Fe3+ coordinating capability that ADP has due to its pentose N and neighboring amino group that allows an easy electron transfers to the incoming phosphate group. Iron remains common as an enzymatic cofactor today, and it is obviously highly plausible in this free form as a critical catalyst in a pre-biotic setting. Likewise, acetyl phosphate could hardly be simpler, occurs naturally under prebiotic conditions, and remains an important element of bacterial metabolism (and transiently one of eukaryotic metabolism) today. 

Ferric iron and ATP make a unique mechanistic pairing that enables easy phosphorylation at the third position, making ATP out of ADP and acyl phosphate. At step b, the incoming acyl phosphate is coordinated by the amino group while the iron is coordinated by the pentose nitrogen and two existing phosphates.

The point of this paper was simply to reveal why ATP, of all the possible bases and related chemicals, gained its dominant position of core chemical and currency. It is rare in origin-of-life research to gain a definitive insight like this, amid the masses of speculation and modeling, however plausible. So this is a significant step ahead for the field, while it continues to refine its ideas of how this amazing transition took place. Whether it can demonstrate the spontaneous rTCA cycle in a reasonable experimental setting is perhaps the next significant question.


  • How China extorts celebrities, even Taiwanese celebrities, to toe the line.
  • Stay away from medicare advantage, unless you are very healthy, and will stay that way.
  • What to expect in retirement.

Saturday, September 17, 2022

Death at the Starting Line- Aneuploidy and Selfish Centromeres

Mammalian reproduction is unusually wasteful, due to some interesting processes and tradeoffs.

Now that we have settled the facts that life begins at conception and abortion is murder, a minor question arises. There is a lot of murder going on in early embryogenesis, and who is responsible? Probably god. Roughly two-thirds of embryos that form are aneuploid (have an extra chromosome or lack a chromosome) and die, usually very soon. Those that continue to later stages of pregnancy cause a high rate of miscarriages-about 15% of pregnancies. A recent paper points out that these rates are unusual compared with most eukaryotes. Mammals are virtually alone in exhibiting such high wastefulness, and the author proposes an interesting explanation for it.

First, some perspective on aneupoidy. Germ cells go through a two-stage process of meiosis where their DNA is divided two ways, first by homolog pairs, (that is, the sets inherited from each parent, with some amount of crossing-over that provides random recombination), and second by individual chromosomes. In more primitive organisms (like yeast) this is an efficient, symmetrical, and not-at-all wasteful process. Any loss of genetic material would be abhorrent, as the cells are putting every molecule of their being into the four resulting spores, each of which are viable.

A standard diagram of meiosis. Note that the microtubules (yellow) engage in a gradual and competitive process of capturing centromeres of each chromosome to arrive at the final state of regular alignment, which can then be followed by even division of the genetic material and the cell.


In animals, on the other hand, meiosis of egg cells is asymmetric, yielding one ovum / egg and three polar bodies, which  have various roles in some species to assist development, but are ultimately discarded. This asymmetric division sets up a competition between chromosomes to get into the egg, rather than into a polar body. One would think that chromosomes don't have much say in the matter, but actually, cell division is a very delicate process that can be gamed by "strong" centromeres.

Centromeres are the central structures on chromosomes that form attachments to the microtubules forming the mitotic spindle. This attachment process is highly dynamic and even competitive, with microtubules testing out centromere attachment sites, and using tension ultimately as the mark of having a properly oriented chromosome with microtubules from each side of the dividing cell (i.e. each microtubule organizing center) attached to each of the centromeres, holding them steady and in tension at the midline of the cell. Well, in oocytes, this does not happen at the midline, but lopsidedly towards one pole, given that one of the product cells is going to be much larger than the others. 

In oocytes, cell division is highly asymmetric with a winner-take-all result. This opens the door to a mortal competition among chromosomes to detect which side is which and to get on the winning side. 

One of the mysteries of biology is why the centromere is a highly degenerate, and also a speedily evolving, structure. They are made up of huge regions of monotonously repeated DNA, which have been especially difficult to sequence accurately. Well, this competition to get into the next generation can go some way to explain this structure, and also why it changes rapidly, (on evolutionary time scales), as centromeric repeats expand to capture more microtubules and get into the egg, and other portions of the machinery evolve to dampen this unsociable behavior and keep everyone in line. It is a veritable arms race. 

But the funny thing is that it is only mammals that show a particularly wasteful form of this behavior, in the form of frequent aneuploidy. The competition is so brazen that some centromeres force their way into the egg when there is already another copy there, generating at best a syndrome like Down, but for all other chromosomes than #21, certain death. This seems rather self-defeating. Or does it?

The latest paper observes that mammals devote a great deal of care to their offspring, making them different from fish, amphibians, and even birds, which put most of their effort into producing the very large egg, and relatively less (though still significant amounts) into care of infants. This huge investment of resources means that causing a miscarriage or earlier termination is not a total loss at all, for the rudely trisomic extra chromosome. No, it allows resource recovery in the form of another attempt at pregnancy, typically quite soon thereafter, at which point the pushy chromosome gets another chance to form a proper egg. It is a classic case of extortion at the molecular scale. 


  • Do we have rules, or not?
  • How low will IBM go, vs its retirees?

Sunday, July 10, 2022

Tooth Development and Redevelopment

Wouldn't it be nice to regrow teeth? Sharks do.

Imagine for a minute if instead of fillings, crowns, veneers, posts, bridges, and all the other advanced technologies of dental restoration, a tooth could be removed, and an injection prompt the growth of a complete replacement tooth. That would be amazing, right? Other animals, such as sharks and fish, regrow teeth all the time. But we only get two sets- our milk teeth and mature teeth. While mature mammalian teeth are incredibly tough and generally last a lifetime, modern agriculture and other conditions have thrown a wrench into human dental health, which modern dentistry has only partially restored. As evolution proceeded into the mammalian line, tooth development became increasingly restricted and specialized, so that the generic teeth that sharks spit out throughout their lives have become tailored for various needs across the mouth, firmly anchored into the jaw bone, and precisely shaped to fit against each other. But the price for this high-level feature set seems to be that we have lost the ability to replace them.

So researchers are studying tooth development in other animals- wondering how similar they are to human development, and whether some of their tricks can be atavistically re-stimulated in our own tissues. While the second goal remains a long way off, the first has been productively pursued, with teeth forming a model system of complex tissue development. A recent paper (with review) looked at similarities between molecular details of shark and mammalian tooth development.

Teeth are the result of an interaction between epithelial tissues and mesenchymal tissues- two of the three fundamental tissues of early embryogenesis. Patches of epithelium form dental arches around the two halves of the future mouth. Spots around these arches expand into dental placodes, which grow into buds, and as they interact continuously with the inner mesenchyme, form enamel knots. The epithelial cells of the knot then eventually start producing enamel as they pull away from interface, while the mesenchymal cells produce dentin and then the pulp and other bone-anchoring tissues of the inner tooth and root as they pull away in the opposite direction. 

Embryonic tooth development, which depends heavily on the communication between epithelial tissue (white) and mesenchymal tissue (pink). An epithelial "enamel knot" (PEK/ SEK) develops at the future cusp(s), where enamel will be laid down by the epithelial cells, and dentin by the mesenchymal cells. Below are some of the molecules known to orchestrate the activities of all these cells. Some of these molecules are extracellular signals (BMP, FGF, WNT), while others are cell-internal components of the signaling systems (LEF, PAX, MSX).

Naturally, all this doesn't happen by magic, but by a symphony of gene expression and molecular signals going back and forth. These signals are used in various combinations in many developmental processes, but given the cell types located here, due to the prior location-based patterning of the embryo in larger coordinate schemes, and the particular combination of signals, they orchestrate tooth development. Over evolution, these signals have been diverse in the highest degree across mammals, creating teeth of all sorts of conformations and functions, from whale baleen to elephant tusks. The question these researchers posed was whether sharks use the same mechanisms to make their teeth, which across that phylum are also highly diverse in form, including complicated cusp patterns. Indeed, sharks even develop teeth on their skin- miniature teeth called denticles.

Shark skin is festooned with tiny teeth, or denticles.

These authors show detailed patterns of expression of a variety of the known gene-encoded components of tooth development, in a shark. For example, WNT11(C)  is expressed right at the future cusp, also known as the enamel knot, an organizing center for tooth development. Dental epithelium (de) and dental mesenchyme (dm) are indicated. Cell nuclei are stained with DAPI, in gray. Dotted lines indicate the dental lamina composed of he dental epithelium, and large arrows indicate the presumptive enamel knot, which prefigures the cusp of the tooth and future enamel deposition.

The answer- yes indeed. For instance, sharks use the WNT pathway (panel C) and associated proteins (panels A, B, D) in the same places as mammals do, to determine the enamel knot, cusp formation, and the rest. The researchers use some chemical enhancers and inhibitors of WNT signaling to demonstrate relatively mild effects, with the inhibitor reducing tooth size and development, and the enhancer causing bigger teeth, occasionally with additional cusps. While a few differences were seen, overall, tooth development in sharks and mammals is quite similar in molecular detail. 

The researchers even went on to deploy a computer model of tooth development that incorporates twenty six gene and cellular parameters, which had been developed for mammals. They could use it to model the development of shark teeth quite well, and also model their manipulations of the WNT pathway to come out with realistic results. But they did not indicate that the overall differences in detail between mouse and shark tooth development were recapitulated faithfully by these model alterations. So it is unlikely that strict correspondence of all the network functions could be achieved, even though the overall system works similarly.

The authors offer a general comparison of mouse and shark tooth development, centered around the dental epithelium, with mesenchyme in gray. Most genes are the same (that is, orthologous) and expressed in the same places, especially including an enamel knot organizing center. For mouse, a WNT analog is not indicated, but does exist and is an important class of signal.

These authors did not, additionally, touch on the question of why tooth production stops in mammals, and is continuous in sharks. That is probably determined at an earlier point in the tissue identity program. Another paper indicated that a few of the epithelial stem cells that drive tooth development remain about in our mouths through adulthood. Indeed, these cells cause rare cancers (ameloblastoma). It is these cells that might be harnessed, if they could be prodded to multiply and re-enter their developmental program, to create new teeth.


  • Boring, condescending, disposable, and modern architecture is hurting us.
  • Maybe attacking Russia is what is needed here.

Saturday, June 18, 2022

Balancing Selection

Human signatures of balancing selection, one form and source of genomic variation.

We generally think of selection as an inexorable force towards greater fitness, eliminating mutations and less fit forms in favor of those more successful. But there is a lot else going on. For one thing, much mutation is meaningless, or "neutral". For another, our lives and traits are so complicated that interactions can lead to hilly adaptive landscapes where many successful solutions exist, rather than just one best solution. One form of adaptive and genetic complexity is balancing selection, which happens when two alleles (i.e. mutants or variants) of one gene have distinct roles in the whole organism or ecological setting, each significant, and thus each is maintained over time. 

A quick example is color in moths. Dark colors work well as camouflage in dirty urban environments, while lighter colors work better in the countryside. Since both conditions exist, and moths move around between them, both color schemes are selected for, resulting in a population that is persistently mixed for this trait. Indeed, the capacity of predators to learn these colors may also lead to an automatic advantage for the less frequent color, another form of balancing selection. Heterozygotes may also have an intrinsic advantage, as is so clearly the case for the sickle cell mutation in hemoglobin, against malaria. These are all classic examples. But to bring it home, a society has only so much capacity for people like Donald Trump. Insofar as sociopathy is genetic, there will necessarily be a frequency-dependent limit, where this trait (and other antisocial traits) may be highly successful at (extremely) low frequency, but terminally destructive at high frequencies.


Schematic selective landscapes. Sometimes selection just optimizes an existing trait by intensifying it (1), or moving it along trait space to a new optimum (2). But other times, multiple forms (i.e. variants, or mutations) of a given locus each have some useful / beneficial characteristic, and may be selected either discretely for particular effects (3), or generally for their diversity (4).

One laborious method to find such sites of balancing selection in a genome is to compare it to genomes of other species. If the same variants exist in each species over long periods of divergence, that argues that such conserved sites of diversity are maintained by balancing selection. Studies of humans and chimpanzees have found some such sites, but not many. But these methods are known to be very conservative, missing out on what is likely to be most cases.

A recent paper offered a slighly more sensitive way to find signs of balancing selection in the human genome, and found quite a lot of them. (Some background here.) It is based, as many investigations of selection are, on a special property of protein-coding genes, due to the degeneracy of the genetic code, that some mutations are "synonymous" and lead to no change in the coded protein, and others are "non-synonymous" and do change the protein. The latter would be assumed to be visible to selection, and sometimes give significant signals of conservation (i.e. low rates of change between species and populations, and few variations maintained in a population). This embedded signal/control pairing of information helps to insulate against many problems in analysis, and can tell us pretty directly how severe selection is on such sites. 

It is worth adding that each basepair in the human genome has its own selective constraints. One position may code for the active site of some enzyme and be extremely well conserved, while the next may be a "synonymous" that has very few or no selective constraints, and another lies in junk DNA that doesn't code for anything or regulate anything, is effectively neutral, and can be changed with no effect. The system is in this sense massively parallel, and able to experience evolution individually at each site concurrently. On the other hand, selection on one site affects the frequencies at nearby sites, since selective "sweeps" through that area of the genome drag the nearby regions of DNA (and whatever variants they may harbor) along, whether positively if the site is increasing in frequency, or negatively if it is deleterious and causing death of its bearers. The reach of this "linkage" effect depends on the recombination frequency, which is relatively low, leading the moderate stability (and linkage) of relatively large "haplotypes" in our genomes.

At any rate, as the methods for detecting selection improve, more selection is detected, which is the lesson of this paper. These authors claim that while their method still significantly under-estimates balancing selection, they find evidnce for the existence of hundreds of sites in humans, when comparing genomes between different geographic regions of the world. A couple hundred of these sites are in the MHC regions- the immunological areas of the genome that code for antibodies and related proteins. These are well-known to be hotspots both for diversity and for the ongoing selective arms race vs pathogens (as we have recently experienced vs Covid). Seeing a lot of balancing selection there makes complete sense, naturally. 

The authors note that their focus on coding regions of the genome, and other technical limitations such as the need to find these sites through population comparisons, argues strongly that their estimate is a severe undercount. Thus one can assume that there will be at least several thousand sites of balanced selection in humans. This is quite apart from the many more sites of ongoing unidirectional selection, mostly purifying against problem mutations, but also towards positive characteristics. An accounting that is only starting to get going, over the vast amounts of variation we harbor. So we live in a dynamic world, inside and out.


  • Green fuel for airplanes... really?
  • Barr is not the good guy here.
  • Free speech- not entirely free.
  • Court to workers: drop dead.
  • Islam and the megadrought.
  • Is crypto this cycle's subprime black hole?

Sunday, May 29, 2022

Evolution Under (Even in) Our Noses

The Covid pandemic is a classic and blazingly fast demonstration of evolution.

Evolution has been "controversial" in some precincts. While tradition told the fable of genesis, evolution told a very different story of slow yet endless change and adaptation- a mechanistic story of how humans ultimately arose. The stark contrast between these stories, touching both on the family tree we are heir to, and also on the overall point and motivation behind the process, caused a lot of cognitive dissonance, and is a template of how a fact can be drawn into the left/right, blue/red, traditional/progressive cultural vortex.

This all came to a head a couple of decades ago, when in the process of strategic retreat, anti-evolution forces latched onto some rather potent formulations, like "just a theory", and "intelligent design". These were given a lot of think tank support and right wing money, as ways to keep doubt alive in a field that scientifically had been settled and endlessly ramified for decades. To scientists, it was the height of absurdity, but necessitated wading into the cultural sphere in various ways that didn't always connect effectively with their intended audience. But eventually, the tide turned, courts recognized that religion was behind it all, and kept it out of schools. Evolution has more or less successfully receded from hot-button status.

One of the many rearguard arguments of anti-evolutionists was that sure, there is short-term evolution, like that of microbes or viruses, but that doesn't imply that larger organisms are they way they are due to evolution and selection. That would be simply beyond the bounds of plausibility, so we should search for explanations elsewhere. At this point they were a little gun-shy and didn't go so far in public as to say that elsewhere might be in book like the Bible. This line of argument was a little ironic, since Darwin himself hardly knew about microbes, let alone viruses, when he wrote his book. The evidence that he adduced (in some profusion) described the easily visible signs of geology, of animals and plants around the world, (including familar domestic animals), which all led to the subtle, yet vast, implications he drew about evolution by selection. 

So it has been notable that the vistas of biology that opened up since that time, in microbiology, paleontology, genetics, molecular biology, et al., have all been guided by these original insights and have in turn supported them without fail. No fossils are found out of order in the strata, no genes or organisms parachute in without antecedents, and no chicken happens without an egg. Evolution makes sense of all of biology, including our current pandemic.

But you wouldn't know it from the news coverage. New variants arise into the headlines, and we are told to "brace" for the next surge, or the next season. Well, what has happened is that the SARS-COV2 virus has adapted to us, as we have to it, and we are getting along pretty well at this point. Our adaptation to it began as a social (or antisocial!) response that was very effective in frustrating transmission. But of late, it has been more a matter of training our immune systems, which have an internal selective principle. Between rampant infections and the amazing vaccines, we have put up significant protective barriers to severe illness, though not, notably, to transmission.

But what about the virus? It has adapted in the most classic of ways, by experiencing a wide variety of mutations that address its own problems of survival. It is important to remember that this virus originated in some other species (like a bat) and was not very well adapted to humans. Bats apparently have countless viruses of this kind that don't do them much harm. Similarly, HIV originated in chimpanzee viruses that didn't do them much harm either. Viruses are not inherently interested in killing us. No, they survive and transmit best if they keep us walking around, happily breathing on other people, with maybe an occasional sneeze. The ultimate goal of every virus is to stay under the radar, not causing its host to either isolate or die. (I can note parenthetically that viruses that do not hew to this paradigm, like smallpox, are typically less able to mutate, thus less adaptable, or have some other rationale for transmission than upper respiratory spread.)

And that is clearly what has happened with SARS-COV2. Local case rates in my area are quite high, and wastewater surveilance indicates even higher prevalence. Isolation and mask mandates are history. Yet hospitalizations remain very low, with no one in the ICU right now. Something wonderful has happened. Part of it is our very high local vaccination rate, (96% of the population), but another part is that the virus has become less virulent as it has adapted to our physiology, immune systems, media environment and social practices, on its way to becoming endemic, and increasingly innocuous. All this in a couple of years of world-wide spread, after billions of infections and transmissions.

The succession (i.e. evolution) of variants detected in my county

The trend of local wastewater virus detection, which currently shows quite high levels, despite mild health outcomes.

So what has the virus been doing? While it has many genes and interactions with our physiology, the major focus has been on the spike protein, which is most prominent on the viral surface, is the first protein to dock to specific human proteins (the ACE2 cell surface receptor), and is the target of all the mRNA and other specific subunit vaccines. (As distinct from the killed virus vaccines that are made from whole viruses.) It is the target of 40% of the antibodies we naturally make against the whole virus, if we are infected. It is also, not surprisingly, the most heavily mutated portion of the virus, over the last couple of years of evolution. One paper counts 45 mutations in the spike protein that have risen to the level of "variants of concern" at WHO. 

"We found that most of the SARS-COV-2 genes are undergoing negative purifying selection, while the spike protein gene (S-gene) is undergoing rapid positive selection."


Structure of the spike protein, in its normal virus surface conformation, (B, C), and in its post-triggering extended conformation that reaches down into the target cell's membrane, and later pulls the two together. Top (in B, C) is where it binds to the ACE2 target on respiratory cells, and bottom is its anchor in the viral membrane coat (D shows it upside-down). At top (A) is the overall domain structure of the protein, in its linear form as synthesized, especially the RBD (receptor binding domain) and the two protease cleavage sites that prepare it for eventual triggering.


The spike protein is a machine, not just a blob. As shown in this video, it starts as a pyramidal blob flexibly tethered to the viral surface. Binding the ACE2 proteins in our respiratory tracts triggers a dramatic re-organization whereby this blob turns into a thin rope, which drops into the target cell. Meanwhile, the portion stuck to the virus unfolds as well and turns into threads that wind back around the newly formed rope, thereby pulling the virus and the target cell membrane together and ultimately fusing them. This is, mechanistically, how the virus gets inside our cells.

The triggering of the spike protein is a sensitive and adjustable process. In related viruses, the triggering is more difficult, and waits till the virus is engulfed in a vesicle that taken into the cell, and acidified in the normal process of lysosomal destruction / ingestion of outside materials. The acidification triggers these viral spike proteins to fire and release the virus into the cell. Triggering also requires cleavage of the spike protein with proteases that cut it at two locations. Other related viruses sometime wait for a target host protease to do the honors, but SARS-COV2 spike protein apparently is mostly cleaved during production by its originating host. This raises the stakes, since it can then more readily trigger, by accident, or once it finds proper ACE2 receptors on a target host. One theme of recent SARS-COV2 evolution is that triggering has become slightly easier, allowing the virus to infect higher up in the respiratory system. The original strains set up infections deep in the lung, but recent variants infect higher up, which lessens the systemic risks of infection to the host, promotes transmissibility, and speeds the infection and transmission process. 

The mutations G339D, N440K, L452R, S477N, T478K, and E484K in the spike region that binds to ACE2 (RBD, or receptor binding domain) promotes this interaction, raising transmissibility. (The nomenclature is that the number gives the position of the amino acid in the linear protein sequence, and the letters give the original version of the amino acid in one letter code (start) and in the mutated version (end)). Overall, mutations of the spike protein have increased the net charge on the spike protein significantly in the positive direction, which encourages binding to the negatively charged ACE2 protein. D614G is not in this region, but is nearby and seems to have similar effects, stabilizing the protein. The P681 mutation in one of the cleaved regions promotes proteolysis by the enzyme furin, thus making the virus more trigger-able. 

What are some other constraints on the spike protein? It needs to evade our vaccines and natural immunity, but has seemingly adapted to a here-and-gone infection style, though with periodic re-infection, like other colds. So any change is good for the purpose of camouflage, as long as its essential functions remain intact. The N-terminal, or front, domain of the spike protein, which is not involved directly in ACE2 binding, has experienced a series of mutations of this kind. An additional function it seems to have is to mimic a receptor for the cytokine interleukin 8, which attracts neutrophils and encourages activation of macrophages. Such mimicry may reduce this immune reaction, locally. 

In comparison to all these transmissibility-enhancing mutations, it is not clear yet where the mutations that decrease virulence are located. It is likely that they are more widely distributed, not in the gene encoding the spike protein. SARS-COV2 has a remarkable number of genes with various interactions with our immune systems, so the scope for tuning is prodigious. If all this can be accomplished in a couple of years, image what a million, or a billion, years can do for other organisms that, while they have slower reproduction cycles and more complicated networks of internal and external relations, still obey that great directive to adapt to their circumstances.


  • Late link, on receptor binding vs immune evasion tradeoffs.
  • Yes, chimpanzees can talk.
  • The rich are getting serious about destroying democracy.
  • Forced arbitration is, generally, unconscionable and should be illegal.
  • We could get by with fewer nuclear weapons.
  • Originalism would never allow automatic or semiautomatic weapons.

Saturday, March 12, 2022

DNA Damage Domain Declines to Bind DNA

How one protein domain changed through time.

The BRCA1 and BRCA2 genes are notorious for harboring mutations that increase susceptibility to breast cancer (thus their name, breast cancer type 1 (or 2) susceptibility protein). They have therefore been intensively studied for what they do in the normal course of our cellular lives. Their common naming does not mean they are similar- their structures are completely different. They play related, but distinct, roles in DNA repair, which is naturally influential in our susceptibility to cancer caused by DNA mutations.

An article some time back delved into the history of one domain of the BRCA1 protein, tracing how its functions have changed significantly over evolutionary time. BRCA1 is a large gene encoding a large protein, (1863 amino acids long), composed of several domains. Proteins frequently possess several domains in order to integrate several functions in an orderly way, such as binding a few different partners that together form a complex and carry out some function. Modular protein domains facilitate evolution by being easily duplicated, transferred, and generally being able to be passed around, thanks to rearrangement mutations. BRCA1 has domains that bind to at least 11 other proteins,  most of which play some role in DNA damage responses. So it is a key protein, and damage to it has correspondingly bad effects. 

The domains of BRCA1. Each one has some role in the protein's function, which integrates responses to DNA damage. The BRCT domains are on the very end, right side. NLS is nuclear localization (import) sequence, and NES is the nuclear export signal. These would be typically regulated by other interacting proteins or phosphorylation, to control the access of BRCA1 to the nucleus.

The domain of interest here is the BRCT, or BRCA1 C-terminal domain. It is ~90 amino acids long and BRCA1 has two of them, side by side. Other work has shown that it binds to other proteins, but only after they have been modified by phosphate addition. The DNA damage sensor ATM is one such kinase that adds phosphates to BRCA1 targets such as Abraxis. Thus the BRCT domain plays the key role of bringing this DNA damage repair integrating protein to the right sites, where there is DNA damage to repair. 

Structure of the BRCT double domains in BRCA1 (E). The pocket that binds a phosphorylated serine residue on a partner protein such as abraxis is shown in teal, and in (C), close up. (B) shows a single BRCT domain.


This paper did a sensitive computer search for all possible versions of this domain in all available species and proteins, finding it in 23 human proteins, and in species all the way back to bacteria, so is quite ancient. And the phylogeny they reconstruct indicates that the original versions of these domains had a different function, which was to bind DNA directly, at sites of DNA damage! Such frayed ends also have phosphate groups, so it isn't a huge leap from one function to another. Additionally, other examples of BRCT domains have dispensed with phosphate-dependent binding altogether, but simply bind other proteins regardless. This transition may have happened after phosphorylation became the central way to alert the cell, and key proteins, to the existence of DNA damage, instead of dealing with it solely through enzymes that find & fix such damage directly. This transition allowed a much more robust response by cells, which now includes halting the cell division cycle and activating other stress responses to help the cell recover.

Some of the BRCT domains (along with many others) found in various species and their proteins.

The BRCT domain is mostly used among proteins involved in DNA repair, and even in humans some versions bind DNA directly (PARP1, RFC1). So through the long path of evolution, this single domain has stuck generally to its original role, while it also- along with the organisms and proteins it acts within- diversified and ramified in its functions. From an initial role in direct DNA damage and end recognition, it has become a card-carrying member of the bureaucracy of the cell, playing regulatory and organizing roles within numerous actors important to DNA handling and repair. It is a classic story of how eukaryotes used their surfeit of energy and material resources to develop whole orders of novel molecular, and concomitant outward, complexity.


  • There are a lot of places we shouldn't get our energy from.
  • But we are hopelessly dependent and immature.
  • Partisan hack on the Supreme Court.
  • What the Russians think of negotiation.
  • Is it more than a job? Should it be?
  • Ruminations on war.

Saturday, December 4, 2021

Supergroups in Search of Their Roots

The early stages of eukaryotic evolution are proving hard to reconstruct.

There is normal evolution, and then there are great evolutionary transitions. Not to say that the latter don't obey the principles of normal evolution, but they go by so fast, and render so many transitional forms obsolete along the way, that there is little record left of what happened. Among those great transitions are the origin of life itself, the origin of humans, and the origin of eukaryotes. We are slowly piecing together human evolution, from the exceedingly rare fossils of intermediate forms and branch off-shoots. But looking at the current world, we are the lone hominin, having displaced or killed off all competitors and predecessors to stand alone atop the lineage of primates, and over the biosphere generally. Human evolution didn't violate any natural laws, but it seems to have operated under uniquely directional selection, especially for intelligence and social sophistication, which led to a sort of arms race of rapid evolution that laid the groundwork for an exponential rate in the invention of technologies and collective social forms over the last million years.

Similarly, it is clear that however the origin of life started out, it was a very humble affair, with each innovation quickly displacing its progenitors, just as the early cell phones came out in quick succession, until a technological plateau was reached from which further development was / is less obvious. While the origin and success of eukaryotes did not erase the prokaryotic kingdoms from which they sprang, it does seem to have erased the early stages of its own development, to the point that those stages are very hard to reconstruct, especially given the revolutionary and multifarious nature of their innovations.

Eukaryotes differ from prokaryotes in possessing: nuclei and a nuclear membrane with specialized pores; mitochondria descended from a separate bacterial ancestor (and photosynthetic plastids descended from yet other bacterial ancestors in some cases); sex and meiosis; greater size by several orders of magnitude; phagocytosis by amoeboid cells; internal membrane organelles like golgi, peroxisomes, lysosomes, endocytic and exocytic vesicles; cyclins that run the cell cycle; microtubules that participate in the cell cycle, cytoskeleton, and cilia; cilia, as distinct from flagella; an active actin-based cytoskeleton, with novel motor proteins; a greatly elaborated transcriptional apparatus with modular enhancers and novel classes of transcription regulators; histones; mRNA splicing and introns; nucleolus and small nucleolar RNAs; telomeres on linear chromosomes; a significant increment in the size of both ribosomal subunits. Indeed, the closer one looks at the molecular landscape, the more differences accumulate. This was quite simply a quantum leap in cellular organization, which happened sometime between 1.8 and 3 billion years ago. Indeed, eukaryotes are not just the McMansions of the microbial world, but the Downton Abbeys- with dutiful servants and complex and luxurious internal economies that prokaryotic cells couldn't conceive of.

Major lineages of eukaryotes are traced back to their origins in a molecular-based phylogeny. Animals (and fungi!) are in the Opisthokonta, plants in the Chloroplastida. So many groups connect right to the "root" of this tree that there is little way to figure out which came first. Also, the dashed lines indicate uncertainty about those orderings/rootings as well, which leaves a great deal of early eukaryotic evolution obscure. Some abbreviations / links are- CRuMs: collodictyonids (syn. diphylleids) + rigifilida + mantamonas; excavates, hemimastigophora, haptista, TSAR:  telonemids, stramenopiles, alveolates, and rhizaria.


A recent paper recounts the current phylogenetic state of affairs, and a variety of other papers over the last decade delve into the many questions surrounding eukaryotic origins. While molecular phylogenies have improved tremendously with the advent of faster, whole-genome sequencing and the continued collection of obscure single-celled eukaryotes, (aka protists), the latest phylogeny, as shown above, remains inconclusive. The deepest root is both uncertain with regard to its bacterial progenitor, and to which current eukaryotes bear the closest relation. There are occasional fossil kelps, algae, and other biochemical traces back to 2.0 to 2.7 billion years, (though some do not put the origin earlier than 1.8 billion years) but these have not been able to shed any light on the order of events either.

Nevertheless, the field can agree on a few ideas. One is that the assimilation of mitochondria (whether willing or unwilling) is perhaps the dominant event in the sequence. That doesn't mean it was necessarily the first event, but means that it created a variety of conditions that led to a cascade of other consequences and features. The energy mitochondria provided enabled large cell sizes and the accumulation of a whole new household full of junk, like lipids in several new membrane compartments. The genome that they contributed brought in thousands of new genes, including introns. 

Secondly, the loss of cell walls and the adoption of amoeboid carnivory is likely one of the first events in the evolutionary sequence. Shedding the obligatory cell wall that all bacteria have necessitates a cytoskeleton of some kind, and it is also conducive to the engulfment of the proto-mitochondrion. For while complicated co-symbiotic metabolic arguments have been devised to explain why these two cells may have engaged in a long-term mutual relationship long before their ultimate consumation, the most convenient hypothesis for assimilation remains the simplest- that one engulfed the other, in a meal that lasted well over a billion years.

Thirdly, the question of what the progenitor cell was has been refined somewhat. One of the most intriguing findings of the last half-century of biology was the discovery of archaebacteria (also called archaea)- a whole new kingdom of bacteria characterized by their tendency to occupy extreme habitats, their clear separation from bacteria by chemical and genetic criteria, and also their close relationship to eukaryotes, especially what is presumed to be the original host genome. Many proposals have been made, (including that archaea are the original cell, preceding other bacteria), but the best one currently is that archaea split from the rest of bacteria rather late, after which eukaryotes split off from archaea, thus making the latter two sister groups. This explains the many common traits they share, while allowing significant divergence, plus the incorporation of many bacterial features into eukaryotes, either through the original lineage, or by later transfer from the proto-mitochondrion. So here at last is one lineage that survived out of the gradual development of eukaryotes- the archaea, though one wouldn't guess it from looking at them. It took analysis at the molecular level to even know that archaea existed, let alone that they are the last extant eukaryotic sister group.

comically overstuffed figure from an argument for the late development of archaebacteria out of pre-existing bacteria (prokaryotes), with subsequent split and diversification of eukaryotes out of a proto-archaeal lineage. Many key molecular and physiological characters are mentioned.

Lastly, surveying the various outlying protist lineages for clues about which might hearken back to primitive eukaryotic forms, one research group suggests that the collodictyonids might fit the bill. Being an ancient lineage means that it is lonesome, without a large family of evolutionary development to show diversification and change. It also means that in molecular terms, it is highly distinct, branching deeply from all other groups. Whether that all means that it resembles an ancient / early form of the eukaryotic cell, or went its own way on a unique evolutionary trajectory, is difficult to say. For each trait, (including sequence traits), a phylogenetic analysis is done to figure out whether it is differential- shared with some other lineages but not all- whether those without the trait lost it at some later point, or whether it was gained by a sub-group. After analyzing enough such traits, one can make a statement about the overall picture, and thus the "ancient-ness", of an organism.

Is anything special about collodictyon? Not really. It is predatory, and has four flagella and a feeding groove, which functions as a sort of mouth. It can make pseudopods, has normal microtubule organizing centers for its flagella, and generally all the accoutrements of a eukaryotic cell. It lacks nothing, and thus may be an early branching eukaryote, but is not in any way a transitional form.

An unassuming protist (collodictyon) as possible representative of early eukaryotes. Its cilia are numbered.


At this point, we are left still peering darkly into the past, though obscure living protists and their molecular fossils, trying to figure out what happened when they split from the bacteria and archaea. A tremendous amount happened, but little record survives of the path along the way. That tends to be characteristic of the most momentous evolutionary events, which cause internal and external cataclysms, (including the opening of whole new lifestyles to exploit), that necessitate a rapid dynamic of further adaptation before their descendents achieve a stable and successful state sufficient to ride out the ensuing billion or more years ... before we come on the scene with the ability and interest to contemplate what went before.


  • Red regions have three times the death rates from Covid as blue regions. Will that change electoral math?
  • Annals of secession, cont.
  • Sad spectacle at the court.
  • Analysis of how the energy transition might go. Again, a carbon tax would help.