Showing posts with label cell biology. Show all posts
Showing posts with label cell biology. Show all posts

Sunday, August 21, 2022

What Holds up the Nucleus?

Cell nuclei are consistently sized with respect to cell volume, and pleasingly round. How does that happen?

An interesting question in biology is why things are the size they are. Why are cells so small, and what controls their size? Why are the various organelles within them a particular size and shape, and is that controlled in some biologically significant way, or just left to some automatic homeostatic process? An interesting paper come out recently about the size of the nucleus, home of our DNA and all DNA-related transactions like transcription and replication. (Note to reader/pronouncer: "new clee us", not "new cue lus".) 

The nucleus, with parts labeled. Pores are large structures that control traffic in and out. 

The nucleus is surrounded by a double membrane (the nuclear membrane) studded with structurally complex and interesting pores. These pores are totally permeable to small molecules like ions, water, and very small proteins, but restrict the movement of larger proteins and RNAs, and naturally, DNA. To get out, (or in), these molecules need to have special tags, and cooperate with nuclear transport proteins. But very large complexes can be transported in this way, such as just-transcribed RNAs and half-ribosomes that get assembled in the nucleolus, a small sub-compartment within the nucleus (which has no membrane, just a higher concentration of certain molecules, especially the portion of the genomic DNA that encodes ribosomal RNA). So the nuclear pore is restrictive in some ways, but highly permissive in other ways, accommodating transmitted materials of vastly different sizes.

Nuclear pores are basket-shaped structures that are festooned, particularly inside the channel, with disordered phenylalanine/glycine rich protein strands that act as size, tag, and composition-based filters over what gets through.

The channels of nuclear pores have a peculiar composition, containing waving strands of protein with repetitive glycine/phenylalanine composition, plus interspersed charged segments (FG domains). This unstructured material forms a unique phase, neither oily nor watery, that restricts the passage of immiscible molecules, (i.e., most larger molecules), unless accompanied by partners that bind specifically to these FG strands, and thus melt right through the barrier. This mechanism explains how one channel can, at the same time block all sorts of small to medium sized RNAs and proteins, but let through huge ribosomal components and specifically tagged and spliced mRNAs intended for translation.

But getting back to the overall shape and size of the nucleus, a recent paper made the case in some detail that colloid pressure is all that is required. As noted above, all small molecules equilibrate easily across the nuclear membrane, while larger molecules do not. It is these larger molecules that are proposed to provide a special form of osmotic pressure, called colloid osmotic pressure, which gently inflates the nucleus, against the opposing force of the nuclear membrane's surface tension. No special mechanical receptors are needed, or signaling pathways, or stress responses.

The paper, and an important antecedent paper, make some interesting points. First is that DNA takes up very little of the nuclear volume. Despite being a huge molecule (lengthwise), DNA makes up less than 1% of nuclear volume in typical mammalian cells. Ribosomal RNA, partially constructed ribosomal components, tRNAs, and other materials are far more abundant and make up the bulk of large molecules. This means that nuclear size is not very sensitive to genome copy number, or ploidy in polyploid species. Secondly, they mention that a vanishingly small number of mutants have been found that affect nuclear size specifically. This is what one would expect for a simple- even chemical- homeostatic process, not dependent on the usual signaling pathways of cellular stress, growth regulation, etc., of which there are many.

Where does colloid osmotic pressure come from? That is a bit obscure, but this Wiki site gives a decent explanation. When large molecules exist in solution, they exclude smaller molecules from their immediate vicinity, just by taking up space, including a surface zone of exclusion, a bit like national territorial waters. That means that the effective volume available to the small solutes (which generally control osmotic pressure) is slightly reduced. But when two large molecules collide by random diffusion, the points where they touch represent overlapping exclusion zones, which means that globally, the net exclusion zone from large molecules has decreased, giving small solutes slightly more room to move around. And this increased entropy of the smaller solutes drives the colloid osmotic pressure, which rises quite rapidly as the concentration of large molecules increases. The prior paper argues that overall, cells have quite low colloid osmotic pressure, despite their high concentrions of complex large molecules. They are, in chemical terms, dilute. This helps our biochemistry do its thing with unexpectedly rapid diffusion, and is explained by the fact that much of our molecular machinery is bound up in large complexes that reduce the number of independent colloidal particles, even while increasing their individual size.

So much for theory- what about the experiment? The authors used yeast cells (Schizosaccharomyces pombe), which are a common model system. But they have cell walls, which the researchers digested off before treating them with a variety of osmolytes, mostly sorbitol, to alter their osmotic environment (not to mention adding fluorescent markers for the nuclear and plasma membranes, so they could see what was going on). Isotonic concentration was about 0.4 Molar (M) sorbitol, with treatments going up to 4M sorbitol (hypertonic). The question was.. is the nucleus (and the cell as a whole) a simple osmometer, reacting as physical chemistry would expect to variations in osmotic pressure from outside? Recall that high concentrations of any chemical outside a cell will draw water out of it, to equalize the overall water / osmotic pressure on both sides of the membrane.

Schizosaccharomyces pombe are oblong cells (left) with plasma membrane marked with a green fluorescent marker, and the nuclear membrane marked with a purple fluorescent marker. If one removes the chitin-rich cell wall, the cells turn round, and one can experiment on their size response to osmotic pressure/treatment. Hypertonic (high-sorbitol, top) treatment causes the cell to shrink, and causes the  nucleus to shrink in strictly proportional fashion, indicating that both have simple composition-based responses to osmotic variation.


They found that not only does the outer cell membrane shrink as the cell comes under hypertonic shock, but the nucleus shinks proportionately. A number of other experiments followed, all consistent with the same model. One of the more interesting was treatment with leptomycin B (LMB), which is a nuclear export inhibitor. Some materials build up inside the nucleus, and one would expect that, under this simple model of nuclear volume homeostasis, the nuclei would gradually gain size relative to the surrounding cell, breaking the general observation of strict proportionality of nuclear to cell volumes.

Schizosaccharomyces pombe cells treated with a drug that inhibits nuclear export of certain proteins causes the nuclear volume to blow up a little bit, relative to the rest of the cell.

That is indeed what is seen, not really immediately discernable, but after measuring the volumes from micrographs, evident on the accompanying graph (panel C). So this looks like a solid model of nuclear size control, elegantly explaining a small problem in basic cell biology. While there is plenty of regulation occuring over traffic into and out of the nucleus, that has critical effects on gene expression, translation, replication, division, and other processes, the nucleus can leave its size and shape to simple biophysics and not worry about piling on yet more ornate mechanisms.


  • About implementing the climate bill and related policies.
  • We should have given Ukraine to Russia, apparently. Or something.
  • Big surprise- bees suffer from insecticides.

Sunday, July 10, 2022

Tooth Development and Redevelopment

Wouldn't it be nice to regrow teeth? Sharks do.

Imagine for a minute if instead of fillings, crowns, veneers, posts, bridges, and all the other advanced technologies of dental restoration, a tooth could be removed, and an injection prompt the growth of a complete replacement tooth. That would be amazing, right? Other animals, such as sharks and fish, regrow teeth all the time. But we only get two sets- our milk teeth and mature teeth. While mature mammalian teeth are incredibly tough and generally last a lifetime, modern agriculture and other conditions have thrown a wrench into human dental health, which modern dentistry has only partially restored. As evolution proceeded into the mammalian line, tooth development became increasingly restricted and specialized, so that the generic teeth that sharks spit out throughout their lives have become tailored for various needs across the mouth, firmly anchored into the jaw bone, and precisely shaped to fit against each other. But the price for this high-level feature set seems to be that we have lost the ability to replace them.

So researchers are studying tooth development in other animals- wondering how similar they are to human development, and whether some of their tricks can be atavistically re-stimulated in our own tissues. While the second goal remains a long way off, the first has been productively pursued, with teeth forming a model system of complex tissue development. A recent paper (with review) looked at similarities between molecular details of shark and mammalian tooth development.

Teeth are the result of an interaction between epithelial tissues and mesenchymal tissues- two of the three fundamental tissues of early embryogenesis. Patches of epithelium form dental arches around the two halves of the future mouth. Spots around these arches expand into dental placodes, which grow into buds, and as they interact continuously with the inner mesenchyme, form enamel knots. The epithelial cells of the knot then eventually start producing enamel as they pull away from interface, while the mesenchymal cells produce dentin and then the pulp and other bone-anchoring tissues of the inner tooth and root as they pull away in the opposite direction. 

Embryonic tooth development, which depends heavily on the communication between epithelial tissue (white) and mesenchymal tissue (pink). An epithelial "enamel knot" (PEK/ SEK) develops at the future cusp(s), where enamel will be laid down by the epithelial cells, and dentin by the mesenchymal cells. Below are some of the molecules known to orchestrate the activities of all these cells. Some of these molecules are extracellular signals (BMP, FGF, WNT), while others are cell-internal components of the signaling systems (LEF, PAX, MSX).

Naturally, all this doesn't happen by magic, but by a symphony of gene expression and molecular signals going back and forth. These signals are used in various combinations in many developmental processes, but given the cell types located here, due to the prior location-based patterning of the embryo in larger coordinate schemes, and the particular combination of signals, they orchestrate tooth development. Over evolution, these signals have been diverse in the highest degree across mammals, creating teeth of all sorts of conformations and functions, from whale baleen to elephant tusks. The question these researchers posed was whether sharks use the same mechanisms to make their teeth, which across that phylum are also highly diverse in form, including complicated cusp patterns. Indeed, sharks even develop teeth on their skin- miniature teeth called denticles.

Shark skin is festooned with tiny teeth, or denticles.

These authors show detailed patterns of expression of a variety of the known gene-encoded components of tooth development, in a shark. For example, WNT11(C)  is expressed right at the future cusp, also known as the enamel knot, an organizing center for tooth development. Dental epithelium (de) and dental mesenchyme (dm) are indicated. Cell nuclei are stained with DAPI, in gray. Dotted lines indicate the dental lamina composed of he dental epithelium, and large arrows indicate the presumptive enamel knot, which prefigures the cusp of the tooth and future enamel deposition.

The answer- yes indeed. For instance, sharks use the WNT pathway (panel C) and associated proteins (panels A, B, D) in the same places as mammals do, to determine the enamel knot, cusp formation, and the rest. The researchers use some chemical enhancers and inhibitors of WNT signaling to demonstrate relatively mild effects, with the inhibitor reducing tooth size and development, and the enhancer causing bigger teeth, occasionally with additional cusps. While a few differences were seen, overall, tooth development in sharks and mammals is quite similar in molecular detail. 

The researchers even went on to deploy a computer model of tooth development that incorporates twenty six gene and cellular parameters, which had been developed for mammals. They could use it to model the development of shark teeth quite well, and also model their manipulations of the WNT pathway to come out with realistic results. But they did not indicate that the overall differences in detail between mouse and shark tooth development were recapitulated faithfully by these model alterations. So it is unlikely that strict correspondence of all the network functions could be achieved, even though the overall system works similarly.

The authors offer a general comparison of mouse and shark tooth development, centered around the dental epithelium, with mesenchyme in gray. Most genes are the same (that is, orthologous) and expressed in the same places, especially including an enamel knot organizing center. For mouse, a WNT analog is not indicated, but does exist and is an important class of signal.

These authors did not, additionally, touch on the question of why tooth production stops in mammals, and is continuous in sharks. That is probably determined at an earlier point in the tissue identity program. Another paper indicated that a few of the epithelial stem cells that drive tooth development remain about in our mouths through adulthood. Indeed, these cells cause rare cancers (ameloblastoma). It is these cells that might be harnessed, if they could be prodded to multiply and re-enter their developmental program, to create new teeth.


  • Boring, condescending, disposable, and modern architecture is hurting us.
  • Maybe attacking Russia is what is needed here.

Saturday, June 25, 2022

Visualizing Profilin

Profilin as a part of the musculo-skeletal system that motors our cells around. But how can we tell?

Our cells have structural elements called the cytoskeleton. The term is a misnomer, since the cytoskeleton comprises the muscles of the cell as well as its rigid supports. There are three types of rigid element- actin filaments, intermediate filaments, and microtubules. Intermediate filaments are the stable, relatively inert part of the equation, making up structures like keratins that shape our skin, hair, and nails. Actin and microtubules, however are highly dynamic and contribute to amoeboid motion, developmental cell motions, neural extensions, and all kinds of other shape changes cells perform. Microtubules are bigger and stiffer, (25 nm diameter, hundreds of times stiffer than actin filaments), and participate in big, discrete processes like separating the chromosomes at division, and forming the core of cilia that wave from the outside of the cell. 

Actin (6 nm diameter) is more pervasive all over the cell, and is what provides the main motive force of ameboid motions and cell shape change. Indeed, our muscles are mostly composed of great quantities of actin along with interdigitated filaments of its corresponding motor protein (myosin) in orderly, almost crystalline, arrays. Both myosin and actin create motion in two ways- by their own polymerization / depolymerization, and also by way of motors that can move along their lengths.

Images of cells showing fluorescence labeling of skeletal components. Microtubules are shown in green, and DNA in blue. Panel C shows a neuronal growth cone with actin labeled in red. Note how microtubules and actin cooperate, with actin in the lead, pushing out the cell edges by force of its own polymerization. Panel A shows resting cells, with the microtubule organizing center in red. E shows a yeast cell with microtubules spanning its length. G shows a dividing cell at M phase, where microtubules organize the separation of chromosomes, after the microtubule organizing center has itself first divided into two.


A recent paper discussed new tools in the quest to visualize profilin, one of the many accessory proteins involved in managing the cytoskeleton. The most basic role of profilin is to bind to monomers of actin helping them recharge (that is, exchange their ADP for a new ATP). There is a lot of profilin in the cell, and it mostly sits around complexed with actin, preventing it from spontaneously polymerizing. But then if a signal comes in, profilin has binding sites for formin proteins, which tend to be the main instigators of cell shape change and actin polymerization, and can orchestrate the handoff of actin from profilin to growing actin filaments.

The overall actin cycle. Actin monomers are constantly coming on and off of filaments. ATP-charged actin is held in reserve in complex with profilin (dark shapes). Then formins or other accessory proteins can encourage addition to a filament, at one end, called the barbed end. While in filaments, actin gradually hydrolyzes its ATP, forming ADP. Actin with ADP is prone to dissociation, which may be encouraged or discouraged by various other accessory proteins. The resulting actin monomers are then re-bound by profilin and the cycle begins again.


But how can we see all this? Making proteins fluorescent has been now for decades the amazingly effective way to vizualize them. And one can do that either live, or dead. For the latter, the cell is chemically embalmed and permeabilized, then treated with antibodies that bind to the protein(s) of interest. Then a second set of antibodies are applied that bind to the first set, and are labeled with some fluorescent tag, and voila- images of where your protein of interest is, or was. But much more compelling is to see all this in living, working, and moving cells. To do that, the protein of interest is mutated to add an intrinsically fluorescent tag, such as green fluorescent protein. But profilin is so small, and so packed with critical binding sites, that there is little room for a fluorescent tag protein that is, in fact, almost twice as large as profilin itself. 

What to do? These researchers attached a little tail to one end of the protein, off which they then added their tag, in this case a protein called mApple, chosen for its nice red fluorescence spectrum that doesn't interfere with the other greens and blues typically used in these experiments. The paper is mostly then a laborious verification that this new form of profilin fully functions in cells as the wild type does, engages in all the same interactions, (as far as known), and thus consitutes a wonderful new tool for the field.

An atomic structure of profilin bound to actin. Profilin is a very small protein with many important interactions. That makes altering it very tricky. How to create a fluorescent form, or squeeze in some other tag? Profilin binds to actin, to microtubules, to formins and other proteins with PLP (poly-proline) domains, and to phosphoinositide 4,5-bisphosphate (PIP2), which is not even shown here.


It turns out that profilin binds to microtubules as well as to actin. And so do formins. As shown above in the image of a neural growth cone, though the composition of actin and microtubules and their size and other characteristics are very different, they cooperate extensively, thus must have mechanisms of crosstalk. Not much is known, unfortunately, about how this works- while a good bit is known individually how each of the actin and microtubule systems work, how they work together is poorly understood. But one thing these researchers show is that profilin, along with its abundance all over the cell, is also concentrated at the microtubule organizing center. Indeed, some mutations that cause the disease ALS occur right in these regions of profilin that bind microtubules. So something important is going on, and hopefully this new tool will speed work towards greater understanding of how the cytoskeletons operate.

Profilin imaged in a live cell, with other tagged molecules. At left, profilin occurs all over the cell in its role as actin buffer and storage partner. But note a couple of dots on each side. Next is shown the same cell labeled on alpha tubulin, the major component of microtubules. Next is show DNA, which is condensed, as this cell is undergoing division. Last is shown the merged images, with DNA in blue, tubulin in green, and profilin in red/orange. The dots turn out to be the microtubule organizing centers that run the spindle which is orchestrating chromosome segregation.

  • Keep 'em high.. a way to smooth gas price volatility, and fight climate change.
  • And we need a carbon tax for comprehensive decarbonization.
  • Liberals tied in knots by homelessness.
  • All public school systems are at risk.
  • Someone has been watching a little too much Grit TV.
  • Cry me a river- about a shortage of post-docs.

Sunday, May 29, 2022

Evolution Under (Even in) Our Noses

The Covid pandemic is a classic and blazingly fast demonstration of evolution.

Evolution has been "controversial" in some precincts. While tradition told the fable of genesis, evolution told a very different story of slow yet endless change and adaptation- a mechanistic story of how humans ultimately arose. The stark contrast between these stories, touching both on the family tree we are heir to, and also on the overall point and motivation behind the process, caused a lot of cognitive dissonance, and is a template of how a fact can be drawn into the left/right, blue/red, traditional/progressive cultural vortex.

This all came to a head a couple of decades ago, when in the process of strategic retreat, anti-evolution forces latched onto some rather potent formulations, like "just a theory", and "intelligent design". These were given a lot of think tank support and right wing money, as ways to keep doubt alive in a field that scientifically had been settled and endlessly ramified for decades. To scientists, it was the height of absurdity, but necessitated wading into the cultural sphere in various ways that didn't always connect effectively with their intended audience. But eventually, the tide turned, courts recognized that religion was behind it all, and kept it out of schools. Evolution has more or less successfully receded from hot-button status.

One of the many rearguard arguments of anti-evolutionists was that sure, there is short-term evolution, like that of microbes or viruses, but that doesn't imply that larger organisms are they way they are due to evolution and selection. That would be simply beyond the bounds of plausibility, so we should search for explanations elsewhere. At this point they were a little gun-shy and didn't go so far in public as to say that elsewhere might be in book like the Bible. This line of argument was a little ironic, since Darwin himself hardly knew about microbes, let alone viruses, when he wrote his book. The evidence that he adduced (in some profusion) described the easily visible signs of geology, of animals and plants around the world, (including familar domestic animals), which all led to the subtle, yet vast, implications he drew about evolution by selection. 

So it has been notable that the vistas of biology that opened up since that time, in microbiology, paleontology, genetics, molecular biology, et al., have all been guided by these original insights and have in turn supported them without fail. No fossils are found out of order in the strata, no genes or organisms parachute in without antecedents, and no chicken happens without an egg. Evolution makes sense of all of biology, including our current pandemic.

But you wouldn't know it from the news coverage. New variants arise into the headlines, and we are told to "brace" for the next surge, or the next season. Well, what has happened is that the SARS-COV2 virus has adapted to us, as we have to it, and we are getting along pretty well at this point. Our adaptation to it began as a social (or antisocial!) response that was very effective in frustrating transmission. But of late, it has been more a matter of training our immune systems, which have an internal selective principle. Between rampant infections and the amazing vaccines, we have put up significant protective barriers to severe illness, though not, notably, to transmission.

But what about the virus? It has adapted in the most classic of ways, by experiencing a wide variety of mutations that address its own problems of survival. It is important to remember that this virus originated in some other species (like a bat) and was not very well adapted to humans. Bats apparently have countless viruses of this kind that don't do them much harm. Similarly, HIV originated in chimpanzee viruses that didn't do them much harm either. Viruses are not inherently interested in killing us. No, they survive and transmit best if they keep us walking around, happily breathing on other people, with maybe an occasional sneeze. The ultimate goal of every virus is to stay under the radar, not causing its host to either isolate or die. (I can note parenthetically that viruses that do not hew to this paradigm, like smallpox, are typically less able to mutate, thus less adaptable, or have some other rationale for transmission than upper respiratory spread.)

And that is clearly what has happened with SARS-COV2. Local case rates in my area are quite high, and wastewater surveilance indicates even higher prevalence. Isolation and mask mandates are history. Yet hospitalizations remain very low, with no one in the ICU right now. Something wonderful has happened. Part of it is our very high local vaccination rate, (96% of the population), but another part is that the virus has become less virulent as it has adapted to our physiology, immune systems, media environment and social practices, on its way to becoming endemic, and increasingly innocuous. All this in a couple of years of world-wide spread, after billions of infections and transmissions.

The succession (i.e. evolution) of variants detected in my county

The trend of local wastewater virus detection, which currently shows quite high levels, despite mild health outcomes.

So what has the virus been doing? While it has many genes and interactions with our physiology, the major focus has been on the spike protein, which is most prominent on the viral surface, is the first protein to dock to specific human proteins (the ACE2 cell surface receptor), and is the target of all the mRNA and other specific subunit vaccines. (As distinct from the killed virus vaccines that are made from whole viruses.) It is the target of 40% of the antibodies we naturally make against the whole virus, if we are infected. It is also, not surprisingly, the most heavily mutated portion of the virus, over the last couple of years of evolution. One paper counts 45 mutations in the spike protein that have risen to the level of "variants of concern" at WHO. 

"We found that most of the SARS-COV-2 genes are undergoing negative purifying selection, while the spike protein gene (S-gene) is undergoing rapid positive selection."


Structure of the spike protein, in its normal virus surface conformation, (B, C), and in its post-triggering extended conformation that reaches down into the target cell's membrane, and later pulls the two together. Top (in B, C) is where it binds to the ACE2 target on respiratory cells, and bottom is its anchor in the viral membrane coat (D shows it upside-down). At top (A) is the overall domain structure of the protein, in its linear form as synthesized, especially the RBD (receptor binding domain) and the two protease cleavage sites that prepare it for eventual triggering.


The spike protein is a machine, not just a blob. As shown in this video, it starts as a pyramidal blob flexibly tethered to the viral surface. Binding the ACE2 proteins in our respiratory tracts triggers a dramatic re-organization whereby this blob turns into a thin rope, which drops into the target cell. Meanwhile, the portion stuck to the virus unfolds as well and turns into threads that wind back around the newly formed rope, thereby pulling the virus and the target cell membrane together and ultimately fusing them. This is, mechanistically, how the virus gets inside our cells.

The triggering of the spike protein is a sensitive and adjustable process. In related viruses, the triggering is more difficult, and waits till the virus is engulfed in a vesicle that taken into the cell, and acidified in the normal process of lysosomal destruction / ingestion of outside materials. The acidification triggers these viral spike proteins to fire and release the virus into the cell. Triggering also requires cleavage of the spike protein with proteases that cut it at two locations. Other related viruses sometime wait for a target host protease to do the honors, but SARS-COV2 spike protein apparently is mostly cleaved during production by its originating host. This raises the stakes, since it can then more readily trigger, by accident, or once it finds proper ACE2 receptors on a target host. One theme of recent SARS-COV2 evolution is that triggering has become slightly easier, allowing the virus to infect higher up in the respiratory system. The original strains set up infections deep in the lung, but recent variants infect higher up, which lessens the systemic risks of infection to the host, promotes transmissibility, and speeds the infection and transmission process. 

The mutations G339D, N440K, L452R, S477N, T478K, and E484K in the spike region that binds to ACE2 (RBD, or receptor binding domain) promotes this interaction, raising transmissibility. (The nomenclature is that the number gives the position of the amino acid in the linear protein sequence, and the letters give the original version of the amino acid in one letter code (start) and in the mutated version (end)). Overall, mutations of the spike protein have increased the net charge on the spike protein significantly in the positive direction, which encourages binding to the negatively charged ACE2 protein. D614G is not in this region, but is nearby and seems to have similar effects, stabilizing the protein. The P681 mutation in one of the cleaved regions promotes proteolysis by the enzyme furin, thus making the virus more trigger-able. 

What are some other constraints on the spike protein? It needs to evade our vaccines and natural immunity, but has seemingly adapted to a here-and-gone infection style, though with periodic re-infection, like other colds. So any change is good for the purpose of camouflage, as long as its essential functions remain intact. The N-terminal, or front, domain of the spike protein, which is not involved directly in ACE2 binding, has experienced a series of mutations of this kind. An additional function it seems to have is to mimic a receptor for the cytokine interleukin 8, which attracts neutrophils and encourages activation of macrophages. Such mimicry may reduce this immune reaction, locally. 

In comparison to all these transmissibility-enhancing mutations, it is not clear yet where the mutations that decrease virulence are located. It is likely that they are more widely distributed, not in the gene encoding the spike protein. SARS-COV2 has a remarkable number of genes with various interactions with our immune systems, so the scope for tuning is prodigious. If all this can be accomplished in a couple of years, image what a million, or a billion, years can do for other organisms that, while they have slower reproduction cycles and more complicated networks of internal and external relations, still obey that great directive to adapt to their circumstances.


  • Late link, on receptor binding vs immune evasion tradeoffs.
  • Yes, chimpanzees can talk.
  • The rich are getting serious about destroying democracy.
  • Forced arbitration is, generally, unconscionable and should be illegal.
  • We could get by with fewer nuclear weapons.
  • Originalism would never allow automatic or semiautomatic weapons.

Saturday, May 14, 2022

Tangling With the Network

Molecular biology needs better modeling.

Molecular biologists think in cartoons. It takes a great deal of work to establish the simplest points, like that two identifiable proteins interact with each other, or that one phosphorylates the other, which has some sort of activating effect. So biologists have been satsified to achieve such critical identifications, and move on to other parts of the network. With 20,000 genes in humans, expressed in hundreds of cell types, regulated states and disease settings, work at this level has plenty of scope to fill years of research.

But the last few decades have brought larger scale experimentation, such as chips that can determine the levels of all proteins or mRNAs in a tissue, or the sequences of all the mRNAs expressed in a cell. And more importantly, the recognition has grown that any scientific field that claims to understand its topic needs to be able to model it, in comprehensive detail. We are not at that point in molecular biology, at all. Our experiments, even those done at large scale and with the latest technology, are in essence qualitative, not quantitative. They are also crudely interventionistic, maybe knocking out a gene entirely to see what happens in response. For a system as densely networked as the eukaryotic cell, it will take a lot more to understand and model it.

One might imagine that this is a highly detailed model of cellular responses to outside stimuli. But it is not. Some of the connections are much less important than others. Some may take hours to have the indicated effect, while others happen within seconds or less. Some labels hide vast sub-systems with their own dynamics. Important items may still be missing, or assumed into the background. Some connections may be contingent on (or even reversed by) other conditions that are not shown. This kind of cartoon is merely a suggestive gloss and far from a usable computational (or true) model of how a biological regulatory system works.


The field of biological modeling has grown communities interested in detailed modeling of metabolic networks, up to whole cells. But these remain niche activities, mostly because of a lack of data. Experiments remain steadfastly qualitative, given the difficulty of performing them at all, and the vagaries of the subjects being interrogated. So we end up with cartoons, which lack not only quantitative detail on the relative levels of each molecule, but also critical dynamics of how each relationship develops in time, whether in a time scale of seconds or milliseconds, as might be possible for phosphorylation cascades (which enable our vision, for example), or a time scale of minutes, hours, or days- the scale of changes in gene expression and longer-term developmental changes in cell fate.

These time and abundance variables are naturally critical to developing dynamic and accurate models of cellular activities. But how to get them? One approach is to work with simple systems- perhaps a bacterial cell rather than a human cell, or a stripped down minimal bacterial cell rather than the E. coli standard, or a modular metabolic sub-network. Many groups have labored for years to nail down all the parameters of such systems, work which remains only partially successful at the organismal scale.

Another approach is to assume that co-expressed genes are yoked together in expression modules, or regulated by the same upstream circuitry. This is one of the earliest forms of analysis for large scale experiments, but it ignores all the complexity of the network being observed, indeed hardly counts as modeling at all. All the activated genes are lumped together into one side, and all the down-regulated genes on the other side, perhaps filtered by biggest effect. The resulting collections are clustered by some annotation of those gene's functions, thereby helping the user infer what general cell function was being regulated in her experiment / perturbation. This could be regarded perhaps as the first step on a long road from correlation analysis of gene activities to a true modeling analysis that operates with awareness of how individual genes and their products interact throughout a network.

Another approach is to resort to a lot of fudge factors, while attempting to make a detailed model of the cell /components. Assume a stable network, and fill in all the values that could get you there, given the initial cartoon version of molecule interactions. Simple models thus become heuristic tools to hunt for missing factors that affect the system, which are then progressively filled in, hopefully by doing new experiments. Such factors could be new components, or could be unsuspected dynamics or unknown parameters of those already known. This is, incidentally, of intense interest to drug makers, whose drugs are intended to tweek just the right part of the system in order to send it to a new state- say, from cancerous back to normal, well-behaved quiescence.

A recent paper offered a version of this approach, modular response analysis (MRA). The authors use perturbation data from other labs, such as the inhibition of 1000 different genes in separately assayed cells, combined with a tentative model of the components of the network, and then deploy mathematical techniques to infer / model the dynamics of how that cellular system works in the normal case. What is observed in either case- the perturbed version, or the wild-type version- is typically a system (cell) at steady state, especially if the perturbation is something like knocking out a gene or stably expressing an inhibitor of its mRNA message. Thus, figuring out the (hidden) dynamic in between- how one stable state gets to another one after a discrete change in one or more components- is the object of this quest. Molecular biologists and geneticists have been doing this kind of thing off-the-cuff forever (with mutations, for instance, or drugs). But now we have technologies (like siRNA silencing) to do this at large scale, altering many components at will and reading off the results.

This paper extends one of the relevant mathematical methods (modular response analysis, MRA) to this large scale, and finds that, with a bit of extra data and some simplifications, it is competitive with other methods (mutual information) in creating dynamic models of cellular activities, at the scale of a thousand components, which is apparently unprecedented. At the heart of MRA are, as its name implies, modules, which break down the problem into manageable portions and allow variable amounts of detail / resolution. For their interaction model, they use a database of protein interactions, which is a reasonably comprehensive, though simplistic, place to start.

What they find is that they can assemble an effective system that handles both real and simulated data, creating quantitative networks from their inputs of gene expression changes upon inhibition of large numbers of individual components, plus a basic database of protein relationships. And they can do so at reasonable scale, though that is dependent on the ability to modularize the interaction network, which is dangerous, as it may ignore important interactions. As a state of the art molecular biology inference system, it is hardly at the point of whole cell modeling, but is definitely a few steps ahead of the cartoons we typically work with.

The authors offer this as one result of their labors. Grey nodes are proteins, colored lines (edges) are activating or inhibiting interactions. Compared to the drawing above, it is decidedly more quantitative, with strengths of interactions shown. But timing remains a mystery, as do many other details, such as the mechanisms of the interactions


  • Fiscal contraction + interest rate increase + trade deficit = recession.
  • The lies come back to roost.
  • Status of carbon removal.
  • A few notes on stuttering.
  • A pious person, on shades of abortion.
  • Discussion on the rise of China.

Saturday, February 19, 2022

DNA Mambo in the Nucleus

Some organizational principles for nuclear DNA to organize genes for local regulation.

There has been a long and productive line of research on the mechanisms of transcription from DNA to RNA- the process that reads the genome and translates its code into a running stream of instructions going out to the cell through development and all through life. This search has generally gone from the core of the process outwards to its regulatory apparatus. The opening of DNA by simple RNA polymerases was one of the first topics of study, followed by how the polymerase is positioned at the start site by "promoter" DNA sequences, with ever more ornate and distant surrounding machinery coming under scrutiny over time, as researchers climbed the evolutionary trajectory of life, from viruses and bacteria to mammals. 

But how this process fits into the larger structure of the nucleus, and how it is globally organized eukaryotes has long been an intriguing question, and tools are finally available to bring this level of organization into focus. For example, genes are known to be activated by direct contact with "enhancer" elements located thousands, even many tens of thousands, of basepairs away on the DNA- so why can't those enhancers activate other genes elsewhere in the nucleus, rather than the genes they are nearest to on the one-dimensional DNA? The nucleus is a small place with a lot of DNA. Roughly 1/100 of its physical space is taken up by DNA, and it is highly likely that such enhancers could be closer in 3-D space to other genes than the ones they are supposed to regulate, if everything were arranged randomly. Similarly, how do such enhancer elements find their proper targets, amid the welter of other DNA and proteins? A hundred thousand base pairs is long enough to traverse the entire nucleus.

So there has to be some organization, and new techniques have come along to illuminate it. These are crosslinking methods where the cells are treated with a chemical to crosslink / freeze a fraction of protein and DNA interactions in place, then enzymes are introduced to chop everything up, to various degrees of completeness. What is left are little clumps of DNA and protein that hopefully include distant cross-links, between enhancers and promoters, between key organizational sites and the genes they interact with, etc. Then comes the sequencing magic. These clumped stray DNAs are diluted and ligated together (only to local ends), amplified and sequenced, generating a slew of DNA sequences. Those hybrid sequences can be interpreted, (given the known sequence of the reference genome), to say whether some genomic location X got tangled up with some other location Y, reflecting their 3-D interaction in the cell when it was originally treated.

A recent paper pushed this method forward a bit, with finer-grained enzymatic digestion and deeper sequencing, to come up with the most detailed look ever at a drosophila genome, and at some particular genes that have long held interest as key regulators of development. This refined detail, plus some experiments mutating some of the key DNA sites involved, allowed them to come up with a new class of organizing elements and a theory of how the nuclear tangle works.

Long range contacts in the Antennapedia locus of flies. Micro-C refers to the crosslinking and sequencing method that maps long-range DNA contacts mediated by proteins. Pyramids in the top diagram map binary location-to-location contacts. Local contacts generally predominate over distant ones, but a few distant connections are visible, such as between the ends of the ftz gene. TAD stands for topologically associating domain, mapping out the connections seen above between pink sites. This line also lists the genes residing in each zone (Deformed, micro RNA 10, Sex combs reduced, fushi terazu, and Antennapedia promoters P1 and P2). The contacts track shows where the authors map specific sites where organizing factors (including Trl (trithorax-like) and CP190 (centrosomal protein of 190 kDa)) bind. The overall idea is that there are two kinds of contacts, boundaries and tethers. Boundaries insulate one region from the next, preventing regulatory spill-over to the wrong gene. Tethers serve as pro-regulatory staging points, helping enhancers contact their proper promoter targets, even though the tether complex does not itself promote RNA transcription.

Insulator elements have been recognized for some time. These are locations that seem to block regulatory interactions across them, thus defining, between two such sites, a topologically associated domain, (TAD). How they work is not entirely clear, but they may stitch themselves to the nuclear membrane. They are thought to interact with a DNA pump called cohesin to extrude a loop of DNA between two insulator sites, thereby keeping that DNA clear of other interactions, at least temporarily, and locally clumped. The authors claim to find a new element called a distal tethering element (DTE), which works like an enhancer in promoting interaction between distant activating regulatory sites and genes, but doesn't actually activate. They just structure the region so that when a signal comes, the gene is ready to be activated efficiently. 

One theory of how insulator elements work. The insulator sites "CTCF motif" are marked on the DNA with dark blue arrow heads. They control the boundaries of action by the protein complex cohesin, which forms dimeric doughnuts around DNA and can pump DNA. Cohesins are central to the mechanisms of meiosis and mitosis. The net effect is to produce a segregated region of DNA as portrayed at the bottom, which should have a much higher rate of local interactions (as seen in the Micro-C method) than distant interactions.

At the largest scale, these authors claim that there are, in the whole fly genome and at this particular (early) point in development, 2034 insulator locations (TADs) and 620 tethering elements (TEs or DTEs). They show that DTEs in the locus they study closely play an active role in turning the nearby genes on at early times in development, and in directing activation from enhancers near the DTE, rather than ones farther away. What binds to the DTEs? So-called "pioneer" regulatory factors(such as Zelda) that have the power to make way through nucleosomes and other chromatin proteins to bind their target DNA. The authors say that these tether sites, once set up, are then stable on a permanent basis, through all developmental stages, even though the genes they assist may only be active transiently. 

The "poised" nature of some genes had been observed long ago, so it is not entirely surprising to see this mechanism get fleshed out a little, as a structural connection that is made between genes and their regulatory sites in advance of the actual activator proteins arriving at the associated enhancers and turning them on.

 

Final model: the normal case around the Antennapedia locus is shown at top, with insulator sites shown in pink, and tethering sites shown in teal. If one of the tethering elements is removed (middle), then the enhancer EE has less effect on the gene Scr, whose expression is reduced. If an insulator is removed (bottom), the re-organized domain allows the ftz gene's regulators, including the enhancer AE1, to affect Scr expression, altering its timing and location of expression.


  • Don't hold your breath for capitalism to address climate change.
  • How the Russian skating machine works.
  • Russia, solved.
  • Solar tax for all! Or at least a separation of grid costs and electricity generation costs.

Saturday, January 8, 2022

Desperately Seeking Calcium

How cells regulate internal calcium levels.

Now that we are getting a crash course in molecular biology and evolution courtesy of the pandemic, many will be familiar with the intricate and dynamic activities of some proteins. The SARS spike protein doesn't just dock at a particular receptor on our pulmonary epithelial surfaces, but goes through a gymnastic routine to facilitate membrane fusion as well. Many other proteins have dynamic behaviors as well- something that was not fully appreciated back when structural biology was in its infancy and knowing anything about the structure of a protein or DNA or RNA required it to be locked into crystaline form for X-ray diffraction studies.

Another example came up recently, involving calcium regulation within cells. Calcium is a hugely important ion and regulator, central to core signaling cascades in all eukaryotic cells- to neuronal function, and to muscle activation, among many other roles. Our blood levels of calcium are tightly regulated, (to within a 20% range), mostly by way of an axis of parathyroid hormone between the parathyroid gland and the kidney, with additional effects from factors such as vitamin D, calcitonin, and estrogen. So our cells can rely on having a constant level of calcium on the outside. How do they maintain their levels internally?

One way is to have a large store socked away, as we have in bones for the body generally. Within cells, the endoplasmic reticulum (ER) turns out to have far higher concentrations of calcium than the rest of the cytosol, up to 10,000 fold. In muscle cells, the ER gets a special name- as the sarcoplasmic reticulum. Many calcium regulatory events rely on calcium being released briefly from the ER, having some effect, and then gradually getting pumped back in. But what if the ER is short of calcium? That would be a crisis!  

It turns out that we have a sensor system for that, llinking an ER protein called STIM1, which senses levels of Ca++ in the ER with a plasma membrane channel called ORAI1, which can open to let in Ca++ from the outside. A recent paper, (review), in combination with much other past work, demonstrates how STIM1 works. The two proteins turn out to interact directly, thanks to the fact that the ER, which is a huge organelle that extends all over the cell, always has some spots that interact with the plasma membrane, called membrane contact sites. These are strucured by other proteins, so there is a set distance between the two membranes, which must never fuse together. This means that while STEM1 can get very close to ORAI1 in the plasma membrane, there will still be a gap between them. How to bridge it?

Overall model for how STIM1 works. The luminal side sticks into the ER and binds calcium (red dots). If levels are low, the protein dimerizes at the transmembrane and internal domains, causing extensive refolding of the external domains residing in the cytosol. This causes them to straighten out and span the space of the contact structure between the ER and the plasma membrane, where it activates the ORAI1 calcium channel protein by direct contact.


The STIM1 protein turns out to provide the bridge, in the form of a transformer-style mechanism that shifts it from a compact blob on the ER when calcium levels are high, to an extended rod that pokes into ORAI1, activating it, when calcium levels are low. Since it is the ER-internal level of calcium that needs to be sensed, it is the ER-internal (or luminal) portion of the STIM1 that does this sensing. It has about five calcium binding sites that, if filled, prevent its dimerization, but which if empty, promote it. Internal dimerization induces a dramatic refolding of the cytoplasmic portion of STIM1 into the active, extended rod. 

These authors were faced with a situation where the full STIM1 protein was apparently impossible to crystalize, so no full structure was available. Worse, some of the prior structural studies of fragments of STIM1 conflicted with each other. So they turned to very clever method to probe structural dimensions point by point, called fluorescence (or Förster) resonance energy transfer, (FRET). If by mutation or chemical modification one installs fluorescent molecules on a protein of interest, indeed installs two different ones, one of whose absorbtion spectrum overlaps with the emission spectrum of the other, one can measure quantitatively the distance between them.

How the FRET fluorescence method works. Different fluorophores are placed on the protein of interest, here the EFSAM luminal domain of STIM1. The absorption spectrum of one (acceptor) overlaps the emission spectrum of the other fluorophore (donor). In the first graph, the green graph shows that when the two are combined on the same molecule, emission from the acceptor goes up dramatically, due to its proximity-dependent absorbance of emissions from the donor fluorophore. The second graph shows how this protein responds to calcium, by increasing interaction (absorbance-emission intensity at 620 nm, reflecting the physical distance between the fluorescence probes) as Ca++ concentration goes down.
 

By placing fluorescence probe pairs all over the external regions of STIM1, these authors were able to definitively refute one of the prior structural models, and then outline the probable sequence of events by which STIM1 opens up into its active form. The image above ably summarizes their model, by which the ORAI1-interacting domain (CC2/CC3) is stored upside-down and inside out in the inactive conformation. It is quite a proposal, all carried out by domains which are alpha helixes hinged at strategic locations and obviously highly sensitive to slight changes in the structure, induced by the dimerization outlined above, in low calcium conditions.

Finally, they investigated a mutation which in humans causes Stormorken syndrome, a wide-ranging set of deficiencies including bleeding, dyslexia, muscle weakness, and hypocalcemia. In molecular terms it is a "gain of function" mutation. It weakens the interactions that keep STIM1 closed during high calcium conditions, so promotes its stimulation of ORAI1 and excess uptake by cells all over the body. The mutation changes argenine at position 304 in STIM1 to tryptophan, which has much different characteristics. It is genetically dominant, meaning that a single allele, combined with a wild-type allele on the other chromosome, gives the syndrome. Thus it is a powerful mutation, tweeking the sensitivity of this system just enough to screw up a lot of physiology. Deletions of this gene are not lethal, however, in part because there is also a STIM2 gene that encodes a similar function.

Analysis of the effect of the Stormorken mutation (R304W) on the physical proximities and overall shape of the STIM1 protein. The FRET graphs track different probe pairs that were placed all over the cytosolic (folding) portion of STIM1. In these graphs, degree of FRET relative frequency shift/communication is on the X axis, while photon counts are on the Y axis. They show noticeable shifts in distances, reflected in the structural model. The mutation significantly loosens up the high-calcium folded state, inducing more Ca++  influx when it is not needed.

So, we are just full of little machines, developed and refined over the billions of years in the ongoing race to live a little better, keep things humming, and to defend ourselves against all the other machines, such as parasitic viruses.


Saturday, December 18, 2021

The RNAs Shall Protect Us

The humble skin mole has at least one oncogenic mutation. But it is not cancer- why not?

We know that mutations cause cancer. But we also know that it takes multiple mutations, not just one, in virtually all cases. This is one reason why age is such a strong risk factor, providing the time to accumulate multiple "hits". One place where this is particularly apparent is the skin. Most people have moles (nevi) and other imperfections, which are no cause for alarm. We are also on the lookout for the unusual signs and forms that indicate melanoma- which truly is a cause for alarm. Moles typically have one of the key oncogenic mutations for melanoma, however: BRAF V600E (which means the 600th amino acid in its protein chain has been changed from valine to glutamic acid). So what is behind the difference? What systems do cells and organs have to keep this train on the tracks, despite a wheel or two coming off?

A recent paper (review) explored this issue, and tells a complicated technical and scientific story. But the bottom line is that certain miRNAs- a novel form a gene regulator discovered just in the last couple of decades- form a firewall against further proliferation. The BRAF mutation is an activating change, which disrupts the normal "off" state of this protein kinase. BRAF is a protein kinase that attaches phosphate groups to serines and threonines on other proteins. And some those other proteins are specifically other (MAP) protein kinases that form cascades promoting cell proliferation and differentiation. In the case of melanocytes in the skin, the BRAF mutation promotes just that: proliferation, mole formation, and, in some cases, progression to full blown melanoma. 

What is a skin mole? Well, it clearly is composed of lots of cells, so whatever is arresting the mutant BRAF-activated proliferation is taking its sweet time. Proliferation goes for a while, but then stops for an unknown reason. It had been thought in the field (and by these researchers as well) that mole cells had gone into senescence- an irreversible division arrest that is frequently activated in cancer cells and is similar to age-dependent cell cycle arrest. But they show now that senescence is not the explanation. If the BRAF mutation state is reversed, the cells resume dividing. And they also have other hallmarks of a different form of (G2/M) cell division arrest. So something more dynamic is going on.

They do a few technical tours de force of modern DNA sequencing and large-scale molecular biology to find what unusual genes are being expressed in these cells, and find two:  MIR211-5p and MIR328-3p. These are miRNAs, which are short RNA pieces that repress the expression of other genes. We have thousands of them, and each can repress hundreds of other genes, forming a somewhat crazy interdigitated regulatory network. They evolved from an immune function of repressing the expression of viruses and other foreign DNA, but have been repurposed to have broad regulatory effects, often in development and disease.

In BRAF-activated skin mole cells, these miRNAs have one effective target, which is AURKB (Aurora B kinase), another protein kinase that is needed for cell division. No AURKB, no cell division. Indeed, skin mole cells have a high rate of cells stuck in the last phase of cell division, with 4 genome equivalents. They found that AURKB has low expression in skin mole cells, but high expression, as expected, in melanoma cells, while the miRNAs had the reverse pattern. And tellingly, artificial inhibition of these miRNAs released mole cells from their proliferation arrest and allowed the BRAF mutation to have its way with them.

Model of this paper's findings about melanocytes. Starting with stem-like melanocytes, mutated BRAF can cause oncogenenic or pre-oncogenic proliferation. Separately, TPA, or some local tissue factor like TPA, can encourage stem melanocytes to grow and differentiate properly into mature melanocytes. But those same activators (TPA and its natural analog) increase miRNA expression of particularly MIR211-5p, which (by inhibiting AURKB) arrests growth as part of the differentiation program, and also shuts down proliferation caused by mutated BRAF, (at late mitosis / G2 arrest), at least most of the time.

But there was still a problem- what activates the miRNA gene expression in the natural setting? It isn't the mutated BRAF protein, since it routinely drives cells through several replication cycles to form moles, and didn't have any regulatory effect on the miRNAs. The researchers focused on the kinds of local secreted hormones, like endothelin, that might locally inhibit overgrowth of cells, and logically lead to a mole-like pattern. What they hit on was TPA, an artificial analog of diacylglycerol, which is an activator of yet another protein kinase, PKC. TPA is paradoxically a tumor promoter, and is routinely used in cell culture systems to goose the proliferation of melanocytes. But for the mutated BRAF- driven cells from moles, TPA arrests their growth, and it does so because PKC activates the expression of MIR211-5p. They showed that taking TPA out of their cell culture mixes dramatically restarted the growth of mole-derived and other BRAF mutation-driven cells. So this closes the circle in some degree, explaining how it is that skin moles form as sort of arrested mini-cancers.

Unfortunately, TPA is not a natural chemical, and diacylglycerol is not hormone, though many hormones, such as thyroid hormone and oxytocin, do affect PKC activity. So the natural PKC and miRNA activator, and inhibitor of excess proliferation in these BRAF mutation-driven melanocytes remains unknown. I am sure that this research group will be hunting diligently for it, since it is an extremely interesting issue not just in oncology, but in skin and tissue development generally.


Saturday, December 4, 2021

Supergroups in Search of Their Roots

The early stages of eukaryotic evolution are proving hard to reconstruct.

There is normal evolution, and then there are great evolutionary transitions. Not to say that the latter don't obey the principles of normal evolution, but they go by so fast, and render so many transitional forms obsolete along the way, that there is little record left of what happened. Among those great transitions are the origin of life itself, the origin of humans, and the origin of eukaryotes. We are slowly piecing together human evolution, from the exceedingly rare fossils of intermediate forms and branch off-shoots. But looking at the current world, we are the lone hominin, having displaced or killed off all competitors and predecessors to stand alone atop the lineage of primates, and over the biosphere generally. Human evolution didn't violate any natural laws, but it seems to have operated under uniquely directional selection, especially for intelligence and social sophistication, which led to a sort of arms race of rapid evolution that laid the groundwork for an exponential rate in the invention of technologies and collective social forms over the last million years.

Similarly, it is clear that however the origin of life started out, it was a very humble affair, with each innovation quickly displacing its progenitors, just as the early cell phones came out in quick succession, until a technological plateau was reached from which further development was / is less obvious. While the origin and success of eukaryotes did not erase the prokaryotic kingdoms from which they sprang, it does seem to have erased the early stages of its own development, to the point that those stages are very hard to reconstruct, especially given the revolutionary and multifarious nature of their innovations.

Eukaryotes differ from prokaryotes in possessing: nuclei and a nuclear membrane with specialized pores; mitochondria descended from a separate bacterial ancestor (and photosynthetic plastids descended from yet other bacterial ancestors in some cases); sex and meiosis; greater size by several orders of magnitude; phagocytosis by amoeboid cells; internal membrane organelles like golgi, peroxisomes, lysosomes, endocytic and exocytic vesicles; cyclins that run the cell cycle; microtubules that participate in the cell cycle, cytoskeleton, and cilia; cilia, as distinct from flagella; an active actin-based cytoskeleton, with novel motor proteins; a greatly elaborated transcriptional apparatus with modular enhancers and novel classes of transcription regulators; histones; mRNA splicing and introns; nucleolus and small nucleolar RNAs; telomeres on linear chromosomes; a significant increment in the size of both ribosomal subunits. Indeed, the closer one looks at the molecular landscape, the more differences accumulate. This was quite simply a quantum leap in cellular organization, which happened sometime between 1.8 and 3 billion years ago. Indeed, eukaryotes are not just the McMansions of the microbial world, but the Downton Abbeys- with dutiful servants and complex and luxurious internal economies that prokaryotic cells couldn't conceive of.

Major lineages of eukaryotes are traced back to their origins in a molecular-based phylogeny. Animals (and fungi!) are in the Opisthokonta, plants in the Chloroplastida. So many groups connect right to the "root" of this tree that there is little way to figure out which came first. Also, the dashed lines indicate uncertainty about those orderings/rootings as well, which leaves a great deal of early eukaryotic evolution obscure. Some abbreviations / links are- CRuMs: collodictyonids (syn. diphylleids) + rigifilida + mantamonas; excavates, hemimastigophora, haptista, TSAR:  telonemids, stramenopiles, alveolates, and rhizaria.


A recent paper recounts the current phylogenetic state of affairs, and a variety of other papers over the last decade delve into the many questions surrounding eukaryotic origins. While molecular phylogenies have improved tremendously with the advent of faster, whole-genome sequencing and the continued collection of obscure single-celled eukaryotes, (aka protists), the latest phylogeny, as shown above, remains inconclusive. The deepest root is both uncertain with regard to its bacterial progenitor, and to which current eukaryotes bear the closest relation. There are occasional fossil kelps, algae, and other biochemical traces back to 2.0 to 2.7 billion years, (though some do not put the origin earlier than 1.8 billion years) but these have not been able to shed any light on the order of events either.

Nevertheless, the field can agree on a few ideas. One is that the assimilation of mitochondria (whether willing or unwilling) is perhaps the dominant event in the sequence. That doesn't mean it was necessarily the first event, but means that it created a variety of conditions that led to a cascade of other consequences and features. The energy mitochondria provided enabled large cell sizes and the accumulation of a whole new household full of junk, like lipids in several new membrane compartments. The genome that they contributed brought in thousands of new genes, including introns. 

Secondly, the loss of cell walls and the adoption of amoeboid carnivory is likely one of the first events in the evolutionary sequence. Shedding the obligatory cell wall that all bacteria have necessitates a cytoskeleton of some kind, and it is also conducive to the engulfment of the proto-mitochondrion. For while complicated co-symbiotic metabolic arguments have been devised to explain why these two cells may have engaged in a long-term mutual relationship long before their ultimate consumation, the most convenient hypothesis for assimilation remains the simplest- that one engulfed the other, in a meal that lasted well over a billion years.

Thirdly, the question of what the progenitor cell was has been refined somewhat. One of the most intriguing findings of the last half-century of biology was the discovery of archaebacteria (also called archaea)- a whole new kingdom of bacteria characterized by their tendency to occupy extreme habitats, their clear separation from bacteria by chemical and genetic criteria, and also their close relationship to eukaryotes, especially what is presumed to be the original host genome. Many proposals have been made, (including that archaea are the original cell, preceding other bacteria), but the best one currently is that archaea split from the rest of bacteria rather late, after which eukaryotes split off from archaea, thus making the latter two sister groups. This explains the many common traits they share, while allowing significant divergence, plus the incorporation of many bacterial features into eukaryotes, either through the original lineage, or by later transfer from the proto-mitochondrion. So here at last is one lineage that survived out of the gradual development of eukaryotes- the archaea, though one wouldn't guess it from looking at them. It took analysis at the molecular level to even know that archaea existed, let alone that they are the last extant eukaryotic sister group.

comically overstuffed figure from an argument for the late development of archaebacteria out of pre-existing bacteria (prokaryotes), with subsequent split and diversification of eukaryotes out of a proto-archaeal lineage. Many key molecular and physiological characters are mentioned.

Lastly, surveying the various outlying protist lineages for clues about which might hearken back to primitive eukaryotic forms, one research group suggests that the collodictyonids might fit the bill. Being an ancient lineage means that it is lonesome, without a large family of evolutionary development to show diversification and change. It also means that in molecular terms, it is highly distinct, branching deeply from all other groups. Whether that all means that it resembles an ancient / early form of the eukaryotic cell, or went its own way on a unique evolutionary trajectory, is difficult to say. For each trait, (including sequence traits), a phylogenetic analysis is done to figure out whether it is differential- shared with some other lineages but not all- whether those without the trait lost it at some later point, or whether it was gained by a sub-group. After analyzing enough such traits, one can make a statement about the overall picture, and thus the "ancient-ness", of an organism.

Is anything special about collodictyon? Not really. It is predatory, and has four flagella and a feeding groove, which functions as a sort of mouth. It can make pseudopods, has normal microtubule organizing centers for its flagella, and generally all the accoutrements of a eukaryotic cell. It lacks nothing, and thus may be an early branching eukaryote, but is not in any way a transitional form.

An unassuming protist (collodictyon) as possible representative of early eukaryotes. Its cilia are numbered.


At this point, we are left still peering darkly into the past, though obscure living protists and their molecular fossils, trying to figure out what happened when they split from the bacteria and archaea. A tremendous amount happened, but little record survives of the path along the way. That tends to be characteristic of the most momentous evolutionary events, which cause internal and external cataclysms, (including the opening of whole new lifestyles to exploit), that necessitate a rapid dynamic of further adaptation before their descendents achieve a stable and successful state sufficient to ride out the ensuing billion or more years ... before we come on the scene with the ability and interest to contemplate what went before.


  • Red regions have three times the death rates from Covid as blue regions. Will that change electoral math?
  • Annals of secession, cont.
  • Sad spectacle at the court.
  • Analysis of how the energy transition might go. Again, a carbon tax would help.