Showing posts with label cell biology. Show all posts
Showing posts with label cell biology. Show all posts

Saturday, April 6, 2024

Mopping up Around the Cell

What happens when proteins can't find their partners?

Cells have a lot of garbage disposal issues. There are lysosomes to digest large things like viruses, proteasomes to dispose of individual proteins, and lots of surveillance mechanisms to check that things are going as they should- that proteins coming off the ribosome are complete, that mRNAs are being spliced, that mitochondria are charged up as they should be, that the endoplasmic reticulum is making, folding, and secreting proteins as it should be, among many others. One basic problem that arises when cells have a lot of proteins that assemble and cooperate in the form of complexes, is that some of those subunits may be present in excess, or not join their intended complexes for other reasons such as misfolding. This can have very bad effects. Most protein binding makes use of hydrophobic surfaces, and having these floating around freely can lead to indiscriminate binding / agglomeration, like amyloid plaque formation, and cell death.

Bacteria have one partial solution, which is to encode proteins that are destined to the same complex from the same mRNA, made from what is called an "operon" of genes, like a train with successive gene-carriages. Each multi-protein-encoding message from such an operon is thus necessarily equally abundant, and, assuming simiar ribosomal rates of protein synthesis, the proteins should also be produced in equal quantities, providing at least one method to balance their abundance in the cell. But there are many other issues- proteins may have different life-spans, or different ribosomal production rates, or assembly into the complex may be slow and difficult, so bacteria still are not out of the woods. Eukaryotes do not use operons anyhow, so our more-finely regulated gene control mechanisms are called on to properly equalize (or adjust for) the ultimate subunit concentrations. 

But when all this fails, and there is more of some complex subunit than needed, what happens then? When experimenters over-produce some complex component in cells, it is typically short-lived. And if they impair its production, the rest of the complex tends to be short-lived. This implies mechanisms in the cell to dispose of incomplete complexes and their components. It turns out that there are some specific chaperone proteins that detect such orphan subunits, and tag them to be destroyed. Several prominent complexes, such as ribosomes and proteasomes, even have specifically dedicated mop-up chaperones. A recent paper described a chaperone protein dedicated to mopping up the excess or misfolded subunits of another large and abundant complex - the chaperonin complex. That makes this protein, ZNRD2, a sort of metachaperone.

Some structural (though not dynamic) views of the CCT complex. A shows top and side views, respectively. C shows a layout of how the equator of the complex looks, as coded by each of the subunits. At the ring-ring interfaces are the ATP binding sites (d). And lastly (e) a cut-away view of the inside show where substrate proteins are enclosed and encouraged to fold correctly.

The chaperonin complex, (also called CCT), is a large, hollow sphere that actively helps other proteins to fold correctly. The structural proteins actin and tubulin are the most prominent targets that need this help. When first synthesized, they are bound by adapters that ferry them to the chaperonin complex, which lifts its lid to allow the protein in. Then, ATP is used to induce dramatic cycling of the chaperonin structure, shifting from an internal hydrophobic structure to a more hydrophilic one. This allows the unfolded protein to alternately splay open over the hydrophobic surface, and then fold in piece-wise fashion, for as long as it takes till the barrel detects that it is fully folded and no longer sticking to the hydrophobic internal surfaces.

In the current work, the researchers drove the expression of several individual CCT subunits in cell lysates. Then they sent the products into a mass spectrometer to find out what was sticking to these "orphan" proteins. They found two major associated proteins, HERC2, and ZNRD2. HERC2 is known as a ubiquitin ligase, which is one of a large family of enzymes that tag proteins with ubiquitin, targeting them for disposal. But ZNRD2 was totally uncharacterized, known only as an auto-antigen reacted to by some people with Sjogren's syndrome or scleroderma. The question then was .. does HERC2 directly sense the presence of free-floating CCT subunits, or does it need a helper to do so, such as perhaps ZNRD2?

"... a sizable population of multiple CCT subunits are orphaned even under normal conditions, and the degradation of a subset of these can be stimulated by HERC2."

The researchers showed that deleting HERC2 strongly impaired the cleanup of most orphan CCT subunits. It is evident, however, that there are other chaperones not covered in this work that help clean up some of the other CCT subunits. Then they found that HERC2 interaction with the CCT proteins was dependent on ZNRD2, but that the reverse was not the case- ZNRD2 binds CCT subunits in any case. This, and other experiments, including mapping the location within the HERC2 protein that binds ZNRD2, showed that ZNRD2 is the adapter that does the detailed detection of orphaned CCT subunits. At only 199 amino acids, there is not much to it, and searches for domain signatures do not yield much. Its name reflects a structure that uses zinc ions for stabilization, but much of the protein is also disordered. It is notable for a high proportion of hydrophobic amino acids (alanine, leucine) and lots of prolines (15), which would contribute to a disordered structure. 

Thankfully, with the advent of AI and alpha-fold, these researchers could also investigate and model how ZNRD2 interacts with both the HERC2 ubiquitin ligase and with one of the CCT subunits, CCT4- all without doing any laborious structure determinations.


AI-calculated structures of the complex of the ubiquitin ligase HERC2 with the adaptor ZNRD2 and the target subunit CCT4. At right, the hydrophobic residues of CCT4 are colored yellow, showing that the ZNRD2 orphan subunit detector and adaptor binds to a hydrophobic pocket which would otherwise be completely buried with the full CCT structure. The interacting domain of HERC4 in green is termed a 7-bladed beta propeller.

"In the fully assembled CCT double ring, all potential ZNRD2 interaction sites are completely buried because they form the interface between the two individual rings."

 

They found that ZNRD2 binds to a hydrophobic pocket of CCT4, a pocket that is otherwise buried in the fully assembled CCT. This patch would also be exposed on partially assembled CCT complexes, indicating that this interaction is not only relevant for mopping up the individual subunit, but for several kinds of incomplete assembly of the entire complex, perhaps explaining why other subunits are also mopped up by this system. 

This kind of work is a good example of normal science. A gene about which nothing was previously known (ZNDR2) is now given a function in the cell, and a process circumstantially known to exist is fleshed out with actors and structures that explain it. Of the ~20,000 human protein-coding genes, roughly ten percent still have no annotation, and many more have only tenuous annotation, perhaps only drawn from structural analogy, not direct study. So there is a great deal more work needed to evaluate our parts list, even on the most basic level, even before getting into the complexities of how these proteins act and interact in tissues and pathways. 


  • What are the hippos thinking?
  • Vodka is apparently a thing.
  • Just how low is this grift going?
  • Who gets to reproduce, and who gets killed? Population control at the heart of the Jewish state.
  • Genetics and parenting.
  • No, absolutely not.. this can not be true.

Sunday, March 31, 2024

Nominee for Most Amazing Protein: RAD51

On the repair and resurrection of DNA, which gets a lot of help from a family of proteins including RAD51, DMC1, and RecA.

Proteins do all sorts of amazing things, from composing pores that can select a single kind of ion- even just a proton- to allow across a membrane, to massive polymerizing enzymes that synthesize other proteins, DNA, and RNA. There is really no end to it. But one of the most amazing, even incredible, things that happens in a cell is the hunt for DNA homology. Even over a genome of billions of base pairs, it is possible for one DNA segment to find the single other DNA segment that matches it. This hunt is executed for several reasons. One is to line up the homologous chromosomes at meiosis, and carry out the genetic cross-overs between them (when they are lined up precisely) that help scramble our genetic lineages for optimal mix-and-matching during reproduction. Another is for DNA repair, which is best done with a good copy for reference, especially when a full double-strand break has happened. Just this week, a fascinating article showed that memories in our brains depend in some weird way on DNA breaks occurring in neurons, some of which then use the homologous repair process, including homology search, to patch things up.

The protein that facilitates this DNA homology search is deeply conserved in evolution. It is called RecA in bacteria, radA and radB in archaea, and the RAD51 family in eukaryotes. Naturally, the eukaryotic family is most closely related to the archaeal versions (RAD51 and DMC1 evolving from radA, and a series of other, and poorly understood family members, from radB). In this post, I will mostly just call them all RAD51, unless I am referring to DMC1 specifically. The name comes from genetic screens for radiation-sensitive mutants in human and other eukaryotes, since RAD51 plays a crucial role in DNA repair, as noted above. RAD51 is not a huge protein, but it is an ATPase. It binds to itself, forming linear filaments with ATP at the junction points between units. It binds to a single strand of DNA, which is going to be what does the hunting. And it binds, in a complicated way, to another double-stranded DNA, which it helps to open briefly to allow its quality as a target to be evaluated. 

This diagram describes the repair of double strand breaks (DSB) in DNA. First the ends are covered with a bunch of proteins that signal far and wide that something terrible has happened- the cell cycle has to stop.. fire engines need to be called. One of these proteins is RPA, which simply binds all over single-stranded DNA and protects it. Then the RAD51 protein comes in, displaces RPA, and begins the homology search process. The second DNA shown, in dark black, doesn't just happen, but is hunted for high and low throughout the nucleus to find the exact homolog of the broken end. When that exact match is found, the repair process can proceed, with continued DNA synthesis through the lesion, and resolution of the newly repaired double strands, either to copy up the homolog version, or exchange versions (GC, for gene conversion). 

This diagram shows how the notorious (when mutated) oncogene BRCA2 (in green) works. It binds RAD51 (in blue) and brings it, chain-gang style, to the breakpoints of DNA damage to speed up and specify repair.


There have been several structural studies by this point that clarify how RAD51 does its thing. ATP is simply required to form filaments on single-stranded DNA. When a match has been found and RAD51 is no longer needed, ATP is cleaved, and RAD51 falls off, back to reserve status. The magic starts with how RAD51 binds the single stranded DNA. One RAD51 binds for every ~3 bases in the DNA, and the it binds the phosphate backbone, so that the bases are nicely exposed in front, and all stretched out, ready to hunt for matching DNA.

A series of RAD51 molecules (in this case, RecA from bacteria) bound sequentially to single-stranded DNA (red). Note the ATP homolog chemicals in yellow, positioned between each protein unit. One can see that the DNA is stretched out a bit and the bases point outwards.

A closeup view of one of the RAD51 units from above, showing how the bases of the DNA (yellow) are splayed out into the medium, ready to find their partners. They are arranged in orientations similar to how they sit in normal (B-form) DNA, further enhancing their ability to find partners.

The second, and more mysterious part of the operation is how RAD51 scans double-stranded DNA throughout the genome. It has binding sites for double-stranded DNA, away from the single-stranded DNA, and then it also has a little finger that splits open the double-stranded DNA, encouraging separation and allowing one strand to face up to the single stranded DNA that is held firmly by the RAD51 polymer. The transient search happens in eight-base increments, with tighter capture of the double-strand DNA happening when nine bases are matched, and committment to recombination or repair happening when a match of fifteen bases is found.  

These structures show an intermediate where a double-stranded DNA (ends in teal and lavender, and separated DNA segments in green and red) has been captured, making a twelve base match with the stable single-stranded DNA (brown). Note how the double-stranded DNA ends are held by outside portions of the RAD51 protein. Closeup on the right shows the dangling, non-paired DNA strand in red, and the newly matched duplex DNA with green-brown colored base interactions.

These structures can only give a hint of what is going on, since the whole process relies so clearly on the brownian motion that allows super-rapid diffusion of the stablized single-strand DNA+RAD51 over the genome, which it scans efficiently in one-dimensional fashion, despite all the chromatin and other proteins parked all over the place. And while the structures provide insight into how the process happens, it remains incredible that this search can happen, on what is clearly a quite reliable basis, day and day out, as our genomes get hit by whatever the environment throws at us.

"Unfortunately, most RAD51 and RAD51 paralog point mutations that have been clinically identified are classified as variants of unknown significance (VUSs). Future studies to reclassify these RAD51 gene family VUSs as pathogenic or benign are desperately needed, as many of these genes are now included on hereditary breast and ovarian cancer screening panels. Reclassification of HR-deficient VUSs would enable these patients to benefit from therapies that specifically target HR deficiency, as do poly(ADP)-ribose polymerase (PARP) inhibitors in BRCA1/2-deficient cells."

Lastly, one paper made the point that clinicians need better understanding of the various mutations that can affect RAD51 itself. Genetic testing now is able to find all of our mutations, but we don't always know what each mutation is capable of doing. Thus deeper studies of RAD51 will have beneficial effects on clinical diagnosis, when particular mutations can be assigned as disease-causing, thus justifying specific therapies that would otherwise not be attempted.


Saturday, March 9, 2024

Getting Cancer Cells to Shoot Themselves

New chemicals that make novel linkages among cellular components can be powerful drugs.

One theme that has become common in molecular biology over the years is the prevalence of proteins whose only job is to bring other proteins together. Many proteins lack any of the usual jazzy functions, like catalytic enzyme, or ion channel, or signaling kinase, but just serve as "conveners", bringing other proteins together. Typically they are regulated in some way, by phosphorylation, expression, or localization, and some of these proteins serve as key "scaffolds" for the activation of some process, like G-protein activation, or cell cycle control, or cell growth. 

Well, the drug industry has caught on, and is starting to think about chemicals that can do similar things, resulting in occasionally powerful results. Conventional drug design has aimed to bind to whatever protein is responsible for some ill, and inhibit it. Such as an oncogene, or an over-active component of the immune system. This has led to many great drugs, but has significant limitations. The chemical has to bind not just anywhere on the target, but at the particular spot (the active site) that is its business end, where its action happens. And it has to bind really well, since binding and inhibiting only half the target proteins in a cell (or the body) will typically only have a modest effect. These requirements are quite stringent and result in many protein targets being deemed difficult to drug, or "undruggable".

A paradigm for a new kind of chemical drug, which links two functions, is the PROTAC class, which combines binding with a target on one end, with another end that binds to the cell's protein destruction machinery, thereby not just inhibiting the target, but destroying it. A new paper describes an even more nuclear option along this line of drug development, linking an oncogene with a second part that activates the cellular suicide machinery. One can imagine that this approach can have far more dramatic effects.

These researchers synthesize and demonstrate a chemical that binds on one end the oncogene BCL6, mutations of which can cause B cell lymphoma. This gene is a transcription repressor, and orchestrates the development of particular immunologic T cells called T follicular helper cells. One of its roles is to prevent the suicide of these cells when an antigen is present, which is when the cells are most needed. If over-expressed in cancer, these cells think they really need to protect the body and proliferate wildly.

The other end of this chemical, called TCIP1, binds to BRD4, which is another transcription regulator, but this one activates the cell suicide genes, instead of turning them off. Both ends of this molecule were based on previously known structures. The innovation was solely in linking them together. I should say parenthetically that BRD4 is itself recognized as an oncogene, as it can promote cell growth and prevent cell suicide in many settings. So it has ambivalent roles, (inviting a lot of vague writing), and it is somewhat curious that these researchers focused on BRD4 as an apoptosis driver.

"TCIP1 kills diffuse large B cell lymphoma cell lines, including chemotherapy-resistant, TP53-mutant lines, at EC50 of 1–10 nM in 72 h" 
Here EC50 means the effective concentration where the effect is 50% of maximal. This value of 1.3 nano molar is a very low concentration for a drug, meaning it is highly effective. TP53 is another cancer-driving mutation, common in treatment-resistant cancers. The drug has a characteristic and curious dosage behavior, as its effect decreases at higher concentrations. This is because each individual end of the molecule starts to bind and saturate targets independently, reducing the rate of linkage between the two target proteins, and thus the intended effect.

Chemical structure of TCIP1. The left side binds to BRD4, a regulator of cell suicide, while the right side binds to BCL6, an oncogene.

The authors did numerous controls with related chemicals, and tracked genes that were targeted by the novel chemical, all to show that the dramatic effects they were seeing were specifically caused by the linkage of the two chemical functions. Indeed, BCL6 represses its own transcription in the natural course of affairs, and the new drug reverses this behavior as well, inducing more of its own synthesis, which now potentiates the drug's lethal effect. While the authors did not show effectiveness in animals, they did show that TCIP1 is not toxic in mice. Neither did they show that TCIP1 is orally available, but administered it by injection. But even by this mode, it would, if effective, be a very exciting therapy. Not surprisingly, the authors report a long series of biotech industry ties (rooted at Stanford) and indicate that this technology is under license for drug development.

This approach is highly promising, and a significant advance in the field. It should allow increased flexibility in targeting all kinds of proteins that may or not cause disease, but are specific to or over-expressed in disease states, in order to address those diseases. It will allow increased flexibility in targeting apoptosis (cell suicide) pathways through numerous entry points, to have the same ultimate (and highly effective) therapeutic endpoint. It allows drugs to work at low concentrations, not needing to fully occupy or inhibit their targets. Many possible areas of therapy can be envisioned, but one is aging. By targeting and killing senescent cells, which are notorious for promoting aging, significant increases in lifespan and health are conceivable. 


  • Biden is doing an excellent job.
  • Annals of mental decline.
  • Maybe it is an anti-addiction drug.
  • One gene that really did the trick.
  • A winning issue.
  • It is hard to say yet whether nuclear power is a climate solution, or an expensive distraction.

Saturday, March 2, 2024

Ions: A Family Saga

The human genome encodes hundreds of proteins that ferry ions across membranes. How did they get here? How do they work?

As macroscopic beings, we generally think we are composed of tissues like bones, skin, hair, organs. But this modest apparent complexity sits atop a much greater and deeper molecular diversity- of molecules encoded from our genes, and of the chemistry of life. Management of cellular biochemistry requires strict and dynamic control of all its constituents- the many ions and myriad organic molecules that we rely on for energy, defense, and growth. One avenue is careful control across the cellular membrane, setting up persistent differences between inside and outside that define the living cell- one may say life itself. Typical cells have higher levels of potassium inside, and higher levels of sodium and chloride outside, for example. Calcium, for another example, is used commonly for signaling, and is kept at low concentrations in the cytoplasm, while being concentrated in some organelles (such as the sarcoplasmic reticulum in muscle cells) and outside. 

All this is done by a fleet of ion channels, pumps, and other solute carriers, encoded in the genome. We have genes for about 1,555 molecule transporters. Out of a genome of about 20,000 genes, this represents a huge concentration(!) of resources. One family alone, the solute carrier (SLC) family, has 440 members. Many of these are passive channels, which just let their selected cargo through. But many are also co-transporters, which harness the transport of one ion with that of another which may have an actively pumped gradient across the membrane and thus provide an indirect energy source for transfer of the first ion. The SLC family includes channels for glucose, amino acids, neurotransmitters, chloride, cotransport (or anti-transport) of sodium with glucose, calcium, neurotransmitters, hydrogen, and phosphate. Also, metals like zinc, iron, copper, magnesium, molybdate, nucleotides, steroids, drugs/toxins, cholesterol, bile, folate, fatty acids, peptides, sulfate, carbonate, and many others. 

It is clear that these proteins did not just appear out of nowhere. The "intelligent" design people recognize that much, that complex structures, which these are, must have some antecedent process of origination- some explanation, in short. Biologists call the SLC proteins a family because they share clear sequence similarity, which derives, by evolutionary theory, and by the observed diversification of genes and the organisms encoding them over time, from duplication and diversification. This, sadly, is where the "intelligent" design proponents part ways in logic, maintaining perhaps the most pathetic (and pedantic) bit of hooey ever devised by the dogmatic believer: "specified information", which apparently forbids the replication of information.

However, information replicates all the time, thanks to copious inputs of energy from the sun, and the advent of life, which can transform energy into profusions of reproduced/replicated organisms, including replication of all their constituent parts. For our purposes, one side effect of all this replication is error, which can cause unintended replication/duplication of individual genes, which can then diverge in function to provide the species with new vistas of, in this case, ionic regulation. In yeast cells, there are maybe a hundred SLC genes, and fewer in bacteria. So it is apparent that the road to where we are has been a very long one, taking billions of years. Gene duplication is a rare event, and each new birth a painful, experimental project. But a family with so many members shows the fecundity of life, and the critical (that is, naturally selected) importance of these transporters in their diverse roles throughout the body.

A few of the relatives in the SLC26A family, given in one-letter protein sequence from small sections of the much larger protein, around the core ion binding site. You can see that they are, in this alignment, very similar, clearly being in the same family. You can also see that SLC26A9 has "V" in a position in alpha helix 10, which in all other members is a quite basic amino acid like lysine ("K") or arginine ("R"). The authors argue that this difference is one key to the functional differences between it and SLC26A6.

A recent paper showed structures for two SLC family members, which each transport chloride ion, but differ in that one exchanges chloride for bicarbonate, while the other allows chloride through without a matched exchange (though see here). SLC26A9 is expressed in the gut and lung, and apparently helps manage fluid levels by allowing chloride permeability. It is of interest to those with cystic fibrosis, because the gene responsible for that disorder, CFTR, is another transporter, (of the ABC family), and plays a major role doing a similar thing in the same places- exchanging chloride and bicarbonate, which helps manage the pH and fluidity of our mucus in the lung and other organs. SLC26A9, having a related role and location, might be able to fill some of the gap if drugs could be found to increase its expression or activity.

SLC26A6 is expressed in the kidney, pancreas, and gut, and in addition to exchanging bicarbonate for chloride, can also exchange oxalate, which prevents kidney stones. Very little, really, is known about how all these ion transporters are expressed and regulated, what differentiates them, how they relate to each other, and what prompted their divergence through evolution. We are really just in the identification and gross characterization stage. The new paper focuses on the structural mechanisms that differentiate these two particular SLC family members.

Structure of two SLC transporters, each dimeric, and superimposed. The upper parts are set in the membrane, with the lower parts in the cytoplasm. The upper parts combine two domains for each monomer, the "core" and "gate" domains. The channel for the anion threads within the center of each upper part, between these two domains. Note how structurally similar the two family members are, one in green+gray, the other in red+blue.


Schemes of how SLC26A6 works. The gate domain (purple) is stable, while the core domain (green) rocks to provide access from the ion binding site to either outside or inside the cell.

Like any proper ion channel, SLC26A6 sits in the membrane and provides a place for its ion to transiently bind (for careful selection of the right kind of ion) and then to go through. There is a central binding site that is lined specially with a few semi-positively charged amino acids like asparagine (N), glutamine (Q) and arginine (R), which provide an attactive electronic environment for anions like Cl-. The authors describe a probable mechanism of action, (above), whereby the core domain rocks up and down to allow the ion to pass through, after being very sensitively bound and verified. This rocking is not driven by ATP or other outside power, but just by brownian motion, as gated by the ion binding and unbinding steps.

Drilling a little closer into the target ion binding site of SLC26A6. On right is shown Cl- in green, center, with a few of the amino acids that coordinate its specific, but transient, binding in the core domain pocket. 


They draw contrasts between these very closely related channels, in that the binding pocket is deeper and narrower in SLC26A9, allowing the smaller Cl- to bind while not allowing HCO3- to bind as well. There are also numerous differences in the structure of the core protein around the channel that they argue allow coupling of HCO3- transport (to Cl- transport in the other direction) in SLC26A6, while SLC26A9 is uncoupled. One presumes that the form of the ion site can be subtly altered at each end of the rocking motion, so that the preferred ion is bound at each end of the cycle.

While all this work is splitting fine hairs, these are hairs presented to us by evolution. It is evolution that duplicated the precursors to these genes, then retained them while each, over time, developed its fine-tuned differences, including different activities and distinct tissue expression. Indeed, the fully competent, bicarbonate exchanging, SLC26A6 is far more widely expressed, suggesting that SLC26A9 has a more specialized role in the body. To reiterate a point made many times before- having the whole human genome sequenced, or even having atomic structures of all of its encoded proteins, is merely the beginning to understanding what these molecular machines do, and how our bodies really work.


  • A cult.
  • The deep roots of fascism in the American Right.
  • We are at a horrifying inflection point in foreign policy.
  • Instead of subsidizing oil and gas, the industry should be charged for damages.
  • Are we ready for first contact?

Saturday, December 30, 2023

Some Challenges of Biological Modeling

If modeling one small aspect of one cell is this difficult, how much more difficult is it to model whole cells and organisms?

While the biological literature is full of data / knowledge about how cells and organisms work, we remain far from true understanding- the kind of understanding that would allow computer modeling of their processes. This is both a problem of the kind of data, which is largely qualitative and descriptive, and also of amount- that countless processes and enzymes have never had their detailed characteristics evaluated. In the human genome, I would estimate that roughly half its genes have only been described (if at all) in the most rudimentary way, typically by loose analogy to similar ones. And the rest, when studied more closely, present all sorts of other interesting issues that deflect researchers from core data like their enzymatic rate constants and binding constants to other proteins, as might occur under a plethora of different modification, expression, and other regulatory conditions. 

Then how do we get to usable models of cellular activities? Typically, a lot of guessing is involved, to make anything that approaches a computer model. A recent paper offered a novel way to go down this path, which was to ignore all the rate constants and even interactions, and just focus on the measurements we can make more conveniently- whole metabolome assessments. These are experiments where mass spectrometry is used to evaluate the level of all the smaller chemicals in a cell. If such levels are known, perhaps at a few different conditions, then, these authors argue, we can derive models of their mutual regulation- disregarding all the details and just establishing that some sort of feedback system among these metabolic chemicals must exist to keep them at the observed concentrations.

Their experimental subject is a relatively understandable, but by no means simple, system- the management of iron concentrations in yeast cells. Iron is quite toxic, so keeping it at controlled concentrations and in various carefully-constructed complexes is important for any cell. It is used to make heme, which functions not only in hemoglobin, but in several core respiratory enzymes of mitochondria. It also gets placed into iron-sulfur clusters, which are used even more widely, in respiratory enzymes, in the DNA replication, transcription, protein synthesis, and iron assimilation machineries. It is iron's strong and flexible redox chemistry (and its ancient abundance in the rocks and fluids life evolved with) that make it essential as well as dangerous.

Author's model for iron use and regulation in yeast cells. Outside is on left, cytoplasm is blue, vacuole is green, and mitochondrion is yellow. See text below for abbreviations and description. O2 stands for the oxygen  molecule. The various rate constants R refer to the transition between each state or location.

Iron is imported from outside and forms a pool of free iron in the cytoplasm (FC, in the diagram above). From there, it can be stored into membrane-bound vacuoles (F2, F3), or imported to the mitochondria (FM), where it is corporated into iron-sulfur clusters and heme (FS). Some of the mitochondrially assembled iron-sulfur clusters are exported back out to the cytoplasm to be integrated to a variety of proteins there (CIA). This is indeed one of the most essential roles of mitochondria- needed even if metabolic respiration is for some reason not needed (in hypoxic or anaerobic conditions). If there is a dramatic overload of iron, it can build up as rust particles in the mitochondria (MP). And finally, the iron-sulfur complexes contribute to respiration of oxygen in mitochondria, and thus influence the respiration rate of the whole cell.

The task these authors set themselves was to derive a regulatory scheme using only the elements shown above, in combination with known levels of all the metabolites, under the conditions of 1) normal levels of iron, 2) low iron, and 3) a mutant condition- a defect in the yeast gene YFG1, which binds iron inside mitochondria and participates in iron-sulfur cluster assembly. A slew of differential equations later, and selection through millions of possible regulatory circuits, and they come up with the one shown above, where the red lines/arrows indicate positive regulation, and the red lines ending with bars indicate repression. The latter is typically feedback repression, such as of the import of iron, repressed by the amount already in the cell, in the FC pool. 

They show that this model provides accurate control of iron levels at all the various points, with stable behavior, no singularities or wobbling, and the expected responses to the various conditions. In low iron, the vacuole is emptied of iron, and in the mutant case, iron nanoparticles (MP) accumulate in the mitochondrion, due in part to excess amounts of oxygen admitted to the mitochondrial matrix, which in turn is due to defects in metabolic respiration caused by a lack of iron-sulfur clusters. What seemed so simple at the outset does have quite a few wrinkles!

The authors present their best regulatory scheme, selected from among millions, which provides accurate metabolite control in simulation, as shown by key transitions between conditions as shown here, one line per molecular species. See text and image above for abbreviations.


But note that none of this is actually biological. There are no transcription regulators, such as the AFT1/2 proteins known to regulate a large set of iron assimilation genes. There are no enzymes explicitly cited, and no other regulatory mechanisms like protein modifications, protein disposal, etc. Nor does the cytosolic level of iron actually regulate the import machinery- that is done by the level of iron-sulfur clusters in the mitochondria, as sensed by the AFT regulators, among other mechanisms.

Thus it is not all clear what work like this has to offer. It takes the known concentrations of metabolites (which can be ascertained in bulk) to create a toy system that accurately reproduces a very restricted set of variations, limited to what the researchers could assess elsewhere, in lab experiments. It does not inform the biology of what is going on, since it is not based on the biology, and clearly even contravenes it. It does not inform diseases associated with iron metabolism- in this case Friedreich's ataxia which is caused in humans by a gene related to YFH1- because again it is not biologically based. Knowing where some regulatory events might occur in theory, as one could have done almost as well (if not quantitatively!) on a cocktail napkin, is of little help when drugs need to be made against actual enzymes and actual regulators. It is a classic case of looking under the streetlight- working with the data one has, rather than the data one needs to do something useful.

"Like most ODE (ordinary differential equation)-based biochemical models, sufficient kinetic information was unavailable to solve the system rigorously and uniquely, whereas substantial concentration data were available. Relying on concentrations of cellular components increasingly makes sense because such quantitative concentration determinations are becoming increasingly available due to mass-spectrometry-based proteomic and metabolomics studies. In contrast, determining kinetic parameters experimentally for individual biochemical reactions remain an arduous task." ...

"The actual biochemical mechanisms by which gene expression levels are controlled were either too complicated to be employed in autoregulation, or they were unknown. Thus, we decided to augment every regulatable reaction using soft Heaviside functions as surrogate regulatory systems." ...

"We caution that applying the same strategy for selecting viable autoregulatory mechanisms will become increasing difficult computationally as the complexity of models increases."


But the larger point that motivated a review of this paper is the challenge of modeling a system so small as to be almost infinitesimal in the larger scheme of biology. If dedicated modelers, as this laboratory is, dispair of getting the data they need for even such a modest system, (indeed, the mitochondrial iron and sulfur-containing signaling compound that mediates repression of the AFT regulators is still referred to in the literature as "X-S"), then things are bleak indeed for the prospect of modeling higher levels of biology, such as whole cells. Unknowns are unfortunately gaping all over the place. As has been mentioned a few times, molecular biologists tend to think in cartoons, simplifying the relations they deal with to the bare minimum. Getting beyond that is going to take another few quantum leaps in data- the vaunted "omics" revolutions. It will also take better interpolation methods (dare one invoke AI?) that use all the available scraps of biology, not just mathematics, in a Bayesian ratchet that provides iteratively better models. 


Saturday, December 9, 2023

The Way We Were: Origins of Meiosis and Sex

Sex is as foundational for eukaryotes as are mitochondria and internal membranes. Why and how did it happen?

Sexual reproduction is a rather expensive proposition. The anxiety, the dating, the weddings- ugh! But biologically as well, having to find mates is no picnic for any species. Why do we bother, when bacteria get along just fine just dividing in two? This is a deep question in biology, with a lot of issues in play. And it turns out that bacteria do have quite a bit of something-like-sex: they exchange DNA with each other in small pieces, for similar reasons we do. But the eukaryotic form of sex is uniquely powerful and has supported the rapid evolution of eukaryotes to be by far the dominant domain of life on earth.

A major enemy of DNA-encoded life is mutation. Despite the many DNA replication accuracy and repair mechanisms, some rate of mutation still occurs, and is indeed essential for evolution. But for larger genomes, the mutation rate always exceeds the replication rate, (and the purifying natural selection rate), so that damaging mutations build up and the lineage will inevitably die out without some help. This process is called Muller's ratchet, and is why all organisms appear to exchange DNA with others in their environment, either sporadically like bacteria, or systematically, like eukaryotes.

An even worse enemy of the genome is unrepaired damage like complete (double strand) breaks in the DNA. These stop replication entirely, and are fatal. These also need to be repaired, and again, having extra copies of a genome is the way to allow these to be fixed, by processes like homologous recombination and gene conversion. So having access to other genomes has two crucial roles for organisms- allowing immediate repair, and allowing some way to sweep out deleterious mutations over the longer term.

Our ancestors, the archaea, which are distinct from bacteria, typically have circular, single molecule genomes, in multiple copies per cell, with frequent gene conversions among the copies and frequent exchange with other cells. They routinely have five to twenty copies of their genome, and can easily repair any immediate damage using those other copies. They do not hide mutant copies like we do in a recessive allele, but rather by gene conversion (which means, replicating parts of a chromosome into other ones, piecemeal) make each genome identical over time so that it (and the cell) is visible to selection, despite their polyploid condition. Similarly, taking in DNA from other, similar cells uses the target cells' status as live cells (also visible to selection) to insure that the recipients are getting high quality DNA that can repair their own defects or correct minor mutations. All this ensures that their progeny are all set up with viable genomes, instead of genomes riddled with defects. But it comes at various costs as well, such as a constant race between getting lethal mutation and finding the DNA that might repair it. 

Both mitosis and meiosis were eukaryotic innovations. In both, the chromosomes all line up for orderly segregation to descendants. But meiosis engages in two divisions, and features homolog synapsis and recombination before the first division of the parental homologs.

This is evidently a precursor to the process that led, very roughly 2.5 billion years ago, to eukaryotes, but is all done in a piecemeal basis, nothing like what we do now as eukaryotes. To get to that point, the following innovations needed to happen:

  • Linearized genomes, with centromeres and telomeres, and >1 number of chromosomes.
  • Mitosis to organize normal cellular division, where multiple chromosomes are systematically lined up and distributed 1:1 to daughter cells, using extensive cytoskeletal rearrangements and regulation.
  • Mating with cell fusion, where entire genomes are combined, recombined, and then reduced back to a single complement, and packaged into progeny cells.
  • Synapsis, as part of meiosis, where all sister homologs are lined up, damaged to initiate DNA repair and crossing-over.
  • Meiosis division one, where the now-recombined parental homologs are separated.
  • Meiosis division two, which largely follows the same mechanisms as mitosis, separating the reshuffled and recombined sister chromosomes.

This is a lot of novelty on the path to eukaryogenesis, and is just a portion of the many other innovations that happened in this lineage. What drove all this, and what were some plausible steps in the process? The advent of true sex generated several powerful effects:

  1. A definitive solution to Muller's ratchet, by exposing every locus in a systematic way to partial selection and sweeping out deleterious mutations, while protecting most members of the population from those same mutations. Continual recombination of the parental genomes allows beneficial mutations to separate from deleterious ones and be differentially preserved.
  2. Mutated alleles are partially, yet systematically, hidden as recessive alleles, allowing selection when they come into homozygous status, but also allowing them to exist for limited time to buffer the mutation rate and to generate new variation. This vastly increases accessible genetic variation.
  3. Full genome-length alignment and repair by crossing over is part of the process, correcting various kinds of damage and allowing accurate recombination across arbitrarily large genomes.
  4. Crossing over during meiotic synapsis mixes up the parental chromosomes, allowing true recombination among the parental genomes, beyond just the shuffling of the full-length chromosomes. This vastly increases the power of mating to sample genetic variation across the population, and generates what we think of as "species", which represent more or less closed interbreeding pools of genetic variants that are not clones but diverse individuals.

The time point of 2.5 billion years ago is significant because this is the general time of the great oxidation event, when cyanobacteria were finally producing enough oxygen by photosynthesis to alter the geology of earth. (However our current level of atmospheric oxygen did not come about until almost two billion years later, with rise of land plants.) While this mainly prompted the logic of acquiring mitochondria, either to detoxify oxygen or use it metabolically, some believe that it is relevant to the development of meiosis as well. 

There was a window of time when oxygen was present, but the ozone layer had not yet formed, possibly generating a particularly mutagenic environment of UV irradiation and reactive oxygen species. Such higher mutagenesis may have pressured the archaea mentioned above to get their act together- to not distribute their chromosomes so sporadically to offspring, to mate fully across their chromosomes, not just pieces of them, and to recombine / repair across those entire mated chromosomes. In this proposal, synapsis, as seen in meiosis I, had its origin in a repair process that solved the problem of large genomes under mutational load by aligning them more securely than previously. 

It is notable that one of the special enzymes of meiosis is Spo11, which induces the double-strand breaks that lead to crossing-over, recombination, and the chiasmata that hold the homologs together during the first division. This DNA damage happens at quite high rates all over the genome, and is programmed, via the structures of the synaptonemal complex, to favor crossing-over between (parental) homologs vs duplicate sister chromosomes. Such intensive repair, while now aimed at ensuring recombination, may have originally had other purposes.

Alternately, others suggest that it is larger genome size that motivated this innovation. This origin event involves many gene duplication events that ramified the capabilities of the symbiotic assemblage. Such gene dupilcations would naturally lead to recombinational errors in traditional gene conversion models of bacterial / archaeal genetic exchange, so there was pressure to generate a more accurate whole-genome alignment system that confined recombination to the precise homologs of genes, rather than to any similar relative that happened to be present. This led to the synapsis that currently is part of meiosis I, but it is also part of "parameiosis" systems on some eukaryotes, which, while clearly derived, might resemble primitive steps to full-blown meiosis.

It has long been apparent that the mechanisms of meiosis division one are largely derived from (or related to) the mechanisms used for mitosis, via gene duplications and regulatory tinkering. So these processes (mitosis and the two divisions of meiosis) are highly related and may have arisen as a package deal (along with linear chromosomes) during the long and murky road from the last archaeal ancestor and the last common eukaryotic ancestor, which possessed a much larger suite of additional innovations, from mitochondria to nuclei, mitosis, meiosis, cytoskeleton, introns / mRNA splicing, peroxisomes, other organelles, etc.  

Modeling of different mitotic/meiotic features. All cells modeled have 18 copies of a polypoid genome, with a newly evolved process of mitosis. Green = addition of crossing over / recombination of parental chromosomes, but no chromosome exchange. Red = chromosome exchange, but no crossing over. Blue = both crossing over and chromosome exchange, as occurs now in eukaryotes. The Y axis is fitness / survival and the X axis is time in generations after start of modeling.

A modeling paper points to the quantitative benefits of the mitosis when combined with the meiotic suite of innovations. They suggest that in a polyploid archaean lineage, the establishment of mitosis alone would have had revolutionary effects, ensuring accurate segregation of all the chromosomes, and that this would have enabled differentiation among those polyploid chromosome copies, since they would be each be faithfully transmitted individually to offspring (assuming all, instead of one, were replicated and transmitted). Thus they could develop into different chromosomes, rather than remain copies. This would, as above, encourage meiosis-like synapsis over the whole genome to align all the (highly similar) genes properly.

"Modeling suggests that mitosis (accurate segregation of sister chromosomes) immediately removes all long-term disadvantages of polyploidy."

Additional modeling of the meiotic features of chromosome shuffling, and recombination between parental chromosomes, indicates (shown above) that these are highly beneficial to long-term fitness, which can rise instead of decaying with time, per the various benefits of true sex as described above. 

The field has definitely not settled on one story of how meiosis (and mitosis) evolved, and these ideas and hypotheses are tentative at this point. But the accumulating findings that the archaea that most closely resemble the root of the eukaryotic (nuclear) tree have many of the needed ingredients, such as active cytoskeletons, a variety of molecular antecedents of ramified eukaryotic features, and now extensive polyploidy to go with gene conversion and DNA exchange with other cells, makes the momentous gap from archaea to eukaryotes somewhat narrower.


Saturday, November 25, 2023

Are Archaea Archaic?

It remains controversial whether the archaeal domain of life is 1 or 4.5 billion years old. That is a big difference!

Back in the 1970's, the nascent technologies of molecular analysis and DNA sequencing produced a big surprise- that hidden in the bogs and hot springs of the world are micro-organisms so extremely different from known bacteria and protists that they were given their own domain on the tree of life. These are now called the archaea, and in addition to being deeply different from bacteria, they were eventually found to be the progenitors of eukaryotic cell- the third (and greatest!) domain of life that arose later in the history of the biosphere. The archaeal cell contributed most of the nuclear, informational, membrane management, and cytoskeletal functions, while one or more assimilated bacteria (most prominently the future mitochondrion and chloroplast) contributed most of the metabolic functions, as well as membrane lipid synthesis and peroxisomal functions.

Carl Woese, who discovered and named archaea, put his thumb heavily on the scale with that name, (originally archaebacteria), suggesting that these new cells were not just an independent domain of life, totally distinct from bacteria, but were perhaps the original cell- that is, the LUCA, or last universal common ancestor. All this was based on the sequences of rRNA genes, which form the structural and catalytic core of the ribosome, and are conserved in all known life. But it has since become apparent that sequences of this kind, which were originally touted as "molecular clocks", or even "chronometers" are nothing of the kind. They bear the traces of mutations that happen along the way, and, being highly important and conserved, do not track the raw mutation rate, (which itself is not so uniform either), but rather the rate at which change is tolerated by natural selection. And this rate can be wildly different at different times, as lineages go through crises, bottlenecks, adaptive radiations, and whatever else happened in the far, far distant past.

Carl Woese, looking over filmed spots of 32P labeled ribosomal RNA from different species, after size separation by electrophoresis. This is how RNAs were analyzed, back in 1976, and such rough analysis already suggested that archaea were something very different from bacteria.

There since has been a tremendous amount of speculation, re-analysis, gathering of more data, and vitriol in the overall debate about the deep divergences in evolution, such as where eukaryotes come from, and where the archaea fit into the overall scheme. Compared with the rest of molecular biology, where experiments routinely address questions productively and efficiently due to a rich tool chest and immediate access to the subject at hand, deep phylogeny is far more speculative and prone to subjective interpretation, sketchy data, personal hobbyhorses, and abusive writing. A recent symposium in honor of one of its more argumentative practitioners made that clear, as his ideas were being discarded virtually at the graveside.

Over the last decade, estimates of the branching date of archaea from the rest of the tree of life have varied from 0.8 to 4.5 Gya (billion years ago). That is a tremendous range, and is a sign of the difficulty of this field. The frustrations of doing molecular phylogeny are legion, just as the temptations are alluring. Firstly, there are very few landmarks in the fossil record to pin all this down. There are stromatolites from roughly 3.5 Gya, which pin down the first documented life of any kind. Second are eukaryotic fossils, which start, at the earliest, about 1.5 Gya. Other microbial fossils pin down occasional sub-groups of bacteria, but archaea are not represented in the fossil record at all, being hardly distinguishable from bacteria in their remains. Then we get the Cambrian explosion of multicellular life, roughly 0.5 Gya. That is pretty much it for the fossil record, aside from the age of the moon, which is about 4.5 Gya and gives us the baseline of when the earth became geologically capable of supporting life of any kind.

The molecules of living organisms, however, form a digital record of history. Following evolutionary theory, each organism descends from others, and carries, in mutated and altered form, traces of that history. We have parts of our genomes that vary with each generation, (useful for forensics and personal identification), we have other parts that show how we changed and evolved from other apes, and we have yet other areas that vary hardly at all- that carry recognizable sequences shared with all other forms of life, and presumably with LUCA. This is a real treasure trove, if only we can make sense of it.

But therein lies the rub. As mentioned above, these deeply conserved sequences are hardly chronometers. So for all the data collection and computer wizardry, the data itself tells a mangled story. Rapid evolution in one lineage can make it look much older than it really is, confounding the whole tree. Over the years, practitioners have learned to be as judicious as possible in selecting target sequences, while getting as many as possible into the mix. For example, adding up the sequences of 50-odd ribosomal proteins can give more and better data than assembling the 2 long-ish ribosomal RNAs. They provide more and more diverse data. But they have their problems as well, since some are much less conserved than others, and some were lost or gained along the way. 

A partisan of the later birth of archaea provides a phylogenetic tree with countless microbial species, and one bold claim: "inflated" distances to the archaeal and eukaryotic stems. This is given as the reason that archaea (lower part of the diagram, including eukaryotes, termed "archaebacteria"), looks very ancient, but really just sped away from its originating bacterial parent, (the red bacteria), estimated at about 1 Gya. This tree is based on an aligned concatentation of 26 universally conserved ribosomal protein sequences, (51 from eukaryotes), with custom adjustments.

So there has been a camp that claims that the huge apparent / molecular distance between the archaea and other cells is just such a chimera of fast evolution. Just as the revolution that led to the eukaryotic cell involved alot of molecular change including the co-habitation of countless proteins that had never seen each other before, duplications / specializations, and many novel inventions, whatever process led to the archaeal cell (from a pre-existing bacterial cell) might also have caused the key molecules we use to look into this deep time to mutate much more rapidly than is true elsewhere in the vast tree of life. What are the reasons? There is the general disbelief / unwillingness to accept someone else's work, and evidence like possible horizontal transfers of genes from chloroplasts to basal archaea, some large sequence deletion features that can be tracked through these lineages and interpreted to support late origination, some papering over of substantial differences in membrane and metabolic systems, and there are plausible (via some tortured logic) candidates for an originating, and late-evolving, bacterial parent. 

This thread of argument puts the origin of eukaryotes roughly at 0.8 Gya, which is, frankly, uncomfortably close to the origination of multicellular life, and gives precious little time for the bulk of eukaryotic diversity to develop, which exists largely, as shown above, at the microbial level. (Note that "Animalia" in the tree above is a tiny red blip among the eukaryotes.) All this is quite implausible, even to a casual reader, and makes this project hard to take seriously, despite its insistent and voluminous documentation.

Parenthetically, there was a fascinating paper that used the evolution of the genetic code itself to make a related point, though without absolute time attributions. The code bears hallmarks of some amino acids being added relatively late (tryptophan, histidine), while others were foundational from the start (glycine, alanine), when it may have consisted of two RNA bases (or even one) rather than three. All of this took place long before LUCA, naturally. This broad analysis of genetic code usage argued that bacteria tend to use a more ancient subset of the code, which may reflect their significantly more ancient position on the tree of life. While the full code was certainly in place by the time of LUCA, there may still at this time have been, in the inherited genome / pool of proteins, a bias against the relatively novel amino acids. This finding implies that the time of archaeal origination was later than the origination of bacteria, by some unspecified but significant amount.

So, attractive as it would be to demote the archaea from their perch as super-ancient organisms, given their small sizes, small genomes, specialization in extreme environments, and peripheral ecological position relative to bacteria, that turns out to be difficult to do. I will turn, then, to a very recent paper that gives what I think is much more reasoned and plausible picture of the deeper levels of the tree of life, and the best general picture to date. This paper is based on the protein sequences of the rotary ATPases that are universal, and were present in LUCA, despite their significant complexity. Indeed, the more we learn about LUCA, the more complete and complex this ancestor turns out to be. Our mitochondrion uses a (bacterial) F-type ATPase to synthesize ATP from the food-derived proton gradient. Our lysosomes use a (archaeal) V-type ATPase to drive protons into / acidify the lysosome in exchange for ATP. These are related, derived from one distant ancestor, and apparently each was likely to have been present in LUCA. Additionally, each ATPase is composed of two types of subunits, one catalytic, and one non-catalytic, which originated from an ancient protein duplication, also prior to LUCA. The availability of these molecular cousins / duplications provides helpful points of comparison throughout, particularly for locating the root of the evolutionary tree.

Phylogenetic trees based on ATP synthase enzymes that are present in all forms of life. On left is shown the general tree, with branch points of key events / lineages. On right are shown sub-trees for the major types of the ATP synthase, whether catalytic subunit (c), non-catalytic (n), F-type, common in bacteria, or V type, common in archaea. Note how congruent these trees are. At bottom right in the tiny print is a guide to absolute time, and the various last common ancestors.

This paper also works quite hard to pin the molecular data to the fossil and absolute time record, which is not always provided The bottom line is that archaea by this tree arise quite early, (see above), co-incident with or within about 0.5 Gy of LUCA, which was bacterial, at roughly 4.4 Gya. The bacterial and archaeal last common ancestors are dated to 4.3 and 3.7 Gya, respectively. The (fused) eukaryotic last common ancestor dates to about 1.9 Gya, with the proto-mitochondrion's individual last common ancestor among the bacteria some time before that, at roughly 2.4 Gya. 

This time line makes sense on many fronts. First, it provides a realistic time frame for the formation and diversification of eukaryotes. It puts their origin right around the great oxidation event, which is when oxygen became dominant in earth's atmosphere, (about 2 to 2.4 Gya), which was a precondition for the usefulness of mitochondria to what are otherwise anaerobic archaeal cells. It places the origin of archaea (LACA) a substantial stretch after the origin of bacteria, which agrees with the critic's points above that bacteria are the truly basal lineage of all life, and archaea, while highly different and pretty archaic, also share a lot of characteristics with bacteria, and perhaps more so with certain early lineages than with others that came later. The distinction between LUCA and the last common bacterial ancestor (LBCA) is a technical one given the trees they were working from, and are not, given the ranges of age presented, (see figure above), significantly different.

I believe this field is settling down, and though this paper, working from only a subset of the most ancient sequences plus fossil set-points, is hardly the last word, it appears to represent a consensus view and is the best picture to date of the deepest and most significant waypoints in the deep history of life. This is what comes from looking through microscopes, and finding entire invisible worlds that we had no idea existed. Genetic sequencing is another level over that of microscopy, looking right at life's code, and at its history, if darkly. What we see in the macroscopic world around us is only the latest act in a drama of tremendous scale and antiquity.


Sunday, November 12, 2023

Missing Links in Eukaryotic Evolution

The things you find in Slovenian mud! Like an archaeal cell that is the closest thing to the eukaryotic root organism.

Creationists and "intelligent" design advocates tirelessly point to the fossil record. Not how orderly it is and revealing of the astonishingly sequenced, slow, and relentless elaboration of life. No, they decry its gaps- places where fossils do not account for major evolutionary (er, designed) transitions to more modern forms. It is a sad kind of argument, lacking in imagination and dishonest in its unfairness and hypocrisy. Does the life of Jesus have gaps in the historical record? Sure enough! And are those historical records anywhere near as concrete and informative as fossils? No way. What we have as a record of Christianity's history is riven with fantasy, forgery, and uncertainty.

But enough trash talk. One thing that science has going for it is a relentlessly accumulating process by which new fossils appear, and new data from other sources, like newly found organisms and newly sequenced genomes, arise to clarify what were only imaginative (if reasonable) hypotheses previously. Darwin's theory of evolution, convincing and elegantly argued as it was originally, has gained such evidence without fail over the subsequent century and a half, from discoveries of the age of the earth (and thus the solar system) to the mechanics of genetic inheritance.

A recent paper describes the occurence of cytoskeletal proteins and structures in an organism that is neither a bacterium nor a eukaryote, but appears to be within the family of Archaea that is the closest thing we have to the eukaryotic progenitor. These are the Asgard Archaea, a family that was discovered only in the last decade, as massive environmental sequencing projects have sampled the vast genetic diversity hidden in the muds, sediments, soils, rocks, and waters of the world. 

Sampling stray DNA is one thing, but studying these organisms in depth requires growing them in the lab. After trolling through the same muds in Slovenia where promising DNA sequences were fond, this group fished out, and then carefully cultured, a novel archaeal cell. But growing these cells is notoriously difficult. They are anaerobic, never having made the transition to the oxygenated atmosphere of the later earth. They have finicky nutritional requirements. They grow very slowly. And they generally have to live with other organisms (bacteria) with which they have reciprocal metabolic relationships. In the ur-eukaryote, this was a relationship with the proto-mitochondrion, which was later internalized. For the species cultured by this research group, it is a pair of other free-living bacteria. One is related to sulfur-reducing Desulfovibrio, and the other one is related to a simpler archaeal Methanogenium that uses hydrogen and CO2 or related simple carbon compounds to make methane. Anaerobic Asgard archaea generally have relatively simple metabolisms and make hydrogen from small organic compounds, through a kind of fermentation.

A phylogenetic tree showing relations between the newly found organisms (bottom) and eukaryotes (orange), other archaea, and the entirely separate domain of bacteria (red). This is based on a set of sequences of universally used / conserved ribosomal proteins. While the eukaryotes have strayed far from the root, that root is extremely close to some archaeal groups.

Micrographs of cultured lokiarchaeal cells, with a scale bar of 500 nanometers. These are rather amoeboid cells with extensive cytoskeletal and membrane regulation.

Another micrograph of part of a lokiarchaeal cell, showing not just its whacky shape, but a good bit of internal structure as well. The main scale bar is 100 nanometers. There are internal actin filaments (yellow arrowheads), lined up ribosomes (gray arrowhead) and cell surface proteins of some kind (blue arrowheads).

What they found after all this was pretty astonishing. They found cells that are quite unlike typical bacterial or even archaeal cells, which are compact round or rod shapes. These (termed lokiarchaeal) cells have luxurious processes extending all over the place, and a profusion of internal structural elements reminiscent of eukaryotic cells, though without membrane-bound internal organelles. But they have membrane-bound protrusions and what look like vesicles budding off. At only six million base pairs (compared to our three billion) and under five thousand genes, these cells have a small and streamlined genome. Yet there are a large number (i.e. 258) of eukaryotic-related (signature) proteins (outlined below), particularly concerning cytoskeletal and membrane trafficking. The researchers delved into the subcellular structures, labeling actin and obtaining structural data for both actin and ribosomes, confirming their archaeal affinity with added features. 

A schematic of eukaryotic-like proteins in the newly cultured lokiarchaeal Asgard genome. Comparison (blue) is to a closely related organism isolated recently in Japan.


This work is the first time that the cytoskeleton of Asgard cells has been visualized, along with its role in their amoeboid capabilities. What is it used for? That remains unknown. The lush protrusions may collaborate with this organism's metabolic partners, or be used for sensing and locomoting to find new food within its sediment habitat, or for interacting with fellow lokiarchaeal cells, as shown above. Or all of these roles. Evolutionarily, this organism, while modern, appears to be a descendent of the closest thing we have to the missing link at the origin of eukaryotes, (that is, the archaeal dominant partner of the founding symbiosis), and in that sense seems both ancient in its characteristics, and possibly little changed from that time. Who would have expected such a thing? Well, molecular biologists and evolutionary biologists have been expecting it for a long time.


  • Fossil fuel consumption is still going up, not down.