Saturday, December 30, 2023

Some Challenges of Biological Modeling

If modeling one small aspect of one cell is this difficult, how much more difficult is it to model whole cells and organisms?

While the biological literature is full of data / knowledge about how cells and organisms work, we remain far from true understanding- the kind of understanding that would allow computer modeling of their processes. This is both a problem of the kind of data, which is largely qualitative and descriptive, and also of amount- that countless processes and enzymes have never had their detailed characteristics evaluated. In the human genome, I would estimate that roughly half its genes have only been described (if at all) in the most rudimentary way, typically by loose analogy to similar ones. And the rest, when studied more closely, present all sorts of other interesting issues that deflect researchers from core data like their enzymatic rate constants and binding constants to other proteins, as might occur under a plethora of different modification, expression, and other regulatory conditions. 

Then how do we get to usable models of cellular activities? Typically, a lot of guessing is involved, to make anything that approaches a computer model. A recent paper offered a novel way to go down this path, which was to ignore all the rate constants and even interactions, and just focus on the measurements we can make more conveniently- whole metabolome assessments. These are experiments where mass spectrometry is used to evaluate the level of all the smaller chemicals in a cell. If such levels are known, perhaps at a few different conditions, then, these authors argue, we can derive models of their mutual regulation- disregarding all the details and just establishing that some sort of feedback system among these metabolic chemicals must exist to keep them at the observed concentrations.

Their experimental subject is a relatively understandable, but by no means simple, system- the management of iron concentrations in yeast cells. Iron is quite toxic, so keeping it at controlled concentrations and in various carefully-constructed complexes is important for any cell. It is used to make heme, which functions not only in hemoglobin, but in several core respiratory enzymes of mitochondria. It also gets placed into iron-sulfur clusters, which are used even more widely, in respiratory enzymes, in the DNA replication, transcription, protein synthesis, and iron assimilation machineries. It is iron's strong and flexible redox chemistry (and its ancient abundance in the rocks and fluids life evolved with) that make it essential as well as dangerous.

Author's model for iron use and regulation in yeast cells. Outside is on left, cytoplasm is blue, vacuole is green, and mitochondrion is yellow. See text below for abbreviations and description. O2 stands for the oxygen  molecule. The various rate constants R refer to the transition between each state or location.

Iron is imported from outside and forms a pool of free iron in the cytoplasm (FC, in the diagram above). From there, it can be stored into membrane-bound vacuoles (F2, F3), or imported to the mitochondria (FM), where it is corporated into iron-sulfur clusters and heme (FS). Some of the mitochondrially assembled iron-sulfur clusters are exported back out to the cytoplasm to be integrated to a variety of proteins there (CIA). This is indeed one of the most essential roles of mitochondria- needed even if metabolic respiration is for some reason not needed (in hypoxic or anaerobic conditions). If there is a dramatic overload of iron, it can build up as rust particles in the mitochondria (MP). And finally, the iron-sulfur complexes contribute to respiration of oxygen in mitochondria, and thus influence the respiration rate of the whole cell.

The task these authors set themselves was to derive a regulatory scheme using only the elements shown above, in combination with known levels of all the metabolites, under the conditions of 1) normal levels of iron, 2) low iron, and 3) a mutant condition- a defect in the yeast gene YFG1, which binds iron inside mitochondria and participates in iron-sulfur cluster assembly. A slew of differential equations later, and selection through millions of possible regulatory circuits, and they come up with the one shown above, where the red lines/arrows indicate positive regulation, and the red lines ending with bars indicate repression. The latter is typically feedback repression, such as of the import of iron, repressed by the amount already in the cell, in the FC pool. 

They show that this model provides accurate control of iron levels at all the various points, with stable behavior, no singularities or wobbling, and the expected responses to the various conditions. In low iron, the vacuole is emptied of iron, and in the mutant case, iron nanoparticles (MP) accumulate in the mitochondrion, due in part to excess amounts of oxygen admitted to the mitochondrial matrix, which in turn is due to defects in metabolic respiration caused by a lack of iron-sulfur clusters. What seemed so simple at the outset does have quite a few wrinkles!

The authors present their best regulatory scheme, selected from among millions, which provides accurate metabolite control in simulation, as shown by key transitions between conditions as shown here, one line per molecular species. See text and image above for abbreviations.


But note that none of this is actually biological. There are no transcription regulators, such as the AFT1/2 proteins known to regulate a large set of iron assimilation genes. There are no enzymes explicitly cited, and no other regulatory mechanisms like protein modifications, protein disposal, etc. Nor does the cytosolic level of iron actually regulate the import machinery- that is done by the level of iron-sulfur clusters in the mitochondria, as sensed by the AFT regulators, among other mechanisms.

Thus it is not all clear what work like this has to offer. It takes the known concentrations of metabolites (which can be ascertained in bulk) to create a toy system that accurately reproduces a very restricted set of variations, limited to what the researchers could assess elsewhere, in lab experiments. It does not inform the biology of what is going on, since it is not based on the biology, and clearly even contravenes it. It does not inform diseases associated with iron metabolism- in this case Friedreich's ataxia which is caused in humans by a gene related to YFH1- because again it is not biologically based. Knowing where some regulatory events might occur in theory, as one could have done almost as well (if not quantitatively!) on a cocktail napkin, is of little help when drugs need to be made against actual enzymes and actual regulators. It is a classic case of looking under the streetlight- working with the data one has, rather than the data one needs to do something useful.

"Like most ODE (ordinary differential equation)-based biochemical models, sufficient kinetic information was unavailable to solve the system rigorously and uniquely, whereas substantial concentration data were available. Relying on concentrations of cellular components increasingly makes sense because such quantitative concentration determinations are becoming increasingly available due to mass-spectrometry-based proteomic and metabolomics studies. In contrast, determining kinetic parameters experimentally for individual biochemical reactions remain an arduous task." ...

"The actual biochemical mechanisms by which gene expression levels are controlled were either too complicated to be employed in autoregulation, or they were unknown. Thus, we decided to augment every regulatable reaction using soft Heaviside functions as surrogate regulatory systems." ...

"We caution that applying the same strategy for selecting viable autoregulatory mechanisms will become increasing difficult computationally as the complexity of models increases."


But the larger point that motivated a review of this paper is the challenge of modeling a system so small as to be almost infinitesimal in the larger scheme of biology. If dedicated modelers, as this laboratory is, dispair of getting the data they need for even such a modest system, (indeed, the mitochondrial iron and sulfur-containing signaling compound that mediates repression of the AFT regulators is still referred to in the literature as "X-S"), then things are bleak indeed for the prospect of modeling higher levels of biology, such as whole cells. Unknowns are unfortunately gaping all over the place. As has been mentioned a few times, molecular biologists tend to think in cartoons, simplifying the relations they deal with to the bare minimum. Getting beyond that is going to take another few quantum leaps in data- the vaunted "omics" revolutions. It will also take better interpolation methods (dare one invoke AI?) that use all the available scraps of biology, not just mathematics, in a Bayesian ratchet that provides iteratively better models. 


Saturday, December 23, 2023

How Does Speciation Happen?

Niles Eldredge and the theory of punctuated equilibrium in evolution.

I have been enjoying "Eternal Ephemera", which is an end-of-career memoir/intellectual history from a leading theorist in paleontology and evolution, Niles Eldredge. In this genre, often of epic proportions and scope, the author takes stock of the historical setting of his or her work and tries to put it into the larger context of general intellectual progress, (yes, as pontifically as possible!), with maybe some gestures towards future developments. I wish more researchers would write such personal and deeply researched accounts, of which this one is a classic. It is a book that deserves to be in print and more widely read.

Eldredge's claim to fame is punctuated equilibrium, the theory (or, perhaps better, observation) that evolution occurs much more haltingly than in the majestic gradual progression that Darwin presented in "Origin of Species". This is an observation that comes straight out of the fossil record. And perhaps the major point of the book is that the earliest biologists, even before Darwin, but also including Darwin, knew about this aspect of the fossil record, and were thus led to concepts like catastrophism and "etagen". Only Lamarck had a steadfastly gradualist view of biological change, which Darwin eventually took up, while replacing Lamarck's mechanism of intentional/habitual change with that of natural selection. Eldridge unearths tantalizing and, to him, supremely frustrating, evidence that Darwin was fully aware of the static nature of most fossil series, and even recognized the probable mechanism behind it (speciation in remote, peripheral areas), only to discard it for what must have seemed a clearer, more sweeping theory. But along the way, the actual mechanism of speciation got somewhat lost on the shuffle.

Punctuated equilibrium observes that most species recognized in the fossil record do not gradually turn into their descendents, but are replaced by them. Eldredge's subject of choice is trilobites, which have a long and storied record for almost 300 million years, featuring replacement after replacement, with species averaging a few million years duration each. It is a simple fact, but one that is a bit hard to square with the traditional / Darwinian and even molecular account of evolution. DNA is supposed to act like a "clock", with constant mutational change through time. And natural selection likewise acts everywhere and always... so why the stasis exhibited by species, and why the apparently rapid evolution in between replacements? That is the conundrum of punctuated equilibrium.

There have been lot of trilobites. This comes from a paper about their origin during the Cambrian explosion, arguing that only about 20 million years was enough for their initial speciation (bottom of image).

The equilibrium part, also termed stasis, is seen in the current / recent world as well as in the fossil record. We see species such as horses, bison, and lions that are identical to those drawn in cave paintings. We see fossils of animals like wildebeest that are identical to those living, going back millions of years. And we see unusual species in recent fossils, like saber-toothed cats, that have gone extinct. We do not typically see animals that have transformed over recent geological history from one (morphological) species into another, or really, into anything very different at all. A million years ago, wildebeest seem to have split off a related species, the black wildebeest, and that is about it.

But this stasis is only apparent. Beneath the surface, mutations are constantly happening and piling up in the genome, and selection is relentlessly working to ... do something. But what? This is where the equilibrium part comes in, positing that wide-spread, successful species are so hemmed in by the diversity of ecologies they participate in that they occupy a very narrow adaptive peak, which selection works to keep the species on, resulting in apparent stasis. It is a very dynamic equilibrium. The constant gene flow among all parts of the population that keeps the species marching forward as one gene pool, despite the ecological variability, makes it impossible to adapt to new conditions that do not affect the whole range. Thus, paradoxically, the more successful the species, and the more prominent it is in the fossil record, the less change will be apparent in those fossils over time.

The punctuated part is that these static species in the fossil record eventually disappear and are replaced by other species that are typically similar, but not the same, and do not segue from the original in a gradual way that is visible in the fossil record. No, most species and locations show sudden replacement. How can this be so if evolution by natural selection is true? As above, wide-spread species are limited in what selection can do. Isolated populations, however, are more free to adapt to local conditions. And if one of those local conditions (such as arctic cold) happens to be what later happens to the whole range (such as an ice age), then it is more likely that a peripherally (pre-)adapted population will take over the whole range, than that the resident species adapts with sufficient speed to the new conditions. Range expansion, for the peripheral species, is easier and faster than adaptation, for the wide-ranging originating species.

The punctuated equilibrium proposition came out in the 1970's, and naturally followed theories of speciation by geographic separation that had previously come out (also resurrected from earlier ideas) in the 1930's to 1950's, but which had not made much impression (!) on paleontologists. Paleontologists are always grappling with the difficulties of the record, which is partial, and does not preserve a lot of what we would like to know, like behavior, ecological relationships, and mutational history. But they did come to agree that species stasis is a real thing, not just, as Darwin claimed, an artifact of the incomplete fossil record. Granted- if we had fossils of all the isolated and peripheral locations, which is where speciation would be taking place by this theory, we would see the gradual change and adaptation taking place. So there are gaps in the fossil record, in a way. But as long as we look at the dominant populations, we will rarely see speciation taking place before our eyes, in the fossils.

So what does a molecular biologist have to say about all this? As Darwin insisted early in "Origin", we can learn quite a bit from domesticated animals. It turns out that wild species have a great amount of mostly hidden genetic variation. This is apparent whenever one is domesticated and bred for desired traits. We have bred dogs, for example, to an astonishingly wide variety of traits. At the same time, we have bred them out to very low genetic diversity. Many breeds are saddled with genetic defects that can not be resolved without outbreeding. So we have in essence exchanged the vast hidden genetic diversity of a wild species for great visible diversity in the domesticated species, combined with low genetic diversity.

What this suggests is that wild species have great reservoirs of possible traits that can be selected for the purposes of adaptation under selective conditions. Which suggests that speciation in range edges and isolated environments can be very fast, as the punctuated part of punctuated equilibrium posits. And again, it reinforces the idea that during equilibrium with large populations and ranges, species have plenty of genetic resources to adapt and change, but spend those resources reinforcing / fine tuning their core ecological "franchise", as it were.

In population genetics, it is well known that mutations arise and fix (that is, spread to 100% of the population on both alleles) at the same rate no matter how large the population, in theory. That is to say- bigger populations generate more mutations, but correspondingly hide them better in recessive form (if deleterious) and for neutral mutations, take much longer to allow any individual mutation to drift to either extinction or fixation. Selection against deleterious mutations is more relentless in larger populations, while relaxed selection and higher drift can allow smaller populations to explore wider ranges of adaptive space, perhaps finding globally higher (fitness) peaks than the parent species could find.

Eldredge cites some molecular work that claims that at least twenty percent of sequence change in animal lineages is due specifically to punctuational events of speciation, and not to the gradual background accumulation of mutations. What could explain this? The actual mutation rate is not at issue, (though see here), but the numbers of mutations retained, perhaps due to relaxed purifying selection in small populations, and founder effects and positive selection during the speciation process. This kind of phenomenon also helps to explain why the DNA "clock" mentioned above is not at all regular, but quite variable, making an uneven guide to dating the past.

Humans are another good example. Our species is notoriously low in genetic diversity, compared to most wild species, including chimpanzees. It is evident that our extremely low population numbers (over prehistoric time) have facilitated speciation, (that is, the fixation of variants which might be swamped in bigger populations), which has resulted in a bewildering branching pattern of different hominid forms over the last few million years. That makes fossils hard to find, and speciation hard to pinpoint. But now that we have taken over the planet with a huge population, our bones will be found everywhere, and they will be largely static for the foreseeable future, as a successful, wide-spread species (barring engineered changes). 

I think this all adds up to a reasonably coherent theory that reconciles the rest of biology with the fossil record. However, it remains frustratingly abstract, given the nature of fossils that rarely yield up the branching events whose rich results they record.


Saturday, December 16, 2023

Easy Does it

The eukaryotic ribosome is significantly slower than, and more accurate than, the bacterial ribosome.

Despite the focus, in molecular biology, on interesting molecules like genes and regulators, the most striking thing facing anyone who breaks open cells is the prevalence of ribosomes. Run the cellular proteins or RNAs out on a gel, and bulk of the material is always ribosomal proteins and ribosomal RNAs, along with tRNAs. That is because ribosomes are critically important, immense in size, and quite slow. They are sort of the beating heart of the cell- not the brains, not the energy source, but the big lumpy, ancient, shape-shifting object that pumps out another essential form of life-blood- all the proteins the cell needs to keep going.

With the revolution in structural biology, we have gotten an increasingly clear view of the ribosome, and a recent paper took it up another notch with a structural analysis of how tRNA handling works and how / why it is that the eukaryotic ribosome is about ten times slower than its bacterial progenitor. One of their figures provides a beautiful (if partial) view of each kind of ribosome, showing how well-conserved this structure is, despite the roughly three billion or more years that have elapsed since their divergence into the bacterial and archaeal lineages, from which the eukaryotic ribosome comes. 

Above, the human ribosome, and below, the ribosome of E. coli, a bacterium, in partial views. The perspective is from the back, relative to conventional views, and only a small amount of the large subunit (LSU) appears at the top of each structure, with more of the small subunit (SSU) shown below. Between them is the cleft where tRNAs bind, in a dynamic sequence of incoming rRNA at the A (acceptor) site, then catalysis of peptide bond addition at the P (peptidyl transfer) site, and ejection of the last tRNA at the E (ejection) site. In concert with the conveyor belt of tRNAs going through, the nascent protein is being synthesized in the large subunit and the mRNA is going by, codon by codon, in the small subunit. Note the overall conservation of structure, despite quite a bit of difference in detail.

The ribosome is an RNA machine at its core, with a lot of accessory proteins that were added later on. And it comes in two parts, the large and small subunits. These subunits do different things, do a lot of rolling about relative to each other, and bind a conveyor belt of tRNAs between them. The tRNAs are pre-loaded with an amino acid on one end (top) and an anticodon on the other end (bottom). They also come with a helper protein (EF-Tu in bacterial, eEF1A in eukaryotes), which plays a role later on. The anticodon is a set of three nucleotides that constitute the genetic code, whereby this tRNA is always going to match one codon to a particular amino acid. 

The ribosome doesn't care what the code is or which tRNA comes in. It only cares that the tRNA matches the mRNA held by the small subunit, as transcribed from the DNA. This process is called decoding, and the researchers show some of the differences that make it slower, but also more accurate, in eukaryotes. In bacteria, ribosomes can work at up to 20 amino acids per second, while human ribosomes top out at about 2 amino acids per second. That is pretty slow, for an enzyme! Its accuracy is about one error per thousand to ten thousand codons.

See text for description of this diagram of the ribosomal process. 50 S is the large ribosomal subunit in bacteria (60S in eukaryotes). 30S is the small subunit in bacteria (40S in eukaryotes). S stands for Svedberg units, a unit of sedimentation in high-speed centrifugation, which was used to study proteins at the dawn of molecular biology.

Above is diagrammed the stepwise logic of protein synthesis. The first step is that a tRNA comes in and lands on the empty A site, and tests whether its anticodon sequence fits the codon on the mRNA being threaded through the bottom. This fitting and testing is the key quality control process, and the slower and more selective it is, the more accurate the resulting translation. The EF-Tu/eEF1A+GTP protein holds on to the tRNA at the acceptor (A) position, and only when the fit is good does that fit communicate back up from the small subunit to the large subunit and cause hydrolysis of GTP to GDP, and release of the top of the tRNA, which allows it to swing into position (accommodation) to the catalytic site of the ribosome. This is where the tRNA contributes its amino acid to the growing protein chain. That chain, previously attached to the tRNA in the P site, now is attached to the tRNA in the A site. Now another GTP-binding protein comes in, EF-G (EEF2 in eukaryotes), which bumps the tRNA from the A site to the P site, and simultaneously the mRNA one codon ahead. This also releases whatever was in the E site of the ribosome and frees up the A site to accept another new tRNA.

See text for description. IC = initiation complex, CR = codon recognition complex, GA = GTPase activation complex, AC = accommodated complex. FRET = fluorescence resonance energy transfer. Head and shoulder refer to structural features of the small ribosomal subunit.

These researchers did both detailed structural studies of ribosomes stuck in various positions, and also mounted fluorescent labels at key sites in the P and A sites. These double labels allowed one to be flashed with light, (at its absorbance peak), and the energy to be transferred between them, resulting in fluorescence of light back out from the second fluorophore. The emitted energy from the second fluorophore provides an exquisitely sensitive measure of the distance between the two fluorophores, since its ability to capture light from the first fluorophore is sensitive to distance (cubed). The graph above (right) provides a trace of the fluorescence seen in one ribosomal cycle, as the distance between the two tRNAs changes slightly as the reaction proceeds and the two tRNAs come closer together. This technical method allows real-time analysis of the reaction as it is going along, especially one as slow as this one.

Structures of the ribosome accentuating the tRNA positions in the A, P, and E sites. Note how the green tRNA in the A site starts bent over towards the eEF1A GTPase (blue), as the decoding and quality control are going on, after which it is released and swings over next to the P site tRNA, ready for peptide bond formation. Note also how the structure of the anticodon-codon pairing (pink, bottom) evolves from loose and disordered to tight after the tRNA straightens up.

Above is shown a gross level view in stop-motion of ribosomal progress, achieved with various inhibitors and altered substrates. The mRNA is in pink (insets), and shows how the codon-anticodon match evolves from loose to tight. Note how at first only two bases of the mRNA are well-paired, while all three are paired later on. This reflects in a dim way the genetic code, which has redundancies in the third position for many amino acids, and is thought to have first had only two letters, before transitioning to three letters.

Higher detail on the structures of the tRNAs in the P site and the A site as they progress through the proof-reading phase of protein synthesis. The fluorescence probes are pictured, (Red and green dots), as is more the mRNA strand (pink).

These researchers have a great deal to say about the details of these structures- what differentiates the human from the E. coli ribosome, why the human one is slower and allows more time and more hindrance during the proof-reading step, thereby helping badly matched tRNAs to escape and increasing overall fidelity. For example, how does the GTPase eEF1A, docked to the large subunit, know when a match down at the codon-anticodon pair has been successful down in the small ribosomal subunit?

"Base pairing between the mRNA codon and the aa-tRNA anticodon stem loop (ASL) is verified through a network of ribosomal RNA (rRNA) and protein interactions within the SSU A site known as the decoding centre. Recognition of cognate aa-tRNA closes the SSU shoulder domain towards the SSU body and head domains. Consequent ternary complex engagement of the LSU GTPase-activating centre (GAC), including the catalytic sarcin-ricin loop12 (SRL), induces rearrangements in the GTPase, including switch-I and switch-II remodeling, that trigger GTP hydrolysis"

They note that there seem to be at least two proofreading steps, both in activating the eEF1A and also afterwards, during the large swing of the tRNA towards the P site. And they note novel rolling motions of the human ribosome compared with the bacterial ribosome, to help explain some of its distinctive proofreading abilities, which may be adjustable in humans by regulatory processes. Thus we are gaining ever more detailed window on the heart of this process, which is foundational to the origin of life, central to all cells, and not without medical implications, since many poisons that bacteria have devised attack the ribosome, and several of our current antibiotics do likewise.


Saturday, December 9, 2023

The Way We Were: Origins of Meiosis and Sex

Sex is as foundational for eukaryotes as are mitochondria and internal membranes. Why and how did it happen?

Sexual reproduction is a rather expensive proposition. The anxiety, the dating, the weddings- ugh! But biologically as well, having to find mates is no picnic for any species. Why do we bother, when bacteria get along just fine just dividing in two? This is a deep question in biology, with a lot of issues in play. And it turns out that bacteria do have quite a bit of something-like-sex: they exchange DNA with each other in small pieces, for similar reasons we do. But the eukaryotic form of sex is uniquely powerful and has supported the rapid evolution of eukaryotes to be by far the dominant domain of life on earth.

A major enemy of DNA-encoded life is mutation. Despite the many DNA replication accuracy and repair mechanisms, some rate of mutation still occurs, and is indeed essential for evolution. But for larger genomes, the mutation rate always exceeds the replication rate, (and the purifying natural selection rate), so that damaging mutations build up and the lineage will inevitably die out without some help. This process is called Muller's ratchet, and is why all organisms appear to exchange DNA with others in their environment, either sporadically like bacteria, or systematically, like eukaryotes.

An even worse enemy of the genome is unrepaired damage like complete (double strand) breaks in the DNA. These stop replication entirely, and are fatal. These also need to be repaired, and again, having extra copies of a genome is the way to allow these to be fixed, by processes like homologous recombination and gene conversion. So having access to other genomes has two crucial roles for organisms- allowing immediate repair, and allowing some way to sweep out deleterious mutations over the longer term.

Our ancestors, the archaea, which are distinct from bacteria, typically have circular, single molecule genomes, in multiple copies per cell, with frequent gene conversions among the copies and frequent exchange with other cells. They routinely have five to twenty copies of their genome, and can easily repair any immediate damage using those other copies. They do not hide mutant copies like we do in a recessive allele, but rather by gene conversion (which means, replicating parts of a chromosome into other ones, piecemeal) make each genome identical over time so that it (and the cell) is visible to selection, despite their polyploid condition. Similarly, taking in DNA from other, similar cells uses the target cells' status as live cells (also visible to selection) to insure that the recipients are getting high quality DNA that can repair their own defects or correct minor mutations. All this ensures that their progeny are all set up with viable genomes, instead of genomes riddled with defects. But it comes at various costs as well, such as a constant race between getting lethal mutation and finding the DNA that might repair it. 

Both mitosis and meiosis were eukaryotic innovations. In both, the chromosomes all line up for orderly segregation to descendants. But meiosis engages in two divisions, and features homolog synapsis and recombination before the first division of the parental homologs.

This is evidently a precursor to the process that led, very roughly 2.5 billion years ago, to eukaryotes, but is all done in a piecemeal basis, nothing like what we do now as eukaryotes. To get to that point, the following innovations needed to happen:

  • Linearized genomes, with centromeres and telomeres, and >1 number of chromosomes.
  • Mitosis to organize normal cellular division, where multiple chromosomes are systematically lined up and distributed 1:1 to daughter cells, using extensive cytoskeletal rearrangements and regulation.
  • Mating with cell fusion, where entire genomes are combined, recombined, and then reduced back to a single complement, and packaged into progeny cells.
  • Synapsis, as part of meiosis, where all sister homologs are lined up, damaged to initiate DNA repair and crossing-over.
  • Meiosis division one, where the now-recombined parental homologs are separated.
  • Meiosis division two, which largely follows the same mechanisms as mitosis, separating the reshuffled and recombined sister chromosomes.

This is a lot of novelty on the path to eukaryogenesis, and is just a portion of the many other innovations that happened in this lineage. What drove all this, and what were some plausible steps in the process? The advent of true sex generated several powerful effects:

  1. A definitive solution to Muller's ratchet, by exposing every locus in a systematic way to partial selection and sweeping out deleterious mutations, while protecting most members of the population from those same mutations. Continual recombination of the parental genomes allows beneficial mutations to separate from deleterious ones and be differentially preserved.
  2. Mutated alleles are partially, yet systematically, hidden as recessive alleles, allowing selection when they come into homozygous status, but also allowing them to exist for limited time to buffer the mutation rate and to generate new variation. This vastly increases accessible genetic variation.
  3. Full genome-length alignment and repair by crossing over is part of the process, correcting various kinds of damage and allowing accurate recombination across arbitrarily large genomes.
  4. Crossing over during meiotic synapsis mixes up the parental chromosomes, allowing true recombination among the parental genomes, beyond just the shuffling of the full-length chromosomes. This vastly increases the power of mating to sample genetic variation across the population, and generates what we think of as "species", which represent more or less closed interbreeding pools of genetic variants that are not clones but diverse individuals.

The time point of 2.5 billion years ago is significant because this is the general time of the great oxidation event, when cyanobacteria were finally producing enough oxygen by photosynthesis to alter the geology of earth. (However our current level of atmospheric oxygen did not come about until almost two billion years later, with rise of land plants.) While this mainly prompted the logic of acquiring mitochondria, either to detoxify oxygen or use it metabolically, some believe that it is relevant to the development of meiosis as well. 

There was a window of time when oxygen was present, but the ozone layer had not yet formed, possibly generating a particularly mutagenic environment of UV irradiation and reactive oxygen species. Such higher mutagenesis may have pressured the archaea mentioned above to get their act together- to not distribute their chromosomes so sporadically to offspring, to mate fully across their chromosomes, not just pieces of them, and to recombine / repair across those entire mated chromosomes. In this proposal, synapsis, as seen in meiosis I, had its origin in a repair process that solved the problem of large genomes under mutational load by aligning them more securely than previously. 

It is notable that one of the special enzymes of meiosis is Spo11, which induces the double-strand breaks that lead to crossing-over, recombination, and the chiasmata that hold the homologs together during the first division. This DNA damage happens at quite high rates all over the genome, and is programmed, via the structures of the synaptonemal complex, to favor crossing-over between (parental) homologs vs duplicate sister chromosomes. Such intensive repair, while now aimed at ensuring recombination, may have originally had other purposes.

Alternately, others suggest that it is larger genome size that motivated this innovation. This origin event involves many gene duplication events that ramified the capabilities of the symbiotic assemblage. Such gene dupilcations would naturally lead to recombinational errors in traditional gene conversion models of bacterial / archaeal genetic exchange, so there was pressure to generate a more accurate whole-genome alignment system that confined recombination to the precise homologs of genes, rather than to any similar relative that happened to be present. This led to the synapsis that currently is part of meiosis I, but it is also part of "parameiosis" systems on some eukaryotes, which, while clearly derived, might resemble primitive steps to full-blown meiosis.

It has long been apparent that the mechanisms of meiosis division one are largely derived from (or related to) the mechanisms used for mitosis, via gene duplications and regulatory tinkering. So these processes (mitosis and the two divisions of meiosis) are highly related and may have arisen as a package deal (along with linear chromosomes) during the long and murky road from the last archaeal ancestor and the last common eukaryotic ancestor, which possessed a much larger suite of additional innovations, from mitochondria to nuclei, mitosis, meiosis, cytoskeleton, introns / mRNA splicing, peroxisomes, other organelles, etc.  

Modeling of different mitotic/meiotic features. All cells modeled have 18 copies of a polypoid genome, with a newly evolved process of mitosis. Green = addition of crossing over / recombination of parental chromosomes, but no chromosome exchange. Red = chromosome exchange, but no crossing over. Blue = both crossing over and chromosome exchange, as occurs now in eukaryotes. The Y axis is fitness / survival and the X axis is time in generations after start of modeling.

A modeling paper points to the quantitative benefits of the mitosis when combined with the meiotic suite of innovations. They suggest that in a polyploid archaean lineage, the establishment of mitosis alone would have had revolutionary effects, ensuring accurate segregation of all the chromosomes, and that this would have enabled differentiation among those polyploid chromosome copies, since they would be each be faithfully transmitted individually to offspring (assuming all, instead of one, were replicated and transmitted). Thus they could develop into different chromosomes, rather than remain copies. This would, as above, encourage meiosis-like synapsis over the whole genome to align all the (highly similar) genes properly.

"Modeling suggests that mitosis (accurate segregation of sister chromosomes) immediately removes all long-term disadvantages of polyploidy."

Additional modeling of the meiotic features of chromosome shuffling, and recombination between parental chromosomes, indicates (shown above) that these are highly beneficial to long-term fitness, which can rise instead of decaying with time, per the various benefits of true sex as described above. 

The field has definitely not settled on one story of how meiosis (and mitosis) evolved, and these ideas and hypotheses are tentative at this point. But the accumulating findings that the archaea that most closely resemble the root of the eukaryotic (nuclear) tree have many of the needed ingredients, such as active cytoskeletons, a variety of molecular antecedents of ramified eukaryotic features, and now extensive polyploidy to go with gene conversion and DNA exchange with other cells, makes the momentous gap from archaea to eukaryotes somewhat narrower.


Saturday, December 2, 2023

Preliminary Pieces of AI

We already live in an AI world, and really, it isn't so bad.

It is odd to hear about all the hyperventilating about artificial intelligence of late. One would think it is a new thing, or some science-fiction-y entity. Then there are fears about the singularity and loss of control by humans. Count me a skeptic on all fronts. Man is, and remains, wolf to man. To take one example, we are contemplating the election of perhaps the dummbest person ever to hold the office of president. For the second time. How an intelligence, artificial or otherwise, is supposed to worm its way into power over us is not easy to understand, looking at nature of humans and of power. 

So let's take a step back and figure out what is going on, and where it is likely to take us. AI has become a catch-all for a diversity of computer methods, mostly characterized by being slightly better at doing things we have long wanted computers to do, like interpreting text, speech, and images. But I would offer that it should include much more- all the things we have computers do to manage information. In that sense, we have been living among shards of artificial intelligence for a very long time. We have become utterly dependent on databases, for instance, for our memory functions. Imagine having to chase down a bank balance or a news story, without access to the searchable memories that modern databases provide. They are breathtakingly superior to our own intelligence when it comes to the amount of things they can remember, the accuracy they can remember them, and the speed with which they can find them. The same goes for calculations of all sorts, and more recently, complex scientific math like solving atomic structures, creating wonderful CGI graphics, or predicting the weather. 

We should view AI as a cabinet filled with many different tools, just as our own bodies and minds are filled with many kinds of intelligence. The integration of our minds into a single consciousness tends to blind us to the diversity of what happens under the hood. While we may want gross measurements like "general intelligence", we also know increasingly that it (whatever "it" is, and whatever it fails to encompass of our many facets and talents) is composed of many functions that several decades of work in AI, computer science, and neuroscience have shown are far more complicated and difficult to replicate than the early AI pioneers imagined, once they got hold of their Turing machine with its infinite potential. 

Originally, we tended to imagine artificial intelligence as a robot- humanoid, slightly odd looking, but just like us in form and abilities. That was a natural consequence of our archetypes and narcissism. But AI is nothing like that, because full-on humanoid consciousness is an impossibly high bar, at least for the foreseeable future, and requires innumerable capabilities and forms of intelligence to be developed first. 

The autonomous car drama is a good example of this. It has taken every ounce of ingenuity and high tech to get to a reasonably passable vehicle, which is able to "understand" key components of the world around it. That a blob in front is a person, instead of a motorcycle, or that a light is a traffic light instead of a reflection of the sun. Just as our brain has a stepwise hierarchy of visual processing, we have recapitulated that evolution here by harnessing cameras in these cars (and lasers, etc.) to not just take in a flat visual scene, which by itself is meaningless, but to parse it into understandable units like ... other cars, crosswalks, buildings, bicylists, etc.. Visual scenes are very rich, and figuring out what is in them is a huge accomplishment. 

But is it intelligence? Yes, it certainly is a fragment of intelligence, but it isn't consciousness. Imagine how effortless this process is for us, and how effortful and constricted it is for an autonomous vehicle. We understand everything in a scene within a much wider model of the world, where everything relates to everything else. We evaluate and understand innumerable levels of our environment, from its chemical makeup to its social and political implications. Traffic cones do not freak us out. The bare obstacle course of getting around, such as in a vehicle, is a minor aspect, really, of this consciousness, and of our intelligence. Autonomous cars are barely up to the level of cockroaches, on balance, in overall intelligence.

The AI of text and language handling is similarly primitive. Despite the vast improvements in text translation and interpretation, the underlying models these mechanisms draw on are limited. Translation can be done without understanding text at all, merely by matching patterns from pre-digested pools of pre-translated text, regurgitated as cued by the input text. Siri-like spoken responses, on the other hand, do require some parsing of meaning out of the input, to decide what the topic and the question are. But the scope of these tools tend to be very limited, and the wider scope they are allowed, the more embarrassing their performance, since they are essentially scraping web sites and text pools for question-response patterns, instead of truly understanding the user's request or any field of knowledge.

Lastly, there are the generative ChatGPT style engines, which also regurgitate text patterns reformatted from public sources in response to topical requests. The ability to re-write a Wikipedia entry through a Shakespeare filter is amazing, but it is really the search / input functions that are most impressive- being able, like the Siri system, to parse through the user's request for all its key points. This betokens some degree of understanding, in the sense that the world of the machine (i.e. its database) is parceled up into topics that can be separately gathered and reshuffled into a response. This requires a pretty broad and structured ontological / classification system, which is one important part of intelligence.

Not only is there a diversity of forms of intelligence to be considered, but there is a vast diversity of expertise and knowledge to be learned. There are millions of jobs and professions, each with their own forms of knowledge. Back the early days of AI, we thought that expert systems could be instructed by experts, formalizing their expertise. But that turned out to be not just impractical, but impossible, since much of that expertise, formed out of years of study and experience, is implicit and unconscious. That is why apprenticeship among humans is so powerful, offering a combination of learning by watching and learning by doing. Can AI do that? Only if it gains several more kinds of intelligence including an ability to learn in very un-computer-ish ways.

This analysis has emphasized the diverse nature of intelligences, and the uneven, evolutionary development they have undergone. How close are we to a social intelligence that could understand people's motivations and empathise with them? Not very close at all. How close are we to a scientific intelligence that could find holes in the scholarly literature and manage a research enterprise to fill them? Not very close at all. So it is very early days in terms of anything that could properly be called artificial intelligence, even while bits and pieces have been with us for a long time. We may be in for fifty to a hundred more years of hearing every advance in computer science being billed as artificial intelligence.


Uneven development is going to continue to be the pattern, as we seize upon topics that seem interesting or economically rewarding, and do whatever the current technological frontier allows. Memory and calculation were the first to fall, being easily formalizable. Communication network management is similarly positioned. Game learning was next, followed by the Siri / Watson systems for question answering. Then came a frontal assault on language understanding, using the neural network systems, which discard the former expert system's obsession with grammar and rules, for much simpler statistical learning from large pools of text. This is where we are, far from fully understanding language, but highly capable in restricted areas. And the need for better AI is acute. There are great frontiers to realize in medical diagnosis and in the modeling of biological systems, to only name two fields close at hand that could benefit from a thoroughly systematic and capable artificial intelligence.

The problem is that world modeling, which is what languages implicitly stand for, is very complicated. We do not even know how to do this properly in principle, let alone having the mechanisms and scale to implement it. What we have in terms of expert systems and databases do not have the kind of richness or accessibility needed for a fluid and wide-ranging consciousness. Will neural nets get us there? Or ontological systems / databases? Or some combination? However it is done, full world modeling with the ability to learn continuously into those models are key capabilities needed for significant artificial intelligence.

After world modeling come other forms of intelligence like social / emotional intelligence and agency / management intelligence with motivation. I have no doubt that we will get to full machine consciousness at some point. The mechanisms of biological brains are just not sufficiently mysterious to think that they can not be replicated or improved upon. But we are nowhere near that yet, despite bandying about the word artificial intelligence. When we get there, we will have to pay special attention to the forms of motivation we implant, to mitigate the dangers of making beings who are even more malevolent than those that already exist... us.

Would that constitute some kind of "singularity"? I doubt it. Among humans there are already plenty of smart people and diversity, which result in niches for everyone having something useful to do. Technology has been replacing human labor forever, and will continue moving up the chain of capability. And when machines exceed the level of human intelligence, in some general sense, they will get all the difficult jobs. But the job of president? That will still go to a dolt, I have no doubt. Selection for some jobs is by criteria that artificial intelligence, no matter how astute, is not going to fulfill.

Risks? In the current environment, there are a plenty of risks, which are typically cases where technology has outrun our will to regulate its social harm. Fake information, thanks to the chatbots and image makers, can now flood the zone. But this is hardly a new phenomenon, and perhaps we need to get back to a position where we do not believe everything we read, in the National Enquirer or on the internet. The quality of our sources may become once again an important consideration, as they always should have been.

Another current risk is that the automation risks chaos. For example in the financial markets, the new technologies seem to calm the markets most of the time, arbitraging with relentless precision. But when things go out of bounds, flash breakdowns can happen, very destructively. The SEC has sifted through some past events of this kind and set up regulatory guard rails. But they will probably be perpetually behind the curve. Militaries are itching to use robots instead of pilots and soldiers, and to automate killing from afar. But ultimately, control of the military comes down to social power, which comes down to people of not necessarily great intelligence. 

The biggest risk from these machines is that of security. If we have our financial markets run by machine, or our medical system run by super-knowledgeable artificial intelligences, or our military by some panopticon neural net, or even just our electrical grid run by super-computers, the problem is not that they will turn against us of their own volition, but that some hacker somewhere will turn them against us. Countless hospitals have already faced ransomware attacks. This is a real problem, growing as machines become more capable and indispensable. If and when we make artificial people, we will need the same kind of surveillance and social control mechanisms over them that we do over everyone else, but with the added option of changing their programming. Again, powerful intelligences made for military purposes to kill our enemies are, by the reflexive property of all technology, prime targets for being turned against us. So just as we have locked up our nuclear weapons and managed to not have them fall into enemy hands (so far), similar safeguards would need to be put on similar powers arising from these newer technologies.

We may have been misled by the many AI and super-beings of science fiction, Nietzsche's Ãœbermensch, and similar archetypes. The point of Nietzsche's construction is moral, not intellectual or physical- a person who has thrown off all the moral boundaries of civilization, expecially Christian civilization. But that is a phantasm. The point of most societies is to allow the weak to band together to control the strong and malevolent. A society where the strong band together to enslave the weak.. well, that is surely a nightmare, and more unrealistic the more concentrated the power. We must simply hope that, given the ample time we have before truly comprehensive and superior artificial intelligent beings exist, we have exercised sufficient care in their construction, and in the health of our political institutions, to control them as we have many other potentially malevolent agents.


  • AI in chemistry.
  • AI to recognize cells in images.
  • Ayaan Hirsi Ali becomes Christian. "I ultimately found life without any spiritual solace unendurable."
  • The racism runs very deep.
  • An appreciation of Stephen J. Gould.
  • Forced arbitration against customers and employees is OK, but fines against frauds... not so much?
  • Oil production still going up.