Showing posts with label deep time. Show all posts
Showing posts with label deep time. Show all posts

Saturday, December 23, 2023

How Does Speciation Happen?

Niles Eldredge and the theory of punctuated equilibrium in evolution.

I have been enjoying "Eternal Ephemera", which is an end-of-career memoir/intellectual history from a leading theorist in paleontology and evolution, Niles Eldredge. In this genre, often of epic proportions and scope, the author takes stock of the historical setting of his or her work and tries to put it into the larger context of general intellectual progress, (yes, as pontifically as possible!), with maybe some gestures towards future developments. I wish more researchers would write such personal and deeply researched accounts, of which this one is a classic. It is a book that deserves to be in print and more widely read.

Eldredge's claim to fame is punctuated equilibrium, the theory (or, perhaps better, observation) that evolution occurs much more haltingly than in the majestic gradual progression that Darwin presented in "Origin of Species". This is an observation that comes straight out of the fossil record. And perhaps the major point of the book is that the earliest biologists, even before Darwin, but also including Darwin, knew about this aspect of the fossil record, and were thus led to concepts like catastrophism and "etagen". Only Lamarck had a steadfastly gradualist view of biological change, which Darwin eventually took up, while replacing Lamarck's mechanism of intentional/habitual change with that of natural selection. Eldridge unearths tantalizing and, to him, supremely frustrating, evidence that Darwin was fully aware of the static nature of most fossil series, and even recognized the probable mechanism behind it (speciation in remote, peripheral areas), only to discard it for what must have seemed a clearer, more sweeping theory. But along the way, the actual mechanism of speciation got somewhat lost on the shuffle.

Punctuated equilibrium observes that most species recognized in the fossil record do not gradually turn into their descendents, but are replaced by them. Eldredge's subject of choice is trilobites, which have a long and storied record for almost 300 million years, featuring replacement after replacement, with species averaging a few million years duration each. It is a simple fact, but one that is a bit hard to square with the traditional / Darwinian and even molecular account of evolution. DNA is supposed to act like a "clock", with constant mutational change through time. And natural selection likewise acts everywhere and always... so why the stasis exhibited by species, and why the apparently rapid evolution in between replacements? That is the conundrum of punctuated equilibrium.

There have been lot of trilobites. This comes from a paper about their origin during the Cambrian explosion, arguing that only about 20 million years was enough for their initial speciation (bottom of image).

The equilibrium part, also termed stasis, is seen in the current / recent world as well as in the fossil record. We see species such as horses, bison, and lions that are identical to those drawn in cave paintings. We see fossils of animals like wildebeest that are identical to those living, going back millions of years. And we see unusual species in recent fossils, like saber-toothed cats, that have gone extinct. We do not typically see animals that have transformed over recent geological history from one (morphological) species into another, or really, into anything very different at all. A million years ago, wildebeest seem to have split off a related species, the black wildebeest, and that is about it.

But this stasis is only apparent. Beneath the surface, mutations are constantly happening and piling up in the genome, and selection is relentlessly working to ... do something. But what? This is where the equilibrium part comes in, positing that wide-spread, successful species are so hemmed in by the diversity of ecologies they participate in that they occupy a very narrow adaptive peak, which selection works to keep the species on, resulting in apparent stasis. It is a very dynamic equilibrium. The constant gene flow among all parts of the population that keeps the species marching forward as one gene pool, despite the ecological variability, makes it impossible to adapt to new conditions that do not affect the whole range. Thus, paradoxically, the more successful the species, and the more prominent it is in the fossil record, the less change will be apparent in those fossils over time.

The punctuated part is that these static species in the fossil record eventually disappear and are replaced by other species that are typically similar, but not the same, and do not segue from the original in a gradual way that is visible in the fossil record. No, most species and locations show sudden replacement. How can this be so if evolution by natural selection is true? As above, wide-spread species are limited in what selection can do. Isolated populations, however, are more free to adapt to local conditions. And if one of those local conditions (such as arctic cold) happens to be what later happens to the whole range (such as an ice age), then it is more likely that a peripherally (pre-)adapted population will take over the whole range, than that the resident species adapts with sufficient speed to the new conditions. Range expansion, for the peripheral species, is easier and faster than adaptation, for the wide-ranging originating species.

The punctuated equilibrium proposition came out in the 1970's, and naturally followed theories of speciation by geographic separation that had previously come out (also resurrected from earlier ideas) in the 1930's to 1950's, but which had not made much impression (!) on paleontologists. Paleontologists are always grappling with the difficulties of the record, which is partial, and does not preserve a lot of what we would like to know, like behavior, ecological relationships, and mutational history. But they did come to agree that species stasis is a real thing, not just, as Darwin claimed, an artifact of the incomplete fossil record. Granted- if we had fossils of all the isolated and peripheral locations, which is where speciation would be taking place by this theory, we would see the gradual change and adaptation taking place. So there are gaps in the fossil record, in a way. But as long as we look at the dominant populations, we will rarely see speciation taking place before our eyes, in the fossils.

So what does a molecular biologist have to say about all this? As Darwin insisted early in "Origin", we can learn quite a bit from domesticated animals. It turns out that wild species have a great amount of mostly hidden genetic variation. This is apparent whenever one is domesticated and bred for desired traits. We have bred dogs, for example, to an astonishingly wide variety of traits. At the same time, we have bred them out to very low genetic diversity. Many breeds are saddled with genetic defects that can not be resolved without outbreeding. So we have in essence exchanged the vast hidden genetic diversity of a wild species for great visible diversity in the domesticated species, combined with low genetic diversity.

What this suggests is that wild species have great reservoirs of possible traits that can be selected for the purposes of adaptation under selective conditions. Which suggests that speciation in range edges and isolated environments can be very fast, as the punctuated part of punctuated equilibrium posits. And again, it reinforces the idea that during equilibrium with large populations and ranges, species have plenty of genetic resources to adapt and change, but spend those resources reinforcing / fine tuning their core ecological "franchise", as it were.

In population genetics, it is well known that mutations arise and fix (that is, spread to 100% of the population on both alleles) at the same rate no matter how large the population, in theory. That is to say- bigger populations generate more mutations, but correspondingly hide them better in recessive form (if deleterious) and for neutral mutations, take much longer to allow any individual mutation to drift to either extinction or fixation. Selection against deleterious mutations is more relentless in larger populations, while relaxed selection and higher drift can allow smaller populations to explore wider ranges of adaptive space, perhaps finding globally higher (fitness) peaks than the parent species could find.

Eldredge cites some molecular work that claims that at least twenty percent of sequence change in animal lineages is due specifically to punctuational events of speciation, and not to the gradual background accumulation of mutations. What could explain this? The actual mutation rate is not at issue, (though see here), but the numbers of mutations retained, perhaps due to relaxed purifying selection in small populations, and founder effects and positive selection during the speciation process. This kind of phenomenon also helps to explain why the DNA "clock" mentioned above is not at all regular, but quite variable, making an uneven guide to dating the past.

Humans are another good example. Our species is notoriously low in genetic diversity, compared to most wild species, including chimpanzees. It is evident that our extremely low population numbers (over prehistoric time) have facilitated speciation, (that is, the fixation of variants which might be swamped in bigger populations), which has resulted in a bewildering branching pattern of different hominid forms over the last few million years. That makes fossils hard to find, and speciation hard to pinpoint. But now that we have taken over the planet with a huge population, our bones will be found everywhere, and they will be largely static for the foreseeable future, as a successful, wide-spread species (barring engineered changes). 

I think this all adds up to a reasonably coherent theory that reconciles the rest of biology with the fossil record. However, it remains frustratingly abstract, given the nature of fossils that rarely yield up the branching events whose rich results they record.


Saturday, December 9, 2023

The Way We Were: Origins of Meiosis and Sex

Sex is as foundational for eukaryotes as are mitochondria and internal membranes. Why and how did it happen?

Sexual reproduction is a rather expensive proposition. The anxiety, the dating, the weddings- ugh! But biologically as well, having to find mates is no picnic for any species. Why do we bother, when bacteria get along just fine just dividing in two? This is a deep question in biology, with a lot of issues in play. And it turns out that bacteria do have quite a bit of something-like-sex: they exchange DNA with each other in small pieces, for similar reasons we do. But the eukaryotic form of sex is uniquely powerful and has supported the rapid evolution of eukaryotes to be by far the dominant domain of life on earth.

A major enemy of DNA-encoded life is mutation. Despite the many DNA replication accuracy and repair mechanisms, some rate of mutation still occurs, and is indeed essential for evolution. But for larger genomes, the mutation rate always exceeds the replication rate, (and the purifying natural selection rate), so that damaging mutations build up and the lineage will inevitably die out without some help. This process is called Muller's ratchet, and is why all organisms appear to exchange DNA with others in their environment, either sporadically like bacteria, or systematically, like eukaryotes.

An even worse enemy of the genome is unrepaired damage like complete (double strand) breaks in the DNA. These stop replication entirely, and are fatal. These also need to be repaired, and again, having extra copies of a genome is the way to allow these to be fixed, by processes like homologous recombination and gene conversion. So having access to other genomes has two crucial roles for organisms- allowing immediate repair, and allowing some way to sweep out deleterious mutations over the longer term.

Our ancestors, the archaea, which are distinct from bacteria, typically have circular, single molecule genomes, in multiple copies per cell, with frequent gene conversions among the copies and frequent exchange with other cells. They routinely have five to twenty copies of their genome, and can easily repair any immediate damage using those other copies. They do not hide mutant copies like we do in a recessive allele, but rather by gene conversion (which means, replicating parts of a chromosome into other ones, piecemeal) make each genome identical over time so that it (and the cell) is visible to selection, despite their polyploid condition. Similarly, taking in DNA from other, similar cells uses the target cells' status as live cells (also visible to selection) to insure that the recipients are getting high quality DNA that can repair their own defects or correct minor mutations. All this ensures that their progeny are all set up with viable genomes, instead of genomes riddled with defects. But it comes at various costs as well, such as a constant race between getting lethal mutation and finding the DNA that might repair it. 

Both mitosis and meiosis were eukaryotic innovations. In both, the chromosomes all line up for orderly segregation to descendants. But meiosis engages in two divisions, and features homolog synapsis and recombination before the first division of the parental homologs.

This is evidently a precursor to the process that led, very roughly 2.5 billion years ago, to eukaryotes, but is all done in a piecemeal basis, nothing like what we do now as eukaryotes. To get to that point, the following innovations needed to happen:

  • Linearized genomes, with centromeres and telomeres, and >1 number of chromosomes.
  • Mitosis to organize normal cellular division, where multiple chromosomes are systematically lined up and distributed 1:1 to daughter cells, using extensive cytoskeletal rearrangements and regulation.
  • Mating with cell fusion, where entire genomes are combined, recombined, and then reduced back to a single complement, and packaged into progeny cells.
  • Synapsis, as part of meiosis, where all sister homologs are lined up, damaged to initiate DNA repair and crossing-over.
  • Meiosis division one, where the now-recombined parental homologs are separated.
  • Meiosis division two, which largely follows the same mechanisms as mitosis, separating the reshuffled and recombined sister chromosomes.

This is a lot of novelty on the path to eukaryogenesis, and is just a portion of the many other innovations that happened in this lineage. What drove all this, and what were some plausible steps in the process? The advent of true sex generated several powerful effects:

  1. A definitive solution to Muller's ratchet, by exposing every locus in a systematic way to partial selection and sweeping out deleterious mutations, while protecting most members of the population from those same mutations. Continual recombination of the parental genomes allows beneficial mutations to separate from deleterious ones and be differentially preserved.
  2. Mutated alleles are partially, yet systematically, hidden as recessive alleles, allowing selection when they come into homozygous status, but also allowing them to exist for limited time to buffer the mutation rate and to generate new variation. This vastly increases accessible genetic variation.
  3. Full genome-length alignment and repair by crossing over is part of the process, correcting various kinds of damage and allowing accurate recombination across arbitrarily large genomes.
  4. Crossing over during meiotic synapsis mixes up the parental chromosomes, allowing true recombination among the parental genomes, beyond just the shuffling of the full-length chromosomes. This vastly increases the power of mating to sample genetic variation across the population, and generates what we think of as "species", which represent more or less closed interbreeding pools of genetic variants that are not clones but diverse individuals.

The time point of 2.5 billion years ago is significant because this is the general time of the great oxidation event, when cyanobacteria were finally producing enough oxygen by photosynthesis to alter the geology of earth. (However our current level of atmospheric oxygen did not come about until almost two billion years later, with rise of land plants.) While this mainly prompted the logic of acquiring mitochondria, either to detoxify oxygen or use it metabolically, some believe that it is relevant to the development of meiosis as well. 

There was a window of time when oxygen was present, but the ozone layer had not yet formed, possibly generating a particularly mutagenic environment of UV irradiation and reactive oxygen species. Such higher mutagenesis may have pressured the archaea mentioned above to get their act together- to not distribute their chromosomes so sporadically to offspring, to mate fully across their chromosomes, not just pieces of them, and to recombine / repair across those entire mated chromosomes. In this proposal, synapsis, as seen in meiosis I, had its origin in a repair process that solved the problem of large genomes under mutational load by aligning them more securely than previously. 

It is notable that one of the special enzymes of meiosis is Spo11, which induces the double-strand breaks that lead to crossing-over, recombination, and the chiasmata that hold the homologs together during the first division. This DNA damage happens at quite high rates all over the genome, and is programmed, via the structures of the synaptonemal complex, to favor crossing-over between (parental) homologs vs duplicate sister chromosomes. Such intensive repair, while now aimed at ensuring recombination, may have originally had other purposes.

Alternately, others suggest that it is larger genome size that motivated this innovation. This origin event involves many gene duplication events that ramified the capabilities of the symbiotic assemblage. Such gene dupilcations would naturally lead to recombinational errors in traditional gene conversion models of bacterial / archaeal genetic exchange, so there was pressure to generate a more accurate whole-genome alignment system that confined recombination to the precise homologs of genes, rather than to any similar relative that happened to be present. This led to the synapsis that currently is part of meiosis I, but it is also part of "parameiosis" systems on some eukaryotes, which, while clearly derived, might resemble primitive steps to full-blown meiosis.

It has long been apparent that the mechanisms of meiosis division one are largely derived from (or related to) the mechanisms used for mitosis, via gene duplications and regulatory tinkering. So these processes (mitosis and the two divisions of meiosis) are highly related and may have arisen as a package deal (along with linear chromosomes) during the long and murky road from the last archaeal ancestor and the last common eukaryotic ancestor, which possessed a much larger suite of additional innovations, from mitochondria to nuclei, mitosis, meiosis, cytoskeleton, introns / mRNA splicing, peroxisomes, other organelles, etc.  

Modeling of different mitotic/meiotic features. All cells modeled have 18 copies of a polypoid genome, with a newly evolved process of mitosis. Green = addition of crossing over / recombination of parental chromosomes, but no chromosome exchange. Red = chromosome exchange, but no crossing over. Blue = both crossing over and chromosome exchange, as occurs now in eukaryotes. The Y axis is fitness / survival and the X axis is time in generations after start of modeling.

A modeling paper points to the quantitative benefits of the mitosis when combined with the meiotic suite of innovations. They suggest that in a polyploid archaean lineage, the establishment of mitosis alone would have had revolutionary effects, ensuring accurate segregation of all the chromosomes, and that this would have enabled differentiation among those polyploid chromosome copies, since they would be each be faithfully transmitted individually to offspring (assuming all, instead of one, were replicated and transmitted). Thus they could develop into different chromosomes, rather than remain copies. This would, as above, encourage meiosis-like synapsis over the whole genome to align all the (highly similar) genes properly.

"Modeling suggests that mitosis (accurate segregation of sister chromosomes) immediately removes all long-term disadvantages of polyploidy."

Additional modeling of the meiotic features of chromosome shuffling, and recombination between parental chromosomes, indicates (shown above) that these are highly beneficial to long-term fitness, which can rise instead of decaying with time, per the various benefits of true sex as described above. 

The field has definitely not settled on one story of how meiosis (and mitosis) evolved, and these ideas and hypotheses are tentative at this point. But the accumulating findings that the archaea that most closely resemble the root of the eukaryotic (nuclear) tree have many of the needed ingredients, such as active cytoskeletons, a variety of molecular antecedents of ramified eukaryotic features, and now extensive polyploidy to go with gene conversion and DNA exchange with other cells, makes the momentous gap from archaea to eukaryotes somewhat narrower.


Saturday, November 25, 2023

Are Archaea Archaic?

It remains controversial whether the archaeal domain of life is 1 or 4.5 billion years old. That is a big difference!

Back in the 1970's, the nascent technologies of molecular analysis and DNA sequencing produced a big surprise- that hidden in the bogs and hot springs of the world are micro-organisms so extremely different from known bacteria and protists that they were given their own domain on the tree of life. These are now called the archaea, and in addition to being deeply different from bacteria, they were eventually found to be the progenitors of eukaryotic cell- the third (and greatest!) domain of life that arose later in the history of the biosphere. The archaeal cell contributed most of the nuclear, informational, membrane management, and cytoskeletal functions, while one or more assimilated bacteria (most prominently the future mitochondrion and chloroplast) contributed most of the metabolic functions, as well as membrane lipid synthesis and peroxisomal functions.

Carl Woese, who discovered and named archaea, put his thumb heavily on the scale with that name, (originally archaebacteria), suggesting that these new cells were not just an independent domain of life, totally distinct from bacteria, but were perhaps the original cell- that is, the LUCA, or last universal common ancestor. All this was based on the sequences of rRNA genes, which form the structural and catalytic core of the ribosome, and are conserved in all known life. But it has since become apparent that sequences of this kind, which were originally touted as "molecular clocks", or even "chronometers" are nothing of the kind. They bear the traces of mutations that happen along the way, and, being highly important and conserved, do not track the raw mutation rate, (which itself is not so uniform either), but rather the rate at which change is tolerated by natural selection. And this rate can be wildly different at different times, as lineages go through crises, bottlenecks, adaptive radiations, and whatever else happened in the far, far distant past.

Carl Woese, looking over filmed spots of 32P labeled ribosomal RNA from different species, after size separation by electrophoresis. This is how RNAs were analyzed, back in 1976, and such rough analysis already suggested that archaea were something very different from bacteria.

There since has been a tremendous amount of speculation, re-analysis, gathering of more data, and vitriol in the overall debate about the deep divergences in evolution, such as where eukaryotes come from, and where the archaea fit into the overall scheme. Compared with the rest of molecular biology, where experiments routinely address questions productively and efficiently due to a rich tool chest and immediate access to the subject at hand, deep phylogeny is far more speculative and prone to subjective interpretation, sketchy data, personal hobbyhorses, and abusive writing. A recent symposium in honor of one of its more argumentative practitioners made that clear, as his ideas were being discarded virtually at the graveside.

Over the last decade, estimates of the branching date of archaea from the rest of the tree of life have varied from 0.8 to 4.5 Gya (billion years ago). That is a tremendous range, and is a sign of the difficulty of this field. The frustrations of doing molecular phylogeny are legion, just as the temptations are alluring. Firstly, there are very few landmarks in the fossil record to pin all this down. There are stromatolites from roughly 3.5 Gya, which pin down the first documented life of any kind. Second are eukaryotic fossils, which start, at the earliest, about 1.5 Gya. Other microbial fossils pin down occasional sub-groups of bacteria, but archaea are not represented in the fossil record at all, being hardly distinguishable from bacteria in their remains. Then we get the Cambrian explosion of multicellular life, roughly 0.5 Gya. That is pretty much it for the fossil record, aside from the age of the moon, which is about 4.5 Gya and gives us the baseline of when the earth became geologically capable of supporting life of any kind.

The molecules of living organisms, however, form a digital record of history. Following evolutionary theory, each organism descends from others, and carries, in mutated and altered form, traces of that history. We have parts of our genomes that vary with each generation, (useful for forensics and personal identification), we have other parts that show how we changed and evolved from other apes, and we have yet other areas that vary hardly at all- that carry recognizable sequences shared with all other forms of life, and presumably with LUCA. This is a real treasure trove, if only we can make sense of it.

But therein lies the rub. As mentioned above, these deeply conserved sequences are hardly chronometers. So for all the data collection and computer wizardry, the data itself tells a mangled story. Rapid evolution in one lineage can make it look much older than it really is, confounding the whole tree. Over the years, practitioners have learned to be as judicious as possible in selecting target sequences, while getting as many as possible into the mix. For example, adding up the sequences of 50-odd ribosomal proteins can give more and better data than assembling the 2 long-ish ribosomal RNAs. They provide more and more diverse data. But they have their problems as well, since some are much less conserved than others, and some were lost or gained along the way. 

A partisan of the later birth of archaea provides a phylogenetic tree with countless microbial species, and one bold claim: "inflated" distances to the archaeal and eukaryotic stems. This is given as the reason that archaea (lower part of the diagram, including eukaryotes, termed "archaebacteria"), looks very ancient, but really just sped away from its originating bacterial parent, (the red bacteria), estimated at about 1 Gya. This tree is based on an aligned concatentation of 26 universally conserved ribosomal protein sequences, (51 from eukaryotes), with custom adjustments.

So there has been a camp that claims that the huge apparent / molecular distance between the archaea and other cells is just such a chimera of fast evolution. Just as the revolution that led to the eukaryotic cell involved alot of molecular change including the co-habitation of countless proteins that had never seen each other before, duplications / specializations, and many novel inventions, whatever process led to the archaeal cell (from a pre-existing bacterial cell) might also have caused the key molecules we use to look into this deep time to mutate much more rapidly than is true elsewhere in the vast tree of life. What are the reasons? There is the general disbelief / unwillingness to accept someone else's work, and evidence like possible horizontal transfers of genes from chloroplasts to basal archaea, some large sequence deletion features that can be tracked through these lineages and interpreted to support late origination, some papering over of substantial differences in membrane and metabolic systems, and there are plausible (via some tortured logic) candidates for an originating, and late-evolving, bacterial parent. 

This thread of argument puts the origin of eukaryotes roughly at 0.8 Gya, which is, frankly, uncomfortably close to the origination of multicellular life, and gives precious little time for the bulk of eukaryotic diversity to develop, which exists largely, as shown above, at the microbial level. (Note that "Animalia" in the tree above is a tiny red blip among the eukaryotes.) All this is quite implausible, even to a casual reader, and makes this project hard to take seriously, despite its insistent and voluminous documentation.

Parenthetically, there was a fascinating paper that used the evolution of the genetic code itself to make a related point, though without absolute time attributions. The code bears hallmarks of some amino acids being added relatively late (tryptophan, histidine), while others were foundational from the start (glycine, alanine), when it may have consisted of two RNA bases (or even one) rather than three. All of this took place long before LUCA, naturally. This broad analysis of genetic code usage argued that bacteria tend to use a more ancient subset of the code, which may reflect their significantly more ancient position on the tree of life. While the full code was certainly in place by the time of LUCA, there may still at this time have been, in the inherited genome / pool of proteins, a bias against the relatively novel amino acids. This finding implies that the time of archaeal origination was later than the origination of bacteria, by some unspecified but significant amount.

So, attractive as it would be to demote the archaea from their perch as super-ancient organisms, given their small sizes, small genomes, specialization in extreme environments, and peripheral ecological position relative to bacteria, that turns out to be difficult to do. I will turn, then, to a very recent paper that gives what I think is much more reasoned and plausible picture of the deeper levels of the tree of life, and the best general picture to date. This paper is based on the protein sequences of the rotary ATPases that are universal, and were present in LUCA, despite their significant complexity. Indeed, the more we learn about LUCA, the more complete and complex this ancestor turns out to be. Our mitochondrion uses a (bacterial) F-type ATPase to synthesize ATP from the food-derived proton gradient. Our lysosomes use a (archaeal) V-type ATPase to drive protons into / acidify the lysosome in exchange for ATP. These are related, derived from one distant ancestor, and apparently each was likely to have been present in LUCA. Additionally, each ATPase is composed of two types of subunits, one catalytic, and one non-catalytic, which originated from an ancient protein duplication, also prior to LUCA. The availability of these molecular cousins / duplications provides helpful points of comparison throughout, particularly for locating the root of the evolutionary tree.

Phylogenetic trees based on ATP synthase enzymes that are present in all forms of life. On left is shown the general tree, with branch points of key events / lineages. On right are shown sub-trees for the major types of the ATP synthase, whether catalytic subunit (c), non-catalytic (n), F-type, common in bacteria, or V type, common in archaea. Note how congruent these trees are. At bottom right in the tiny print is a guide to absolute time, and the various last common ancestors.

This paper also works quite hard to pin the molecular data to the fossil and absolute time record, which is not always provided The bottom line is that archaea by this tree arise quite early, (see above), co-incident with or within about 0.5 Gy of LUCA, which was bacterial, at roughly 4.4 Gya. The bacterial and archaeal last common ancestors are dated to 4.3 and 3.7 Gya, respectively. The (fused) eukaryotic last common ancestor dates to about 1.9 Gya, with the proto-mitochondrion's individual last common ancestor among the bacteria some time before that, at roughly 2.4 Gya. 

This time line makes sense on many fronts. First, it provides a realistic time frame for the formation and diversification of eukaryotes. It puts their origin right around the great oxidation event, which is when oxygen became dominant in earth's atmosphere, (about 2 to 2.4 Gya), which was a precondition for the usefulness of mitochondria to what are otherwise anaerobic archaeal cells. It places the origin of archaea (LACA) a substantial stretch after the origin of bacteria, which agrees with the critic's points above that bacteria are the truly basal lineage of all life, and archaea, while highly different and pretty archaic, also share a lot of characteristics with bacteria, and perhaps more so with certain early lineages than with others that came later. The distinction between LUCA and the last common bacterial ancestor (LBCA) is a technical one given the trees they were working from, and are not, given the ranges of age presented, (see figure above), significantly different.

I believe this field is settling down, and though this paper, working from only a subset of the most ancient sequences plus fossil set-points, is hardly the last word, it appears to represent a consensus view and is the best picture to date of the deepest and most significant waypoints in the deep history of life. This is what comes from looking through microscopes, and finding entire invisible worlds that we had no idea existed. Genetic sequencing is another level over that of microscopy, looking right at life's code, and at its history, if darkly. What we see in the macroscopic world around us is only the latest act in a drama of tremendous scale and antiquity.


Sunday, November 12, 2023

Missing Links in Eukaryotic Evolution

The things you find in Slovenian mud! Like an archaeal cell that is the closest thing to the eukaryotic root organism.

Creationists and "intelligent" design advocates tirelessly point to the fossil record. Not how orderly it is and revealing of the astonishingly sequenced, slow, and relentless elaboration of life. No, they decry its gaps- places where fossils do not account for major evolutionary (er, designed) transitions to more modern forms. It is a sad kind of argument, lacking in imagination and dishonest in its unfairness and hypocrisy. Does the life of Jesus have gaps in the historical record? Sure enough! And are those historical records anywhere near as concrete and informative as fossils? No way. What we have as a record of Christianity's history is riven with fantasy, forgery, and uncertainty.

But enough trash talk. One thing that science has going for it is a relentlessly accumulating process by which new fossils appear, and new data from other sources, like newly found organisms and newly sequenced genomes, arise to clarify what were only imaginative (if reasonable) hypotheses previously. Darwin's theory of evolution, convincing and elegantly argued as it was originally, has gained such evidence without fail over the subsequent century and a half, from discoveries of the age of the earth (and thus the solar system) to the mechanics of genetic inheritance.

A recent paper describes the occurence of cytoskeletal proteins and structures in an organism that is neither a bacterium nor a eukaryote, but appears to be within the family of Archaea that is the closest thing we have to the eukaryotic progenitor. These are the Asgard Archaea, a family that was discovered only in the last decade, as massive environmental sequencing projects have sampled the vast genetic diversity hidden in the muds, sediments, soils, rocks, and waters of the world. 

Sampling stray DNA is one thing, but studying these organisms in depth requires growing them in the lab. After trolling through the same muds in Slovenia where promising DNA sequences were fond, this group fished out, and then carefully cultured, a novel archaeal cell. But growing these cells is notoriously difficult. They are anaerobic, never having made the transition to the oxygenated atmosphere of the later earth. They have finicky nutritional requirements. They grow very slowly. And they generally have to live with other organisms (bacteria) with which they have reciprocal metabolic relationships. In the ur-eukaryote, this was a relationship with the proto-mitochondrion, which was later internalized. For the species cultured by this research group, it is a pair of other free-living bacteria. One is related to sulfur-reducing Desulfovibrio, and the other one is related to a simpler archaeal Methanogenium that uses hydrogen and CO2 or related simple carbon compounds to make methane. Anaerobic Asgard archaea generally have relatively simple metabolisms and make hydrogen from small organic compounds, through a kind of fermentation.

A phylogenetic tree showing relations between the newly found organisms (bottom) and eukaryotes (orange), other archaea, and the entirely separate domain of bacteria (red). This is based on a set of sequences of universally used / conserved ribosomal proteins. While the eukaryotes have strayed far from the root, that root is extremely close to some archaeal groups.

Micrographs of cultured lokiarchaeal cells, with a scale bar of 500 nanometers. These are rather amoeboid cells with extensive cytoskeletal and membrane regulation.

Another micrograph of part of a lokiarchaeal cell, showing not just its whacky shape, but a good bit of internal structure as well. The main scale bar is 100 nanometers. There are internal actin filaments (yellow arrowheads), lined up ribosomes (gray arrowhead) and cell surface proteins of some kind (blue arrowheads).

What they found after all this was pretty astonishing. They found cells that are quite unlike typical bacterial or even archaeal cells, which are compact round or rod shapes. These (termed lokiarchaeal) cells have luxurious processes extending all over the place, and a profusion of internal structural elements reminiscent of eukaryotic cells, though without membrane-bound internal organelles. But they have membrane-bound protrusions and what look like vesicles budding off. At only six million base pairs (compared to our three billion) and under five thousand genes, these cells have a small and streamlined genome. Yet there are a large number (i.e. 258) of eukaryotic-related (signature) proteins (outlined below), particularly concerning cytoskeletal and membrane trafficking. The researchers delved into the subcellular structures, labeling actin and obtaining structural data for both actin and ribosomes, confirming their archaeal affinity with added features. 

A schematic of eukaryotic-like proteins in the newly cultured lokiarchaeal Asgard genome. Comparison (blue) is to a closely related organism isolated recently in Japan.


This work is the first time that the cytoskeleton of Asgard cells has been visualized, along with its role in their amoeboid capabilities. What is it used for? That remains unknown. The lush protrusions may collaborate with this organism's metabolic partners, or be used for sensing and locomoting to find new food within its sediment habitat, or for interacting with fellow lokiarchaeal cells, as shown above. Or all of these roles. Evolutionarily, this organism, while modern, appears to be a descendent of the closest thing we have to the missing link at the origin of eukaryotes, (that is, the archaeal dominant partner of the founding symbiosis), and in that sense seems both ancient in its characteristics, and possibly little changed from that time. Who would have expected such a thing? Well, molecular biologists and evolutionary biologists have been expecting it for a long time.


  • Fossil fuel consumption is still going up, not down.

Sunday, August 27, 2023

Better Red Than Dead

Some cyanobacteria strain for photosynthetic efficiency at the red end of the light spectrum.

The plant world is green around us- why green, and not some other color, like, say, black? That plants are green means that they are letting green light through (or out by reflection), giving up some energy. Chlorophyll absorbs both red light and blue light, but not green, though all are near the peak of solar output. Some accessory pigments within the light-gathering antenna complexes can extend the range of wavelenghts absorbed, but clearly a fair amount of green light gets through. A recent theory suggests that this use of two separated bands of light is an optimal solution to stabilize power output. At any rate, it is not just the green light- the extra energy of the blue light is also thrown away as heat- its excitation is allowed to decay to the red level of excitation, within the antenna complex of chlorophyll molecules, since the only excited state used in photosynthesis is that at ~690 nm. This forms a uniform common denominator for all incoming light energy that then induces charge separation at the oxygen reaction center, (stripping water of electrons and protons), and sends newly energized electrons out to quinone molecules and on into the biosynthetic apparatus.

The solar output, which plants have to work with.

Fine. But what if you live deeper in the water, or in the veins of a rock, or in a mossy, shady nook? What if all you have access to is deeper red light, like at 720 nm, with lower energy than the standard input? In that case, you might want to re-engineer your version of photosynthesis to get by with slightly lower-energy light, while getting the same end results of oxygen splitting and carbon fixation. A few cyanobacteria (the same bacterial lineage that pioneered chlorophyll and the standard photosynthesis we know so well) have done just that, and a recent paper discusses the tradeoffs involved, which are of two different types.

The chlorophylls with respective absorption spectra and partial structures. Redder light is toward the right. Chlorophyll a is one used most widely in plants and cyanobacteria. Chlorophyll b is also widely used in these organisms as an additional antenna pigment that extends the range of absorbed light. Chlorophylls d and f are red-shifted and used in specialized species discussed here. 

One of the species, Chroococcidiopsis thermalis, is able to switch states, from bright/white light absorbtion with normal array of pigments, to a second state where it expresses chlorophylls d and f, which absorb light at the lower energy 720 nm, in the far red. This "facultative" ability means that it can optimize the low-light state without much regard to efficiency or photo-damage protection, which it can address by switching back to the high energy wavelength pigment system. The other species is Acaryochloris marina, which has no bright light system, but only chlorophyll d. This bacterium lives inside the cells of bigger red algae, so has a relatively stable, if shaded, environment to deal with.

What these and prior researchers found was that the ultimate quantum energy used to split water to O2, and to send energized electrons off the photosystem I and carbon compound synthesis, is the same as in any other chlorophyll a-using system. The energetics of those parts of the system apparently can not be changed. The shortfall needs to be made up in the front end, where there is a sharp drop in energy from that absorbed- 1.82 electron volts (eV) from photons at 680 nm (but only 1.72 eV from far-red photons)- and that needed at the next points in the electron transport chains (about 1.0 eV). This difference plays a large role in directing those electrons to where the plant wants them to go- down the gradient to the oxygen-evolving center, and to the quinones that ferry energized electrons to other synthetic centers. While it seems like more waste, a smaller difference allows the energized electrons to go astray, forming chemical radicals and other products dangerous to the cell. 

Summary diagram, described in text. Energy levels are described for photon excitation of chlorophyll (Chl, left axis, and energy transitions through the reaction center (Phe- pheophytin), and quinones (Q) that conduct energized electrons out to the other photosynthetic center and biosynthesis. On top are shown the respective system types- normal chlorophyll a from white-light adapted C. thermalis, chlorophyll d in A. marina, and chlorophyll f in red-adapted C. thermalis. 

What these researchers summarize in the end is that both of the red light-using cyanobacteria squeeze this middle zone of the power gradient in different ways. There is an intermediate event in the trail from photon-induced electron excitation to the outgoing quinone (+ electron) and O2 that is the target of all the antenna chlorophylls- the photosynthetic reaction center. This typically has chlorophyll a (called P680) and pheophytin, a chlorophyll-like molecule. It is at this chlorophyll a molecule that the key step takes place- the excitation energy (an electron bumped to a higher energy level) conducted in from the antenna of ~30 other chlorophylls pops out its excited electron, which flits over to the pheophytin, then thence to the carrier quinone molecules and photosystem I. Simultaneously, an electron comes in to replace it from the oxygen-evolving center, which receives alternate units of photon energy, also from the chlorophyll/pheophytin reaction center. The figure above describes these steps in energetic terms, from the original excited state, to the pheophytin (Phe-, loss of 0.16 eV) to the exiting quinone state (Qa-, loss of 0.385 eV). In the organisms discussed here, chlorophyll d replaces a at this center, and since its structure is different and absorbance is different, its energized electron is about 0.1 eV less energetic. 

In A. marina, (center in the diagram above), the energy gap between the pheophytin and the quinone is squeezed, losing about 0.06 eV. This has the effect of losing some of the downward "slope" on the energy landscape that prevents side reactions. Since A. marina has no choice but to use this lower energy system, it needs all the efficiency it can get, in terms of the transfer from chlorophyll to pheopytin. But it then sacrifices some driving force from the next step to the quinone. This has the ultimate effect of raising damage levels and side reactions when faced with more intense light. However, given its typically stable and symbiotic life style, that is a reasonable tradeoff.

On the other hand, C. thermalis (right-most in the diagram above) uses its chlorophyll d/f system on an optional basis when the light is bad. So it can give up some efficiency (in driving pheophytin electron acceptance) for better damage control. It has dramatically squeezed the gap between chlorophyll and pheophytin, from 0.16 eV to 0.08 eV, while keeping the main pheophytin-to-quinone gap unchanged. This has the effect of keeping the pumping of electrons out to the quinones in good condition, with low side-effect damage, but restricts overall efficiency, slowing the rate of excitation transfer to pheophytin, which affects not only the quinone-mediated path of energy to photosystem I, but also the path to the oxygen evolving center. The authors mention that this cyanobacterium recovers some efficiency by making extra light-harvesting pigments that provide more inputs, under these low / far-red light conditions.

The methods used to study all this were mostly based on fluorescence, which emerges from the photosynthetic system when electrons fall back from their excited states. A variety of inhibitors have been developed to prevent electron transfer, such as to the quinones, which bottles up the system and causes increased fluorescence and thermoluminescence, whose wavelengths reveal the energy gaps causing them. Thus it is natural, though also impressive, that light provides such an incisive and precise tool to study this light-driven system. There has been much talk that these far red-adapted photosynthetic organisms validate the possibility of life around dim stars, including red dwarves. But obviously these particular systems developed evolutionarily out of the dominant chlorophyll a-based system, so wouldn't provide a direct path. There are other chlorophyll systems in bacteria, however, and systems that predate the use of oxygen as the electron source, so there are doubtless many ways to skin this cat.


  • Maybe humiliating Russia would not be such a bad thing.
  • Republicans might benefit from reading the Federalist Papers.
  • Fanny Willis schools Meadows on the Hatch act.
  • "The top 1% of households are responsible for more emissions (15-17%) than the lower earning half of American households put together (14% of national emissions)."

Saturday, April 1, 2023

Consciousness and the Secret Life of Plants

Could plants be conscious? What are the limits of consciousness and pain? 

Scientific American recently reviewed a book titled "Planta Sapiens". The title gives it all away, and the review was quite positive, with statements like: 

"Our senses can not grasp the rich communicative world of plants. We therefore lack language to describe the 'intelligence' of a root tip in conversation with the microbial life of the soil or the 'cognition' that emerges when chemical whispers ripple through a lacework of leaf cells."

This is provocative indeed! What if plants really do have a secret life and suffer pain with our every bite and swing of the scythe? What of our vaunted morals and ethics then?

I am afraid that I take a skeptical view of this kind of thing, so let's go through some of the aspects of consciousness, and ask how widespread it really is. One traditional view, from the ur-scientific types like Descartes, is that only humans have consciousness, and all other creatures, have at best a mechanism, unfeeling and mechanical, that may look like consciousness, but isn't. This, continued in a sense by B. F. Skinner in the 20th century, is a statement from ignorance. We can not fully communicate with animals, so we can not really participate in what looks like their consciousness, so let's just ignore it. This position has the added dividend of supporting our unethical treatment of animals, which was an enormous convenience, and remains the core position of capitalism generally, regarding farm animals (though its view of humans is hardly more generous).

Well, this view is totally untenable, from our experience of animals, our ability to indeed communicate with them to various degrees, to see them dreaming, not to mention from an evolutionary standpoint. Our consciousness did not arise from nothing, after all. So I think we can agree that mammals can all be included in the community of conscious fellow-beings on the planet. It is clear that the range of conscious pre-occupations can vary tremendously, but whenever we have looked at the workings of memory, attention, vision, and other components assumed to be part of or contributors to conscious awareness, they all exist in mammals, at least. 

But what about other animals like insects, jellyfish, or bacteria? Here we will need a deeper look at the principles in play. As far as we understand it, consciousness is an activity that binds various senses and models of the world into an experience. It should be distinguished from responsiveness to stimuli. A thermostat is responsive. A bacterium is responsive. That does not constitute consciousness. Bacteria are highly responsive to chemical gradients in their environment, to food sources, to the pheromones of fellow bacteria. They appear to have some amount of sensibility and will. But we can not say that they have experience in the sense of a conscious experience, even if they integrate a lot of stimuli into a holistic and sensitive approach to their environment. 


The same is true of our own cells, naturally. They also are highly responsive on an individual basis, working hard to figure out what the bloodstream is bringing them in terms of food, immune signals, pathogens, etc. Could each of our cells be conscious? I would doubt it, because their responsiveness is mechanistic, rather than being an independent as well as integrated model of their world. Simlarly, if we are under anaesthesia and a surgeon cuts off a leg, is that leg conscious? It has countless nerve cells, and sensory apparatus, but it does not represent anything about its world. It rather is built to send all these signals to a modeling system elsewhere, i.e. our brain, which is where consciousness happens, and where (conscious) pain happens as well.

So I think the bottom line is that consciousness is rather widely shared as a property of brains, thus of organisms with brains, which were devised over evolutionary time to provide the kind of integrated experience that a neural net can not supply. Jellyfish, for instance, have neural nets that feel pain, respond to food and mates, and swim exquisitely. They are highly responsive, but, I would argue, not conscious. On the other hand, insects have brains and would count as conscious, even though their level of consciousness might be very primitive. Honey bees map out their world, navigate about, select the delicacies they want from plants, and go home to a highly organized hive. They also remember experiences and learn from them.

This all makes it highly unlikely that consciousness is present in quantum phenomena, in rocks, in bacteria, or in plants. They just do not have the machinery it takes to feel something as an integrated and meaningful experience. Where exactly the line is between highly responsive and conscious is probably not sharply defined. There are brains that are exceedingly small, and neural nets that are very rich. But it is also clear that it doesn't take consciousness to experience pain or try to avoid it, (which plants, bacteria, and jellyfish all do). Where is the limit of ethical care, if our criterion shifts from consciousness to pain? Wasn't our amputated leg in pain after the operation above, and didn't we callously ignore its feelings? 

I would suggest that the limit remains that of consciousness, not that of responsiveness to pain. Pain is not problematic because of a reflex reaction. The doctor can tap our knee as often as he wants, perhaps causing pain to our tendon, but not to our consciousness. Pain is problematic because of suffering, which is a conscious construct built around memory, expectations, and models of how things "should" be. While one can easily see that a plant might have certain positive (light, air, water) and negative (herbivores, fungi) stimuli that shape its intrinsic responses to the environment, these are all reflexive, not reflective, and so do not appear (to an admittedly biased observer) to constitute suffering that rises to ethical consideration.

Saturday, March 11, 2023

An Origin Story for Spider Venom

Phylogenetic analysis shows that the major component of spider venom derives from one ancient ancestor.

One reason why biologists are so fully committed to the Darwinian account of natural selection and evolution is that it keeps explaining and organizing what we see. Despite the almost incredible diversity and complexity of life, every close look keeps confirming what Darwin sensed and outlined so long ago. In the modern era, biology has gone through the "Modern Synthesis", bringing genetics, molecular biology, and evolutionary theory into alignment with mutually supporting data and theories. For example, it was Linus Pauling and colleagues (after they lost the race to determine the structure of DNA) who proposed that the composition of proteins (hemoglobin, in their case) could be used to estimate evolutionary relationships, both among those molecules, and among their host species.

Naturally, these methods have become vastly more powerful, to the point that most phylogenetic analyses of the relationship between species (including the definition of what species are, vs subspecies, hybrids, etc.) are led these days by DNA analysis, which provides the richest possible trove of differentiating characters- a vast spectrum from universally conserved to highly (and forensically) varying. And, naturally, it also constitutes a record of the mutational steps that make up the evolutionary process. The correlation of such analyses with other traditionally used diagnostic characters, and with the paleontological record, is a huge area of productive science, which leads, again and again, to new revelations about life's history.


One sample structure of a DRP- the disulfide rich protein that makes up most of spider venoms.
 The disulfide bond (between two cysteines) is shown in red. There is usually another disulfide helping to hold the two halves of the molecule together as well. The rest of the molecule is (evolutionarily, and structurally) free to change shape and character, in order to carry out its neuron-channel blocking or other toxic function.

One small example was published recently, in a study of spider venoms. Spiders arose, from current estimates, about 375 million years ago, and comprise the second most prevalent form of animal life, second only to their cousins, the insects. They generally have a hunting lifestyle, using venom to immobilize their prey, after capture and before digestion. These venoms are highly complex brews that can have over a hundred distinct molecules, including potassium, acids, tissue- and membrane-digesting enzymes, nucleosides, pore-forming peptides, and neurotoxins. At over three-fourths of the venom, the protein-based neurotoxins are the most interesting and best studied of the venom components, and a spider typically deploys dozens of types in its venom. They are also called cysteine-rich peptides or disulfide-rich peptides (DRPs) due to their composition. The fact that spiders tend to each have a large variety of these DRPs in their collection argues that a lot of gene duplication and diversification has occured.

A general phylogenetic tree of spiders (left). On the right are the signal peptides of a variety of venoms from some of these species. The identity of many of these signal sequences, which are not present in the final active protein, is a sign that these venom genes were recently duplicated.

So where do they come from? Sequences of the peptides themselves are of limited assistance, being small, (averaging ~60 amino acids), and under extensive selection to diversify. But they are processed from larger proteins (pro-proteins) and genes that show better conservation, providing the present authors more material for their evolutionary studies. The figure above, for example, shows, on the far right, the signal peptides from families of these DRP genes from single species. Signal peptides are the small leading section of a translated protein that directs it to be secreted rather than being kept inside the cell. Right after the protein is processed to the right place, this signal is clipped off and thus is not part of the mature venom protein. These signal peptides tend to be far more conserved than the mature venom protein, despite that fact that they have little to do- just send the protein to the right place, which can be accomplished by all sorts of sequences. But this is a sign that the venoms are under positive evolutionary pressure- to be more effective, to extend the range of possible victims, and to overcome whatever resistance the victims might evolve against them. 

Indeed, these authors show specifically that strong positive selection is at work, which is one more insight that molecular data can provide. (First, by comparing the rates of protein-coding positions that are neutral via the genetic code (synonymous) vs those that make the protein sequence change (non-synonymous), and second by the pattern and tempo of evolution of venom sequences compared with the mass of neutral sequences of the species.

"Given their significant sequence divergence since their deep-rooted evolutionary origin, the entire protein-coding gene, including the signal and propeptide regions, has accumulated significant differences. Consistent with this hypothesis, the majority of positively selected sites (~96%) identified in spider venom DRP toxins (all sites in Araneomorphae, and all but two sites in Mygalomorphae) were restricted to the mature peptide region, whereas the signal and propeptide regions harboured a minor proportion of these sites (1% and 3%, respectively)."

 

Phylogenetic tree (left), connecting up venom genes from across the spider phylogeny. On right, some of the venom sequences are shown just by their cysteine (C) locations, which form the basic structural scaffold of these proteins (top figure).


The more general phyogenetic analysis from all their sequences tells these authors that all the venom DRP genes, from all spider species, came from one origin. One easy way to see this is in the image above on the right, where just the cysteine scaffold of these proteins from around the phylogeny are lined up, showing that this scaffold is very highly conserved, regardless of the rest of the sequence. This finding (which confirms prior work) is surprising, since venoms of other animals, like snakes, tend to incorporate a motley bunch of active enzymes and components, sourced from a variety of ancestral sources. So to see spiders sticking so tenaciously to this fundamental structure and template for the major component of their venom is impressive- clearly it is a very effective molecule. The authors point out the cone snails, another notorious venom-maker, originated much more recently, (about 45 million years ago), and shows the same pattern of using one ancestral form to evolve a diversified blizzard of venom components, which have been of significant interest to medical science.


  • Example: a spider swings a bolas to snare a moth.

Saturday, February 11, 2023

A Gene is Born

Yes, genes do develop out of nothing.

The "intelligent" design movement has long made a fetish of information. As science has found, life relies on encoded information for its genetic inheritance and the reliable expression of its physical manifestations. The ID proposition is, quite simply, that all this information could not have developed out of a mindless process, but only through "design" by a conscious being. Evidently, Darwinian natural selection still sticks on some people's craw. Michael Behe even developed a pseudo-mathematical theory about how, yes, genes could be copied mindlessly, but new genes could never be conjured out of nothing, due to ... information.

My understanding of information science equates information to loss of entropy, and expresses a minimal cost of the energy needed to create, compute or transmit information- that is, the Shannon limits. A quite different concept comes from physics, in the form of information conservation in places like black holes. This form of information is really the implicit information of the wave functions and states of physical matter, not anything encoded or transmitted in the sense of biology or communication. Physical state information may be indestructable (and un-create-able) on this principle, but coded information is an entirely different matter.

In a parody of scientific discussion, intelligent design proponents are hosted by the once-respectable Hoover Institution for a discussion about, well, god.

So the fecundity that life shows in creating new genes out of existing genes, (duplications), and even making whole-chromosome or whole-genome duplications, has long been a problem for creationists. Energetically, it is easy to explain as a mere side-effect of having plenty of energy to work with, combined with error-prone methods of replication. But creationistically, god must come into play somewhere, right? Perhaps it comes into play in the creation of really new genes, like those that arise from nothing, such as at the origin of life?

A recent paper discussed genes in humans that have over our recent evolutionary history arisen from essentially nothing. It drew on prior work in yeast that elegantly laid out a spectrum or life cycle of genes, from birth to death. It turns out that there is an active literature on the birth of genes, which shows that, just like duplication processes, it is entirely natural for genes to develop out of humble, junky precursors. And no information theory needs to be wheeled in to show that this is possible.

Yeast provides the tools to study novel genes in some detail, with rich genetics and lots of sequenced relatives, near and far. Here is portrayed a general life cycle of a gene, from birth out of non-gene DNA sequences (left) into the key step of translation, and on to a subject of normal natural selection ("Exposed") for some function. But if that function decays or is replaced, the gene may also die, by mutation, becoming a pseudogene, and eventually just some more genomic junk.

The death of genes is quite well understood. The databases are full of "pseudogenes" that are very similar to active genes, but are disabled for some reason, such as a truncation somewhere or loss of reading frame due to a point mutation or splicing mutation. Their annotation status is dynamic, as they are sometimes later found to be active after all, under obscure conditions or to some low level. Our genomes are also full of transposons and retroviruses that have died in this fashion, by mutation.

Duplications are also well-understood, some of which have over evolutionary time given rise to huge families of related proteins, such as kinases, odorant receptors, or zinc-finger transcription factors. But the hunt for genes that have developed out of non-gene materials is a relatively new area, due to its technical difficulty. Genome annotators were originally content to pay attention to genes that coded for a hundred amino acids or more, and ignore everything else. That became untenable when a huge variety of non-coding RNAs came on the scene. Also, occasional cases of very small genes that encoded proteins came up from work that found them by their functional effects.

As genome annotation progressed, it became apparent that, while a huge proportion of genes are conserved between species, (or members of families of related proteins), other genes had no relatives at all, and would never provide information by this highly convenient route of computer analysis. They are orphans, and must have either been so heavily mutated since divergence that their relationships have become unrecognizable, or have arisen recently (that is, since their evolutionary divergence from related species that are used for sequence comparison) from novel sources that provide no clue about their function. Finer analysis of ever more closely related species is often informative in these cases.

The recent paper on human novel genes makes the finer point that splicing and export from the nucleus constitute the major threshold between junk genes and "real" genes. Once an RNA gets out of the nucleus, any reading frame it may have will be translated and exposed to selection. So the acquisition of splicing signals is a key step, in their argument, to get a randomly expressed bit of RNA over the threshold.

A recent paper provided a remarkable example of novel gene origination. It uncovered a series of 74 human genes that are not shared with macaque, (which they took as their reference), have a clear path of origin from non-coding precursors, and some of which have significant biological effects on human development. They point to a gradual process whereby promiscuous transcription from the genome gave rise by chance to RNAs that acquired splice sites, which piped them into the nuclear export machinery and out to the cytoplasm. Once there, they could be translated, over whatever small coding region they might possess, after which selection could operate on their small protein products. A few appear to have gained enough function to encourage expansion of the coding region, resulting in growth of the gene and entrenchment as part of the developmental program.

Brain "organoids" grown from genetically manipulated human stem cells. On left is the control, in middle is where ENSG00000205704 was deleted, and on the right is where ENSG00000205704 is over-expressed. The result is very striking, as an evolutionarily momentous effect of a tiny and novel gene.

One gene, "ENSG00000205704" is shown as an example. Where in macaque, the genomic region corresponding to this gene encodes at best a non-coding RNA that is not exported from the nucleus, in humans it encodes a spliced and exported mRNA that encodes a protein of 107 amino acids. In humans it is also highly expressed in the brain, and when the researchers deleted it in embryonic stem cells and used those cells to grow "organoids", or clumps of brain-like tissue, the growth was significantly reduced by the knockout, and increased by the over-expression of this gene. What this gene does is completely unknown. Its sequence, not being related to anything else in human or other species, gives no clue. But it is a classic example of gene that arose from nothing to have what looks like a significant effect on human evolution. Does that somehow violate physics or math? Nothing could be farther from the truth.

  • Will nuclear power get there?
  • What the heck happened to Amazon shopping?