Showing posts with label genetics. Show all posts
Showing posts with label genetics. Show all posts

Saturday, August 24, 2024

Aging and Death

Our fate was sealed a very long time ago.

Why do we die? It seems like a cruel and wasteful way to run a biosphere, not to mention a human life. After we have accumulated a lifetime of experience and knowledge, we age, decline, and sign off, whether to go to our just reward, or into oblivion. What is the biological rationale and defense for all this, which the biblical writers assigned to the fairy tale of the snake and the apple?

A recent paper ("A unified framework for evolutionary genetic and physiological theories of aging") discusses evolutionary theories of aging, but in typical French fashion, is both turgid and uninteresting. Aging is widely recognized as the consequence of natural selection, or more precisely, the lack thereof after organisms have finished reproducing. Thus we are at our prime in early adulthood, when we seek mates and raise young. Evolutionarily, it is all downhill from there. In professional sports, athletes are generally over the hill at 30, retiring around 35. Natural selection is increasingly irrelevant after we have done the essential tasks of life- surviving to mate and reproduce. We may participate in our communities, and do useful things, but from an evolutionary perspective, genetic problems at this phase of life have much less impact on reproductive success than those that hit earlier. 

All this is embodied in the "disposable soma" theory of aging, which is that our germ cells are the protected jewels of reproduction, while the rest of our bodies are, well, disposable, and thus experience all the indignities of age once their job of passing on the germ cells is done. The current authors try to push another "developmental" theory of aging, which posits that the tradeoffs between youth and age are not so much the resources or selective constraints focused on germ cell propagation vs the soma, but that developmental pathways are, by selection, optimized for the reproductive phase of life, and thus may be out of tune for later phases. Some pathways are over-functional, some under-functional for the aged body, and that imbalance is sadly uncorrected by evolution. Maybe I am not doing justice to these ideas, which maybe feed into therapeutic options against aging, but I find this distinction uncompelling, and won't discuss it further.

A series of unimpressive distinctions in the academic field studying aging from an evolutionary perspective.

Where did the soma arise? Single cell organisms are naturally unitary- the same cell that survives also mates and is the germ cell for the next generation. There are signs of aging in single cell organisms as well, however. In yeast, "mother" cells have a limited lifespan and ability to put out daughter buds. Even bacteria have "new" and "old" poles, the latter of which accumulate inclusion bodies of proteinaceous junk, which apparently doom the older cell to senescence and death. So all cells are faced with processes that fail over time, and the only sure bet is to start as a "fresh" cell, in some sense. Plants have taken a distinct path from animals, by having bodies and death, yes, but being able to generate germ cells from mature tissues instead of segregating them very early in development into stable and distinct gonads.

Multicellularity began innocently enough. Take slime molds, for example. They live as independent amoebae most of the time, but come together to put out spores, when they have used up the local food. They form a small slug-like body, which then grows a spore-bearing head. Some cells form the spores and get to reproduce, but most don't, being part of the body. The same thing happens with mushrooms, which leave a decaying mushroom body behind after releasing their spores. 

We don't shed alot of tears for the mushrooms of the world, which represent the death-throes of their once-youthful mycelia. But that was the pattern set at the beginning- that bodies are cells differentiated from the germ cells, that provide some useful, competitive function, at the cost of being terminal, and not reproducing. Bodies are forms of both lost energy and material, and lost reproductive potential from all those extra cells. Who could have imagined that they would become so ornate as to totally overwhelm, in mass and complexity, the germ cells that are the point of the whole exercise? Who could have imagined that they would gain feelings, purposes, and memories, and rage against the fate that evolution had in store for them?

On a more mechanistic level, aging appears to arise from many defects. One is the accumulation of mutations, which in soma cells lead to defective proteins being made and defective regulation of cell processes. An extreme form is cancer, as is progeria. Bad proteins and other junk like odd chemicals and chemically modified cell components can accumulate, which is another cause of aging. Cataracts are one example, where the proteins in our lenses wear out from UV exposure. We have quite intricate trash disposal processes, but they can't keep with everything, as we have learned from the advent of modern chemistry and its many toxins. Another cause is more programmatic: senescent cells, which are aged-out and have the virtue that they are blocked from dividing, but have the defect that they put out harmful signals to the immune system that promote inflammation, another general cause of aging.

Aging research has not found a single magic bullet, which makes sense from the evolutionary theory behind it. A few things may be fixable, but mostly the breakdowns were never meant to be remedied or fixed, nor can they be. In fact, our germ cells are not completely immune from aging either, as we learn from older fathers whose children have higher rates of autism. We as somatic bodies are as disposable as any form of packaging, getting those germ cells through a complicated, competitive world, and on to their destination.


Saturday, August 17, 2024

Oh, to Be Normal

It is a greater accomplishment than commonly appreciated.

The popular media make a fetish of condemning normality. Chase your dream, dare to be different, don't settle for average. Well, that is laudable, and appropriate for the occasional genius, but militates against much larger forces toward uniformity. Just look at styles in clothing, cars, architecture. "Keeping up" with fashions and the times is a marker of, not just normality, but of being alive and part of the larger social community. Achieving normal means not being fossilized in wig and breeches, or bell bottoms. The period of middle school and high school is when these pressures are most acute, as children find places in the wider society, staking their claim with clothing and all the other markers of being "normal". Especially against parents, who have by this time fallen a little back in their ability or desire to keep up with current standards.

But the point I am more interested in is genetic. In genetic terms, normal is typically stated as "wild-type", which is the opposite of mutant. Any particular gene or trait can be construed as normal or defective, with the possibility of being improved in some way over the "wild-type" being exceptionally rare. But summed over an entire genome, one can appreciate that not every gene can be normal. We all have mutations, and thus deviate from normal. In this sense, normality is an impossible, unattainable standard, and as anyone can observe, we all labor under some kind of deficiency. The only question is how severe those deficiencies are, relative to others, and relative to the minimum level of competence we need to survive.


That is where these two threads come together. Young people are continually competing and testing each other for fitness, gauging each other's ability to keep up with the high standard that constitutes "normal" for a culture. It is the beauty queens, and the popular kids, who find themselves at the top of the heap, shining standards of normality in a sea of mediocrity and deficiency. At least until they find out that they might have other, less visible weaknesses, like, perhaps, alcoholism. 

So, not to be all conformist about it, but for all the praise showered on diversity and innovation, there is a lot to be said for standards of normality, which are rather higher than they seem. They actually set significant challenges for everyone to aspire to. They represent, for example, a wide gamut of competencies that undergird society- the ability of people to get along in professional and intimate settings, and the basic knowledge and judgement needed for a democratic political system. Making up for one's deficiencies turns out to be a life-long quest, just as significant as making use of extraordinary gifts or pursuing competitive excellence in some chosen field.


Saturday, July 27, 2024

Putting Body Parts in Their Places

How HOX genes run development, on butterfly wings.

I have written about the HOX complex of genes several times, because they constitute a grail of developmental genetics- genes that specify the identity of body parts. They occupy the middle of a body plan cascade of gene regulation, downstream from broader specifiers for anterior/posterior orientation, regional and segment specification, and in turn upstream of many more genes that specify the details of organ and tissue construction. Each of the HOX genes encodes a transcriptional regulator, and the name of one says it all- antennapedia. In fruit flies, where all this was first discovered, loss of antennapedia converts some legs into antennae, and extra expression of antennapedia converts antennae on the head into legs.

The HOX complex (named for the homeobox DNA binding motif of the proteins they encode) is linear, arranged from head-affecting genes (labial, proboscipedia) to abdomen-affecting genes (abdominal A, abdominal B; evidently the geneticist's flair for naming ran out by this point). This arrangement is almost universally conserved, and turns out to reflect molecular mechanisms operating on the complex. That is, it "opens" in a progressive manner during development, on the chromosome. Repression of chromatin is a very common and sturdy way to turn genes off, and tends to affect nearby genes, in a spreading effect. So it turns out to be easy, in some sense, to set up the HOX complex to have this chromatin repression lifted in a segmental fashion, by upstream regulators, whereby only the head sections are allowed to be expressed in head tissues, but all the genes are allowed to be expressed in the final abdominal segment. That is why the unexpected expression of antennapedia, which is the fifth of eight HOX genes, in the head, leads to a thoracic tissue (legs) forming on the head.

A recent paper delved a little more deeply into this story, using butterflies, which have a normal linearly conserved HOX cluster and are easy to diagnose for certain body part transformations (called homeotic) on their beautiful wings. The main thing these researchers were interested in is the genetic elements that separate one part of the HOX cluster from other parts. These are boundary or "insulator" elements that separate topologically associated domains (called TADs). Each HOX gene is surrounded by various regulatory enhancer and inhibitor sites in the DNA that are bound by regulatory proteins. And it is imperative that these sites be directed only to the intended gene, not neighboring genes. That is why such TADs exist, to isolate the regulation of genes from others nearby. There are now a variety of methods to map such TADs, by looking where chromatin (histones) are open or closed, or where DNA can be cut by enzymes in the native chromatin, or where crosslinks can be formed between DNA molecules, and others.

The question posed here was whether a boundary element, if deleted, would cause a homeotic transformation in the butterflies they were studying. They found, unfortunately, that it was impossible to generate whole animals with the deletions and other mutations they were engineering, so they settled for injecting the CRISPER mutational molecules into larval tissues and watching how they affected the adults in mosaic form, with some mutant tissues, some wild-type. The boundary they focused on was between antennapedia (Antp) and ultrabithorax (Ubx), and the tissues the forewings, where Ubx is normally off, and hindwings, where Ubx is normally on. Using methods to look at the open state of chromatin, they found that the Ubx gene is dramatically opened in hindwings, relative to forewings. Nevertheless, the boundary remains in place throughout, showing that there is a pretty strong isolation from Antp to Ubx, though they are next door and a couple hundred thousand basepairs apart. Which in genomic terms is not terribly far, while it leaves plenty of space for enhancers, promotes, introns, boundary elements, and other regulatory paraphernalia.

Analysis of the site-to-site chromosomal closeness and accessibility across the HOX locus of the butterfly Junonia coenia. The genetic loci are noted at the bottom, and the site-to-site hit rates are noted in the top panels, with blue for low rates of contact, and orange/red for high rates of contact. At top is the forewing, and at bottom is the hindwing, where Ubx is expressed, thus the high open-ness and intra-site contact within its topological domain (TAD). Yet the boundary between Ubx and Anp to its left (dotted lines at bottom) remains very strong in both tissues. In green is a measure of transcription from this DNA, in differential terms hindwing minus forewing, showing the strong repression of Ubx in the forewing, top panel.

The researchers naturally wanted to mutate the boundary element, (Antp-Ubx_BE), which they deduced lay at a set of binding sites (featuring CCCTC) for the protein CTCF, a well-known insulating boundary regulator. Note, interestingly, that in the image above, the last exon (blue) of Ubx (transcription goes right to left) lies across the boundary element, and in the topological domain of the Antp gene. This means that while all the regulatory apparatus of Ubx is located in its own domain, on the right side, it is OK for transcription to leak across- that has no regulatory implications. 

Effects of removing the boundary element between Ubx and Antp. Detailed description is in the text below. 

Removal of this boundary element, using CRISPER technology in portions of the larval tissues, had the expected partial effects on the larval, and later adult, wings of this butterfly. First, note that in panel D insets, the wild type larval forewing shows no expression of Ubx, (green), while the wild type hind wing shows wide-spread expression. This is the core role of the HOX locus and the Ubx gene- locate its expression in the correct body parts to then induce the correct tissues to develop. The larval wing tissue of the mosaic mutant, also in D, shows, in the forewing, extensive patchy expression of Ubx. This is then reflected in the adult (different animals) in the upper panels, in the mangled eyespot of the fully formed wing (center panel, compared to wild-type forewing and hindwing to each side). It is a small effect, but then these are small mutations, done in only a fraction of the larval cells, as well.

So here we are, getting into the nuts and bolts of how body parts are positioned and encoded. There are large regions around these genes devoted to regulatory affairs, including the management of chromatin repression, the insulation of one region from another, the enhancer and repressor sites that integrate myriad upstream signals (i.e. other DNA binding proteins) to come up with the detailed pattern of expression of these HOX genes. Which in turn control hundreds of other genes to execute the genetic program. This program can hardly be thought of as a blueprint, nor a "design" in anyone's eye, divine or otherwise. It resembles much more a vast pile of computer code that has accreted over time with occasional additions of subroutines, hacks, duplicated bits, and accidental losses, adding up to a method for making a body that is robust in some respects to the slings and arrows of fortune, but naturally not to mutations in its own code.


Saturday, July 13, 2024

The Long Tail of Genome Duplication

A new genomic sequence of hagfish tells us a little about our origins.

Hagfish- not a fish, and not very pretty, but it occupies a special place in evolution, as a vertebrate that diverged very early (along with lampreys, forming the cyclostome branch) from the rest of the jawed vertebrates (the gnathostome branch). The lamprey has been central to studies of the blood clotting system, which is a classic story of gradual elaboration over time, with more steps added to the cascade, enabling faster clotting and finer regulation.

A highly schematic portrayal (not to scale!) of the evolutionary history of animal life on earth.

A recent paper reported a full genome sequence of hagfish, and came up with some interesting observations about the history of vertebrate genomes. At about three billion nucleotides, this genome is about as large as ours. (Yet again, size doesn't see, to matter much, when it comes to genomes.) They confirm that lampreys and hagfish make up a single lineage, separate from all other animals and especially from the jawed vertebrates. For example, though lampreys have 84 chromosomes to the hagfish's 17, this resulted from repeated splitting of chromosomes, and each lamprey chromosome can be mostly mapped to one hagfish chromosome, accepting that a lot of other gene movement and change has taken place in the roughly 460 million years since these lineages diverged. 

Hagfish (bottom) and lamprey (top) chromosomes pretty much line up, indicating that despite the splitting of the lamprey genome, there hasn't been a great deal of shuffling over the intervening 460 million years.

The most important parts of this paper are on the history of genome duplications that happened during this early phase of vertebrate evolution. Whole genome duplications are an extremely powerful engine of change, supplying the organism with huge amounts of new genetic material. Over time, most of the duplicated genes are discarded again (in a process they call re-diploidization). But many are not, if they have gained some foothold in providing more of an important product, or differentiated themselves from each other in some other way. Our genomes are full of families, some extremely large, of related genes that have finely differentiated functions. Many of these copies originated in long-ago genome duplications, while others originated in smaller duplication accidents. It is startling to hear from self-labeled scientists in the so-called intelligent design movement that there is some rule or law against such copying of information, by their ridiculous theories of specified information. Hagfish certainly never heard of such a thing.

At any rate, these researchers confirm that the earliest vertebrate lineage, around 530 million years ago, experienced two genome duplications which led to a large increment of new genes and evolutionary innovation. What they find now is that the cyclostome lineage experienced another genome three-fold duplication (near its origin, about 460 million years ago, leading to another round of copies and innovation. And lastly, the gnathostome lineage separately experienced its own genome four-fold duplication around the same time, after it had diverged from the cyclostome lineage. One might say that the gnathostomes made better use of their genomic manna, generating jaws, teeth, ears, thymus, better immune systems, and the other features that led them to win the race of the animal kingdom. But hagfish are still around, showing that primitive forms can find a place in the scheme of things, as the biosphere gets larger and more diverse over time.

A classic example of gene replication is the Hox cluster, which are a set of genes that have the power of dictating what body part occurs where. They are gene regulators that function in the middle of the developmental sequence, after determination of the overall body axis and segmentation, and themselves regulating downstream genes governing features as they occur in different segments, such as limbs, parts of the head, fingers, etc. Flies have one Hox cluster, split into two parts. The extremely primitive chordate amphioxus, which far predates the cyclostomes, also has one complete Hox cluster, as diagrammed below. Most other vertebrates, including us, have four Hox clusters, amounting to over thirty of these transcription regulators. These four clusters arose from the inferred genome duplications very early in the vertebrate lineage, prior to the advent of the cyclostomes. 

Hox clusters and their origins, as inferred by the current authors. The red/blue points at the left mark whole genome duplications (or more) that have been inferred by these or other authors. More description is in the main text below.

The inferred genome duplications during early chordate evolution, noted on the far left of the diagram above, led to duplicated clusters of Hox genes. Amphioxus (top) is the earliest branching chordate, and has only one full Hox cluster of transcription regulators, which, in general terms, control, during development, the expression of body parts along the body axis, with the order of genes in the cluster paralleling expression and action along the body axis. Chicken as a gnathostome has four copies of the cluster, with a few of the component genes lost over time. Hagfish have six copies of this Hox cluster, some rather skeletal, stemming from its genome duplication events. Clearly several whole clusters have also been lost, as in some cases the genome duplications experienced by the cyclostomes resolved back to diploidy without leaving an extra copy of this cluster. The net effect is to allow all these organisms greater options for controlling the identity and form of different parts of the body, particularly, in the case of gnathostomes, the head.

Genome duplications are one of those fast events in evolution that are highly influential, unlike the usual slow and steady selection and optimization that is the rule in the Darwinian theory. Unlike mass extinction, another kind of fast event in evolution, genome duplications are highly constructive, providing fodder on a mass (if microscopic) scale for new functions and specializations that help account for some of the more rapid events in the history of life, such as the rise of chordates and then vertebrates in the wake of the Cambrian explosion.


Sunday, March 31, 2024

Nominee for Most Amazing Protein: RAD51

On the repair and resurrection of DNA, which gets a lot of help from a family of proteins including RAD51, DMC1, and RecA.

Proteins do all sorts of amazing things, from composing pores that can select a single kind of ion- even just a proton- to allow across a membrane, to massive polymerizing enzymes that synthesize other proteins, DNA, and RNA. There is really no end to it. But one of the most amazing, even incredible, things that happens in a cell is the hunt for DNA homology. Even over a genome of billions of base pairs, it is possible for one DNA segment to find the single other DNA segment that matches it. This hunt is executed for several reasons. One is to line up the homologous chromosomes at meiosis, and carry out the genetic cross-overs between them (when they are lined up precisely) that help scramble our genetic lineages for optimal mix-and-matching during reproduction. Another is for DNA repair, which is best done with a good copy for reference, especially when a full double-strand break has happened. Just this week, a fascinating article showed that memories in our brains depend in some weird way on DNA breaks occurring in neurons, some of which then use the homologous repair process, including homology search, to patch things up.

The protein that facilitates this DNA homology search is deeply conserved in evolution. It is called RecA in bacteria, radA and radB in archaea, and the RAD51 family in eukaryotes. Naturally, the eukaryotic family is most closely related to the archaeal versions (RAD51 and DMC1 evolving from radA, and a series of other, and poorly understood family members, from radB). In this post, I will mostly just call them all RAD51, unless I am referring to DMC1 specifically. The name comes from genetic screens for radiation-sensitive mutants in human and other eukaryotes, since RAD51 plays a crucial role in DNA repair, as noted above. RAD51 is not a huge protein, but it is an ATPase. It binds to itself, forming linear filaments with ATP at the junction points between units. It binds to a single strand of DNA, which is going to be what does the hunting. And it binds, in a complicated way, to another double-stranded DNA, which it helps to open briefly to allow its quality as a target to be evaluated. 

This diagram describes the repair of double strand breaks (DSB) in DNA. First the ends are covered with a bunch of proteins that signal far and wide that something terrible has happened- the cell cycle has to stop.. fire engines need to be called. One of these proteins is RPA, which simply binds all over single-stranded DNA and protects it. Then the RAD51 protein comes in, displaces RPA, and begins the homology search process. The second DNA shown, in dark black, doesn't just happen, but is hunted for high and low throughout the nucleus to find the exact homolog of the broken end. When that exact match is found, the repair process can proceed, with continued DNA synthesis through the lesion, and resolution of the newly repaired double strands, either to copy up the homolog version, or exchange versions (GC, for gene conversion). 

This diagram shows how the notorious (when mutated) oncogene BRCA2 (in green) works. It binds RAD51 (in blue) and brings it, chain-gang style, to the breakpoints of DNA damage to speed up and specify repair.


There have been several structural studies by this point that clarify how RAD51 does its thing. ATP is simply required to form filaments on single-stranded DNA. When a match has been found and RAD51 is no longer needed, ATP is cleaved, and RAD51 falls off, back to reserve status. The magic starts with how RAD51 binds the single stranded DNA. One RAD51 binds for every ~3 bases in the DNA, and the it binds the phosphate backbone, so that the bases are nicely exposed in front, and all stretched out, ready to hunt for matching DNA.

A series of RAD51 molecules (in this case, RecA from bacteria) bound sequentially to single-stranded DNA (red). Note the ATP homolog chemicals in yellow, positioned between each protein unit. One can see that the DNA is stretched out a bit and the bases point outwards.

A closeup view of one of the RAD51 units from above, showing how the bases of the DNA (yellow) are splayed out into the medium, ready to find their partners. They are arranged in orientations similar to how they sit in normal (B-form) DNA, further enhancing their ability to find partners.

The second, and more mysterious part of the operation is how RAD51 scans double-stranded DNA throughout the genome. It has binding sites for double-stranded DNA, away from the single-stranded DNA, and then it also has a little finger that splits open the double-stranded DNA, encouraging separation and allowing one strand to face up to the single stranded DNA that is held firmly by the RAD51 polymer. The transient search happens in eight-base increments, with tighter capture of the double-strand DNA happening when nine bases are matched, and committment to recombination or repair happening when a match of fifteen bases is found.  

These structures show an intermediate where a double-stranded DNA (ends in teal and lavender, and separated DNA segments in green and red) has been captured, making a twelve base match with the stable single-stranded DNA (brown). Note how the double-stranded DNA ends are held by outside portions of the RAD51 protein. Closeup on the right shows the dangling, non-paired DNA strand in red, and the newly matched duplex DNA with green-brown colored base interactions.

These structures can only give a hint of what is going on, since the whole process relies so clearly on the brownian motion that allows super-rapid diffusion of the stablized single-strand DNA+RAD51 over the genome, which it scans efficiently in one-dimensional fashion, despite all the chromatin and other proteins parked all over the place. And while the structures provide insight into how the process happens, it remains incredible that this search can happen, on what is clearly a quite reliable basis, day and day out, as our genomes get hit by whatever the environment throws at us.

"Unfortunately, most RAD51 and RAD51 paralog point mutations that have been clinically identified are classified as variants of unknown significance (VUSs). Future studies to reclassify these RAD51 gene family VUSs as pathogenic or benign are desperately needed, as many of these genes are now included on hereditary breast and ovarian cancer screening panels. Reclassification of HR-deficient VUSs would enable these patients to benefit from therapies that specifically target HR deficiency, as do poly(ADP)-ribose polymerase (PARP) inhibitors in BRCA1/2-deficient cells."

Lastly, one paper made the point that clinicians need better understanding of the various mutations that can affect RAD51 itself. Genetic testing now is able to find all of our mutations, but we don't always know what each mutation is capable of doing. Thus deeper studies of RAD51 will have beneficial effects on clinical diagnosis, when particular mutations can be assigned as disease-causing, thus justifying specific therapies that would otherwise not be attempted.


Saturday, December 23, 2023

How Does Speciation Happen?

Niles Eldredge and the theory of punctuated equilibrium in evolution.

I have been enjoying "Eternal Ephemera", which is an end-of-career memoir/intellectual history from a leading theorist in paleontology and evolution, Niles Eldredge. In this genre, often of epic proportions and scope, the author takes stock of the historical setting of his or her work and tries to put it into the larger context of general intellectual progress, (yes, as pontifically as possible!), with maybe some gestures towards future developments. I wish more researchers would write such personal and deeply researched accounts, of which this one is a classic. It is a book that deserves to be in print and more widely read.

Eldredge's claim to fame is punctuated equilibrium, the theory (or, perhaps better, observation) that evolution occurs much more haltingly than in the majestic gradual progression that Darwin presented in "Origin of Species". This is an observation that comes straight out of the fossil record. And perhaps the major point of the book is that the earliest biologists, even before Darwin, but also including Darwin, knew about this aspect of the fossil record, and were thus led to concepts like catastrophism and "etagen". Only Lamarck had a steadfastly gradualist view of biological change, which Darwin eventually took up, while replacing Lamarck's mechanism of intentional/habitual change with that of natural selection. Eldridge unearths tantalizing and, to him, supremely frustrating, evidence that Darwin was fully aware of the static nature of most fossil series, and even recognized the probable mechanism behind it (speciation in remote, peripheral areas), only to discard it for what must have seemed a clearer, more sweeping theory. But along the way, the actual mechanism of speciation got somewhat lost on the shuffle.

Punctuated equilibrium observes that most species recognized in the fossil record do not gradually turn into their descendents, but are replaced by them. Eldredge's subject of choice is trilobites, which have a long and storied record for almost 300 million years, featuring replacement after replacement, with species averaging a few million years duration each. It is a simple fact, but one that is a bit hard to square with the traditional / Darwinian and even molecular account of evolution. DNA is supposed to act like a "clock", with constant mutational change through time. And natural selection likewise acts everywhere and always... so why the stasis exhibited by species, and why the apparently rapid evolution in between replacements? That is the conundrum of punctuated equilibrium.

There have been lot of trilobites. This comes from a paper about their origin during the Cambrian explosion, arguing that only about 20 million years was enough for their initial speciation (bottom of image).

The equilibrium part, also termed stasis, is seen in the current / recent world as well as in the fossil record. We see species such as horses, bison, and lions that are identical to those drawn in cave paintings. We see fossils of animals like wildebeest that are identical to those living, going back millions of years. And we see unusual species in recent fossils, like saber-toothed cats, that have gone extinct. We do not typically see animals that have transformed over recent geological history from one (morphological) species into another, or really, into anything very different at all. A million years ago, wildebeest seem to have split off a related species, the black wildebeest, and that is about it.

But this stasis is only apparent. Beneath the surface, mutations are constantly happening and piling up in the genome, and selection is relentlessly working to ... do something. But what? This is where the equilibrium part comes in, positing that wide-spread, successful species are so hemmed in by the diversity of ecologies they participate in that they occupy a very narrow adaptive peak, which selection works to keep the species on, resulting in apparent stasis. It is a very dynamic equilibrium. The constant gene flow among all parts of the population that keeps the species marching forward as one gene pool, despite the ecological variability, makes it impossible to adapt to new conditions that do not affect the whole range. Thus, paradoxically, the more successful the species, and the more prominent it is in the fossil record, the less change will be apparent in those fossils over time.

The punctuated part is that these static species in the fossil record eventually disappear and are replaced by other species that are typically similar, but not the same, and do not segue from the original in a gradual way that is visible in the fossil record. No, most species and locations show sudden replacement. How can this be so if evolution by natural selection is true? As above, wide-spread species are limited in what selection can do. Isolated populations, however, are more free to adapt to local conditions. And if one of those local conditions (such as arctic cold) happens to be what later happens to the whole range (such as an ice age), then it is more likely that a peripherally (pre-)adapted population will take over the whole range, than that the resident species adapts with sufficient speed to the new conditions. Range expansion, for the peripheral species, is easier and faster than adaptation, for the wide-ranging originating species.

The punctuated equilibrium proposition came out in the 1970's, and naturally followed theories of speciation by geographic separation that had previously come out (also resurrected from earlier ideas) in the 1930's to 1950's, but which had not made much impression (!) on paleontologists. Paleontologists are always grappling with the difficulties of the record, which is partial, and does not preserve a lot of what we would like to know, like behavior, ecological relationships, and mutational history. But they did come to agree that species stasis is a real thing, not just, as Darwin claimed, an artifact of the incomplete fossil record. Granted- if we had fossils of all the isolated and peripheral locations, which is where speciation would be taking place by this theory, we would see the gradual change and adaptation taking place. So there are gaps in the fossil record, in a way. But as long as we look at the dominant populations, we will rarely see speciation taking place before our eyes, in the fossils.

So what does a molecular biologist have to say about all this? As Darwin insisted early in "Origin", we can learn quite a bit from domesticated animals. It turns out that wild species have a great amount of mostly hidden genetic variation. This is apparent whenever one is domesticated and bred for desired traits. We have bred dogs, for example, to an astonishingly wide variety of traits. At the same time, we have bred them out to very low genetic diversity. Many breeds are saddled with genetic defects that can not be resolved without outbreeding. So we have in essence exchanged the vast hidden genetic diversity of a wild species for great visible diversity in the domesticated species, combined with low genetic diversity.

What this suggests is that wild species have great reservoirs of possible traits that can be selected for the purposes of adaptation under selective conditions. Which suggests that speciation in range edges and isolated environments can be very fast, as the punctuated part of punctuated equilibrium posits. And again, it reinforces the idea that during equilibrium with large populations and ranges, species have plenty of genetic resources to adapt and change, but spend those resources reinforcing / fine tuning their core ecological "franchise", as it were.

In population genetics, it is well known that mutations arise and fix (that is, spread to 100% of the population on both alleles) at the same rate no matter how large the population, in theory. That is to say- bigger populations generate more mutations, but correspondingly hide them better in recessive form (if deleterious) and for neutral mutations, take much longer to allow any individual mutation to drift to either extinction or fixation. Selection against deleterious mutations is more relentless in larger populations, while relaxed selection and higher drift can allow smaller populations to explore wider ranges of adaptive space, perhaps finding globally higher (fitness) peaks than the parent species could find.

Eldredge cites some molecular work that claims that at least twenty percent of sequence change in animal lineages is due specifically to punctuational events of speciation, and not to the gradual background accumulation of mutations. What could explain this? The actual mutation rate is not at issue, (though see here), but the numbers of mutations retained, perhaps due to relaxed purifying selection in small populations, and founder effects and positive selection during the speciation process. This kind of phenomenon also helps to explain why the DNA "clock" mentioned above is not at all regular, but quite variable, making an uneven guide to dating the past.

Humans are another good example. Our species is notoriously low in genetic diversity, compared to most wild species, including chimpanzees. It is evident that our extremely low population numbers (over prehistoric time) have facilitated speciation, (that is, the fixation of variants which might be swamped in bigger populations), which has resulted in a bewildering branching pattern of different hominid forms over the last few million years. That makes fossils hard to find, and speciation hard to pinpoint. But now that we have taken over the planet with a huge population, our bones will be found everywhere, and they will be largely static for the foreseeable future, as a successful, wide-spread species (barring engineered changes). 

I think this all adds up to a reasonably coherent theory that reconciles the rest of biology with the fossil record. However, it remains frustratingly abstract, given the nature of fossils that rarely yield up the branching events whose rich results they record.


Saturday, December 9, 2023

The Way We Were: Origins of Meiosis and Sex

Sex is as foundational for eukaryotes as are mitochondria and internal membranes. Why and how did it happen?

Sexual reproduction is a rather expensive proposition. The anxiety, the dating, the weddings- ugh! But biologically as well, having to find mates is no picnic for any species. Why do we bother, when bacteria get along just fine just dividing in two? This is a deep question in biology, with a lot of issues in play. And it turns out that bacteria do have quite a bit of something-like-sex: they exchange DNA with each other in small pieces, for similar reasons we do. But the eukaryotic form of sex is uniquely powerful and has supported the rapid evolution of eukaryotes to be by far the dominant domain of life on earth.

A major enemy of DNA-encoded life is mutation. Despite the many DNA replication accuracy and repair mechanisms, some rate of mutation still occurs, and is indeed essential for evolution. But for larger genomes, the mutation rate always exceeds the replication rate, (and the purifying natural selection rate), so that damaging mutations build up and the lineage will inevitably die out without some help. This process is called Muller's ratchet, and is why all organisms appear to exchange DNA with others in their environment, either sporadically like bacteria, or systematically, like eukaryotes.

An even worse enemy of the genome is unrepaired damage like complete (double strand) breaks in the DNA. These stop replication entirely, and are fatal. These also need to be repaired, and again, having extra copies of a genome is the way to allow these to be fixed, by processes like homologous recombination and gene conversion. So having access to other genomes has two crucial roles for organisms- allowing immediate repair, and allowing some way to sweep out deleterious mutations over the longer term.

Our ancestors, the archaea, which are distinct from bacteria, typically have circular, single molecule genomes, in multiple copies per cell, with frequent gene conversions among the copies and frequent exchange with other cells. They routinely have five to twenty copies of their genome, and can easily repair any immediate damage using those other copies. They do not hide mutant copies like we do in a recessive allele, but rather by gene conversion (which means, replicating parts of a chromosome into other ones, piecemeal) make each genome identical over time so that it (and the cell) is visible to selection, despite their polyploid condition. Similarly, taking in DNA from other, similar cells uses the target cells' status as live cells (also visible to selection) to insure that the recipients are getting high quality DNA that can repair their own defects or correct minor mutations. All this ensures that their progeny are all set up with viable genomes, instead of genomes riddled with defects. But it comes at various costs as well, such as a constant race between getting lethal mutation and finding the DNA that might repair it. 

Both mitosis and meiosis were eukaryotic innovations. In both, the chromosomes all line up for orderly segregation to descendants. But meiosis engages in two divisions, and features homolog synapsis and recombination before the first division of the parental homologs.

This is evidently a precursor to the process that led, very roughly 2.5 billion years ago, to eukaryotes, but is all done in a piecemeal basis, nothing like what we do now as eukaryotes. To get to that point, the following innovations needed to happen:

  • Linearized genomes, with centromeres and telomeres, and >1 number of chromosomes.
  • Mitosis to organize normal cellular division, where multiple chromosomes are systematically lined up and distributed 1:1 to daughter cells, using extensive cytoskeletal rearrangements and regulation.
  • Mating with cell fusion, where entire genomes are combined, recombined, and then reduced back to a single complement, and packaged into progeny cells.
  • Synapsis, as part of meiosis, where all sister homologs are lined up, damaged to initiate DNA repair and crossing-over.
  • Meiosis division one, where the now-recombined parental homologs are separated.
  • Meiosis division two, which largely follows the same mechanisms as mitosis, separating the reshuffled and recombined sister chromosomes.

This is a lot of novelty on the path to eukaryogenesis, and is just a portion of the many other innovations that happened in this lineage. What drove all this, and what were some plausible steps in the process? The advent of true sex generated several powerful effects:

  1. A definitive solution to Muller's ratchet, by exposing every locus in a systematic way to partial selection and sweeping out deleterious mutations, while protecting most members of the population from those same mutations. Continual recombination of the parental genomes allows beneficial mutations to separate from deleterious ones and be differentially preserved.
  2. Mutated alleles are partially, yet systematically, hidden as recessive alleles, allowing selection when they come into homozygous status, but also allowing them to exist for limited time to buffer the mutation rate and to generate new variation. This vastly increases accessible genetic variation.
  3. Full genome-length alignment and repair by crossing over is part of the process, correcting various kinds of damage and allowing accurate recombination across arbitrarily large genomes.
  4. Crossing over during meiotic synapsis mixes up the parental chromosomes, allowing true recombination among the parental genomes, beyond just the shuffling of the full-length chromosomes. This vastly increases the power of mating to sample genetic variation across the population, and generates what we think of as "species", which represent more or less closed interbreeding pools of genetic variants that are not clones but diverse individuals.

The time point of 2.5 billion years ago is significant because this is the general time of the great oxidation event, when cyanobacteria were finally producing enough oxygen by photosynthesis to alter the geology of earth. (However our current level of atmospheric oxygen did not come about until almost two billion years later, with rise of land plants.) While this mainly prompted the logic of acquiring mitochondria, either to detoxify oxygen or use it metabolically, some believe that it is relevant to the development of meiosis as well. 

There was a window of time when oxygen was present, but the ozone layer had not yet formed, possibly generating a particularly mutagenic environment of UV irradiation and reactive oxygen species. Such higher mutagenesis may have pressured the archaea mentioned above to get their act together- to not distribute their chromosomes so sporadically to offspring, to mate fully across their chromosomes, not just pieces of them, and to recombine / repair across those entire mated chromosomes. In this proposal, synapsis, as seen in meiosis I, had its origin in a repair process that solved the problem of large genomes under mutational load by aligning them more securely than previously. 

It is notable that one of the special enzymes of meiosis is Spo11, which induces the double-strand breaks that lead to crossing-over, recombination, and the chiasmata that hold the homologs together during the first division. This DNA damage happens at quite high rates all over the genome, and is programmed, via the structures of the synaptonemal complex, to favor crossing-over between (parental) homologs vs duplicate sister chromosomes. Such intensive repair, while now aimed at ensuring recombination, may have originally had other purposes.

Alternately, others suggest that it is larger genome size that motivated this innovation. This origin event involves many gene duplication events that ramified the capabilities of the symbiotic assemblage. Such gene dupilcations would naturally lead to recombinational errors in traditional gene conversion models of bacterial / archaeal genetic exchange, so there was pressure to generate a more accurate whole-genome alignment system that confined recombination to the precise homologs of genes, rather than to any similar relative that happened to be present. This led to the synapsis that currently is part of meiosis I, but it is also part of "parameiosis" systems on some eukaryotes, which, while clearly derived, might resemble primitive steps to full-blown meiosis.

It has long been apparent that the mechanisms of meiosis division one are largely derived from (or related to) the mechanisms used for mitosis, via gene duplications and regulatory tinkering. So these processes (mitosis and the two divisions of meiosis) are highly related and may have arisen as a package deal (along with linear chromosomes) during the long and murky road from the last archaeal ancestor and the last common eukaryotic ancestor, which possessed a much larger suite of additional innovations, from mitochondria to nuclei, mitosis, meiosis, cytoskeleton, introns / mRNA splicing, peroxisomes, other organelles, etc.  

Modeling of different mitotic/meiotic features. All cells modeled have 18 copies of a polypoid genome, with a newly evolved process of mitosis. Green = addition of crossing over / recombination of parental chromosomes, but no chromosome exchange. Red = chromosome exchange, but no crossing over. Blue = both crossing over and chromosome exchange, as occurs now in eukaryotes. The Y axis is fitness / survival and the X axis is time in generations after start of modeling.

A modeling paper points to the quantitative benefits of the mitosis when combined with the meiotic suite of innovations. They suggest that in a polyploid archaean lineage, the establishment of mitosis alone would have had revolutionary effects, ensuring accurate segregation of all the chromosomes, and that this would have enabled differentiation among those polyploid chromosome copies, since they would be each be faithfully transmitted individually to offspring (assuming all, instead of one, were replicated and transmitted). Thus they could develop into different chromosomes, rather than remain copies. This would, as above, encourage meiosis-like synapsis over the whole genome to align all the (highly similar) genes properly.

"Modeling suggests that mitosis (accurate segregation of sister chromosomes) immediately removes all long-term disadvantages of polyploidy."

Additional modeling of the meiotic features of chromosome shuffling, and recombination between parental chromosomes, indicates (shown above) that these are highly beneficial to long-term fitness, which can rise instead of decaying with time, per the various benefits of true sex as described above. 

The field has definitely not settled on one story of how meiosis (and mitosis) evolved, and these ideas and hypotheses are tentative at this point. But the accumulating findings that the archaea that most closely resemble the root of the eukaryotic (nuclear) tree have many of the needed ingredients, such as active cytoskeletons, a variety of molecular antecedents of ramified eukaryotic features, and now extensive polyploidy to go with gene conversion and DNA exchange with other cells, makes the momentous gap from archaea to eukaryotes somewhat narrower.


Saturday, May 20, 2023

On the Spectrum

Autism, broader autism phenotype, temperament, and families. It turns out that everyone is on the spectrum.

The advent of genomic sequencing and the hunt for disease-causing mutations has been notably unhelpful for most mental diseases. Possible or proven disease-causing mutations pile up, but they do little to illuminate the biology of what is going on, and even less towards treatment. Autism is a prime example, with hundreds of genes now identified as carrying occasional variants with causal roles. The strongest of these variants affect synapse formation among neurons, and a second class affects long-term regulation of transcription, such as turning genes durably on or off during developmental transitions. Very well- that all makes a great deal of sense, but what have we gained?

Clinically, we have gained very little. What is affected are neural developmental processes that can't be undone, or switched off in later life with a drug. So while some degree of understanding slowly emerges from these studies, translating that to treatment remains a distant dream. One aspect of the genetics of autism, however, is highly informative, which is the sheer number of low-effect and common mutations. Autism can be thought of as coming in two types, genetically- those due to a high effect, typically spontaneous or rare mutation, and those due to a confluence of common variants. The former tends to be severe and singular- an affected child in a family that is otherwise unaffected. The latter might be thought of as familial, where traits that have appeared (mildly) elsewhere in the family have been concentrated in one child, to a degree that it is now diagnosable.

This pattern has given rise to the very interesting concept of the "Broader Autism Phenotype", or BAP. This stems from the observation that families of autistic children have higher rates where ... "the parents, grandparents, and collaterals are persons strongly preoccupied with abstractions of a scientific, literary, or artistic nature, and limited in genuine interest in people." Thus there is not just a wide spectrum of autism proper, based on the particular confluence of genetic and other factors that lead to a diagnosis and its severity, but there is also, outside of the medical spectrum, quite another spectrum of traits or temperaments which tend toward autism and comprise various eccentricities, but have not, at least to date, been medicalized.


The common nature of these variants leads to another question- why are they persistent in the population? It is hard to believe that such a variety and number of variations are exclusively deleterious, especially when the BAP seems to have, well, rather positive aspects. No, I would suggest that an alternative way to describe BAP is "an enhanced ability to focus", and develop interests in salient topics. Ever meet people who are technically useless, but warm-hearted? They are way off on the non-autistic part of the spectrum, while the more technically inclined, the fixers of the world and scholars of obscure topics, are more towards the "ability to focus" part of the spectrum. Only when such variants are unusually concentrated by the genetic lottery do children appear with frank autistic characteristics, totally unable to deal with social interactions, and given to obsessive focus and intense sensitivities.

Thus autism looks like a more general lens on human temperament and evolution, being the tip of a very interesting iceberg. As societies, we need the politicians, backslappers, networkers, and con men, but we also need, indeed increasingly as our societies and technologies developed over the centuries, people with the ability and desire to deal with reality- with technical and obscure issues- without social inflection, but with highly focused attention. Militaries are a prime example, fusing critical needs of managing and motivating people, with a modern technical base of vast scope, reliant on an army of specialists devoted to making all the machinery work. Why does there have to be this tradeoff? Why can't everyone be James Bond, both technically adept and socially debonaire? That isn't really clear, at least to me, but one might speculate that in the first place, dealing with people takes a great deal of specialized intelligence, and there may not be room for everything in one brain. Secondly, the enhanced ability to focus on technical or artistic topics may actively require, as is implicit in doing science and as was exemplified by Mr. Spock, an intentional disregard of social niceties and motivations, if one is to fully explore the logic of some other, non-human, world.


Saturday, February 11, 2023

A Gene is Born

Yes, genes do develop out of nothing.

The "intelligent" design movement has long made a fetish of information. As science has found, life relies on encoded information for its genetic inheritance and the reliable expression of its physical manifestations. The ID proposition is, quite simply, that all this information could not have developed out of a mindless process, but only through "design" by a conscious being. Evidently, Darwinian natural selection still sticks on some people's craw. Michael Behe even developed a pseudo-mathematical theory about how, yes, genes could be copied mindlessly, but new genes could never be conjured out of nothing, due to ... information.

My understanding of information science equates information to loss of entropy, and expresses a minimal cost of the energy needed to create, compute or transmit information- that is, the Shannon limits. A quite different concept comes from physics, in the form of information conservation in places like black holes. This form of information is really the implicit information of the wave functions and states of physical matter, not anything encoded or transmitted in the sense of biology or communication. Physical state information may be indestructable (and un-create-able) on this principle, but coded information is an entirely different matter.

In a parody of scientific discussion, intelligent design proponents are hosted by the once-respectable Hoover Institution for a discussion about, well, god.

So the fecundity that life shows in creating new genes out of existing genes, (duplications), and even making whole-chromosome or whole-genome duplications, has long been a problem for creationists. Energetically, it is easy to explain as a mere side-effect of having plenty of energy to work with, combined with error-prone methods of replication. But creationistically, god must come into play somewhere, right? Perhaps it comes into play in the creation of really new genes, like those that arise from nothing, such as at the origin of life?

A recent paper discussed genes in humans that have over our recent evolutionary history arisen from essentially nothing. It drew on prior work in yeast that elegantly laid out a spectrum or life cycle of genes, from birth to death. It turns out that there is an active literature on the birth of genes, which shows that, just like duplication processes, it is entirely natural for genes to develop out of humble, junky precursors. And no information theory needs to be wheeled in to show that this is possible.

Yeast provides the tools to study novel genes in some detail, with rich genetics and lots of sequenced relatives, near and far. Here is portrayed a general life cycle of a gene, from birth out of non-gene DNA sequences (left) into the key step of translation, and on to a subject of normal natural selection ("Exposed") for some function. But if that function decays or is replaced, the gene may also die, by mutation, becoming a pseudogene, and eventually just some more genomic junk.

The death of genes is quite well understood. The databases are full of "pseudogenes" that are very similar to active genes, but are disabled for some reason, such as a truncation somewhere or loss of reading frame due to a point mutation or splicing mutation. Their annotation status is dynamic, as they are sometimes later found to be active after all, under obscure conditions or to some low level. Our genomes are also full of transposons and retroviruses that have died in this fashion, by mutation.

Duplications are also well-understood, some of which have over evolutionary time given rise to huge families of related proteins, such as kinases, odorant receptors, or zinc-finger transcription factors. But the hunt for genes that have developed out of non-gene materials is a relatively new area, due to its technical difficulty. Genome annotators were originally content to pay attention to genes that coded for a hundred amino acids or more, and ignore everything else. That became untenable when a huge variety of non-coding RNAs came on the scene. Also, occasional cases of very small genes that encoded proteins came up from work that found them by their functional effects.

As genome annotation progressed, it became apparent that, while a huge proportion of genes are conserved between species, (or members of families of related proteins), other genes had no relatives at all, and would never provide information by this highly convenient route of computer analysis. They are orphans, and must have either been so heavily mutated since divergence that their relationships have become unrecognizable, or have arisen recently (that is, since their evolutionary divergence from related species that are used for sequence comparison) from novel sources that provide no clue about their function. Finer analysis of ever more closely related species is often informative in these cases.

The recent paper on human novel genes makes the finer point that splicing and export from the nucleus constitute the major threshold between junk genes and "real" genes. Once an RNA gets out of the nucleus, any reading frame it may have will be translated and exposed to selection. So the acquisition of splicing signals is a key step, in their argument, to get a randomly expressed bit of RNA over the threshold.

A recent paper provided a remarkable example of novel gene origination. It uncovered a series of 74 human genes that are not shared with macaque, (which they took as their reference), have a clear path of origin from non-coding precursors, and some of which have significant biological effects on human development. They point to a gradual process whereby promiscuous transcription from the genome gave rise by chance to RNAs that acquired splice sites, which piped them into the nuclear export machinery and out to the cytoplasm. Once there, they could be translated, over whatever small coding region they might possess, after which selection could operate on their small protein products. A few appear to have gained enough function to encourage expansion of the coding region, resulting in growth of the gene and entrenchment as part of the developmental program.

Brain "organoids" grown from genetically manipulated human stem cells. On left is the control, in middle is where ENSG00000205704 was deleted, and on the right is where ENSG00000205704 is over-expressed. The result is very striking, as an evolutionarily momentous effect of a tiny and novel gene.

One gene, "ENSG00000205704" is shown as an example. Where in macaque, the genomic region corresponding to this gene encodes at best a non-coding RNA that is not exported from the nucleus, in humans it encodes a spliced and exported mRNA that encodes a protein of 107 amino acids. In humans it is also highly expressed in the brain, and when the researchers deleted it in embryonic stem cells and used those cells to grow "organoids", or clumps of brain-like tissue, the growth was significantly reduced by the knockout, and increased by the over-expression of this gene. What this gene does is completely unknown. Its sequence, not being related to anything else in human or other species, gives no clue. But it is a classic example of gene that arose from nothing to have what looks like a significant effect on human evolution. Does that somehow violate physics or math? Nothing could be farther from the truth.

  • Will nuclear power get there?
  • What the heck happened to Amazon shopping?

Saturday, February 4, 2023

How Recessive is a Recessive Mutation?

Many relationships exist between mutation, copy number, and phenotype.

The traditional setup of Mendelian genetics is that an allele of a gene is either recessive or dominant. Blue eyes are recessive to brown eyes, for the simple reason that blue arises from the absence of an enzyme, due to a loss of function mutation. So having some of that enzyme, from even one "brown" copy of that gene, is dominant over the defective "blue" copy. You need two "blue" alleles to have blue eyes. This could be generalized to most genes, especially essential genes, where lacking both copies is lethal, while having one working copy will get you through, and cover for a defective copy. Most gene mutations are, by this model, recessive. 

But most loci and mutations implicated in disease don't really work like that. Some recent papers delved into the genetics of such mutations, and observed that their recessiveness was all over the map, a spectrum, really, of effects from fully recessive to dominant, with most in the middle ground. This is informative for clinical genetics, but also for evolutionary studies, suggesting that evolution is not, after all, blind to the majority of mutations, which are mostly deleterious, exist most of the time in the haploid (one-copy) state, and would be wholly recessive by the usual assumption.

The first paper describes a large study over the Finnish population, which benefited from several advantages. Finns have a good health system with thorough records which are housed in a national biobank. The study used 177,000 health records and 83,000 variants in coding regions of genes collected from sequencing studies. Second, the Finnish population is relatively small and has experienced bottlenecks from smaller founding populations, which amplifies the prevalence of variants that those founders had. That allows those variants to rise to higher rates of appearance, especially in the homozygous state, which generally causes more noticeable disease phenotypes. Both the detectability and the statistics were powered by this higher incidence of some deleterious mutations (while others, naturally, would have been more rare than the world-wide average, or absent altogether).

Thirdly, the authors emphasize that they searched for various levels of recessive effect, which is contrary to the usual practice of just assuming a linear effect. A linear model says that one copy of a mutation has half the effect of two copies- which is true sometimes, but not most of the time, especially in more typical cases of recessive effect where one copy has a good deal less effect, if not zero. Returning to eye color, if one looks in detail, there are many shades of eyes, even of blue eyes, so it is evident that the alleles that affect eye color are various, and express to different degrees (have various penetrance, in the parlance). While complete recessiveness happens frequently, it is not the most common case, since we generally do not routinely express excess amounts of proteins from our genes, making loss of one copy noticeable most of the time, to some degree. This is why the lack of a whole chromosome, or an excess of a whole chromosome, has generally devastating consequences. Trisomies in only three chromosomes are viable (that is, not lethal), and confer various severe syndromes.

A population proportion plot vs age of disease diagnosis for three different diseases and an associated genetic variant. In blue is the normal ("wild-type") case, in yellow is the heterozygote, and in red the homozygote with two variant alleles. For "b", the total lack of XPA causes skin cancer with juvenile onset, and the homozygotic case is not shown. The Finnish data allowed detection of rather small recessive effects from variations that are common in that population. For instanace, "a" shows the barely discernable advancement of age of diagnosis for a disease (hearing loss) that in the homozygotic state is universal by age 10, caused by mutations in GJB2.

The second paper looked more directly at the fitness cost of variations over large populations, in the heterozygous state. They looked at loss-of-function (LOF) mutations of over 17,000 genes, studying their rate of appearance and loss from human populations, as well as in pedigrees. These rates were turned, by a modeling system, into fitness costs, which are stated in percentage terms, vs wild type. A fitness cost of 1% is pretty mild, (though highly significant over longer evolutionary time), while a fitness cost of 10% is quite severe, and one of 100% is immediately lethal and would never be observed in the population. For example, a mutation that is seen rarely, and in pedigrees only persists for a couple of generations, implies a fitness cost of over 10%.

They come up with a parameter "hs", which is the fitness cost "s" of losing both copies of a gene, multiplied by "h", a measure of the dominance of the mutation in a single copy.


In these graphs, human genes are stacked up in the Y axis sorted by their computed "hs" fitness cost in the heterozygous state. Error bars are in blue, showing that this is naturally a rather error-prone exercise of estimation. But what is significant is that most genes are somewhere on the spectrum, with very few having negligible effects, (bottom), and many having highly significant effects (top). Genes on the X chromosome are naturally skewed to much higher significance when mutated, since in males there is no other copy, and even in females, one X chromosome is (randomly) inactivated to provide dosage compensation- that is, to match the male dosage of production of X genes- which results in much higher penetrance for females as well.


So the bottom line is that while diploidy helps to hide alot of variation in sexual organisms, and in humans in particular, it does not hide it completely. We are each estimated to receive, at birth, about 70 new mutations, of which 1/1000 are the kind of total loss of gene function studied here. This work then estimates that 20% of those mutations have a severe fitness effect of >10%, meaning that about one in seventy zygotes carry such a new mutation, not counting what it has inherited from its parents, and will suffer ill effects immediately, even though it has a wild-type copy of that gene as well.

Humans, as other organisms, have a large mutational load that is constantly under surveillance by natural selection. The fact that severe mutations routinely still have significant effects in the heterozygous state is both good and bad news. Good in the sense that natural selection has more to work with and can gradually whittle down on their frequency without necessarily waiting for the chance of two meeting in an unfortunate homozygous state. But bad in the sense that it adds to our overall phenotypic variation and health difficulties a whole new set of deficiencies that, while individually and typically minor, are also legion.