In vitro evolution has interesting things to say about protein structure, evolution, and even AI.
The advent of DNA sequences has been revolutionary in many ways. It has been technologically transformative, is changing medical practice, has radically validated Darwin's theories of evolution, and has allowed much more accurate phylogenies to be drawn out of the history of life. As Dobzhansky said, nothing in biology makes sense except in the light of evolution. The quest to change those DNA sequences has been another technological frontier, now exemplified by the CRISPR genome editing methods. Geneticists have been inducing mutations forever, (well, for over a century), using insults like mustard gas and X-rays. This long-standing tradition is called a "screen", where, after mutagenesis, one looks for particular effects on the resulting organisms, like changes in color, malformations, defects in development. This is a sort of artificial selection, very highly directed by the experimenter, sometimes resulting in some very weird, if informative, organisms. More recently, biotechnologists have been using directed evolution systems to help develop, through a mix of random and semi-directed mutations, more capable enzymes and other proteins.
But there are many broader questions to ask about the mutational and evolutionary processes. A recent paper demonstrated an interesting mutagenic system hosted in brewer's yeast cells, which can model rapid evolution under a variety of selective constraints. The core of the system is a plasmid, replicated separately from the main genome, by an independent enzyme. This plasmid was found in a distantly related yeast, Kluyveromyces lactis, and encodes its own DNA polymerase that operates independently from the genomic replication system. This opened the way to use the plasmid replication system to host genes of interest and subject them to wildly different (which is to say faster) mutagenic rates than the rest of the organism.
This group has been laboring on this system for several years, and this paper is the culmination, developing a series of plasmid DNA polymerases that have extremely high error rates, while also having high replication activity, and also having a balanced spectrum of error types (that is, G>A as well as G>T, etc.). Indeed, they demonstrate that the error rate (of about 2 errors for every 10,000 bases replicated) is at the threshold of mutational breakdown- the level that is so high that the plasmid's other functions (which are maintained implicitly by purifying selection on activities such as expression of an antibiotic resistance gene/protein and the polymerase itself) are so rapidly impaired that the engineered system can not survive. The error rate of the host cell, in contrast, is about 1 error for every ten billion bases replicated.
What is the point of all this? While, as pointed out above, directed evolution systems and mutation/selection systems have been around for a long time, this is something quite different. This plasmid system creates high rates of mutation all the time, over a very confined target (the plasmid). The experimenters can then decide what kinds of selection pressure to put on their target gene, if any. They can place a positive selection regime on it, to drive the development of, say, a new substrate specificity for an enzyme. They can put it under negative (purifying) selection to maintain its current activity. Or they can let it spin with no selection at all, letting it degrade into a pseudogene unable to code for anything. All of these scenarios are common in nature and of interest to evolutionary biologists.
In this paper, the authors focus on one enzyme, tryptophan synthase, from a thermophilic bacterium. The aim was to see how this enzyme responded to both positive and negative selective forces in the face of high mutation rates. As it converts one nutrient, indole, into another, tryptophan, this is an enzyme whose activity is easy to assay for and to select for. In the main experiment, using many replicate cultures, they started with no selection for fifty generations, then ramped gradually to positive selection over the next hundred generations, and finished with 300 generations of purifying selection.
Diversity, for one thing, had increased tremendously by the end of this process. At the end, an average of 21 amino acid changes had accumulated, with the most divergent proteins differing by over 60 amino acids, in a protein that started with 398 amino acids total. Secondly, there was a marked migration to net negative charge, which they speculate was due to accommodation of this thermophilic bacterial enzyme to a more temperate environment where it is a bit more difficult to evade agglomeration with other proteins. Third, changes happened more on the outside of the enzyme structure than the interior (image below). This is a very well-known and understood phenomenon, where selective constraints are much higher on interior packing of a protein and on active/catalytic site portions. Several key amino acids that contact the substrate chemicals are colored gray, meaning that they hardly varied at all in this experiment.
| Structure of the TrpB enzyme, color coded for change during the evolution experiment. Note how particularly high rates of change happen in one external region (bottom) that interactions with a partner TrpA, which was not present here. Also, gray areas with very low change tend to be in the interior and near the catalytic active site (substrate and cofactor [pyridoxal phosphate] shown in black). |
Overall, the rates of mutagenesis created here over a few months in one protein approximate the kind of divergence seen between proteins of humans and mice, which have diverged for about sixty million years. The same studies one can do on such naturally diverged proteins, such as locating selectively important amino acid residues, or comparing activities of highly divergent enzymes, or studying structural constraints, one can do here on artificially evolved enzymes. And this is a general system that could be (with appropriate assays and technology) extended to many other proteins and RNAs of interest.
One thing it can't do, however, is validate machine learning models. The researchers tried to get machine learning models that had been trained on this TrpB enzyme to classify their derived mutants. But this was almost completely unsuccessful, since machine learning (AI) systems only regurgitate what they are trained on, and can not creatively judge novel conditions.
"Although sequences that were predicted to have low fitness did exhibit little or no function in our enrichment assay, we found essentially no correlation between the predicted scores and the real enrichment scores of high-function TrpBs. For example, the highest predicted score was assigned to the nearly nonfunctional TmTriple variant."
It is important to appreciate the significance of this new mutation system, which is far more comprehensive, and a closer model of actual evolution, than are the genetic screens of yore. There, one was hunting for the "hopeful monster" resulting from one shot of X-rays, that might generate an informative phenotype- maybe by killing a gene needed for red eye color, or amplifying expression of a gene for drug resistance. Here, the levels of negative and positive selection can be subtly adjusted in a background of continuous high mutation pressure simulating millions of years of evolution, and resulting in extensively transformed target molecules.
- Total lies come naturally to RFK Jr., as to so many in this administration.
- With the help of crypo, our banks are not-so-unwitting conduits for crime.