Can genes arise out of nothing? The intellegent design folks spent a lot of sweat and pseudo-math showing that that was absolutely impossible. But here we are anyhow. They got their physics and math wrong. New genes arise all the time, mostly from pre-existing genes, by duplication events which are rampant, given the capacity of biological systems to replicate their constituent molecules. The human genome carries vast fleets of genes whose origin is duplication over evolutionary time - hundreds of zinc finger transcription factors, hundreds of odorant receptors, not to mention tens of thousands of duplicated transposons and viral remnants. And yet, can genes arise from nothing at all?
A recent paper says that yes, many functional genes have come from completely non-functional DNA, rather than pre-existing genes. While not the same as assembling a gene from the primordial soup, an event that remains difficult to reconstruct while singular in its global impact, this claim does suggest that the long-term plasticity of our genomes and of biological functions is even higher than many biologists appreciate. These researchers use synteny as their touchstone- the tendency of genes to stay in the same place on chromosomes through time, to conclude that most genes that lack homologs in other species did not arise by duplication, but by the conversion of some junk DNA to a functional state.
Syntenic relations of some of the human chromosomes, with those of chimpanzee. Lines indicate concordant / homologous positions. Note several massive inversions, and a few smaller segments that have jumped from one location to another. But on the whole, our genomes are highly similar in gross structure. |
Humans and chimpanzees have strongly syntenic chromosomes, since we are so closely related. Most chromosomes line up precisely, with a few dramatic inversions (places where a portion of a chromosome in one lineage flipped orientation by recombination), and a few gaps and migrations of segments to new locations. This means that it is easy to trace which gene is ancestrally related to which gene in the other species. But not just genes, all nearby portions of the DNA are similarly lineally related, even if they are not well-conserved, as the cores of genes typically are. The researchers used human, fly, and yeast lineage tracing, benefiting from the large numbers of genomes that have now been sequenced from closely related species. This allowed them to determine the origin of novel genes lacking homologs among other species, but situated between normal, and normally homologous, genes. Either that novel gene arose in place, from the materials available, or else it came from elsewhere as a duplication or gene conversion event, with recognizable antecedents.
At a gene with no recognizable homolog (green), synteny helps to tell us that its origin was from a pre-existing gene, not from junk DNA. |
Given all that information, one can then ask- did this gene decay from some known gene that is homologous to others among many species, and if so, how long did that decay take? At this point we need to define gene similarity. Typically software programs can give quantitative answers to how similar two protein sequences are, or two nucleotide sequences. But there is a twilight zone where similarity is so low that it can not be computationally recognized- like a game of telephone after too many transfers. But that does not mean that the two sequences are not lineally related, or even that they don't have the same function. There are many examples of protein pairs with no discernable sequence similarity, but very similar structures and functions. So evolution can go places our computers can not quite follow, though that may change once we solve the protein folding problem.
The researchers show that this time to gene decay is much faster in flies and yeast than it is in humans. What takes 200 million years in yeast or 400 million years in flies (10% of lineage-ancestral, syntenic genes decayed to unrecognizeable similarity) takes an extrapolated 2 billion years in human genes. This may be due to the vastly different generation times of these species, considering that meiosis may be the most likely time for genome rearrangements.
The next question was- how many of the novel genes across the genome came from that decay process of pre-existing genes, and which did not, rather (by default) coming from de novo origination out of the local DNA segment? It is a complicated question, a function of how one calculates similarity, and models synteny across related species. Do lineages where the matching syntenic DNA disappeared rather than decayed count towards the de novo origin hypothesis, or do they count as similar DNA that supports the decayed gene hypothesis? Since one partner in the homology pair is absent, the analysis depends on having enough other lineages fully sequenced to figure out what happened in detail. The authors' conclusion is that, on the whole, only one-third of novel genes arose from decay processes, and the rest arose de novo. That is a stunning conclusion, and sort of buried in the paper, which focuses on the decay processes that are easier to analyze, and comprise all the figures.
Unfortuntely, their logic breaks down when it comes to this conclusion. Yes, genes degrade to various degrees over time when they fail to see strong selection for function. That is given. But their key assumption is that their derived rate of gene decay at syntenic positions (let us say X) can be extrapolated over the entire genome. They thus claim that since, from separate analysis, Y is the number of genes in the entire genome that are novel (or orphan, lacking recognizable relatives), that Y - X is then the proportion that did not degrade from pre-existing genes, but rather arose denovo from other non-gene genetic elements. From this, they offer an estimate of roughly Y = 3*X, leaving 2/3 of Y coming from somewhere else, presumably de novo formation. The problem is that degradation of a gene at a syntenic position is a special case, compared to the also quite frequent duplication of genes and other sequences to distant locations which is another source of pseudogenes and ultimately of gene degradation and novel or orphan sequences. The mutation rates that apply to these cases are likely to be different, because the syntenic case never involves gene duplication, at least not in the recent past, by definition. Duplication is far more likely to lead to an immediate loss of function and selection than is degradation in a syntenic location.
So I do not think we can conclude what this paper (and an accompanying review) claim. They have not demonstrated at all the de novo origin of novel genes, but only suggested such origins from highly questionable negative evidence. Nevertheless, the topic is an interesting one, and someone is likely to study it with more care than was done here. Many tiny open reading frames and other stray genetic proto-elements litter our genomes, and other studies have shown that practically all of them are expressed at some level, at least as RNA, if not as proteins. So the question remains- whether and at what rate any of them gain an actual selected function, rising to the level of a gene of significance to the organism.