Saturday, April 4, 2026

Not Every Transcript is Golden

 Reflections on junk DNA, and junk transcripts.

Some time ago, a large project in molecular biology determined that most regions of the genome are transcribed. The authors and most observers took this to mean that most regions are functional, quite in contrast to the reigning theory up to that point, that our genomes host a smattering of genes floating in a sea of "junk" DNA. That theory was based on the now-ancient observations of reannealing curves for bulk DNA from humans and other species which found that most of our DNA re-anneals very quickly, due to the fact that it is repetitive. Most of our genomes (60%) are taken up with LINE repeats, SINE repeats, old retro-transposons, stray duplications, and other repetitive material that, at a first glance, seems like junk. There has been a battle ever since, between proponents of junk DNA and those who see function around every corner. As we learn more about the genome, many more functions have indeed come to light, like distant enhancers and regulatory RNAs of many flavors. But overall, there still seems to be a lot of junk. 

A recent paper took an oblique shot at this field, looking at the profusion of alternative gene transcripts, which can number into the hundreds for a single gene. (This was also reviewed.) These are generally called isoforms, and arise due to variable ways one gene's RNA products can be initiated, terminated, and spliced. So not only are most regions of the genome transcribed in some form, actively transcribed regions can be transcribed and processed in myriad ways to lead to different RNA products. Here again, there has been an analogous argument, about whether every such isoform has a function, or whether isoforms might arise from more or less sporadic processes, often as unintended and non-functional sparks coming out of the machinery. The importance of isoforms is very well documented in many cases, so the possibility of function, sometimes highly conserved, is not in question. Only the importance of every last variation in combinatorial collections of isoforms that can number into the hundreds.

Here is an image from the first page (of about six pages) of RNA transcripts coming off the notorious BRCA1 gene, which is intensely studied for its role in breast cancer. Each line is a distinct mRNA transcript. Each darker bar is an exon, which are separated by introns. The darker colored exons are in the protein coding region, while the lighter exons signify the untranslated upstream and downstream ends. I count about 315 transcripts described for this genetic locus. The idea that each of these has some evolutionarily constrained and important function is, on the face of it, absurd.

The authors took an interesting evolutionary approach, reasoning that species with larger population sizes experience more stringent purifying selection, and thus should (in theory) show tighter control over stray genomic products such as isoforms, if most transcript isoforms are neutral (or even deleterious) accidents, rather than intentional and functional forms. Thankfully, animals come in a wide range of population sizes, from insects to crocodiles and primates; very large to very small. While population size is hard to calculate, several convenient proxies are known, like lifespan, body size, etc. When they totted everything up, they saw clear correlations between these proxies and the number of alternative RNA products per gene- also termed transcript diversity. They sliced up the data by organ where the RNA was expressed, and by the source of the RNA variation- either different initiation, different termination, different splicing. In all cases the trend was the same. In species with larger population sizes, the diversity of transcripts was lower, agreeing with their hypothesis that when greater selecive force is available, the slop from the transcription and transcript processing machinery declines.

The authors draw correlations between alternative splicing (AI) diversity in an organism's cells and its population size. 

The authors additionally note that there is a similar relationship between alternative splice site usage and expression level of a gene. That is, the higher the gene expression, the less likely that minor splice sites are used, indicating that here again, higher selective pressure helps to clear out non-functional off-products of the transcription apparatus.

The correlations found here are only that- correlations. While significant, they are not terribly strong, let alone stark. So it is evident that our gene expression machinery has a lot of play in it, and this falls on a spectrum from deleterious to critically functional. It is, after all, machinery, not divine. It is also grist for evolution itself- it is useful to have some slop so that there is always some diversity in the gearing to accommodate new selective pressures. But the idea that just because a distinct transcript exists, it is biologically functional, or that, similarly, because a genomic region is transcribed, it is a "gene" rather than junk DNA.. that does not hold water. Every nucleotide in the genome has its own unique selective constraints, and for many of them, that constraint is zero.


  • The world order, and our position in it, is crumbling.
  • Whence Hungary?
  • Another AI tax, as if gobbling up power wasn't bad enough.
  • Mindless.