Saturday, October 31, 2020

LncRNA: Goblins From the Genomic Junkyard

One more addition to the zoo of functional RNAs.

A major theme over the last couple of decades of molecular biology is the previously unanticipated occurrence of many sorts of small and not so small RNAs that do not code for proteins. In addition to buttressing the general proposition that RNA came early in the history of life and retains many roles beyond being merely the conveying medium of code from DNA to protein, these novel RNAs illuminate some of the complexity that went missing when the human genome came in at under 20,000 protein coding genes.

The current rough accounting of human genetic elements. While we have only 19,954 protein coding genes, we have a lot of other material, including almost as many pseudogenes, (dead copies of protein-coding genes), and even more RNA genes that do not code for proteins. The count of mRNA transcripts is high because splicing of a protein-coding gene is frequently variable and can generate numerous distinct mRNA messages from one gene. Similarly, the start and stop points of transcription can be variable for both mRNA and lncRNA genes.

RNAs perform many functions, such as the catalytic core of the ribosome, the amino-acid complementary-coding role of tRNAs, the catalytic core of the mRNA splicing apparatus, guides to edit and modify ribosomal RNAs, and miRNAs that repress expression of target genes. LncRNA stands for long non-coding RNAs, which were discovered by global analyses of RNA expression. Long transcripts were found that did not have protein coding frames, did not clearly derive from degraded pseudogenes (degraded copies of protein coding genes), and which occasionally still had significant conservation and thus evident selective constraint and function.

Another piece of background is that these expression analyses have found, as the technology advanced in sensitivity and comprehensiveness, that most of our genome is transcribed to RNA. Not only do we have a large amount of junk DNA, like transposons, repetitive elements, pseudogenes, intron and regulatory filler, etc., representing at least 90% of the genome, but most of this DNA is also transcribed at a low level. We have quality control mechanisms that dispose of most of this RNA, but there have been partisans of another perspective, particularly among those who first found all this transcription, that these transcription units are "functional", and thus should not be dismissed as "junk".

Eugene Koonin is not of that persuasion. His recent review of this field, and of lncRNAs in particular, with Alexander Palazzo, generates an extremely interesting model of why most of this is junk, and how such junk RNA can occasionally gain function. Some lncRNAs are important, typically helping nearby genes stay on. One of the most significant lncRNAs, however, represses its nearby genes, and is central to the process of X inactivation. XIST is 17,000 nucleotides long- very long for a non-coding RNA- and binds to dozens of proteins including chromatin remodeling enzymes and X-chromosome scaffold proteins, all in a byzantine process that shuts down the extra X chromosome that females have. This prevents the genes of that chromosome, which encompass many functions, not just sex-specific ones, from being expressed two-fold higher in females than in males.

How to make sense of all this? How can there be many thousands of lncRNAs, but only a few with function, and those functions rather miscellenous, typically local, and centered on transcriptional regulation? The tale begins with one of the many quality control features of the transcription apparatus. When a gene is transcribed, the polymerase as it goes by deposits chromatin marks (on the local histones) that prevent other transcription complexes from initiating within the gene. This prevents extra initiation events that would produce truncated proteins, which can sometimes be very harmful, lacking key regulatory domains. So the theory posits that much of the stray transcription of junk DNA through the rest of the genome, especially in the form of long lncRNAs, has a similarly repressive effect, reducing local initiation within those "gene" bounds. This might be particularly helpful to prevent interference with regulatory events happening in those regions, controlling transcription through the region rather than allowing it to happen sporadically all over.

As a first step, it is innocent enough, and not likely to have strong selection constraints, typically of a low level, and perhaps eventually responsive to some regulatory events, depending on the needs of the nearby coding gene. Nor would the lncRNA that is made have any function at all. It would be junk very literally, would not get spliced, or exported out of the nucleus, and probably get degraded promptly. Its sequence is under no particular selection, and would drift in neutral fashion. The second step then happens if this RNA were to gain some kind of function, such as binding some regulatory protein. There are many RNA and DNA binding proteins in the cell, so this is not difficult. Xist binds to over 80 different proteins. These proteins then might have local effects, as long as the lncRNA remains attached to its own transcription complex during its own synthesis. Such effects might be activating the nearby gene, loosening or tightening nearby chromatin. Given the (arbitrarily large) size of the lncRNA, and the typically small size of nucleic acid binding determinants that proteins recognize, there is little limit to how many such interactions could be accumulated over time, always subject to selection that likely centers around fine-tuning of effects on nearby genes. Indeed, this regulation could allow the relaxation or loss of more proximal regulators, making the lncRNA increasingly essential. After enough interactions accumulate, the lncRNA may remain tethered to local landmarks, and its activity persist after its synthesis ends, prompting selection against its degradation.

In this way, increasingly elaborate mechanisms can be built up, out of very modest selective effects, combined with a lot of drift and exploration of neutral mutational space. This theory provides a rationale for what we are seeing in the lncRNA landscape- a huge number with little to no ascertainable function, but a few that have grown into significant regulators of their local or extended chromatin landscape. It also informs the mechanism by which they function, not as some new exciting mechanism of action by a discrete RNA species, as was found with miRNAs, for instance, but rather an agglomeration of adventitious interactions that will be different in each case, and highly variable in effect.
"Indeed, although complexity in biology is generally regarded as evidence of “fine tuning” or “sophistication,” large biological conglomerates might be better interpreted as the consequences of runaway bureaucracy—as biological parallels of nonsensically complex Rube Goldberg machines that are over-engineered to perform a single task"