How activation of RNA transcription works.
One of the great themes of molecular biology that was established quite early was the notion of flexible regulation over gene expression. Only eight years after the discovery of the structure of DNA, and contemporaneous with the discovery of the genetic code that it harbors, Jacob and Monod proposed the operon hypothesis, whereby proteins responsive to outside conditions or other important circumstances (in their case, a protein that bound the nutrient lactose) control the mechanism of transcription of the RNA message from the DNA genome. Jacob and Monod dealt with a repressor, (called lac), which sits on a gene that encodes various lactose import and metabolic proteins. When lactose binds, this protein detaches from the DNA and releases transcriptional repression, allowing the regular apparatus to bind and start mRNA production.
This focus on repressors misled the field for a little while, since it turns out that there are activators as well, in bacteria as well as very abundantly in eukaryotes. New mechanisms of transcriptional repression and activation keep being found, but the main themes have become reasonably settled. Humans encode an estimated 1500 proteins that bind DNA at specific sites and regulate transcription. Add to that all the other apparatus of generic transcription, chromatin management, and indirect regulators, and easily an eighth of our genome functions in transcriptional regulation, which is the dominant mode (though far from the only one) of distinguishing cell types, tissues, and developmental states from each other, responding to stresses and hormones, and generally managing the dynamic internal diversity that comes with being a multicellular organism.
Historically, similar realizations were being made on the genetic side, for instance when Edward Lewis studied a large developmental locus of Drosophila which he called the ultrabithorax complex. Rearrangements and mutations in this region caused numerous transformations of body parts, with such fine gradations, complex patterns, and wide-ranging effects that he concluded that these loci encode regulators of other genes, and that the locus itself was revealing further complexities of the regulation of these regulators. This work began to be published in the fifites, and onwards through the seventies.
One particular focus of studies of transcription over the last few decades has been how activation works. The 1970s saw a dawning realization that many regulators, especially in eukaryotes, activate transcription rather than repressing it, and do so in a synergistic way from modular cassettes of DNA binding sites (called enhancers) that could be tens of thousands of basepairs from the gene being regulated. This led to a looping model, where enhancers bind a mix of proteins that may be specific to some developmental event or stage, and which may comprise both activators and repressors. This DNA-bound complex of proteins then loop around the intervening DNA to help assemble the generic transcription machinery at the start site of the gene. If the activators outweigh the repressors, (in a fashion that is not at all systematic or rule-based, but is graded in its activity, and still not fully understood), then that machinery fires off, polymerizing its way down the gene, extruding the mRNA message as it goes.
But what is the nature of this touching interaction? Many "activation domains" have been isolated, and there hasn't been a very informative theme to emerge from these studies, at least no universal protein sequence pattern. Indeed such domains tend to be unstructured and poorly conserved, with hints of negative charge and hydrophobic character- not much to go on. On the receiving side, among a plethora of studies of the core transcriptional apparatus, a dedicated complex called "mediator" was found to be a central receiver of activating interactions. Since it has over two dozen components, it explains to some degree the lack of uniformity among the activating protein domains reaching from enhancers.
A recent paper wraps up some of this story by characterizing activating domains in rather thorough fashion. Most interactions in the cell, such as those between DNA-binding proteins and their sites on DNA, are very specific and detailed lock-and-key interactions that make use of steric and electrostatic complementarity. But the activation domain interaction is more of a velcro-like affair, where a broad surface of hydrophobic amino acid side chains, supplemented by a fringe of basic amino acids, binds activation domains that are, as mentioned above, characterized as largely hydrophobic, unstructured, and peppered with acidic residues. The principal target (75% of the time) of all these activation domains is one protein, mediator MED15 (also called GAL11), which has four receiving domains, though other targets exist. This all means that activation can add up synergistically- the more activators are available, the more can bind, and the more strongly bound the whole complex is. It is the perfect system to accomplish graded, sensitive adjustments of activation from modular sets of activators that vary over developmental as well as evolutionary time.
These researchers used yeast cells to comprehensively find all available activation domains from all plausible DNA-binding proteins (164 in this smaller genome). They tested a tedious series of 53 amino acid-long pieces from these proteins to find specifically active protein segments, and then put them through empirical tests and computer comparisons, ultimately developing a neural network model of what makes a successful activation domain. While they did not find anything unprecedented, they characterized what they found with much greater thoroughness, and then docked it in structural terms with the other half of the interaction, the protein mediator 15
This shows two of the four domains on GAL11 which recieves the binding of activator domains. The colored dots indicate where specific amino acids of transcriptional activator domains dock, computationally. Blue is the charged aspartic acid, docking with positively charged target sites, while yellow and red are the hydrophobic phenylalanine and leucine, respectively. The point is that the locations form a diverse cloud, not fitting specifically into any particular matching structures, and also that the interface is relatively flat, ready to accommodate a similarly flat and variable- possibly even unstructured- activating domain.
What was particularly powerful was this group's computer algorithm that was able to find other activation domains, thoughout the human genome, with an accuracy of >80%. This is valuable both as a way to better characterize our genomes and biology, and also to demonstrate a more thorough understanding of how these domains work. Once the GAL11 protein is anchored to an enhancer or set of enhancers via the activating interactions, it (as part of a very large complex of its own- over 20 proteins) loops over to the transcription start site, where it cooperates with other proteins that are able to open up the local nucleosomes and other debris, and attract or activate an RNA polymerase, including extensive phosphorylation of the polymerase's tail. This prompts release of the initiating polymerase from the start site and disassembly of the initiating mediator+other proteins complex, to go off and activate some other gene elsewhere.
- Cooperative corporations.
- How the Taliban celebrates Ramadan.
- Darwin redux: breeding like rabbits, for god.
- Herd-immunity and eradication of Covid? Highly unlikely.