When looking at evolution in the genetic code, we tend to focus on the most conserved elements which can take us the farthest back in time. Essential proteins like metabolic enzymes that we recognizably share with bacteria, or the translational apparatus of ribosomal RNA and related proteins, shared in all life forms. But to look at more rapid and recent evolutionary processes, one has to look at other parts of the genome which churn a lot faster and can be, incidentally, far more difficult to discern.
The DNA sites that direct transcription tend to be such fast-changing genetic elements. They are short, often redundant, degenerate (able to accommodate various errors) and modular, meaning that they can occur in various positions and orientations with respect to their target gene and still work. Their shortness means that they can be born by mutational accident, and also that they are hard to recognize by text-comparison methods.
|Human-mouse comparison of three protein regulators, (GATA, MAX, cMyc), and a brief stretch of DNA they bind to (red and blue). The blue MAX site in the middle was born in the primate lineage, while the others are recognizably conserved. From the top, the general coordinates of the genomic region, over this gene, EPB41, which encodes a protein that provides special flexibility to erythrocyte blood cells. Then various annotated features (colored bars) looking for regulator binding sites, then graphs of the physical binding data for each regulator (colored graphs).|
They are also enormously important for evolution and biology in general. There are many hundreds of gene regulating proteins encoded by the genome, each of which bind to some DNA site, typically 6 to 16 base pairs long. These protein+site complexes constitute the first line of gene regulation, and, though virtually every possible aspect of biology can be bent to regulatory uses, typically the most sensitive and influential mechanism of gene regulation.
"... there are an estimated ~1700–1900 TFs [transcription factors, or regulators] in the human genome."A recent paper discussed a new-ish method to study the phylogeny of such sites. The first step is to use a modern technology to find such sites, physically purifying such individual regulatory proteins while they are still bound to DNA in a cell, and later sequencing the underlying DNA snippets. This allows, for a single regulatory protein, all its target sites in a genome to be mapped (at least the sites being used under the condition used to grow the cells for the experiment, in this case erythroblast cell lines). The researchers did this (or actually got data from others) for several different transcriptional regulators in human and mouse cells.
Then they used sequence comparison methods to deduce the history of these sites in the lengthy time of divergence between the two lineages, over the sequenced genomes of baboon, chimpanzee, rat, squirrel, tarsier, and other species. They used not only sequence conservation of the sites, but larger-scale studies of how the genomes relate to each other, called synteny analysis. This depends on large regions of diverged genomes being, at least on a patchwork basis, descended from each other, even if some of the sequences they contain are not so recognizable. Over time, various accidents in recombination and replication cause genomes to slowly re-arrange relative to their ancestors.
The finding of this paper that is of general interest is: "Notably, between ~58–79% of all human TFBSs [transcription factor binding sites] had inferred origins after the human-mouse split." This is far more rapid change than one would see in encoded proteins, of which about 80% are recognizably shared between mice and humans. It follows that the regulation system, which controls where, when, and how much genes are expressed, is far more variable through evolution, than are the products of those genes. This makes sense when we see the slight variations over human populations and among closely related species that tend to concern relative sizes of bodies and parts, slightly more or less some some feature, coloration, etc. It fits very well with the typically gradualist nature of evolution, operating on thousands of genes and hundreds of thousands, if not millions, of their regulatory sites all in parallel over populations and time.
"For all six factors analyzed, the majority of human TFBSs [transcription factor bound sites] bound in vivo were originally absent in human-mouse common ancestor, which is consistent with previous cross-species comparisons noting substantial divergence in ChIP-seq protein-binding events across the two species and similar comparisons presented here, and is also comparable to detailed analyses conducted in Drosophila using alternative approaches"
"Genes located nearest to hominid-specific binding sites were more frequently enriched for neural and sensory-related functions, and were in many cases involved in neurological pathways (Table S2). CTCF, MYC, and SOX2 target gene sets were all enriched for GO categories involved in sensory perception, while GATA1, MYC, ETS1, and MAX were enriched for neural development and differentiation categories."
- Terrorists win ... in the US.
- And in Afghanistan.
- Annals of feudalism: the no-compete "agreement".
- A jobs shortage, not a skills shortage.
- The next financial crisis may come sooner than we think, at least to poor people.
- In praise of helicopter money.
- The shame of "rocket scientists" who work in finance / fraud.
- This week in WSJ: "... the Obama economy ..." The definition of chutzpa and hypocrisy is to cause an economic catastrophe, then do everything in one's ability to stifle effective action against it, then blame the other party for the result.