Saturday, November 22, 2014

From Weed to Maize

A large-scale investigation on the evolution of corn finds lots of regulatory change.

When Darwin wrote his book on the origin of species, his strongest examples came from pigeons, which at the time were very popular domesticated animals. Just like dogs and cats, pigeons displayed a profusion of breeds and characteristics, all quite clearly descended from a single progenitor species, by way of artificial selection. The speed of artificial selection is amazing, but its relentless focus on desired, superficial traits can lead to problems in temperament, disease susceptibility, and subtle congenital defects.

As mentioned in a recent post, most evolutionary change takes place in regulatory relationships within the genome, rather than as structural changes in encoded proteins. Fine-tuning the binding site of some transcriptional regulator, or moving its site nearer or farther from a gene, tends to have smaller, graded effects on the organism than a change, for example, to that same transciption regulator's own protein sequence, which may affect its interaction with to thousands of sites all over the genome.

A recent paper took a deep dive into the changes that happened in the maize genome on its way to our tables as the king of American agriculture. They reiterate the power of small scale change in a gene's regulatory elements, which they term the cis elements, which is to say, mutations in the DNA local to the gene, typically in upstream sites that bind various regulatory proteins which promote or repress transcription.
"Changes in the cis regulatory elements (CREs) of genes with functionally conserved proteins have been considered a key mechanism, if not the primary mechanism, by which the diverse forms of multicellular eukaryotic organisms evolved. Variation in CREs allows for the deployment of tissue specific patterning of gene expression, differences in developmental timing of expression, and variation in the quantitative levels of gene expression. Furthermore, modification of CREs, as opposed to coding sequence changes, are assumed to have less pleiotropy and consequently have a lower risk of unintended deleterious effects in secondary tissues. The importance of CREs for the development of novel morphologies is supported by the growing catalog of examples for which differences in gene specific CREs between closely related species contributed to the evolution of diversity in form."

The authors sequenced a large crop of RNAs from the tissues of maize and from its ancestor teosinte, to see how their genes are expressed, and, in combination with knowing the genomic DNA that had been sequenced previously, whether changes in gene expression could be tied to specific genome mutations that happened during domestication. The maize genome has more genes than that of humans, 39,423, and 17,579 of them had sufficient expression in these tissues (the RNAs came from the immature ear, the seedling leaf, and the seedling stem) to be analyzed. To give an idea of the scale of current technology, they gathered roughly four billion sequence reads from their RNA libraries.

The majority of the genes they analyzed (82%) were expressed in each of three tissues, while about three percent each were specific to only one or two tissues. The main point of the paper was to attempt to figure out which genes had changed in expression between teosinte and maize, and further, what had caused this, either mutations local to the altered gene, (acting in "cis"), or mutations to DNA far away (acting in "trans") that encodes regulatory proteins whose alteration would affect many other genes as well.

To do that, they used hybrids between teosinte and maize, sampling their RNA as well. In these hybrids, versions of the same gene (alleles) from each parent co-habit in the same cell. So if their expression remained different, it could be chalked up to local effects on each allele's DNA. Conversely, if their expression became similar, (while being different in the parental strains), then the parental difference is likely to be due to regulators that are encoded elsewhere and affect the sampled gene similarly, whatever its origin and local sequence. A very clever scheme, one has to say.

Master graph of genes (dots) assigned to categories of regulatory change, either local to the gene sequence (cis, in black), or due to changes in a non-local regulator (trans, in red). The conclusion is based on the gene's respective behavior when co-housed in the same plant, i.e. the hybrid progeny of a maize X teosinte cross. The logs on each axis refer to logs of the ratio between maize and teosinte, in either the parents (X axis) or in the hybrids (Y axis).

The identity / parentage of the alleles in the hybrids could be kept straight by way of minor DNA variations sprinkled throughout the sequences of their expressed RNA. Teosinte and maize have been separated by about eight thousand years, enough time for quite a few (mostly silent) mutations to accumulate in each genome. But the interesting differences between them would be those that were specifically selected in maize to make it into the dramatically different plant it is today- stalk branching, ear size, ear morphology, growing speed, hardness of the seed, etc. What were those mutations and how can they be found? This paper unfortunately does not get to that detail. They note that 70% of all the genes showed significant changes in expression, and that the sets of differently expressed genes were ~70% different in each of the three tissues. All of which is quite remarkable.

What they are more interested in is defining large sets of genes that might be interesting as ingredients of the special properties of maize. To start, they assume that genes under selection pressure would have had local changes to their regulatory DNA. This is not entirely correct, though. Some far-away change might have been selected for if it had strong effects regulating some target gene / trait, without having too many side effects. While this is difficult to imagine and likely rare, it is by no means impossible or without precedent. Nevertheless, they bundle up all the genes with local or local + distant changes, and call them their "CCT" set (for cis and cis+trans changes in regulation profile). These are the black and purple dots on the graph above, and amounted to about 5500 genes.

They further filtered that set by asking for high consistency and high expression over all their samples, (or different parental and hybrid cross strains), and came up with sets of varying stringency, from very few (69) genes to a much less stringent set (~2326) genes. This had the defect of discounting genes whose expression was very low, either before (in teosinte), or after (in maize). Anyhow, it was a rough-and-ready method to whittle down their data to some interesting candidate genes, depending on how stringently they set the dials. One problem was that gene expression is naturally more variable in teosinte, being a genetically diverse and wild plant, (despite their using inbred strains, which must not have been quite as inbred as they thought), than it is in maize, being heavily in-bred and virtually clonal.

The larger the expression difference of a gene between teosinte and maize (X axis), the more likely that difference is due to local "cis" regulatory effects (Y axis). This is reflected also in the previous graph of genes with higher expression differences on the higher slope lines.

The rest of the paper, unfortunately, is a litany of woe, as they find that their sets of specially selected genes do not agree very well with those that other researchers have isolated using other methods. For instance, one group used a micro-chip based method with fixed DNA samples detect RNAs that are expressed differentially between modern maize and teosinte, and found their own list of such genes:

"However, the absolute level of correspondence between the two studies is rather low. For example, of the 350 leaf genes identified as DE [differentially expressed] by RNAseq [the current paper's method], only 24 (7%) were also identified by the microarray study [the other paper's method]. Thus, while the overlap between our two studies is statistically significant, the two methodologies resulted in largely different lists of DE genes."

It is somewhat depressing that this many years into the genomic age, the large-scale technologies being touted and used to gather presumably quantitative gene expression data of this sort can generate such divergent results. Technically, I believe this is due to their need to have high expression under all conditions, which is contrary to most of the other methods used, which prize very high contrast, i.e. very low expression in one sample vs higher expression in another, to identify candidate genes. Nevertheless, each collection of genes must have some gold amongst the placer and thus this paper is surely the career-building effort of some post-doc who will give job interviews on the ambition of panning through these genes to find ones that have individually significant effects on the unique properties of maize.
"This study shows cis and trans regulatory differences account for ~45% and ~55% of regulatory divergence between maize and teosinte, respectively (Table S1). These values suggest relatively equal contributions of these two mechanisms to regulatory divergence. However, this ignores the contribution of cis effects to large expression differences where cis accounts for nearly 80% of the expression divergence."

A final interesting point is that roughly half the expression differences were traceable to the "trans", or non-local, mechanism. This might seem to go against the assumption outlined above that local mutations in gene regulatory sequences should predominate, but it may take only a few individual changes in regulators or their networks to cause changes in the expression of many of the genes assayed here, while each expression change classified as "cis" or local requires a separate change to that gene's sequence. So the overall number of local regulatory changes in this data set will vastly outnumber individual changes elsewhere, and the authors note additionally that the expression changes that were quantitatively highest were virtually all due to local mutations.

  • Similar story for the deeper divergence between mouse and human.
  • Has religion outlived its usefulness?
  • Reza Aslan: No, and let me present a diatribe about that.
  • A notable podcast on the role of philosophy, relations to science, and ... is there progress?
  • Inheritance ... another feudal, antisocial practice.
  • Perjury- the new frontier in mortgage fraud.
  • Banking is a immoral industry. Perhaps a proper target of vice squads?
  • CO2 visualized, world-wide.
  • Target zero for carbon emissions.
  • Some power companies are on board.
  • Just what was China promising?
  • Britain has internet service competition, we do not.
  • Just what is wrong with the muslim world? Why the torpor, humiliation, and tragedy?
  • Why is the Fed backing off?
  • Democracy may require some kind of revolt.
  • This week in the WSJ- the 1% "earners" are OK.
  • But Bill Black thinks otherwise:
"Cochrane admits in the final paragraph that one of the “secrets of prosperity” is a well-functioning “rule of law.” He doesn’t tell you that his institution, the University of Chicago’s law, finance/business, and law faculty, have led the systematic attack for the last 40 years that successfully eviscerated that rule of law and allowed the banksters to lead the fraud epidemics that Cochrane admits drive our recurrent, intensifying financial crises."