Someday, the human genome will be an open book to us, telling us how we develop from an egg, and what is likely to go wrong along the way. But for now, we know only glimmers about it. We know all the letters of the DNA code, but frustratingly little about what they mean. An example is the fascinating story of DUF1220.
DUF stands for "domain of unknown function". DUF1220 is family of brief protein sequences, one example of which is "EKVQELYAPREVQKAEEKEVPEDSLEECAITCSNSHHPCESNQPYGNTRITFEEDQVDSTLID". This uses a code where each amino acid constituent of a protein is one letter. A slightly more sensible way to look at it is put several family members in a linup, as it were:
Alignment of 10 family members of DUF1220, with most conserved amino acids in red, and hydrophobic amino acids marked with green bars. |
Each member of the domain family is in its own row, lined up with the others as best a computer can do. The well-aligned positions, with mostly the same amino acid, are in red, and less so are blue. I have added a few green markers above to show which positions are hydrophobic, carrying amino acids like F, V, I, L, A, W, Y, and the like, which tend to lie inside folded proteins like oil drops form in water. So, by my own speculation, it looks a bit like an alpha helix, with regular hydrophobic residues lying at roughly seven or so amino acid intervals, appropriate to one face of the helix lying against the interior of a protein while the rest is exposed to the outside, to water and other molecules.
One human gene is 3768 amino acids long and contains 43 iterations of the DUF1220 domain, marked in pink. |
But this is very conventional. Many, many proteins take on this kind of structure. Going up a level to the genes, we see that in one gene carrying this domain, it occurs tandemly 43 times. Wow! What could be going on? Such a long protein, singing the same song, over and over again. This is common in structural types of proteins, less so in enzymatic or regulatory proteins, but who knows? An ancestor of this family seems to be involved in regulation of protein phosphorylation and activity, but very little else is known about what it might be doing in any physical way.
Up one more level, to the genome, we see that there is a family of ~23 of such genes in humans, mostly on chromosome 1, which carry various numbers of this small domain, adding up to about 277 copies in all of this domain in the genome. Why so many genes, why so many approximate copies of this domain? This kind of amplification tends to be a quick and dirty solution on the part of evolutionary processes, to get more of some beneficial gene product. Later on, once the regulation of some of these genes is optimally tuned up, extra copies can be left to die as pseudogenes and deletions.
Going up to the evolutionary level, we find that there has been a dramatic expansion of these genes and this domain over the mammalian and especially primate lineage, from none in birds, to a few in rodents, to a hundred in monkeys, to 290 copies of DUF1220 in humans:
Evolutionary history of DUF1220-containing genomes. Years before present are listed up the middle line. Numbers of DF1220 domains are listed at right. The miscellaneous notes in the tree refer to named sub-families of DUF1220-containing genes, and a few other related issues. |
There seems to be a strong correlation of duplications of this gene or segments of it with closeness to humans in the primate lineage, which would make sense if this gene, say, had something to do with generating bigger and better brains.
And that is something we do know something about, since mutations in these genes come up in a variety of disease conditions, which is to say, human phenotypes. A recent paper found that deletions in this chromosome 1 family lead to microcephaly (small head), while duplications lead to macrocephaly (big head) birth defects. Other mutations among these genes lead to autism and other mental disorders.
"... we have shown that of all the 1q21 genes examined (n 1 ⁄ 453 [subjects]), only DUF1220 sequences exhibit a significant direct correlation with brain-size phenotypes in both pathological and normal human populations. Although we provide data implicating the loss of DUF1220 copy number in 1q21-associated microcephaly, the data are also fully consistent with the view that increases in DUF1220 copy number underlie 1q21-associated macrocephaly."
"Twelve genomic diseases have been linked to CNVs [copy number variations] in the 1q21.1- 1q21.1 region. They ... include autism, congenital heart disease, congenital anomaly of the kidney and urinary tract, epilepsy, intellectual disability, intermittent explosive disorder, macrocephaly, Mayer-Rokitansky-Küster-Hauser syndrome, microcephaly, neuroblastoma, schizophrenia, and thrombocytopenia-absent-radius syndrome."
"However, multivariate linear regression detected a linear increase in CON1 [a sub-famliy of the genes carrying DUF1220] dosage that was progressively associated with increasing severity of each of the three primary symptoms associated with ASD [autism spectrum disorder] as measured by the ADI-R. With each additional copy of CON1, Social Diagnostic Score increased on average 0.25 points (SE 0.11 p = 0.021), Communicative Diagnostic Score increased 0.18 points (SE 0.08 p = 0.030) and Repetitive Behavior Diagnostic Score increased 0.10 points (SE = 0.05 p = 0.047)."
"Given our recent data linking DUF1220 with neural stem cell proliferation (J. Keeney, submitted), this effect could be related to the timing and rate of neurogenesis, such that too many neurons produced too quickly may result in an overabundance of poorly connected neurons. This initial overabundance would in turn inhibit the formation of long distance projection neurons. This process, resulting from (or exacerbated by) CON1 dosage increase, could in turn lead to the excess of localized versus long-distance connectivity seen in individuals with ASD [autism spectrum disorders]."
These clues drive a great deal of interest in finding out what these genes and their encoded proteins do. They have apparently been under intense positive selection (for accumulating duplications and variants) over recent evolutionary time. And this is despite setting up a fraught situation in the genome, since repetitive sequences are more prone to rearragements and other errors, as seen in the various genetic defects located at the 1q-21 chromosomal position. They are clearly part of what makes us human, and diverse as humans.
- Doubt is still faith, if you are unwilling to change your mind.
- Australia's environmental policy... going downhill.
- But the climate can be saved, for a low, low price.
- Hits to Wikipedia can track influenza in real time.
- We are still on FIRE.
- Artificial intelligence is back, baby!
- Low taxes are not enough: Romney and friends pay no taxes at all, off-shore.
- Adam Smith was pro-Occupy.
- Wolf on Piketty: Do we really want to go back to Victorian / Dickensian capitalism?
- Polanyi: "free" markets brought and will bring disaster.
- Krugman psychoanalyzes the right. Is it too easy?
- Economics quote of the week, Brad DeLong on conservative arguments, based on maximizing overall GDP, against the minimum wage and other social controls on income, if one even grants that premise:
"The problem with this, of course, is that maximizing real income per capita does take a stand, and a very fictional stand, on interpersonal value comparisons. To maximize real income per capita is to assert that each dollar at the margin--no matter how rich is the person that goes to--has the same effect on marginal utility, has the same effect on the greatest good of the greatest number."