Saturday, January 16, 2021

Hunting for Lost Height

Progress in sequencing technologies and genetic analysis nails down the genetic sources of variability in the trait of human height.

PBS has an excellent program about eugenics- the push by some scientists and social reformers in the early 1900's to fix social problems by fixing problematic people. Both the science and the social ethics fell into disrepute, however, and were completely done in by the Nazi's version. While the stigma and ethical futility of eugenics remains, human genetics has advanced immeasurably, putting the science on much firmer footing. One example is a recent announcement that one research group has found all the sources of genetic variation that relate to human height.

Height is obviously genetic, and twin studies show that it is 80% heritable. There has been an interesting literature on the environmental effects on height, to the extent that whole populations of malnourished immigrants find that, after they move to the US, their children grow substantially taller. So genetic influences are only apparent (as indicated by the 80% figure) in the absence of over-riding environmental constraints. 

The first attempts to find the genetic loci associated with height took off after the human genome was sequenced, in the form of GWAS studies (genome-wide association study). It was easier in this era to probe short oligonucleotide sequences against the sampled genomic DNA, rather than sequence whole genomes of many people. So GWAS typically took a large sample of about 500,000 locations through human genomes that were variant, and used them to test which of those variants a set of human populations had. A massive correlation analysis was done versus the traits of those people, say their height, or weight or health, to see which markers (i.e. variants) correlated with the trait of interest. 

Such studies only found about 5% to 25% of the heritability of height, perplexing researchers. They were sampling the entire genome, if sparsely. The 500,000 markers corresponded to about one every 6,000 base pairs, so should be near enough to most genes, if they have significant effects on the trait of interest. And since most human genome regions are inherited as relatively large blocks, (haplotypes), due to our near-clonal genetic history, the idea was that sampling a sparse set of markers was sufficient to get at any significant effect from any gene. Later work could then focus in on particular regions to find the actual genes and variations that were responsible for the trait in question.

But there was a big problem, which was that the variants selected to go into the marker pool were from a very small population of a few hundred people. Recall that sequencing whole genomes was very expensive at this time, so researchers were trying to wring as much analysis out of as little data as possible. By 2018, GWAS type studies were still only finding genetic causes for about 25% of the variability of height, clearly short of what was known from simple genetic analysis of the trait. Not only that, but the number of genes implicated was rising into the thousands, each with infinitesimal effect. The first 40 genes found in these studies only accounted for about 5% of the variation in height. 

The large effect of rare alleles. MAF (minor allele frequency) in the human population, plotted against the trait variance it accounts for. The color code (LD, or linkage disequilibrium) indicates selection against the locus (if high) and other predicted characteristics of the variation, in the color scheme. It is very rare protein-altering variants (blue) that have the strongest individual effects.

The current work (review, review) takes a new approach, by virtue of new technologies. They sequence the full genomes of over 20,000 people, finding a plethora of rare alleles that had not been included in the original marker studies- alleles that have significant effects on height. They find variations that account for 79% of height heritability, which is to say, all of it. It turns out that the whole premise of the GWAS study, that common markers are sufficient to analyze diverse populations, is incorrect. The common markers are not as widely distributed, or as well-linked to rare variants, as was originally assumed. The new technologies allow vastly more depth of analysis (full genome sequencing) and broader sampling (20,000 vs a few hundred) to find rare and influential variants. We had previously learned that using common variants confines the GWAS analysis to uninteresting variants- those that are not being selected against. This may not be an enormous issue in height trait, (though these researchers find that many of their new, rare loci are being selected against), but it was a big issue in the analysis of disease-linked genetic loci, like for diabetes or alcoholism. While these traits may be common, the most influential genetic variants that cause them are not, for good reason.

One can imagine that over time, everyone will have their genome sequenced, and that this data will lead to a fuller, if not complete, understanding of trait genetics. But what are the genes responsible for the traits? All this is still an abstract mapping of locations of variability (what used to be called mutation) correlated with variations of a trait. This newest data identifies thousands of influential variants covering one third of the genome. This means that, like most interesting traits, the genetics of human height are dispersed- a genetic fog. All sorts of defects or changes can influence this trait to infinitesimal degrees, making it a fool's errand to look for a gene for height.


  • Guns are a key element of this volatile moment.
  • Stories, data, and emotion.
  • God, guns, and lunacy ... a match made in heaven.