Sunday, March 25, 2018

GWAS: Complexity in Genetic Variation and Selection

Genetic studies show that most traits have many influences, most genes affect many traits, and most variants have small effects.

Once the human genome was all sequenced, and once lots of alleles (aka variants, aka mutations) were collected from human populations, scientists started doing large scale genetic studies, called genome-wide association studies (GWAS). The dream was that now, at last, we could find the "genes for" schizophrenia, and alcoholism, and depression, and autism, and height, and cardiovascular disease, and countless other syndromes and traits which are known to be highly heritable.

But this project pretty much came to naught, for reasons that have gradually become clear, and which a recent paper (review) provides some more explicit modeling for. The variants that have been found through GWAS have generally had very low effects on the studied trait, and even adding all of them up, the heritability that is known by other genetic methods was not accounted for. This became known as "the missing heritability". Height is clearly heritable, as the path leading to Yao Ming shows. Yet add up all the known variants contributing to height, and they do not add up to that known heritability.

Firstly, these studies focused on common variants, necessarily because data was so hard to come by. If a 1000 genomes are sequenced, out of the human population, and the study requires that the variation occur more than once so that its association can be validated, that variation must be a common one. That implies in turn that it can not have a very strong selective effect, otherwise it would not be common. And that implies in turn that any effect it has on any trait has is likely to be weak.

Secondly, we have been somewhat blinded by the archetypal mendelian model of traits. The wrinkled peas, human eye color- these are simple traits, with one or a few alleles. Blue eye color is due to complete lack of the enzyme to make brown- it is either on or off. But most of our genes are more important than that, and can not be turned off without dire consequences. Most of our genes make products that participate in large pathways and networks where they intrinsically will affect many traits and have strong effects if significantly defective. Indeed, it is estimated that about 1/3 of amino acid positions in the coding genome have strongly deleterious effects if changed.

Network of genes with variants found to be genetically associated with autism. Each one, naturally, has very small effects.

This implies that most of the variation that exists around these genes will not have dramatic on/off effects, but rather be slight modifications of the sequence, or of expression- up or down, or in modestly altered locations or times- consistent with the high variability and degeneracy of that regulatory code/system. In addition, if a variant has an effect on the trait one is studying, it will likely also have effects elsewhere, given the complexity of most circuits (called pleiotropy). Thus its overall selective effect may be substantially larger than that focused solely on the trait of interest, dampening yet again one's ability to find such variation from studies on particular traits.

We are now in the world of "quantitative traits", as opposed to Mendelian traits. Not that they do not obey Mendel's laws, but that their complexity is such that a whole new form of statistics and analysis is needed to deal with them. Quantitative traits vary in a continuous way, (like height), and are composed genetically of many genes, whose many variants (at least those which occur commonly) each have small effects.

Modeling is now getting more accurate predictions of heritability explanation based on effect sizes of individual variants, and a study's ability to find them based on its size. The left panel shows how more heritability is explained (lower levels  unexplained) as the study threshold captures more variance (more alleles, with smaller effect sizes) towards the right. Overall heritability of height is supposed to be around 70%. The curve on the right, modeling how big studies (in terms of thousands of individual subjects) would have to be to get there, is unlikely to ever get there, so the modeling remains incomplete. This is even more true for BMI, whose total heritability is roughly 60%. Even with the statistics deployed here, they are not modeling the full heritability, even with extrapolation to infinite study size.

The conclusion from all this is that the missing heritability is not missing, just hidden. If we could sequence everyone, and analyze all their variants, we would find all the heritability that lineage and twin studies know is there. The paper makes the significant point that the problem is not epistasis- the non-linear interaction of different genes and variants. No, the large numbers of small effects tend to add up linearly, but just because they are so small and there are so many of them, which studies up till now are not powerful enough to find, they remain out of reach.

This is disappointing from a medical standpoint, but also biologically. One goal of finding key genes for common diseases was to understand them mechanistically, as well as to treat them. But if no one gene, or even a few, is the key to complex diseases and traits, then the climb to understand their biology, and gain practical insight to alter their course, gets that much steeper.


  • Florida's bridge/political/environmental/traffic/population disaster.
  • Evangelicalism as simple patrician politics.
  • Millions have been killed in Iraq.. was that OK?
  • My data is your data... as usual, the crime isn't what is illegal.
  • A philosophical memoir of science and physical therapy.
  • Corruption seems to know no bounds.

No comments: