Saturday, January 14, 2023

Evolution of Dogs, and Dog Brains

Deeper genetic studies of the history of dogs reveal causal genes and pathways.

Do traits run in families? Are mental and behavioral attributes heritable? Of course they are, though well-intentioned liberals tend to argue otherwise, that everyone is the same by nature, and education, social services, and perhaps psychotherapy are the only things holding anyone back from limitless potential. Well, there is a place for both nurture and nature, but plain observation and mountains of science, such as twin studies, show that nature plays a dominant role, especially in relatively stable societies where nurture is not grossly deficient. While plenty of evidence exists for this in humans, it is particularly evident in model animals, such as those we have bred to have certain dispositions, like dogs. 

A recent landmark study on the genetics of dogs delves into some of the genetic and molecular detail of these traits. The authors find clear lineage differences between groups of dogs bred for different purposes, and dredge up a telling details about where those differences lie in the dog genome. First off, they have a wealth of data to draw from- full genomes sequenced for hundreds of dogs, and mutation variation panels for many more. They claim data from 4,261 individual dogs and 226 breeds, running the gamut from pure bred to village mutts. Wild dogs, wolves and coyotes were also added as outgroup references. 

The second big advance was to use a highly refined method of data reduction. The scale of this data is huge, and how to pull the needles of meaningful, breed- or trait-correlated variation from the haystack of backbground variation? Most of the variation they find was already present in wolves, meaning that while some new mutations occured during domestication, humans mostly spent their time selecting desirable combinations out of a very rich trove of natural variation already present from the start. The traditional way to do this is by principal component analysis (PCA), which plots the data in high dimensional space, and finds the two orthogonal axes that align with the greatest asymmetry in that data, and casts those two axes to two dimensions for visualization.

That is pretty simple, and crude, and a recent paper showed that a more sensitive way (named PHATE) to explore high dimensional data is able to uncover far more structure from it. It is just the kind of thing that these genomic scientists needed to wring more meaning from their huge data set.

Comparison of different dimensional reduction methods, from the same data set, in this case gene expression from embryonic cell types. One can easily see that PCA analysis is far less effective in revealing structure than is the newer PHATE technique.

This method, used over the dog data, yielded extremely clear differentiation between the major lineages, such as herding dogs vs retrievers vs scent hounds vs pointing dogs. As expected, the mutts, village dogs, and wolves clustered near the middle, not having traveled very far from the ancestral condition (except for one ramification along with "sight hounds", like grey hounds and other hunters, shared with Middle Eastern village dogs). Conversely, lineages like terriers formed a clearly separated path from the ancestral condition to more exquisitely bred extremes, at the ends of the distribution. Incidentally, their geographic view of this data showed that the ends of their distributions consistently were occupied by dogs bred in Britain, stemming from the virtual mania for animal husbandry and breeding (not to say eugenics) prevalent in Victorian times. Darwin was fascinated by this as well, devoting much of his "Origin" to the variation and breeding of pigeons.

Structured differences found in the genomic and other variation data gathered from thousands of dogs, of hundreds of breeds and geographic origins. The genomic data naturally fall into the breeds and types of dogs we are familiar with, while wild and feral dogs tend more to the central, ancestral areas.

This data treatment was not just done for visual clarity, but provided the clean classification that these authors could then use to search for the differentiating mutations in genomes separated by these breeding histories. They also do a bit of psychoanalysis, correlating the various lineages with major trait dimensions, such as trainability, aggressiveness, predatory drive, fear, and energy. This helped to give some rationale to aspects that various lineages might share, despite their separation in the main axes. For example, terriers had high levels of predatory chasing, while herders showed high levels of fear. This just buttresses that the dimensional reduction analysis (done on genomes) uncovered real dimensions of dog mentality, not just labeled by conventional breed types, but also by correlation with imputed general traits. What was the headline of this lineage analysis? 

"Lineage-associated variants are largely non-coding regions implicated in neurodevelopment"

There are two very interesting aspects to unpack here. First is that the vast majority of the mutations (aka variants) were non-coding. They state that of 16,250 variants that passed some threshold of statistical significance with regard to lineage divergences, only 76 were protein coding changes with any significant impact. So instead of changing proteins being made in the body, the story is one of control- the regulation over where, when, and how much of these proteins gets made. This is significant, as many genetic tests for humans are still focused on what is called the "exome", which is to say, the protein-coding parts of our genomes, where certainly many devastating mutations exist.  But it isn't where the vast majority of interesting variations occur, either for disease or particularly for normal trait variation. Those happen in the far larger and murkier regions around each gene that are strung with regulatory control sites. Mutations there can have very subtle effects.

Secondly, of course, is that they found brain and neural development genes to dominate the analysis. This only makes sense for our breeding efforts, which have had to firstly tame what was once a wolf, and then develop its talents in very particular, and sometimes peculiar directions. For instance, they note that scent / blood hounds have relatively low trainability, since they were bred to lead the way and follow their noses, not so much their humans. While the official dog shows focus on looks, coats, and colors, the much harder, and more significant job has clearly been to remake the mind of the dog to serve us. Nothing shows this more clearly than the border collie and related herders, whose ability to work with experienced handlers on difficult tasks is legendary.

The figure below gives an overview of what they found. At the top is the dog genome, with scoring of differential herding dog variants on the Y axis. Highlighted in green are genes that are mentioned below (panel C) as being quite densely involved in neural development and maintenance. Many of these are indeed very highly scoring in the genome graph, but others are less so. The authors are evidently being quite selective in calling out genes of interest, and there are many genes at least equally significant that are not being discussed. For instance, while there are by my count about 50 genes that rise to the "10" level in the graph, only seven or eight of which were called out for presentation in this neural pathways collection. And there are easily hundreds if not a thousand that satisfy the "5" level in the graph, making the selection of genes like SRGAP3 which has a score in this range somewhat willful.

Distinctive variations of sheepdogs are heavily involved in brain development, with a selection illustrated at bottom. At top is a graph of dispersion scores vs genomic location, with some genes involved in neural function called out (green). In the middle, a few of these genes are blown up to show that the variants do not generally occur in the coding regions of these genes, but in surrounding regulatory areas. At bottom is a shown an overlay of the genes found and called out above, lain over an independently curated/assembled diagram depicting molecular details of neuronal guidance, from KEGG.

At any rate, the middle panel of this diagram provides a few magnified examples of where the variations are relative to the coding regions of their respective genes. The coding regions are depicted at top with an arrow showing the start of transcription, and tiny vertical lines showing each "protein-coding" exon fragment, interspersed with large non-coding introns. Clearly the variations are clustered in the regulatory regions near, but not in, these genes.

And at bottom is a curated pathway, assembled from huge amounts of work from many labs, of some molecular aspects of axon guidance- the process by which neurons send axons out from where they start in embryogenesis to the targets, sometimes very far away in the brain, where they synapse with other neurons to make up our (or here the dog's) brain anatomy. The concentration of relevant variations in such genes speaks volumes about what has been going on in this process of rather rapid, directed evolution. The domestication of dogs is thought to have begun, very roughly, about 30 thousand years ago. The speed of this process and its resulting variety suggest (as it did to Darwin, and countless others) that evolution by natural selection has had plenty of time to work the biological wonders we see around us.


  • Somewhat boring lecture on axon guidance mechanisms that allow organized brain development and maintenance.
  • Social capital and social climbing.
  • Eugenics, Israeli-style.
  • Brothers at arms.
  • Yes, genes can arise from junk DNA. And they are important genes.