Saturday, November 16, 2019

Gene Duplication and Ramification

Using yeast to study the implications of gene duplication.

Genes duplicate all the time. Much of our forensic DNA technology relies (or at least used to rely) on repetitive, duplicated DNA features that are not under much selection, thus can vary rapidly in the human population due to segment duplication and recombination/elimination. Indeed, whole genomes duplicate with some (rare) frequency. Many plants are polyploid, having duplicated their genomes one, two, three or more times, attaining prodigious genome sizes. What are the consequences when a gene suddenly finds itself making products in competition with, or collaboration with, another copy?

A recent paper explored this issue to some small degree in the case of proteins that form dimeric protein complexes in yeast cells. Saccharomyces cerevisiae is known to have undergone a whole genome duplication in the distant past, which led to a large set of related proteins called paralogs, which is to say homologs (similar genes) that originated by gene duplication and have subsequently diverged. Even more specifically, they are termed ohnologs, since they arise from a known genome duplication event (this special class is interesting since for such organisms, it makes up a huge class of duplicates that all arose at the same time, making some aspects of evolutionary analysis easier). A question is whether that divergence is driven by neutral evolution, in which case their resemblance quickly degrades, or whether selection continues for one homodimer, for both homodimers, or even for the complex between the two partners, which is termed a heterodimer.

The authors go through simulations of several different selection regimes, done at atomic scale to known protein paralogs, to ask what effect selection on one feature has on the retention or degradation of other features of the system. Another term for this is genetic relationship is pleiotropy, which means the effects that one gene can have on multiple functions, in this case heterodimeric complexes in addition to homodimeric complexes, which often have different, even opposite, roles.
One example of a homodimeric protein (GPD1, glycerol-3-phosphate dehydrogenase) that has an ohnolog (GPD2) with which it can heterodimerize. At top is the structure- yellow marks the binding interface, while blue and pink mark the rest of each individual protein monomer. The X axis of each graph is time, as the simulation proceeds, adding mutations to the respective genes, and enforcing selection as the experimenters wish, based on binding energy of the protein-protein interface, as calculated from the chemistry. That binding energy is the Y-axis. Dark blue is one homodimer (GPD1-GPD1), pink is the other homodimer (GPD2-GPD2), and purple is the binding energy of the heterodimer (GPD1-GPD2).

In a neutral evolution regime lacking all selection, (top graph), obviously there is no maintenance of any function, and the ability of the molecules to form complexes of any kind steadily degrades with time- the binding energy of the dimers goes to zero, at the origin. But if selection is maintained for the ability of each gene product to form its own homodimer, then heterdimer formation is maintained as well, apparently for free (second graph). Similarly, if only selection for a heterodimer is maintained, the ability of each to form homodimers is also maintained for free. At bottom, if only one homodimer is under positive selection, then the formation of the other homodimer degrades most rapidly, and the heterodimer degrades a bit less rapidly.

All this is rather obvious from the fact that the binding interface (see the structure at the top of figure for the example of GPD1) is the same for both proteins, so the maintenance of this binding interface through positive selection will necessarily keep it relatively unchanged, which will keep it likewise functional for other interactions that occur on exactly the same face, which is to say the heterodimeric interaction, when the homodimeric interaction is selected for, or vice versa, etc.

So why keep these kinds of duplicates around? One reason is that, while preserving their binding interface with each other, they may diverge elsewhere in their sequence, adopting new functions over time. This kind of thing can lead to the formation of ever more elaborate complexes, which are quite common. Having two genes coding for related functions can also insulate the organism from mutational defects in either one, which would otherwise impair the homodimeric complex more fully. By the same token, this insulation can allow variational space for the development of novel functions, as in the first point.

So, nothing earthshaking in this paper, (which incidentally included a good bit of experimental work which I did not mention, to validate their computational findings), but it is nice to see yeast still serving as a key model system for basic questions in molecular biology. Its genomic history, which includes a whole genome duplication, and its exquisite genetic and molecular tool chest, make it ideal for this kind of study.