Saturday, June 7, 2014

Magical multiple sequence alignments

How to use evolutionary sequence alignments to map protein-protein interfaces.

Wonk alert: this is a bioinformatics post of limited interest to lay readers. While researchers have diligently been crystalizing proteins and generating large numbers of structures, the problem of protein structure determination remains, since after getting a single structure, we want to know how a complex of multiple proteins looks, and then how they look in various active states, and on and on. The questions tend to be endless, and the capacity of structure determination methods quite limited despite their amazing advances over the years. RSCB / PDB, the protein structure database, now has about 100,000 structures, of 230,000 protein chains. But many are from obscure species, or from experiments where the same structure was solved many times, with small variations. Most are also static structures, derived under unnatural conditions.

At any rate, researchers are constantly on the lookout for new ways to gain insight into protein structure, be it using fluorescence energy transfer metrology, NMR, photocrosslinking, computational brute force prediction, ... the list is quite long. A recent paper adds another interesting method to this list, which focusses on sites of interaction between proteins, and whose materials are the growing pile of sequences that are streaming out of those ever-more productive DNA sequencers.

The principle behind this method is that evolution is naturally conservative, so one can hunt for sites of protein-protein interaction in the protein's own linear sequences, by noting where individual amino acids change in coordinated ways over evolutionary time. That is to say, if on a protein interface, one partner has an asparagine (+ charge), and the other facing it has an aspartic acid (- charge), if one of them mutates in one lineage of species to the opposite charge, the partner will typically switch as well, under pressure to preserve the strength of the interaction. These are called "covarying residue pairs". Likewise, a spatially bulky amino acid like phenylalanine might covary with a smaller partner like alanine, switching sides in an interaction to keep the interface properly shaped.

This method has been understood and applied previously to interactions within single proteins, which is a more tractable problem. This paper claims to make it a practical method to apply to multiple proteins, as long as you have an enormous amount of sequence information (i.e. as many different related sequences from various species as the segments being compared are long, in amino acids).

Contacts derived from the computational analysis of miscellaneous proteins with solved structures (using what the authors term "Gremlin" scores for amino acid proximity) are shown as yellow, orange, and red lines, extended for clarity. The red connections are ones that in the known structures are over 12Å apart, so the authors suggest that their method shows contacts that depend on flexibiliy or regulatory conditions that are not apparent in the static crystal structures. Part B shows proteins of the Complex I electron transport chain, which are very tightly complexed in the mitochondrial membrane.

The image above shows how well their method detects amino acids in structurally solved complexes that are at interfaces between proteins. It really is quite impressive, though one has to know beforehand that two (or a few) proteins interact for this method to work. It is not something one yet can deploy over a whole genome in blind fashion to find proteins that interact with each other, which would be extremely useful.

To test their method a bit more critically, they predict interacting amino acids for a set of protein pairs with unknown complex structure, though some of the proteins have known individual structures. Mostly this is grist for other researchers and future validation. But they take a few of the pairs whose individual structures are known, (or could be estimated denovo using computational methods), and put them through a static molecular docking protocol, where the virtual structures are fitted together according to their predicted interface. The results shown below make a good deal of sense, (both prima facie, and based on other work on those proteins), and they feel it validates the method.

More complex structures, this time with interfaces predicted entirely by the sequence alignment method, not from prior structural information.

"Taken together, these results suggest that in cases with small conformational change, the docking protocol can recover the entire interface to high accuracy and in cases where binding is accompanied by a large conformational change, the protocol recovers the largest intact and/or unobstructed interface."

The need for large amount of sequence makes this method a bit restricted for the moment, (the researchers used only bacterial proteins), but it is a very clever way to use evolutionary data to gain structural knowledge about complex interactions. It can be appied to stuctures that are otherwise very difficult to study, like membrane proteins, and may eventually provide data on more dynamic interactions that can not be validated by reference to static crystal structures.


  • Molecular machinery, and other PDB posters.
  • Annals of our dying environment ... the Monarch butterfly.
  • Why do we still work? Why is so much work useless? "Suddenly it became possible to see that if there’s a rule, it’s that the more obviously your work benefits others, the less you’re paid for it."
  • This week in the WSJ, Review of another Bonhoeffer bio ... still hunting for an absent god. "It is evident in the conflicted way in which he approached divinity: the awful longing for an absent God, the hunger for the hot touch of an absolute Christ."
  • "Every single state taxes those in the bottom income quintile at a higher rate than those at the top."
  • War on coal? Bring it on.
  • Is reality (and anti-humanism) worth all the trouble? Zizek vs Chomsky.
  • " ... almost everyone who deconverts from religion and declares themselves a nonbeliever does so because of a compelling need to talk about reality."
  • John Oliver on Comcast and our crappy internet.
  • What goes on in hedge funds ... opacity is the business model.
  • US policy on Syria is a disaster from start to finish.
  • Little on Piketty.
  • When libertarians have problems with people with too much money, they...
  • Why do we have economists?
  • Why do we have GDP?
  • They built it ... gilded age loggers raping the land, with fraud and despotism into the bargain.

END OF DAYS: I am critically endangered: the la hotte whistling frog of Haiti.