Showing posts with label chemistry. Show all posts
Showing posts with label chemistry. Show all posts

Sunday, September 15, 2024

Road Rage Among the Polymerases

DNA polymerase is faster than RNA polymerase. RNA polymerase also leaves detritus in its wake. What happens when they collide?

DNA is a country road- one lane, two directions. Yet in our cells it can be extremely busy, with transcription (RNA synthesis) happening all the time, and innumerable proteins hanging on as signposts, chemical modifications, and even RNA hybridized into sections, creating separated DNA structures called R-loops. When it is time for DNA replication, what happens when all these things collide? One might think that biology had worked all this out by now, but these collisions can be quite dangerous, sending the RNA polymerase careering into the other (new) DNA strand, causing the DNA polymerase to stall or miss sections, and causing DNA breaks, which activate loud cellular alarm bells and mutations.

Despite decades of work, this area of biology is still not yet very well understood, since the conditions are difficult to reproduce and study. So I can only give a few hints of what is going from current work in the field. A couple of decades ago, a classic experiment showed that in bacteria, DNA polymerases can be stopped cold by a collision with an RNA polymerase going in the opposite direction. However, this stall is alleviated by a DNA helicase enzyme, which can pry apart the DNA strands and anything attached, and the DNA replication complex sails through, after a pause of a couple of seconds. The RNA polymerase, meanwhile, is not thrown off completely, but switches its template from the complementary strand it was using previously to the newly synthesized DNA strand just made by the passing DNA polymerase. This was an amazing result, since the elongating RNA polymerase is a rather tightly attached complex. But here, it jumps ship to the new DNA strand, even though the old DNA strand remains present, and will shortly be replicated by the lagging strand DNA polymerase complex.

General schematic of encounters between replication forks and RNA polymerases (pink, RNAP). Only co-directional, not head-on, collisions are shown here. Ribosomes (yellow) in bacteria operate directly on the nascent mRNA, and can helpfully nudge the RNA polymerase along. In this scheme, DNA damage happens after the nascent RNA is used as a primer by a new DNA polymerase (bottom), which will require special repair. 

The ability of the RNA polymerase to switch template strands, along with the nascent RNA it was making, suggests very intriguing flexibility in the system. Indeed, DNA polymerases that come up from behind the RNA polymerase (using the same strand as their template) have a much easier time of it, passing with hardly a pause, and only temporarily displacing the RNA polymerase. But things are different when the RNA polymerase has just found an error and has back-tracked to fix it. Then, the DNA polymerase complex is seriously impeded. It may even use the nascent RNA hanging off the polymerase and hybridized to the local DNA as a primer to continue synthesis, after it has bumped off the RNA polymerase that made it. This leads in turn to difficulties in repair and double strand breaks in that DNA, which is the worst kind of mutation. 

The presence of RNA in the mix, in the form of single strands of RNA hybridized to one of the DNA strands, (that is, R-loops), turns out to be a serious problem. These can arise either from nascent transcription, as above, or from hybridization of non-coding RNAs that are increasingly recognized as significant gene regulators. RNA forms a slightly stronger hybrid with DNA than DNA itself does, in fact. Such R-loops (displacing one DNA strand) are quite common over active genomes, and apparently present a block to replication complexes. One would think that such fork complexes would be supplied with the kinds of helicases that could easily plow through such structures, but that is not quite the case. R-loops cause replication complex stalling, and can invoke DNA damage responses, for reasons that are not entirely clear yet. 

A recent paper that piqued my interest in all this studied an ATPase motor protein that occurs at stalled replication forks and helps them restart, presumably by acting as a DNA or RNA pump of some kind, and forcing the replication complex through obstructions. It is named WRNIP1, for WRN interacting protein, for it also interacts with Werner syndrome protein, another interesting protein at the replication fork. This is another ATPase that is a helicase and also a backwards 3' -> 5' exonuclease that cleans up DNA ends around DNA repair sites, helping to remove mismatched and damaged DNA so the repair can be as accurate as possible. As one can guess, mutations in this gene cause Werner Syndrome, a striking progeria syndrome of early aging and susceptibility to cancer. 

While the details of R-loop toxicity and repair are still being worked out, it is fascinating that such conflicts still exist after several billion years to figure them out. It is apparent that the design of DNA, while exceedingly elegant, results in intrinsic conflicts between expression and replication that are resolved amicably most of the time. But when either process gets overly congested, or encounters unexpected roadblocks, then tempers can flare, and an enormous apparatus of DNA damage signaling and repair is called in, sirens blaring, to do what it can to cut through the mess.


  • Who really believes in climate change?
  • The very strong people of the GOP. 
  • The ancient Easter Islanders mixed with South Americans.

Saturday, June 8, 2024

A Membrane Transistor

Voltage sensitive domains can make switches out of ion channels, antiporters, and other enzymes.

The heart of modern electronics is the transistor. It is a valve or switch, using a small electrical signal to control the flow of other electrical signals. We have learned that the simple logic this mechanism enables can be elaborated into hugely complex, even putatively intelligent, computers, databases, applications, and other paraphernalia of modernity. The same mechanism has a very long history in biology, quite apart from its use in neurons and brains, since membranes are typically charged, well-poised to be sensitive to changes in charge for all sorts of signaling.

The voltage sensitive domain (VSD) in proteins is an ancient (going back to archaea) bundle of four alpha helices that were first found attached to voltage-sensitive ion channels, including sodium, potassium, and calcium channels. But later it became fascinatingly apparent that it can control other protein activities as well. A recent paper discussed the mechanism and structure of a sodium/hydrogen antiporter with a role in sperm navigation, which uses a VSD to control its signaling. But there are also voltage-sensitive phosphatases, and other kinds of effectors hooked up to VSD domains. 

Schematic of a basic VSD, with helix 4 in pink, moving against the other three helices colored teal. Imagine a membrane going horizontally over these embedded proteins. When voltage across the local membrane changes, (hyperpolarized or de-polarized), helix 4 can plunge by one helical repeat unit in either direction, up or down.

One of the helixes (#4) in the VSD bundle has positive charges, while the others have specifically positioned negative charges. This creates a structure where changes in the ambient voltage across the membrane it sits in can cause helix #4 to plunge down by one or two steps (that is, turns of the alpha helix) versus its partners. This movement can then be propagated out along extensions of helix #4 to other domains of the protein in order to switch on or off their activities.

The helices of numerous proteins that have a VSD domain (in red) are drawn out, showing the diversity of how this domain is used.

While the studied protein, SLC9C1, is essential in mammalian sperm for motility, the paper studied its workings in sea urchin sperm, a common model system. The logic (as illustrated below) is that (female) chemoattractants bind to receptors on the sperm surface. These receptors generate cyclic GMP, which turns on potassium channels that increase the voltage across the membrane. This broadcasts the signal locally, and is received by the SLC9C1 transporter, which does two things. It activates a linked soluble adenylate cyclase enzyme, making the further signaling molecule cAMP. And it also activates the transporter itself, pumping protons out (in return 1:1 for sodium ions in) and causing cytoplasmic alkalinization. The cAMP activates sodium ion channels to cancel the high membrane voltage (a fast process), and the alkalinization activates calcium channels that direct the sperm directional swimming responses- the ultimate response. The latter is relatively slow, so the whole cascade has timing characteristics that allow the signal to be dampened, but the response to persist a bit longer as the sperm moves through a variable and stochastic gradient.

A schematic of the logic of this pathway, and of the SLC9C1 anti-porter. At top, the transport mechanism is crudely illustrated as a rocking motion that ensures that only one H+ is exchanged for one Na+ for each cycle of transport. The transport is driven thermodynamically by the higher concentration of Na+ outside.


But these researchers weren't interested in what the sperm were thinking, but rather how this widely used protein domain became hitched to this unusual protein and how it works there, turning on a sodium/hydrogen antiporter rather than the usual ion channel. They estimate that the #4 helix of the VSD moves by 10 angstroms, or 1 nm, upon voltage activation, which is a substantial movement, roughly equivalent to the width of these helices. In their final model, this movement significantly reshapes the intracellular domain of the transporter, which in turn releases its hold on the transporter's throat, allowing it to move cyclically as it needs to exchange hydrogen ions for sodium ions. This protein is known to bind and activate an adenylyl cyclase, which produces cAMP, which is one key next actor in the signaling cascade. This activation may be physically direct, or it may be through the local change in pH- that part is as yet unknown. cAMP also, incidentally, binds to and turns up the activity of this transporter, providing a bit of positive feedback.

Model of the SLC9C1 protein, with the VSD in teal and a predicted activation mechanism illustrated (only the third panel is activated/open). Upon voltage activation, the very long helix 4 dips down and changes orientation, dramatically opening the intracellular portion of the transporter (purple and orange portion). This in turn lets go of the bottom of the actual transporter portion of the protein (gray), allowing alkalinization of the cytoplasm to go forth. At the bottom sides, in brown, is the cAMP binding domain, which lowers the voltage threshold for activation.

There are a variety of interesting lessons from this work. One is that useful protein domains like VSD are often duplicated and propagated to unexpected places to regulate new processes. Another is that the new cryo-electron microscopy methods have made structural biology like this far easier and more common than it used to be, especially for membrane proteins, which are exceedingly difficult to crystalize. A third is that signaling systems in biology are shockingly complex. One would think that getting sperm cells to where they are going would take a bare minimum of complexity, yet we are studying a five or more part cascade involving two cyclic nucleotides, four ions, intricate proteins to manage them all, and who knows what else into the mix. It is difficult to account for all this, other than to say that when you have a few billion years to tinker with things, and have eons of desperate races to the egg for selective pressure, they tend to get more ornate. And a fourth is that it is regulatory switches all the way down.


Saturday, May 25, 2024

Nascent Neurons in Early Animals

Some of the most primitive animals have no nerves or neurons... how do they know what is going on?

We often think of our brains as computers, but while human-made computers are (so far) strictly electrical, our brains have a significantly different basis. The electrical component is comparatively slow, and confined to conduction along the membranes of single cells. Each of these neurons communicate with others using chemicals, mostly at specialized synapses, but also via other small compounds, neuropeptides, and hormones. That is why drugs have so many interesting effects, from anesthesia to anti-depression and hallucination. These properties suggest that the brain and its neurons began, evolutionarily speaking, as chemically excitable cells, before they became somewhat reluctant electrical conductors.

Thankfully, a few examples of early stages of animal evolution still exist. The main branches of the early divergence of animals are sponges (porifera), jellies and corals (ctenophora, cnidiaria), bilaterians (us), and an extremely small family of placozoa. Neural-type functions appear to have evolved independently in each of these lineages, from origins that are clearest in what appears to be the most primitive of them, the placozoa. These are pancake-like organisms of three cell layers, hardly more complex than a single-celled paramecium. They have about six cell types in all, and glide around using cilia, engulfing edible detritus. They have no neurons, let alone synaptic connections between them, yet they have excitable cells that secrete what we would call neuropeptides, that tell nearby cells what to do. Substrances like enkephalins, vasopressin, neurotensin, and the famous glucagon-like peptide are part of the managerie of neuropeptides at work in our own brains and bodies.

A placozoan, about a millimeter wide. They are sort of a super-amoeba, attaching to and gliding over surfaces underwater and eating detritus. They are heavily ciliated, with only a few cell types divided in top, middle, and bottom cell layers. The proto-neural peptidergic cells make up ~13% of cells in this body.


The fact is that excitable cells long predate neurons. Even bacteria can sense things from outside, orient, and respond to them. As eukaryotes, placozoans inherited a complex repertoire of sense and response systems, such as G-protein coupled receptors (GPCRs) that link sensation of external chemicals with cascades of internal signaling. GPCRs are the dominant signaling platforms, along with activatable ion channels, in our nervous systems. So a natural hypothesis for the origin of nervous systems is that they began with chemical sensing and inter-cell chemical signaling systems that later gained electrical characteristics to speed things up, especially as more cells were added, body size increased, and local signaling could not keep up. Jellies, for instance, have neural nets that are quite unlike, and evolutionarily distinct from, the centralized systems of animals, yet use a similar molecular palette of signaling molecules, receptors, and excitation pathways. 

Placozoans, which date to maybe 800 million years ago, don't even have neurons, let alone neural nets or nervous systems. A recent paper labored to catalog what they do have, however, finding a number of pre-neural characteristics. For example, the peptidergic cell type, which secretes peptides that signal to neighboring cells, expresses 25 or more GPCRs, receptors for those same peptides and other environmental chemicals. They state that these GPCRs are not detectably related to those of animals, so placozoans underwent their own radiation, evolving/diversifying a primordial receptor into hundreds that exist in its genome today. The researchers even go so far as to employ the AI program Alpha Fold to model which GPCRs bind to which endogenously produced peptides, in an attempt to figure out the circuitry that these organisms employ.

This peptidergic cell type also expresses other neuron-like proteins, like neuropeptide processing enzymes, transcription regulators Sox, Pax, Jun, and Fos, a neural-specific RNA polyadenylation enzyme, a suite of calcium sensitive channels and signaling components, and many components of the presynaptic scaffold, which organizes the secretion of neuropeptides and other transmitters in neurons, and in placozoa presumably organizes its secretion of its quasi-neuropeptides. So of the six cell types, the peptidergic cell appears to be specialized for signaling, is present in low abundance, and expresses a bunch of proteins that in other lineages became far more elaborated into the neural system. Peptidergic cells do not make synapses or extended cell processes, for example. What they do is to offer this millimeter-sized organism a primitive signaling and response capacity that, in response to environmental cues, prompts it to alter its shape and movement by distributing neuropeptides to nearby effector cells that do the gliding and eating that the peptidergic cells can't do.

A schematic of neural-like proteins expressed in placozoa, characteristic of more advanced presynaptic secretory neural systems. These involve both secretion of neuropeptides (bottom left and middle), the expression of key ion channels used for cell activation (Ca++ channels), and the expression of cell-cell adhesion and signaling molecules (top right).

Why peptides? The workhorse of our brain synapses are simpler chemicals like serotonin, glutamate, and norepinephrine. Yet the chemical palette of such simple compounds is limited, and each one requires its own enzymatic machinery for synthesis. Neuropeptides, in contrast, are typically generated by cleavage of larger proteins encoded from the genome. Thus the same mechanism (translation and cleavage) can generate a virtually infinite variety of short and medium sized peptide sequences, each of which can have its own meaning, and have a GPCR or other receptor tailored to detecting it. The scope of experimentation is much greater, given normal mutation and duplication events through evolutionary time, and the synthetic pipeline much easier to manage. Our nervous systems use a wide variety of neuropeptides, as noted above, and our immune system uses an even larger palette of cytokines and chemokines, upwards of a hundred, each of which have particular regulatory meanings.


An evolutionary scheme describing the neural and proto-neural systems observed among primitive animals.


The placozoan relic lineages show that nervous systems arose in gradual fashion from already-complex systems of cell-cell signaling that focused on chemical rather than electrical signaling. But very quickly, with the advent of only slighly larger and more complex body plans, like those of hydra or jellies, the need for speed forced an additional mode of signaling- the propagation of electrical activity within cells, (the proto-neurons), and their physical extension to capitalize on that new mode of rapid conduction. But never did nervous systems leave behind their chemical roots, as the neurons in our brains still laboriously conduct signals from one neuron to the next via the chemical synapse, secreting a packet of chemicals from one side, and receiving that signal across the gap on the other side.


  • The mechanics of bombing a population back into the stone age.
  • The Saudis and 9/11.
  • Love above all.
  • The lower courts are starting to revolt.
  • Brain worms, Fox news, and delusion.
  • Notes on the origins of MMT, as a (somewhat tedious) film about it comes out.

Saturday, May 18, 2024

Emergency- Call UCP!

Uncoupling proteins in mitochondria provide a paradoxical safety valve.

One of the great insights of biochemistry in the last century was the chemiosmotic theory, which finally described the nature of power flows in the mitochondrion. Everyone knew that energetic electrons were spun off the metabolism (burning) of food via the electron transport chain, ending up re-united with oxygen (creating the CO2 we breathe out). But how was that power transmitted to ATP? The key turned out to be a battery-like state across the mitochondrial membrane, where protons are pumped out by the electron transport chain, and then come back in while turning the motor of the ATP synthase to phosphorylate ADP into ATP. It is the (proton) concentration and charge difference (that is, the chemiosmotic gradient) across the inner mitochondrial membrane that stores and transmits this power- a clever and flexible system for energizing the mitochondrion and, indirectly, the rest of the cell.

Schematic view of the electron transport chain proteins, as well as the consumer of its energy, the ATP synthase. The inside of the mitochondrial matrix is at top, where core metabolism takes place to generate electrons, resulting in protons pumped out towards the bottom. Protons return through the ATP synthase (right) to power the phosphorylation (so-called oxidative phosphorylation) of ADP to ATP.

Chemiosmotic theory taught us that mitochondria are always charged up, keeping a balance of metabolism and ATP production going, all dependent on the tightness of the inner mitochondrial membrane, which was the "plate" that keeps the protons and other ions sealed apart. But over the years, leaks kept cropping up. In the human genome, there are at least six uncoupling proteins, or UCPs, which let protons through this membrane, on purpose. What is the deal with that?

One use of these proteins is easy enough to understand- the generation of heat in brown fat. Brown fat is brown because it has a lot of mitochondria, which are brown because of the many metal- and iron-hosting enzymes that operate at the core of metabolism. UCP1 is present in brown fat to generate heat by letting the engine run free, as it were. It is as simple as that. But most of the time, inefficiency is not really the point. The other UCP proteins have very different roles. On the whole, however, it is estimated that proton leaks from all sources eat up about a fourth of our metabolic energy, and thus evidently play a role in making us warm blooded, even apart from specialized brown fat.

A more general schematic that adds UCP proteins to the view above. Leaks also happen through other channels, such as the membrane itself, and also the ANT protein, at low and non-regulated rates..

One big problem of mitochondria is that they are doing some quite dangerous chemistry. The electrons liberated from metabolism of food have a lot of energy, and the electron transport chain is really more like a high voltage power station. The proteins in this chain are all structured to squeeze all the power they can out of the electrons and into the proton gradient. But that runs the risk of squeezing too hard. If there is a holdup anywhere, things can back up and electrons leak out. If that happens, they are likely to combine with oxygen in an uncontrolled way that generates compounds like peroxide, superoxide, and hydroxy radicals. These are highly reactive (customarily termed ROS, for reactive oxygen species) and can do a great deal of damage in the cell. ROS is used in some signaling systems, such as the pathway by which glucose stimulates insulin secretion in the pancreas, but generally, ROS is very bad for the cell and rises exponentially with the severity of blockages in the electron transport chain. Many theories relating to aging and how to address it revolve around the ongoing damage from ROS.

Thus the more important role for the other UCP proteins is to function as a safety valve for overall power flow through mitochondrial metabolism- a metaphorical steam valve. UCP proteins are known to be inducible by ROS, and when activated, allow protons to run back into the matrix, which relieves the pressure upstream on all the electron transport chain proteins, which are furiously pumping out protons in response to the overall metabolic rate of fat/sugar usage. While metabolism is regulated at innumerable points, it is evident that, on a moment-to-moment basis, an extra level of regulation, i.e. relief, is needed at this UCP level to keep the system humming with minimal chemical damage to the rest of the cell.


Sunday, March 31, 2024

Nominee for Most Amazing Protein: RAD51

On the repair and resurrection of DNA, which gets a lot of help from a family of proteins including RAD51, DMC1, and RecA.

Proteins do all sorts of amazing things, from composing pores that can select a single kind of ion- even just a proton- to allow across a membrane, to massive polymerizing enzymes that synthesize other proteins, DNA, and RNA. There is really no end to it. But one of the most amazing, even incredible, things that happens in a cell is the hunt for DNA homology. Even over a genome of billions of base pairs, it is possible for one DNA segment to find the single other DNA segment that matches it. This hunt is executed for several reasons. One is to line up the homologous chromosomes at meiosis, and carry out the genetic cross-overs between them (when they are lined up precisely) that help scramble our genetic lineages for optimal mix-and-matching during reproduction. Another is for DNA repair, which is best done with a good copy for reference, especially when a full double-strand break has happened. Just this week, a fascinating article showed that memories in our brains depend in some weird way on DNA breaks occurring in neurons, some of which then use the homologous repair process, including homology search, to patch things up.

The protein that facilitates this DNA homology search is deeply conserved in evolution. It is called RecA in bacteria, radA and radB in archaea, and the RAD51 family in eukaryotes. Naturally, the eukaryotic family is most closely related to the archaeal versions (RAD51 and DMC1 evolving from radA, and a series of other, and poorly understood family members, from radB). In this post, I will mostly just call them all RAD51, unless I am referring to DMC1 specifically. The name comes from genetic screens for radiation-sensitive mutants in human and other eukaryotes, since RAD51 plays a crucial role in DNA repair, as noted above. RAD51 is not a huge protein, but it is an ATPase. It binds to itself, forming linear filaments with ATP at the junction points between units. It binds to a single strand of DNA, which is going to be what does the hunting. And it binds, in a complicated way, to another double-stranded DNA, which it helps to open briefly to allow its quality as a target to be evaluated. 

This diagram describes the repair of double strand breaks (DSB) in DNA. First the ends are covered with a bunch of proteins that signal far and wide that something terrible has happened- the cell cycle has to stop.. fire engines need to be called. One of these proteins is RPA, which simply binds all over single-stranded DNA and protects it. Then the RAD51 protein comes in, displaces RPA, and begins the homology search process. The second DNA shown, in dark black, doesn't just happen, but is hunted for high and low throughout the nucleus to find the exact homolog of the broken end. When that exact match is found, the repair process can proceed, with continued DNA synthesis through the lesion, and resolution of the newly repaired double strands, either to copy up the homolog version, or exchange versions (GC, for gene conversion). 

This diagram shows how the notorious (when mutated) oncogene BRCA2 (in green) works. It binds RAD51 (in blue) and brings it, chain-gang style, to the breakpoints of DNA damage to speed up and specify repair.


There have been several structural studies by this point that clarify how RAD51 does its thing. ATP is simply required to form filaments on single-stranded DNA. When a match has been found and RAD51 is no longer needed, ATP is cleaved, and RAD51 falls off, back to reserve status. The magic starts with how RAD51 binds the single stranded DNA. One RAD51 binds for every ~3 bases in the DNA, and the it binds the phosphate backbone, so that the bases are nicely exposed in front, and all stretched out, ready to hunt for matching DNA.

A series of RAD51 molecules (in this case, RecA from bacteria) bound sequentially to single-stranded DNA (red). Note the ATP homolog chemicals in yellow, positioned between each protein unit. One can see that the DNA is stretched out a bit and the bases point outwards.

A closeup view of one of the RAD51 units from above, showing how the bases of the DNA (yellow) are splayed out into the medium, ready to find their partners. They are arranged in orientations similar to how they sit in normal (B-form) DNA, further enhancing their ability to find partners.

The second, and more mysterious part of the operation is how RAD51 scans double-stranded DNA throughout the genome. It has binding sites for double-stranded DNA, away from the single-stranded DNA, and then it also has a little finger that splits open the double-stranded DNA, encouraging separation and allowing one strand to face up to the single stranded DNA that is held firmly by the RAD51 polymer. The transient search happens in eight-base increments, with tighter capture of the double-strand DNA happening when nine bases are matched, and committment to recombination or repair happening when a match of fifteen bases is found.  

These structures show an intermediate where a double-stranded DNA (ends in teal and lavender, and separated DNA segments in green and red) has been captured, making a twelve base match with the stable single-stranded DNA (brown). Note how the double-stranded DNA ends are held by outside portions of the RAD51 protein. Closeup on the right shows the dangling, non-paired DNA strand in red, and the newly matched duplex DNA with green-brown colored base interactions.

These structures can only give a hint of what is going on, since the whole process relies so clearly on the brownian motion that allows super-rapid diffusion of the stablized single-strand DNA+RAD51 over the genome, which it scans efficiently in one-dimensional fashion, despite all the chromatin and other proteins parked all over the place. And while the structures provide insight into how the process happens, it remains incredible that this search can happen, on what is clearly a quite reliable basis, day and day out, as our genomes get hit by whatever the environment throws at us.

"Unfortunately, most RAD51 and RAD51 paralog point mutations that have been clinically identified are classified as variants of unknown significance (VUSs). Future studies to reclassify these RAD51 gene family VUSs as pathogenic or benign are desperately needed, as many of these genes are now included on hereditary breast and ovarian cancer screening panels. Reclassification of HR-deficient VUSs would enable these patients to benefit from therapies that specifically target HR deficiency, as do poly(ADP)-ribose polymerase (PARP) inhibitors in BRCA1/2-deficient cells."

Lastly, one paper made the point that clinicians need better understanding of the various mutations that can affect RAD51 itself. Genetic testing now is able to find all of our mutations, but we don't always know what each mutation is capable of doing. Thus deeper studies of RAD51 will have beneficial effects on clinical diagnosis, when particular mutations can be assigned as disease-causing, thus justifying specific therapies that would otherwise not be attempted.


Saturday, March 2, 2024

Ions: A Family Saga

The human genome encodes hundreds of proteins that ferry ions across membranes. How did they get here? How do they work?

As macroscopic beings, we generally think we are composed of tissues like bones, skin, hair, organs. But this modest apparent complexity sits atop a much greater and deeper molecular diversity- of molecules encoded from our genes, and of the chemistry of life. Management of cellular biochemistry requires strict and dynamic control of all its constituents- the many ions and myriad organic molecules that we rely on for energy, defense, and growth. One avenue is careful control across the cellular membrane, setting up persistent differences between inside and outside that define the living cell- one may say life itself. Typical cells have higher levels of potassium inside, and higher levels of sodium and chloride outside, for example. Calcium, for another example, is used commonly for signaling, and is kept at low concentrations in the cytoplasm, while being concentrated in some organelles (such as the sarcoplasmic reticulum in muscle cells) and outside. 

All this is done by a fleet of ion channels, pumps, and other solute carriers, encoded in the genome. We have genes for about 1,555 molecule transporters. Out of a genome of about 20,000 genes, this represents a huge concentration(!) of resources. One family alone, the solute carrier (SLC) family, has 440 members. Many of these are passive channels, which just let their selected cargo through. But many are also co-transporters, which harness the transport of one ion with that of another which may have an actively pumped gradient across the membrane and thus provide an indirect energy source for transfer of the first ion. The SLC family includes channels for glucose, amino acids, neurotransmitters, chloride, cotransport (or anti-transport) of sodium with glucose, calcium, neurotransmitters, hydrogen, and phosphate. Also, metals like zinc, iron, copper, magnesium, molybdate, nucleotides, steroids, drugs/toxins, cholesterol, bile, folate, fatty acids, peptides, sulfate, carbonate, and many others. 

It is clear that these proteins did not just appear out of nowhere. The "intelligent" design people recognize that much, that complex structures, which these are, must have some antecedent process of origination- some explanation, in short. Biologists call the SLC proteins a family because they share clear sequence similarity, which derives, by evolutionary theory, and by the observed diversification of genes and the organisms encoding them over time, from duplication and diversification. This, sadly, is where the "intelligent" design proponents part ways in logic, maintaining perhaps the most pathetic (and pedantic) bit of hooey ever devised by the dogmatic believer: "specified information", which apparently forbids the replication of information.

However, information replicates all the time, thanks to copious inputs of energy from the sun, and the advent of life, which can transform energy into profusions of reproduced/replicated organisms, including replication of all their constituent parts. For our purposes, one side effect of all this replication is error, which can cause unintended replication/duplication of individual genes, which can then diverge in function to provide the species with new vistas of, in this case, ionic regulation. In yeast cells, there are maybe a hundred SLC genes, and fewer in bacteria. So it is apparent that the road to where we are has been a very long one, taking billions of years. Gene duplication is a rare event, and each new birth a painful, experimental project. But a family with so many members shows the fecundity of life, and the critical (that is, naturally selected) importance of these transporters in their diverse roles throughout the body.

A few of the relatives in the SLC26A family, given in one-letter protein sequence from small sections of the much larger protein, around the core ion binding site. You can see that they are, in this alignment, very similar, clearly being in the same family. You can also see that SLC26A9 has "V" in a position in alpha helix 10, which in all other members is a quite basic amino acid like lysine ("K") or arginine ("R"). The authors argue that this difference is one key to the functional differences between it and SLC26A6.

A recent paper showed structures for two SLC family members, which each transport chloride ion, but differ in that one exchanges chloride for bicarbonate, while the other allows chloride through without a matched exchange (though see here). SLC26A9 is expressed in the gut and lung, and apparently helps manage fluid levels by allowing chloride permeability. It is of interest to those with cystic fibrosis, because the gene responsible for that disorder, CFTR, is another transporter, (of the ABC family), and plays a major role doing a similar thing in the same places- exchanging chloride and bicarbonate, which helps manage the pH and fluidity of our mucus in the lung and other organs. SLC26A9, having a related role and location, might be able to fill some of the gap if drugs could be found to increase its expression or activity.

SLC26A6 is expressed in the kidney, pancreas, and gut, and in addition to exchanging bicarbonate for chloride, can also exchange oxalate, which prevents kidney stones. Very little, really, is known about how all these ion transporters are expressed and regulated, what differentiates them, how they relate to each other, and what prompted their divergence through evolution. We are really just in the identification and gross characterization stage. The new paper focuses on the structural mechanisms that differentiate these two particular SLC family members.

Structure of two SLC transporters, each dimeric, and superimposed. The upper parts are set in the membrane, with the lower parts in the cytoplasm. The upper parts combine two domains for each monomer, the "core" and "gate" domains. The channel for the anion threads within the center of each upper part, between these two domains. Note how structurally similar the two family members are, one in green+gray, the other in red+blue.


Schemes of how SLC26A6 works. The gate domain (purple) is stable, while the core domain (green) rocks to provide access from the ion binding site to either outside or inside the cell.

Like any proper ion channel, SLC26A6 sits in the membrane and provides a place for its ion to transiently bind (for careful selection of the right kind of ion) and then to go through. There is a central binding site that is lined specially with a few semi-positively charged amino acids like asparagine (N), glutamine (Q) and arginine (R), which provide an attactive electronic environment for anions like Cl-. The authors describe a probable mechanism of action, (above), whereby the core domain rocks up and down to allow the ion to pass through, after being very sensitively bound and verified. This rocking is not driven by ATP or other outside power, but just by brownian motion, as gated by the ion binding and unbinding steps.

Drilling a little closer into the target ion binding site of SLC26A6. On right is shown Cl- in green, center, with a few of the amino acids that coordinate its specific, but transient, binding in the core domain pocket. 


They draw contrasts between these very closely related channels, in that the binding pocket is deeper and narrower in SLC26A9, allowing the smaller Cl- to bind while not allowing HCO3- to bind as well. There are also numerous differences in the structure of the core protein around the channel that they argue allow coupling of HCO3- transport (to Cl- transport in the other direction) in SLC26A6, while SLC26A9 is uncoupled. One presumes that the form of the ion site can be subtly altered at each end of the rocking motion, so that the preferred ion is bound at each end of the cycle.

While all this work is splitting fine hairs, these are hairs presented to us by evolution. It is evolution that duplicated the precursors to these genes, then retained them while each, over time, developed its fine-tuned differences, including different activities and distinct tissue expression. Indeed, the fully competent, bicarbonate exchanging, SLC26A6 is far more widely expressed, suggesting that SLC26A9 has a more specialized role in the body. To reiterate a point made many times before- having the whole human genome sequenced, or even having atomic structures of all of its encoded proteins, is merely the beginning to understanding what these molecular machines do, and how our bodies really work.


  • A cult.
  • The deep roots of fascism in the American Right.
  • We are at a horrifying inflection point in foreign policy.
  • Instead of subsidizing oil and gas, the industry should be charged for damages.
  • Are we ready for first contact?

Saturday, December 30, 2023

Some Challenges of Biological Modeling

If modeling one small aspect of one cell is this difficult, how much more difficult is it to model whole cells and organisms?

While the biological literature is full of data / knowledge about how cells and organisms work, we remain far from true understanding- the kind of understanding that would allow computer modeling of their processes. This is both a problem of the kind of data, which is largely qualitative and descriptive, and also of amount- that countless processes and enzymes have never had their detailed characteristics evaluated. In the human genome, I would estimate that roughly half its genes have only been described (if at all) in the most rudimentary way, typically by loose analogy to similar ones. And the rest, when studied more closely, present all sorts of other interesting issues that deflect researchers from core data like their enzymatic rate constants and binding constants to other proteins, as might occur under a plethora of different modification, expression, and other regulatory conditions. 

Then how do we get to usable models of cellular activities? Typically, a lot of guessing is involved, to make anything that approaches a computer model. A recent paper offered a novel way to go down this path, which was to ignore all the rate constants and even interactions, and just focus on the measurements we can make more conveniently- whole metabolome assessments. These are experiments where mass spectrometry is used to evaluate the level of all the smaller chemicals in a cell. If such levels are known, perhaps at a few different conditions, then, these authors argue, we can derive models of their mutual regulation- disregarding all the details and just establishing that some sort of feedback system among these metabolic chemicals must exist to keep them at the observed concentrations.

Their experimental subject is a relatively understandable, but by no means simple, system- the management of iron concentrations in yeast cells. Iron is quite toxic, so keeping it at controlled concentrations and in various carefully-constructed complexes is important for any cell. It is used to make heme, which functions not only in hemoglobin, but in several core respiratory enzymes of mitochondria. It also gets placed into iron-sulfur clusters, which are used even more widely, in respiratory enzymes, in the DNA replication, transcription, protein synthesis, and iron assimilation machineries. It is iron's strong and flexible redox chemistry (and its ancient abundance in the rocks and fluids life evolved with) that make it essential as well as dangerous.

Author's model for iron use and regulation in yeast cells. Outside is on left, cytoplasm is blue, vacuole is green, and mitochondrion is yellow. See text below for abbreviations and description. O2 stands for the oxygen  molecule. The various rate constants R refer to the transition between each state or location.

Iron is imported from outside and forms a pool of free iron in the cytoplasm (FC, in the diagram above). From there, it can be stored into membrane-bound vacuoles (F2, F3), or imported to the mitochondria (FM), where it is corporated into iron-sulfur clusters and heme (FS). Some of the mitochondrially assembled iron-sulfur clusters are exported back out to the cytoplasm to be integrated to a variety of proteins there (CIA). This is indeed one of the most essential roles of mitochondria- needed even if metabolic respiration is for some reason not needed (in hypoxic or anaerobic conditions). If there is a dramatic overload of iron, it can build up as rust particles in the mitochondria (MP). And finally, the iron-sulfur complexes contribute to respiration of oxygen in mitochondria, and thus influence the respiration rate of the whole cell.

The task these authors set themselves was to derive a regulatory scheme using only the elements shown above, in combination with known levels of all the metabolites, under the conditions of 1) normal levels of iron, 2) low iron, and 3) a mutant condition- a defect in the yeast gene YFG1, which binds iron inside mitochondria and participates in iron-sulfur cluster assembly. A slew of differential equations later, and selection through millions of possible regulatory circuits, and they come up with the one shown above, where the red lines/arrows indicate positive regulation, and the red lines ending with bars indicate repression. The latter is typically feedback repression, such as of the import of iron, repressed by the amount already in the cell, in the FC pool. 

They show that this model provides accurate control of iron levels at all the various points, with stable behavior, no singularities or wobbling, and the expected responses to the various conditions. In low iron, the vacuole is emptied of iron, and in the mutant case, iron nanoparticles (MP) accumulate in the mitochondrion, due in part to excess amounts of oxygen admitted to the mitochondrial matrix, which in turn is due to defects in metabolic respiration caused by a lack of iron-sulfur clusters. What seemed so simple at the outset does have quite a few wrinkles!

The authors present their best regulatory scheme, selected from among millions, which provides accurate metabolite control in simulation, as shown by key transitions between conditions as shown here, one line per molecular species. See text and image above for abbreviations.


But note that none of this is actually biological. There are no transcription regulators, such as the AFT1/2 proteins known to regulate a large set of iron assimilation genes. There are no enzymes explicitly cited, and no other regulatory mechanisms like protein modifications, protein disposal, etc. Nor does the cytosolic level of iron actually regulate the import machinery- that is done by the level of iron-sulfur clusters in the mitochondria, as sensed by the AFT regulators, among other mechanisms.

Thus it is not all clear what work like this has to offer. It takes the known concentrations of metabolites (which can be ascertained in bulk) to create a toy system that accurately reproduces a very restricted set of variations, limited to what the researchers could assess elsewhere, in lab experiments. It does not inform the biology of what is going on, since it is not based on the biology, and clearly even contravenes it. It does not inform diseases associated with iron metabolism- in this case Friedreich's ataxia which is caused in humans by a gene related to YFH1- because again it is not biologically based. Knowing where some regulatory events might occur in theory, as one could have done almost as well (if not quantitatively!) on a cocktail napkin, is of little help when drugs need to be made against actual enzymes and actual regulators. It is a classic case of looking under the streetlight- working with the data one has, rather than the data one needs to do something useful.

"Like most ODE (ordinary differential equation)-based biochemical models, sufficient kinetic information was unavailable to solve the system rigorously and uniquely, whereas substantial concentration data were available. Relying on concentrations of cellular components increasingly makes sense because such quantitative concentration determinations are becoming increasingly available due to mass-spectrometry-based proteomic and metabolomics studies. In contrast, determining kinetic parameters experimentally for individual biochemical reactions remain an arduous task." ...

"The actual biochemical mechanisms by which gene expression levels are controlled were either too complicated to be employed in autoregulation, or they were unknown. Thus, we decided to augment every regulatable reaction using soft Heaviside functions as surrogate regulatory systems." ...

"We caution that applying the same strategy for selecting viable autoregulatory mechanisms will become increasing difficult computationally as the complexity of models increases."


But the larger point that motivated a review of this paper is the challenge of modeling a system so small as to be almost infinitesimal in the larger scheme of biology. If dedicated modelers, as this laboratory is, dispair of getting the data they need for even such a modest system, (indeed, the mitochondrial iron and sulfur-containing signaling compound that mediates repression of the AFT regulators is still referred to in the literature as "X-S"), then things are bleak indeed for the prospect of modeling higher levels of biology, such as whole cells. Unknowns are unfortunately gaping all over the place. As has been mentioned a few times, molecular biologists tend to think in cartoons, simplifying the relations they deal with to the bare minimum. Getting beyond that is going to take another few quantum leaps in data- the vaunted "omics" revolutions. It will also take better interpolation methods (dare one invoke AI?) that use all the available scraps of biology, not just mathematics, in a Bayesian ratchet that provides iteratively better models. 


Saturday, December 16, 2023

Easy Does it

The eukaryotic ribosome is significantly slower than, and more accurate than, the bacterial ribosome.

Despite the focus, in molecular biology, on interesting molecules like genes and regulators, the most striking thing facing anyone who breaks open cells is the prevalence of ribosomes. Run the cellular proteins or RNAs out on a gel, and bulk of the material is always ribosomal proteins and ribosomal RNAs, along with tRNAs. That is because ribosomes are critically important, immense in size, and quite slow. They are sort of the beating heart of the cell- not the brains, not the energy source, but the big lumpy, ancient, shape-shifting object that pumps out another essential form of life-blood- all the proteins the cell needs to keep going.

With the revolution in structural biology, we have gotten an increasingly clear view of the ribosome, and a recent paper took it up another notch with a structural analysis of how tRNA handling works and how / why it is that the eukaryotic ribosome is about ten times slower than its bacterial progenitor. One of their figures provides a beautiful (if partial) view of each kind of ribosome, showing how well-conserved this structure is, despite the roughly three billion or more years that have elapsed since their divergence into the bacterial and archaeal lineages, from which the eukaryotic ribosome comes. 

Above, the human ribosome, and below, the ribosome of E. coli, a bacterium, in partial views. The perspective is from the back, relative to conventional views, and only a small amount of the large subunit (LSU) appears at the top of each structure, with more of the small subunit (SSU) shown below. Between them is the cleft where tRNAs bind, in a dynamic sequence of incoming rRNA at the A (acceptor) site, then catalysis of peptide bond addition at the P (peptidyl transfer) site, and ejection of the last tRNA at the E (ejection) site. In concert with the conveyor belt of tRNAs going through, the nascent protein is being synthesized in the large subunit and the mRNA is going by, codon by codon, in the small subunit. Note the overall conservation of structure, despite quite a bit of difference in detail.

The ribosome is an RNA machine at its core, with a lot of accessory proteins that were added later on. And it comes in two parts, the large and small subunits. These subunits do different things, do a lot of rolling about relative to each other, and bind a conveyor belt of tRNAs between them. The tRNAs are pre-loaded with an amino acid on one end (top) and an anticodon on the other end (bottom). They also come with a helper protein (EF-Tu in bacterial, eEF1A in eukaryotes), which plays a role later on. The anticodon is a set of three nucleotides that constitute the genetic code, whereby this tRNA is always going to match one codon to a particular amino acid. 

The ribosome doesn't care what the code is or which tRNA comes in. It only cares that the tRNA matches the mRNA held by the small subunit, as transcribed from the DNA. This process is called decoding, and the researchers show some of the differences that make it slower, but also more accurate, in eukaryotes. In bacteria, ribosomes can work at up to 20 amino acids per second, while human ribosomes top out at about 2 amino acids per second. That is pretty slow, for an enzyme! Its accuracy is about one error per thousand to ten thousand codons.

See text for description of this diagram of the ribosomal process. 50 S is the large ribosomal subunit in bacteria (60S in eukaryotes). 30S is the small subunit in bacteria (40S in eukaryotes). S stands for Svedberg units, a unit of sedimentation in high-speed centrifugation, which was used to study proteins at the dawn of molecular biology.

Above is diagrammed the stepwise logic of protein synthesis. The first step is that a tRNA comes in and lands on the empty A site, and tests whether its anticodon sequence fits the codon on the mRNA being threaded through the bottom. This fitting and testing is the key quality control process, and the slower and more selective it is, the more accurate the resulting translation. The EF-Tu/eEF1A+GTP protein holds on to the tRNA at the acceptor (A) position, and only when the fit is good does that fit communicate back up from the small subunit to the large subunit and cause hydrolysis of GTP to GDP, and release of the top of the tRNA, which allows it to swing into position (accommodation) to the catalytic site of the ribosome. This is where the tRNA contributes its amino acid to the growing protein chain. That chain, previously attached to the tRNA in the P site, now is attached to the tRNA in the A site. Now another GTP-binding protein comes in, EF-G (EEF2 in eukaryotes), which bumps the tRNA from the A site to the P site, and simultaneously the mRNA one codon ahead. This also releases whatever was in the E site of the ribosome and frees up the A site to accept another new tRNA.

See text for description. IC = initiation complex, CR = codon recognition complex, GA = GTPase activation complex, AC = accommodated complex. FRET = fluorescence resonance energy transfer. Head and shoulder refer to structural features of the small ribosomal subunit.

These researchers did both detailed structural studies of ribosomes stuck in various positions, and also mounted fluorescent labels at key sites in the P and A sites. These double labels allowed one to be flashed with light, (at its absorbance peak), and the energy to be transferred between them, resulting in fluorescence of light back out from the second fluorophore. The emitted energy from the second fluorophore provides an exquisitely sensitive measure of the distance between the two fluorophores, since its ability to capture light from the first fluorophore is sensitive to distance (cubed). The graph above (right) provides a trace of the fluorescence seen in one ribosomal cycle, as the distance between the two tRNAs changes slightly as the reaction proceeds and the two tRNAs come closer together. This technical method allows real-time analysis of the reaction as it is going along, especially one as slow as this one.

Structures of the ribosome accentuating the tRNA positions in the A, P, and E sites. Note how the green tRNA in the A site starts bent over towards the eEF1A GTPase (blue), as the decoding and quality control are going on, after which it is released and swings over next to the P site tRNA, ready for peptide bond formation. Note also how the structure of the anticodon-codon pairing (pink, bottom) evolves from loose and disordered to tight after the tRNA straightens up.

Above is shown a gross level view in stop-motion of ribosomal progress, achieved with various inhibitors and altered substrates. The mRNA is in pink (insets), and shows how the codon-anticodon match evolves from loose to tight. Note how at first only two bases of the mRNA are well-paired, while all three are paired later on. This reflects in a dim way the genetic code, which has redundancies in the third position for many amino acids, and is thought to have first had only two letters, before transitioning to three letters.

Higher detail on the structures of the tRNAs in the P site and the A site as they progress through the proof-reading phase of protein synthesis. The fluorescence probes are pictured, (Red and green dots), as is more the mRNA strand (pink).

These researchers have a great deal to say about the details of these structures- what differentiates the human from the E. coli ribosome, why the human one is slower and allows more time and more hindrance during the proof-reading step, thereby helping badly matched tRNAs to escape and increasing overall fidelity. For example, how does the GTPase eEF1A, docked to the large subunit, know when a match down at the codon-anticodon pair has been successful down in the small ribosomal subunit?

"Base pairing between the mRNA codon and the aa-tRNA anticodon stem loop (ASL) is verified through a network of ribosomal RNA (rRNA) and protein interactions within the SSU A site known as the decoding centre. Recognition of cognate aa-tRNA closes the SSU shoulder domain towards the SSU body and head domains. Consequent ternary complex engagement of the LSU GTPase-activating centre (GAC), including the catalytic sarcin-ricin loop12 (SRL), induces rearrangements in the GTPase, including switch-I and switch-II remodeling, that trigger GTP hydrolysis"

They note that there seem to be at least two proofreading steps, both in activating the eEF1A and also afterwards, during the large swing of the tRNA towards the P site. And they note novel rolling motions of the human ribosome compared with the bacterial ribosome, to help explain some of its distinctive proofreading abilities, which may be adjustable in humans by regulatory processes. Thus we are gaining ever more detailed window on the heart of this process, which is foundational to the origin of life, central to all cells, and not without medical implications, since many poisons that bacteria have devised attack the ribosome, and several of our current antibiotics do likewise.


Saturday, October 28, 2023

Melting Proteins Through a Wall

Peroxisomes use a trendy way to import their proteins.

As has been discussed many times in this space, membranes are formidable barriers ... at the molecular level. Having a plasma membrane, and organelles enclosed within membranes, means needing to get all sorts of things across them, from the tiniest proton to truly enormous mega-complexes like ribosomes. Almost eight percent of the proteins encoded by the human genome are transporters, that concern themselves with getting molecules from one place to another, typically across membranes. A critical type of molecule to get into organelles is the proteins that belong there, to do their day-in, day-out jobs. 

But proteins are large molecules. There are two ways to go about transporting them across membranes. One is to thread them across linearly, unfolding them in process, and letting them refold once they are across. This is how proteins get into the endoplasmic reticulum, where the long road to secretion generally starts. Ribosomes dock right up to the endoplasmic reticulum membrane and pump their nascent proteins across as they are being synthesized. Easy peasy.

However other organelles don't get this direct (i.e. cotranslational) method of protein import. They have to get already-made full-length proteins lugged across their membranes somehow. Mitochondria, for instance, are replete with hard-working proteins, virtually all of which are encoded in the nucleus and have to be brought in whole, usually through two separate membranes to get into the mitochondrial matrix. There are dedicated transporters, nicknamed the TOM/TIM complexes, that thread incoming proteins (which are detected by short "signal" sequences these proteins carry) through each membrane in turn, and sometimes use additional helpers to get the proteins plugged into the matrix membrane or other final destination. Still, this remains a protein threading process, (of the first transport type), and due to its need to unfold and the later refold every incoming protein, it involves chaperones which specialize in helping those proteins fold correctly afterwards.

Schematic of the nuclear pore. The wavy bits are protein tails that are F-G rich (phenylalanine-glycine) that are unstructured and form a gel throughout the pore, allowing like-minded F-G proteins through, which are the nuclear transport receptors. These receptors carry various cargo proteins in an out of the nucleus, without having to unfold them. "Nup" is short for nuclear pore protein; GLFG is short for glycine, leucine (another hydrophobic amino acid), phenylalanine, glycine.

But there is another way to do it, which was discovered much more recently and is used principally by the nucleus. The nuclear pore had fascinated biologists for decades, but it was only in the early 2000's that this mechanism was revealed. And a recent paper found that peroxisomes also use this second method, which side-steps the need to thread incoming proteins through a pore, and risk all the problems of refolding. This method is to use a curiously constructed gel phase of (protein) matter that shares some properties with membranes, but has the additional property that specifically compatible proteins can melt right through it. 

The secret lies in repetitive regions of protein sequence that carry, in the case of the nuclear pore, lots of F-G sequences. That is, phenylalanine-glycine repeated regions of proteins that form these transit gel structures, or pores. The phenylalanine is hydrophobic, the glycine is flexible, and the protein backbone is polar, though not charged. This adds up to a region that is a totally disordered mess and forms a gel that can keep out most larger molecules, like a membrane. But if encountered by another F-G-rich protein, this gel lets it right through, like a pat of butter through oil. It also tends to let small molecules through quite easily. The nuclear pore is quite permeable to the many chemicals needed for DNA replication, RNA production, etc.

Summary from current paper, making the case that peroxisomes use PEX13 to make something similiar to the nuclear pore, where targeted proteins can traverse easily piggybacked on carrier proteins, in this case PEX5. The yellow spaghetti is the F-G or Y-G protein tails that congregate in the pore to make up a novel (gel) phase of matter. This gel is uniquely permeable to proteins carrying the same F-G or Y-G on their outsides, as does PEX5. "NTR" is short for nuclear targeting receptor, to which nuclear-bound cargoes bind.

Peroxisomes are sites for specialty chemistry, handling some relatively dangerous oxidation reactions including production of some lipids. They combines this with protective enzymes like catalase that quickly degrade the resulting reactive oxidative products. This suggests that the peroxisomal membrane would need to be pretty tight, but the authors state that the gel-style mechanism used here allows anything under 2,000 Daltons through, which certainly includes most chemicals. Probably the solution is that enough protective enzymes, at a high local concentration, are present that the leakage rate of bad chemicals is relatively low. 

Experimenters purify large amounts of the Y-G protein segments from PEX13 and form macroscopic gels out of them. In the center is a control, where the Y residues have been mutated to serine (S). N+YG refers to the N-terminus of the PES13 protein plus the Y-G portion of the proteins, while Y-G alone has only the Y-G segment of the PEX13 protein.

For its gel-containing pore, the peroxisome uses (on a protein called PEX13) tyrosine (Y) in place of phenylalanine, resulting in a disordered gel of Y-G repeats for its structure. Tyrosine is aromatic, (thus hydrophobic) like phenylalanine and tryptophan, and apparently provides enough distinctiveness that nucleus-bound proteins are not mistaken in their destination. The authors state that it provides a slightly denser packing, and by its composition should help prevent nuclear carriers from binding effectively. But it isn't just the Y-G composition that directs proteins, but a suite of other proteins around the peroxisomal and nuclear pores that, I would speculate, help attract their respective carrier proteins (called PEX5 in the case of peroxisomes) so that they know where to go. 

Evolutionary conservation of the Y-G regions of PEX13, over a wide range of species. The semi-regular periodicity of the Y placements suggests that this protein forms alpha helixes with the Y chains exposed on one side, more or less, despite general lack of structure. 

The authors show some very nice experiments, such as making visible gels from purified / large amounts of these proteins, and then showing that these gels indeed block generic proteins, and allow the same protein if fused to PEX5 to come right through. The result shown below is strikingly absolute- without its peroxisome-specific helper, the protein GFP makes no headway into this gel material at all. But with that helper, it can diffuse 100 microns in half an hour. It is like making jello that you can magically pass your hand through, without breaking it up ... but only if you are wearing the magic glove.

Experimental demonstration of transport. Using macroscopic gel plugs like those shown above, the diffusion of green fluorescent protein (GFP) was assayed from a liquid (buffer) into the gel. By itself (center, bottom), GFP makes no headway at all. But when fused to the PEX5 protein, either in part or in whole, it diffuses quite rapidly into the Y-G gel.