Showing posts with label chemistry. Show all posts
Showing posts with label chemistry. Show all posts

Saturday, March 2, 2024

Ions: A Family Saga

The human genome encodes hundreds of proteins that ferry ions across membranes. How did they get here? How do they work?

As macroscopic beings, we generally think we are composed of tissues like bones, skin, hair, organs. But this modest apparent complexity sits atop a much greater and deeper molecular diversity- of molecules encoded from our genes, and of the chemistry of life. Management of cellular biochemistry requires strict and dynamic control of all its constituents- the many ions and myriad organic molecules that we rely on for energy, defense, and growth. One avenue is careful control across the cellular membrane, setting up persistent differences between inside and outside that define the living cell- one may say life itself. Typical cells have higher levels of potassium inside, and higher levels of sodium and chloride outside, for example. Calcium, for another example, is used commonly for signaling, and is kept at low concentrations in the cytoplasm, while being concentrated in some organelles (such as the sarcoplasmic reticulum in muscle cells) and outside. 

All this is done by a fleet of ion channels, pumps, and other solute carriers, encoded in the genome. We have genes for about 1,555 molecule transporters. Out of a genome of about 20,000 genes, this represents a huge concentration(!) of resources. One family alone, the solute carrier (SLC) family, has 440 members. Many of these are passive channels, which just let their selected cargo through. But many are also co-transporters, which harness the transport of one ion with that of another which may have an actively pumped gradient across the membrane and thus provide an indirect energy source for transfer of the first ion. The SLC family includes channels for glucose, amino acids, neurotransmitters, chloride, cotransport (or anti-transport) of sodium with glucose, calcium, neurotransmitters, hydrogen, and phosphate. Also, metals like zinc, iron, copper, magnesium, molybdate, nucleotides, steroids, drugs/toxins, cholesterol, bile, folate, fatty acids, peptides, sulfate, carbonate, and many others. 

It is clear that these proteins did not just appear out of nowhere. The "intelligent" design people recognize that much, that complex structures, which these are, must have some antecedent process of origination- some explanation, in short. Biologists call the SLC proteins a family because they share clear sequence similarity, which derives, by evolutionary theory, and by the observed diversification of genes and the organisms encoding them over time, from duplication and diversification. This, sadly, is where the "intelligent" design proponents part ways in logic, maintaining perhaps the most pathetic (and pedantic) bit of hooey ever devised by the dogmatic believer: "specified information", which apparently forbids the replication of information.

However, information replicates all the time, thanks to copious inputs of energy from the sun, and the advent of life, which can transform energy into profusions of reproduced/replicated organisms, including replication of all their constituent parts. For our purposes, one side effect of all this replication is error, which can cause unintended replication/duplication of individual genes, which can then diverge in function to provide the species with new vistas of, in this case, ionic regulation. In yeast cells, there are maybe a hundred SLC genes, and fewer in bacteria. So it is apparent that the road to where we are has been a very long one, taking billions of years. Gene duplication is a rare event, and each new birth a painful, experimental project. But a family with so many members shows the fecundity of life, and the critical (that is, naturally selected) importance of these transporters in their diverse roles throughout the body.

A few of the relatives in the SLC26A family, given in one-letter protein sequence from small sections of the much larger protein, around the core ion binding site. You can see that they are, in this alignment, very similar, clearly being in the same family. You can also see that SLC26A9 has "V" in a position in alpha helix 10, which in all other members is a quite basic amino acid like lysine ("K") or arginine ("R"). The authors argue that this difference is one key to the functional differences between it and SLC26A6.

A recent paper showed structures for two SLC family members, which each transport chloride ion, but differ in that one exchanges chloride for bicarbonate, while the other allows chloride through without a matched exchange (though see here). SLC26A9 is expressed in the gut and lung, and apparently helps manage fluid levels by allowing chloride permeability. It is of interest to those with cystic fibrosis, because the gene responsible for that disorder, CFTR, is another transporter, (of the ABC family), and plays a major role doing a similar thing in the same places- exchanging chloride and bicarbonate, which helps manage the pH and fluidity of our mucus in the lung and other organs. SLC26A9, having a related role and location, might be able to fill some of the gap if drugs could be found to increase its expression or activity.

SLC26A6 is expressed in the kidney, pancreas, and gut, and in addition to exchanging bicarbonate for chloride, can also exchange oxalate, which prevents kidney stones. Very little, really, is known about how all these ion transporters are expressed and regulated, what differentiates them, how they relate to each other, and what prompted their divergence through evolution. We are really just in the identification and gross characterization stage. The new paper focuses on the structural mechanisms that differentiate these two particular SLC family members.

Structure of two SLC transporters, each dimeric, and superimposed. The upper parts are set in the membrane, with the lower parts in the cytoplasm. The upper parts combine two domains for each monomer, the "core" and "gate" domains. The channel for the anion threads within the center of each upper part, between these two domains. Note how structurally similar the two family members are, one in green+gray, the other in red+blue.


Schemes of how SLC26A6 works. The gate domain (purple) is stable, while the core domain (green) rocks to provide access from the ion binding site to either outside or inside the cell.

Like any proper ion channel, SLC26A6 sits in the membrane and provides a place for its ion to transiently bind (for careful selection of the right kind of ion) and then to go through. There is a central binding site that is lined specially with a few semi-positively charged amino acids like asparagine (N), glutamine (Q) and arginine (R), which provide an attactive electronic environment for anions like Cl-. The authors describe a probable mechanism of action, (above), whereby the core domain rocks up and down to allow the ion to pass through, after being very sensitively bound and verified. This rocking is not driven by ATP or other outside power, but just by brownian motion, as gated by the ion binding and unbinding steps.

Drilling a little closer into the target ion binding site of SLC26A6. On right is shown Cl- in green, center, with a few of the amino acids that coordinate its specific, but transient, binding in the core domain pocket. 


They draw contrasts between these very closely related channels, in that the binding pocket is deeper and narrower in SLC26A9, allowing the smaller Cl- to bind while not allowing HCO3- to bind as well. There are also numerous differences in the structure of the core protein around the channel that they argue allow coupling of HCO3- transport (to Cl- transport in the other direction) in SLC26A6, while SLC26A9 is uncoupled. One presumes that the form of the ion site can be subtly altered at each end of the rocking motion, so that the preferred ion is bound at each end of the cycle.

While all this work is splitting fine hairs, these are hairs presented to us by evolution. It is evolution that duplicated the precursors to these genes, then retained them while each, over time, developed its fine-tuned differences, including different activities and distinct tissue expression. Indeed, the fully competent, bicarbonate exchanging, SLC26A6 is far more widely expressed, suggesting that SLC26A9 has a more specialized role in the body. To reiterate a point made many times before- having the whole human genome sequenced, or even having atomic structures of all of its encoded proteins, is merely the beginning to understanding what these molecular machines do, and how our bodies really work.


  • A cult.
  • The deep roots of fascism in the American Right.
  • We are at a horrifying inflection point in foreign policy.
  • Instead of subsidizing oil and gas, the industry should be charged for damages.
  • Are we ready for first contact?

Saturday, December 30, 2023

Some Challenges of Biological Modeling

If modeling one small aspect of one cell is this difficult, how much more difficult is it to model whole cells and organisms?

While the biological literature is full of data / knowledge about how cells and organisms work, we remain far from true understanding- the kind of understanding that would allow computer modeling of their processes. This is both a problem of the kind of data, which is largely qualitative and descriptive, and also of amount- that countless processes and enzymes have never had their detailed characteristics evaluated. In the human genome, I would estimate that roughly half its genes have only been described (if at all) in the most rudimentary way, typically by loose analogy to similar ones. And the rest, when studied more closely, present all sorts of other interesting issues that deflect researchers from core data like their enzymatic rate constants and binding constants to other proteins, as might occur under a plethora of different modification, expression, and other regulatory conditions. 

Then how do we get to usable models of cellular activities? Typically, a lot of guessing is involved, to make anything that approaches a computer model. A recent paper offered a novel way to go down this path, which was to ignore all the rate constants and even interactions, and just focus on the measurements we can make more conveniently- whole metabolome assessments. These are experiments where mass spectrometry is used to evaluate the level of all the smaller chemicals in a cell. If such levels are known, perhaps at a few different conditions, then, these authors argue, we can derive models of their mutual regulation- disregarding all the details and just establishing that some sort of feedback system among these metabolic chemicals must exist to keep them at the observed concentrations.

Their experimental subject is a relatively understandable, but by no means simple, system- the management of iron concentrations in yeast cells. Iron is quite toxic, so keeping it at controlled concentrations and in various carefully-constructed complexes is important for any cell. It is used to make heme, which functions not only in hemoglobin, but in several core respiratory enzymes of mitochondria. It also gets placed into iron-sulfur clusters, which are used even more widely, in respiratory enzymes, in the DNA replication, transcription, protein synthesis, and iron assimilation machineries. It is iron's strong and flexible redox chemistry (and its ancient abundance in the rocks and fluids life evolved with) that make it essential as well as dangerous.

Author's model for iron use and regulation in yeast cells. Outside is on left, cytoplasm is blue, vacuole is green, and mitochondrion is yellow. See text below for abbreviations and description. O2 stands for the oxygen  molecule. The various rate constants R refer to the transition between each state or location.

Iron is imported from outside and forms a pool of free iron in the cytoplasm (FC, in the diagram above). From there, it can be stored into membrane-bound vacuoles (F2, F3), or imported to the mitochondria (FM), where it is corporated into iron-sulfur clusters and heme (FS). Some of the mitochondrially assembled iron-sulfur clusters are exported back out to the cytoplasm to be integrated to a variety of proteins there (CIA). This is indeed one of the most essential roles of mitochondria- needed even if metabolic respiration is for some reason not needed (in hypoxic or anaerobic conditions). If there is a dramatic overload of iron, it can build up as rust particles in the mitochondria (MP). And finally, the iron-sulfur complexes contribute to respiration of oxygen in mitochondria, and thus influence the respiration rate of the whole cell.

The task these authors set themselves was to derive a regulatory scheme using only the elements shown above, in combination with known levels of all the metabolites, under the conditions of 1) normal levels of iron, 2) low iron, and 3) a mutant condition- a defect in the yeast gene YFG1, which binds iron inside mitochondria and participates in iron-sulfur cluster assembly. A slew of differential equations later, and selection through millions of possible regulatory circuits, and they come up with the one shown above, where the red lines/arrows indicate positive regulation, and the red lines ending with bars indicate repression. The latter is typically feedback repression, such as of the import of iron, repressed by the amount already in the cell, in the FC pool. 

They show that this model provides accurate control of iron levels at all the various points, with stable behavior, no singularities or wobbling, and the expected responses to the various conditions. In low iron, the vacuole is emptied of iron, and in the mutant case, iron nanoparticles (MP) accumulate in the mitochondrion, due in part to excess amounts of oxygen admitted to the mitochondrial matrix, which in turn is due to defects in metabolic respiration caused by a lack of iron-sulfur clusters. What seemed so simple at the outset does have quite a few wrinkles!

The authors present their best regulatory scheme, selected from among millions, which provides accurate metabolite control in simulation, as shown by key transitions between conditions as shown here, one line per molecular species. See text and image above for abbreviations.


But note that none of this is actually biological. There are no transcription regulators, such as the AFT1/2 proteins known to regulate a large set of iron assimilation genes. There are no enzymes explicitly cited, and no other regulatory mechanisms like protein modifications, protein disposal, etc. Nor does the cytosolic level of iron actually regulate the import machinery- that is done by the level of iron-sulfur clusters in the mitochondria, as sensed by the AFT regulators, among other mechanisms.

Thus it is not all clear what work like this has to offer. It takes the known concentrations of metabolites (which can be ascertained in bulk) to create a toy system that accurately reproduces a very restricted set of variations, limited to what the researchers could assess elsewhere, in lab experiments. It does not inform the biology of what is going on, since it is not based on the biology, and clearly even contravenes it. It does not inform diseases associated with iron metabolism- in this case Friedreich's ataxia which is caused in humans by a gene related to YFH1- because again it is not biologically based. Knowing where some regulatory events might occur in theory, as one could have done almost as well (if not quantitatively!) on a cocktail napkin, is of little help when drugs need to be made against actual enzymes and actual regulators. It is a classic case of looking under the streetlight- working with the data one has, rather than the data one needs to do something useful.

"Like most ODE (ordinary differential equation)-based biochemical models, sufficient kinetic information was unavailable to solve the system rigorously and uniquely, whereas substantial concentration data were available. Relying on concentrations of cellular components increasingly makes sense because such quantitative concentration determinations are becoming increasingly available due to mass-spectrometry-based proteomic and metabolomics studies. In contrast, determining kinetic parameters experimentally for individual biochemical reactions remain an arduous task." ...

"The actual biochemical mechanisms by which gene expression levels are controlled were either too complicated to be employed in autoregulation, or they were unknown. Thus, we decided to augment every regulatable reaction using soft Heaviside functions as surrogate regulatory systems." ...

"We caution that applying the same strategy for selecting viable autoregulatory mechanisms will become increasing difficult computationally as the complexity of models increases."


But the larger point that motivated a review of this paper is the challenge of modeling a system so small as to be almost infinitesimal in the larger scheme of biology. If dedicated modelers, as this laboratory is, dispair of getting the data they need for even such a modest system, (indeed, the mitochondrial iron and sulfur-containing signaling compound that mediates repression of the AFT regulators is still referred to in the literature as "X-S"), then things are bleak indeed for the prospect of modeling higher levels of biology, such as whole cells. Unknowns are unfortunately gaping all over the place. As has been mentioned a few times, molecular biologists tend to think in cartoons, simplifying the relations they deal with to the bare minimum. Getting beyond that is going to take another few quantum leaps in data- the vaunted "omics" revolutions. It will also take better interpolation methods (dare one invoke AI?) that use all the available scraps of biology, not just mathematics, in a Bayesian ratchet that provides iteratively better models. 


Saturday, December 16, 2023

Easy Does it

The eukaryotic ribosome is significantly slower than, and more accurate than, the bacterial ribosome.

Despite the focus, in molecular biology, on interesting molecules like genes and regulators, the most striking thing facing anyone who breaks open cells is the prevalence of ribosomes. Run the cellular proteins or RNAs out on a gel, and bulk of the material is always ribosomal proteins and ribosomal RNAs, along with tRNAs. That is because ribosomes are critically important, immense in size, and quite slow. They are sort of the beating heart of the cell- not the brains, not the energy source, but the big lumpy, ancient, shape-shifting object that pumps out another essential form of life-blood- all the proteins the cell needs to keep going.

With the revolution in structural biology, we have gotten an increasingly clear view of the ribosome, and a recent paper took it up another notch with a structural analysis of how tRNA handling works and how / why it is that the eukaryotic ribosome is about ten times slower than its bacterial progenitor. One of their figures provides a beautiful (if partial) view of each kind of ribosome, showing how well-conserved this structure is, despite the roughly three billion or more years that have elapsed since their divergence into the bacterial and archaeal lineages, from which the eukaryotic ribosome comes. 

Above, the human ribosome, and below, the ribosome of E. coli, a bacterium, in partial views. The perspective is from the back, relative to conventional views, and only a small amount of the large subunit (LSU) appears at the top of each structure, with more of the small subunit (SSU) shown below. Between them is the cleft where tRNAs bind, in a dynamic sequence of incoming rRNA at the A (acceptor) site, then catalysis of peptide bond addition at the P (peptidyl transfer) site, and ejection of the last tRNA at the E (ejection) site. In concert with the conveyor belt of tRNAs going through, the nascent protein is being synthesized in the large subunit and the mRNA is going by, codon by codon, in the small subunit. Note the overall conservation of structure, despite quite a bit of difference in detail.

The ribosome is an RNA machine at its core, with a lot of accessory proteins that were added later on. And it comes in two parts, the large and small subunits. These subunits do different things, do a lot of rolling about relative to each other, and bind a conveyor belt of tRNAs between them. The tRNAs are pre-loaded with an amino acid on one end (top) and an anticodon on the other end (bottom). They also come with a helper protein (EF-Tu in bacterial, eEF1A in eukaryotes), which plays a role later on. The anticodon is a set of three nucleotides that constitute the genetic code, whereby this tRNA is always going to match one codon to a particular amino acid. 

The ribosome doesn't care what the code is or which tRNA comes in. It only cares that the tRNA matches the mRNA held by the small subunit, as transcribed from the DNA. This process is called decoding, and the researchers show some of the differences that make it slower, but also more accurate, in eukaryotes. In bacteria, ribosomes can work at up to 20 amino acids per second, while human ribosomes top out at about 2 amino acids per second. That is pretty slow, for an enzyme! Its accuracy is about one error per thousand to ten thousand codons.

See text for description of this diagram of the ribosomal process. 50 S is the large ribosomal subunit in bacteria (60S in eukaryotes). 30S is the small subunit in bacteria (40S in eukaryotes). S stands for Svedberg units, a unit of sedimentation in high-speed centrifugation, which was used to study proteins at the dawn of molecular biology.

Above is diagrammed the stepwise logic of protein synthesis. The first step is that a tRNA comes in and lands on the empty A site, and tests whether its anticodon sequence fits the codon on the mRNA being threaded through the bottom. This fitting and testing is the key quality control process, and the slower and more selective it is, the more accurate the resulting translation. The EF-Tu/eEF1A+GTP protein holds on to the tRNA at the acceptor (A) position, and only when the fit is good does that fit communicate back up from the small subunit to the large subunit and cause hydrolysis of GTP to GDP, and release of the top of the tRNA, which allows it to swing into position (accommodation) to the catalytic site of the ribosome. This is where the tRNA contributes its amino acid to the growing protein chain. That chain, previously attached to the tRNA in the P site, now is attached to the tRNA in the A site. Now another GTP-binding protein comes in, EF-G (EEF2 in eukaryotes), which bumps the tRNA from the A site to the P site, and simultaneously the mRNA one codon ahead. This also releases whatever was in the E site of the ribosome and frees up the A site to accept another new tRNA.

See text for description. IC = initiation complex, CR = codon recognition complex, GA = GTPase activation complex, AC = accommodated complex. FRET = fluorescence resonance energy transfer. Head and shoulder refer to structural features of the small ribosomal subunit.

These researchers did both detailed structural studies of ribosomes stuck in various positions, and also mounted fluorescent labels at key sites in the P and A sites. These double labels allowed one to be flashed with light, (at its absorbance peak), and the energy to be transferred between them, resulting in fluorescence of light back out from the second fluorophore. The emitted energy from the second fluorophore provides an exquisitely sensitive measure of the distance between the two fluorophores, since its ability to capture light from the first fluorophore is sensitive to distance (cubed). The graph above (right) provides a trace of the fluorescence seen in one ribosomal cycle, as the distance between the two tRNAs changes slightly as the reaction proceeds and the two tRNAs come closer together. This technical method allows real-time analysis of the reaction as it is going along, especially one as slow as this one.

Structures of the ribosome accentuating the tRNA positions in the A, P, and E sites. Note how the green tRNA in the A site starts bent over towards the eEF1A GTPase (blue), as the decoding and quality control are going on, after which it is released and swings over next to the P site tRNA, ready for peptide bond formation. Note also how the structure of the anticodon-codon pairing (pink, bottom) evolves from loose and disordered to tight after the tRNA straightens up.

Above is shown a gross level view in stop-motion of ribosomal progress, achieved with various inhibitors and altered substrates. The mRNA is in pink (insets), and shows how the codon-anticodon match evolves from loose to tight. Note how at first only two bases of the mRNA are well-paired, while all three are paired later on. This reflects in a dim way the genetic code, which has redundancies in the third position for many amino acids, and is thought to have first had only two letters, before transitioning to three letters.

Higher detail on the structures of the tRNAs in the P site and the A site as they progress through the proof-reading phase of protein synthesis. The fluorescence probes are pictured, (Red and green dots), as is more the mRNA strand (pink).

These researchers have a great deal to say about the details of these structures- what differentiates the human from the E. coli ribosome, why the human one is slower and allows more time and more hindrance during the proof-reading step, thereby helping badly matched tRNAs to escape and increasing overall fidelity. For example, how does the GTPase eEF1A, docked to the large subunit, know when a match down at the codon-anticodon pair has been successful down in the small ribosomal subunit?

"Base pairing between the mRNA codon and the aa-tRNA anticodon stem loop (ASL) is verified through a network of ribosomal RNA (rRNA) and protein interactions within the SSU A site known as the decoding centre. Recognition of cognate aa-tRNA closes the SSU shoulder domain towards the SSU body and head domains. Consequent ternary complex engagement of the LSU GTPase-activating centre (GAC), including the catalytic sarcin-ricin loop12 (SRL), induces rearrangements in the GTPase, including switch-I and switch-II remodeling, that trigger GTP hydrolysis"

They note that there seem to be at least two proofreading steps, both in activating the eEF1A and also afterwards, during the large swing of the tRNA towards the P site. And they note novel rolling motions of the human ribosome compared with the bacterial ribosome, to help explain some of its distinctive proofreading abilities, which may be adjustable in humans by regulatory processes. Thus we are gaining ever more detailed window on the heart of this process, which is foundational to the origin of life, central to all cells, and not without medical implications, since many poisons that bacteria have devised attack the ribosome, and several of our current antibiotics do likewise.


Saturday, October 28, 2023

Melting Proteins Through a Wall

Peroxisomes use a trendy way to import their proteins.

As has been discussed many times in this space, membranes are formidable barriers ... at the molecular level. Having a plasma membrane, and organelles enclosed within membranes, means needing to get all sorts of things across them, from the tiniest proton to truly enormous mega-complexes like ribosomes. Almost eight percent of the proteins encoded by the human genome are transporters, that concern themselves with getting molecules from one place to another, typically across membranes. A critical type of molecule to get into organelles is the proteins that belong there, to do their day-in, day-out jobs. 

But proteins are large molecules. There are two ways to go about transporting them across membranes. One is to thread them across linearly, unfolding them in process, and letting them refold once they are across. This is how proteins get into the endoplasmic reticulum, where the long road to secretion generally starts. Ribosomes dock right up to the endoplasmic reticulum membrane and pump their nascent proteins across as they are being synthesized. Easy peasy.

However other organelles don't get this direct (i.e. cotranslational) method of protein import. They have to get already-made full-length proteins lugged across their membranes somehow. Mitochondria, for instance, are replete with hard-working proteins, virtually all of which are encoded in the nucleus and have to be brought in whole, usually through two separate membranes to get into the mitochondrial matrix. There are dedicated transporters, nicknamed the TOM/TIM complexes, that thread incoming proteins (which are detected by short "signal" sequences these proteins carry) through each membrane in turn, and sometimes use additional helpers to get the proteins plugged into the matrix membrane or other final destination. Still, this remains a protein threading process, (of the first transport type), and due to its need to unfold and the later refold every incoming protein, it involves chaperones which specialize in helping those proteins fold correctly afterwards.

Schematic of the nuclear pore. The wavy bits are protein tails that are F-G rich (phenylalanine-glycine) that are unstructured and form a gel throughout the pore, allowing like-minded F-G proteins through, which are the nuclear transport receptors. These receptors carry various cargo proteins in an out of the nucleus, without having to unfold them. "Nup" is short for nuclear pore protein; GLFG is short for glycine, leucine (another hydrophobic amino acid), phenylalanine, glycine.

But there is another way to do it, which was discovered much more recently and is used principally by the nucleus. The nuclear pore had fascinated biologists for decades, but it was only in the early 2000's that this mechanism was revealed. And a recent paper found that peroxisomes also use this second method, which side-steps the need to thread incoming proteins through a pore, and risk all the problems of refolding. This method is to use a curiously constructed gel phase of (protein) matter that shares some properties with membranes, but has the additional property that specifically compatible proteins can melt right through it. 

The secret lies in repetitive regions of protein sequence that carry, in the case of the nuclear pore, lots of F-G sequences. That is, phenylalanine-glycine repeated regions of proteins that form these transit gel structures, or pores. The phenylalanine is hydrophobic, the glycine is flexible, and the protein backbone is polar, though not charged. This adds up to a region that is a totally disordered mess and forms a gel that can keep out most larger molecules, like a membrane. But if encountered by another F-G-rich protein, this gel lets it right through, like a pat of butter through oil. It also tends to let small molecules through quite easily. The nuclear pore is quite permeable to the many chemicals needed for DNA replication, RNA production, etc.

Summary from current paper, making the case that peroxisomes use PEX13 to make something similiar to the nuclear pore, where targeted proteins can traverse easily piggybacked on carrier proteins, in this case PEX5. The yellow spaghetti is the F-G or Y-G protein tails that congregate in the pore to make up a novel (gel) phase of matter. This gel is uniquely permeable to proteins carrying the same F-G or Y-G on their outsides, as does PEX5. "NTR" is short for nuclear targeting receptor, to which nuclear-bound cargoes bind.

Peroxisomes are sites for specialty chemistry, handling some relatively dangerous oxidation reactions including production of some lipids. They combines this with protective enzymes like catalase that quickly degrade the resulting reactive oxidative products. This suggests that the peroxisomal membrane would need to be pretty tight, but the authors state that the gel-style mechanism used here allows anything under 2,000 Daltons through, which certainly includes most chemicals. Probably the solution is that enough protective enzymes, at a high local concentration, are present that the leakage rate of bad chemicals is relatively low. 

Experimenters purify large amounts of the Y-G protein segments from PEX13 and form macroscopic gels out of them. In the center is a control, where the Y residues have been mutated to serine (S). N+YG refers to the N-terminus of the PES13 protein plus the Y-G portion of the proteins, while Y-G alone has only the Y-G segment of the PEX13 protein.

For its gel-containing pore, the peroxisome uses (on a protein called PEX13) tyrosine (Y) in place of phenylalanine, resulting in a disordered gel of Y-G repeats for its structure. Tyrosine is aromatic, (thus hydrophobic) like phenylalanine and tryptophan, and apparently provides enough distinctiveness that nucleus-bound proteins are not mistaken in their destination. The authors state that it provides a slightly denser packing, and by its composition should help prevent nuclear carriers from binding effectively. But it isn't just the Y-G composition that directs proteins, but a suite of other proteins around the peroxisomal and nuclear pores that, I would speculate, help attract their respective carrier proteins (called PEX5 in the case of peroxisomes) so that they know where to go. 

Evolutionary conservation of the Y-G regions of PEX13, over a wide range of species. The semi-regular periodicity of the Y placements suggests that this protein forms alpha helixes with the Y chains exposed on one side, more or less, despite general lack of structure. 

The authors show some very nice experiments, such as making visible gels from purified / large amounts of these proteins, and then showing that these gels indeed block generic proteins, and allow the same protein if fused to PEX5 to come right through. The result shown below is strikingly absolute- without its peroxisome-specific helper, the protein GFP makes no headway into this gel material at all. But with that helper, it can diffuse 100 microns in half an hour. It is like making jello that you can magically pass your hand through, without breaking it up ... but only if you are wearing the magic glove.

Experimental demonstration of transport. Using macroscopic gel plugs like those shown above, the diffusion of green fluorescent protein (GFP) was assayed from a liquid (buffer) into the gel. By itself (center, bottom), GFP makes no headway at all. But when fused to the PEX5 protein, either in part or in whole, it diffuses quite rapidly into the Y-G gel.

Sunday, August 27, 2023

Better Red Than Dead

Some cyanobacteria strain for photosynthetic efficiency at the red end of the light spectrum.

The plant world is green around us- why green, and not some other color, like, say, black? That plants are green means that they are letting green light through (or out by reflection), giving up some energy. Chlorophyll absorbs both red light and blue light, but not green, though all are near the peak of solar output. Some accessory pigments within the light-gathering antenna complexes can extend the range of wavelenghts absorbed, but clearly a fair amount of green light gets through. A recent theory suggests that this use of two separated bands of light is an optimal solution to stabilize power output. At any rate, it is not just the green light- the extra energy of the blue light is also thrown away as heat- its excitation is allowed to decay to the red level of excitation, within the antenna complex of chlorophyll molecules, since the only excited state used in photosynthesis is that at ~690 nm. This forms a uniform common denominator for all incoming light energy that then induces charge separation at the oxygen reaction center, (stripping water of electrons and protons), and sends newly energized electrons out to quinone molecules and on into the biosynthetic apparatus.

The solar output, which plants have to work with.

Fine. But what if you live deeper in the water, or in the veins of a rock, or in a mossy, shady nook? What if all you have access to is deeper red light, like at 720 nm, with lower energy than the standard input? In that case, you might want to re-engineer your version of photosynthesis to get by with slightly lower-energy light, while getting the same end results of oxygen splitting and carbon fixation. A few cyanobacteria (the same bacterial lineage that pioneered chlorophyll and the standard photosynthesis we know so well) have done just that, and a recent paper discusses the tradeoffs involved, which are of two different types.

The chlorophylls with respective absorption spectra and partial structures. Redder light is toward the right. Chlorophyll a is one used most widely in plants and cyanobacteria. Chlorophyll b is also widely used in these organisms as an additional antenna pigment that extends the range of absorbed light. Chlorophylls d and f are red-shifted and used in specialized species discussed here. 

One of the species, Chroococcidiopsis thermalis, is able to switch states, from bright/white light absorbtion with normal array of pigments, to a second state where it expresses chlorophylls d and f, which absorb light at the lower energy 720 nm, in the far red. This "facultative" ability means that it can optimize the low-light state without much regard to efficiency or photo-damage protection, which it can address by switching back to the high energy wavelength pigment system. The other species is Acaryochloris marina, which has no bright light system, but only chlorophyll d. This bacterium lives inside the cells of bigger red algae, so has a relatively stable, if shaded, environment to deal with.

What these and prior researchers found was that the ultimate quantum energy used to split water to O2, and to send energized electrons off the photosystem I and carbon compound synthesis, is the same as in any other chlorophyll a-using system. The energetics of those parts of the system apparently can not be changed. The shortfall needs to be made up in the front end, where there is a sharp drop in energy from that absorbed- 1.82 electron volts (eV) from photons at 680 nm (but only 1.72 eV from far-red photons)- and that needed at the next points in the electron transport chains (about 1.0 eV). This difference plays a large role in directing those electrons to where the plant wants them to go- down the gradient to the oxygen-evolving center, and to the quinones that ferry energized electrons to other synthetic centers. While it seems like more waste, a smaller difference allows the energized electrons to go astray, forming chemical radicals and other products dangerous to the cell. 

Summary diagram, described in text. Energy levels are described for photon excitation of chlorophyll (Chl, left axis, and energy transitions through the reaction center (Phe- pheophytin), and quinones (Q) that conduct energized electrons out to the other photosynthetic center and biosynthesis. On top are shown the respective system types- normal chlorophyll a from white-light adapted C. thermalis, chlorophyll d in A. marina, and chlorophyll f in red-adapted C. thermalis. 

What these researchers summarize in the end is that both of the red light-using cyanobacteria squeeze this middle zone of the power gradient in different ways. There is an intermediate event in the trail from photon-induced electron excitation to the outgoing quinone (+ electron) and O2 that is the target of all the antenna chlorophylls- the photosynthetic reaction center. This typically has chlorophyll a (called P680) and pheophytin, a chlorophyll-like molecule. It is at this chlorophyll a molecule that the key step takes place- the excitation energy (an electron bumped to a higher energy level) conducted in from the antenna of ~30 other chlorophylls pops out its excited electron, which flits over to the pheophytin, then thence to the carrier quinone molecules and photosystem I. Simultaneously, an electron comes in to replace it from the oxygen-evolving center, which receives alternate units of photon energy, also from the chlorophyll/pheophytin reaction center. The figure above describes these steps in energetic terms, from the original excited state, to the pheophytin (Phe-, loss of 0.16 eV) to the exiting quinone state (Qa-, loss of 0.385 eV). In the organisms discussed here, chlorophyll d replaces a at this center, and since its structure is different and absorbance is different, its energized electron is about 0.1 eV less energetic. 

In A. marina, (center in the diagram above), the energy gap between the pheophytin and the quinone is squeezed, losing about 0.06 eV. This has the effect of losing some of the downward "slope" on the energy landscape that prevents side reactions. Since A. marina has no choice but to use this lower energy system, it needs all the efficiency it can get, in terms of the transfer from chlorophyll to pheopytin. But it then sacrifices some driving force from the next step to the quinone. This has the ultimate effect of raising damage levels and side reactions when faced with more intense light. However, given its typically stable and symbiotic life style, that is a reasonable tradeoff.

On the other hand, C. thermalis (right-most in the diagram above) uses its chlorophyll d/f system on an optional basis when the light is bad. So it can give up some efficiency (in driving pheophytin electron acceptance) for better damage control. It has dramatically squeezed the gap between chlorophyll and pheophytin, from 0.16 eV to 0.08 eV, while keeping the main pheophytin-to-quinone gap unchanged. This has the effect of keeping the pumping of electrons out to the quinones in good condition, with low side-effect damage, but restricts overall efficiency, slowing the rate of excitation transfer to pheophytin, which affects not only the quinone-mediated path of energy to photosystem I, but also the path to the oxygen evolving center. The authors mention that this cyanobacterium recovers some efficiency by making extra light-harvesting pigments that provide more inputs, under these low / far-red light conditions.

The methods used to study all this were mostly based on fluorescence, which emerges from the photosynthetic system when electrons fall back from their excited states. A variety of inhibitors have been developed to prevent electron transfer, such as to the quinones, which bottles up the system and causes increased fluorescence and thermoluminescence, whose wavelengths reveal the energy gaps causing them. Thus it is natural, though also impressive, that light provides such an incisive and precise tool to study this light-driven system. There has been much talk that these far red-adapted photosynthetic organisms validate the possibility of life around dim stars, including red dwarves. But obviously these particular systems developed evolutionarily out of the dominant chlorophyll a-based system, so wouldn't provide a direct path. There are other chlorophyll systems in bacteria, however, and systems that predate the use of oxygen as the electron source, so there are doubtless many ways to skin this cat.


  • Maybe humiliating Russia would not be such a bad thing.
  • Republicans might benefit from reading the Federalist Papers.
  • Fanny Willis schools Meadows on the Hatch act.
  • "The top 1% of households are responsible for more emissions (15-17%) than the lower earning half of American households put together (14% of national emissions)."

Sunday, July 30, 2023

To Sleep- Perchance to Inactivate OX2R

The perils of developing sleeping, or anti-sleeping, drugs.

Sleep- the elixir of rest and repose. While we know of many good things that happen during sleep- the consolidation of memories, the cardiovascular rest, the hormonal and immune resetting, the slow waves and glymphatic cleansing of the brain- we don't know yet why it is absolutely essential, and lethal if repeatedly denied. Civilized life tends to damage our sleep habits, given artificial light and the endless distractions we have devised, leading to chronic sleeplessness and a spiral of narcotic drug consumption. Some conditions and mutations, like narcolepsy, have offered clues about how sleep is regulated, which has led to new treatments, though to be honest, good sleep hygiene is by far the best remedy.

Genetic narcolepsy was found to be due to mutations in the second receptor of the hormone orexin (OX2R), or also due to auto-immune conditions that kill off a specialized set of neurons in the hypothalamus- a basal part of the brain that sits just over the brain stem. This region normally has ~ 50,000 neurons that secrete orexin (which comes in two kinds as well, 1 and 2), and project to areas all over the brain, especially basal areas like the basal forebrain and amygdala, to regulate not just sleep but feeding, mood, reward, memory, and learning. Like any hormone receptor, the orexin receptors can be approached in two ways- by turning them on (agonist) or by turning them off (antagonist). Antagonist drugs were developed which turn off both orexin receptors, and thus promote sleep. The first was named suvorexant, using the "orex" and "ant" lexical elements to mark its functions, which is now standard for generic drug names

 This drug is moderately effective, and is a true sleep enhancer, promoting falling to sleep, restful sleep, and length of sleep, unlike some other sleep aids. Suvorexant antagonizes both receptors, but the researchers knew that only the deletion of OX2R, not OX1R, (in dogs, mice, and other animals), generates narcolepsy, so they developed a drug more specific to OX2R only. But the result was that it was less effective. It turned out that binding and turning off OX1R was helpful to sleep promotion, and there were no particularly bad side effects from binding both receptors, despite the wide ranging activities they appear to have. So while the trial of Merck's MK-1064 was successful, it was not better than their exising two-receptor drug, so its development was shelved. And we learned something intriguing about this system. While all animals have some kind of orexin, only mammals have the second orexin family member and receptor, suggesting that some interesting, but not complete, bifurcation happened in the functions of this system in evolution. 

What got me interested in this topic was a brief article from yet another drug company, Takeda, which was testing an agonist against the orexin receptors in an effort to treat narcolepsy. They created TAK-994, which binds to OX2R specifically, and showed a lot of promise in animal trials. It is a pill form, orally taken drug, in contrast to the existing treatment, danavorexton, which must be injected. In the human trial, it was remarkably effective, virtually eliminating cataleptic / narcoleptic episodes. But there was a problem- it caused enough liver toxicity that the trial was stopped and the drug shelved. Presumably, this company will try again, making variants of this compound that retain affinity and activity but not the toxicity. 

This brings up an underappreciated peril in drug design- where drugs end up. Drugs don't just go into our systems, hopefully slipping through the incredibly difficult gauntlet of our digestive system. But they all need to go somewhere after they have done their jobs, as well. Some drugs are hydrophilic enough, and generally inert enough, that they partition into the urine by dilution and don't have any further metabolic events. Most, however, are recognized by our internal detoxification systems as foreign, (that is, hydrophobic, but not recognizable as fats/lipids that are usual nutrients), and are derivatized by liver enzymes and sent out in the bile. 

Structure of TAK-994, which treats narcolepsy, but at the cost of liver dysfunction.

As you can see from the chemical structure above, TAK-994 is not a normal compound that might be encountered in the body, or as food. The amino sulfate is quite unusual, and the fluorines sprinkled about are totally unnatural. This would be a red flag substance, like the various PFAS materials we hear about in the news. The rings and fluorines create a relatively hydrophobic substance, which would need to be modified so that it can be routed out of the body. That is what a key enzyme of the liver, CYP3A4 does. It (and many family members that have arisen over evolutionary time) oxidizes all manner of foreign hydrophobic compounds, using a heme cofactor to handle the oxygen. It can add OH- groups (hydroxylation), break open double bonds (epoxidation), and break open phenol ring structures (aromatic oxidation). 

But then what? Evolution has met most of the toxic substances we meet with in nature with appropriate enzymes and routes out of the body. But these novel compounds we are making with modern chemistry are something else altogether. Some drugs are turned on by this process, waiting till they get to the liver to attain their active form. Others, apparently such as this one, are made into toxic compounds (as yet unknown) by this process, such that the liver is damaged. That is why animal studies and safety trials are so important. This drug binds to its target receptor, and does what it is supposed to do, but that isn't enough to be a good drug. 

 

Saturday, June 10, 2023

A Hard Road to a Cancer Drug

The long and winding story of the oncogene KRAS and its new drug, sotorasib.

After half a century of the "War on Cancer", new treatments are finally straggling into the clinic. It has been an extremely hard and frustrating road to study cancer, let alone treat it. We have learned amazing things, but mostly we have learned how convoluted a few billion years of evolution can make things. The regulatory landscape within our cells is undoubtedly the equal of any recalcitrant bureaucracy, full of redundant offices, multiple veto points, and stakeholders with obscure agendas. I recently watched a seminar in the field, which discussed one of the major genes mutated in cancer and what it has taken to develop a treatment against it. 

Cancer is caused by DNA mutations, and several different types need to occur in succession. There are driver mutations, which are the first step in the loss of normal cellular control. But additional mutations have to happen for such cells to progress through regulatory blocks, like escape from local environmental controls on cell type and cell division, past surveillance by the immune system, and past the reluctance of differentiated cells to migrate away from their resident organ. By the end, cancer cells typically have huge numbers of mutations, having incurred mutations in their DNA repair machinery in an adaptive effort to evade all these different controls.

While this means that many different targets exist that can treat some cancers, it also means that any single cancer requires a precisely tailored treatment, specific to its mutated genes. And that resistance is virtually inevitable given the highly mutable nature of these cells. 

One of the most common genes to be mutated to drive cancer (in roughly 20% of all cases) is KRAS, part of the RAS family of NRAS, KRAS, and HRAS. These were originally discovered through viruses that cause cancer in rats. These viruses (such as Kirsten rat sarcoma virus) had a copy of a rat gene in it, which it overpoduces and uses to overcome normal proliferation controls during infection. The viral gene was called an oncogene, and the original rat (or human) version was called a proto-oncogene, named KRAS. The RAS proteins occupy a central part of the signaling path that external events and stresses turn on to activate cell growth and proliferation, called the MAP kinase cascade. For instance, epidermal growth factor comes along in the blood, binds to a receptor on the outside of a cell, and turns on RAS, then MEK, MAPK, and finally transcription regulators that turn on genes in the nucleus, resulting in new proteins being expressed. "Turning on" means different things at each step in this cascade. The transcription regulators typically get phosphorylated by their upstream kinases like MAPK, which tag them for physical transport into the nucleus, where they can then activate genes. MAPK is turned on by being itself phosphorylated by MEK, and MEK is phosphorylated by RAF. RAF is turned on by binding to RAS, whose binding activity in turn is regulated by the state of a nucleotide (GTP) bound by RAS. When binding GTP, RAS is on, but if binding GDP, it is off.

A schematic of the RAS pathway, whereby extracellular growth signals are interpreted and amplified inside our cells, resulting in new gene expression as well as other more immediate effects. The cell surface receptor, activated by its ligand, activates associated SOS which activates RAS to the active (GTP) state. This leads to a kinase cascade through RAF, MEK, and MAPK and finally to gene regulators like MYC.

This whole system seems rather ornate, but it accomplishes one important thing, which is amplification. One turned-on RAF molecule or MEK molecule can turn on / phosphorylate many targets, so this cascade, though it appears linear in a diagram, is acutally a chain reaction of sorts, amplifying as it goes along. And what governs the state of RAS and its bound GTP? The state of the EGFR receptor, of course. When KRAS is activated, the resident GDP leaves, and GTP comes to take its place. RAS is a weak GTPase enzyme itself, slowly converting itself from the active back to the inactive state with GDP. 

Given all this, one would think that RAS, and KRAS in particular, might be "druggable", by sticking some well-designed molecule into the GTP/GDP binding pocket and freezing it in an inactive state. But the sad fact of the matter is that the affinity KRAS has to GTP is incredibly high- so high it is hard to measure, with a binding constant of about 20 pM. That is, half the KRAS-bound GTP comes off when the ambient concentration of GTP is infinitesimal, 0.02 nano molar. This means that nothing else is likely to be designed that can displace GTP or GDP from the KRAS protein, which means that in traditional terms, it is "undruggable". What is the biological logic of this? Well, it turns out that the RAS enzymes are managed by yet other proteins, which have the specific roles of prying GDP off (GTP exchange factor, or GEF) and of activating the GTP-ase activity of RAS to convert GTP to GDP (GTPase activating protein, or GAP). It is the GEF protein that is stimulated by the receptors like EGFR that induce RAS activity. 

So we have to be cleverer in finding ways to attack this protein. Incidentally, most of the oncogenic mutations of KRAS are at the twelfth residue, glycine, which occupies a key part of the GAP binding site. As glycine is the smallest amino acid, any other amino acid here is bulkier, and blocks GAP binding, which means that KRAS with any of these mutations can not be turned off. It just keeps on signaling and signaling, driving the cell to think it needs to grow all the time. This property of gain of function and the ability of any mutation to fit the bill is why this particular defect in KRAS is such a common cancer-driving mutation. It accounts for ~90% of pancreatic cancers, for instance. 

The seminar went on a long tangent, which occupied the field (of those looking for ways to inhibit KRAS with drugs) for roughly a decade. RAS proteins are not intrinsically membrane proteins, but they are covalently modified with a farnesyl fatty tail, which keeps them stuck in the cell's plasma membrane. Indeed, if this modification is prevented, RAS proteins don't work. So great- how to prevent that? Several groups developed inhibitors of the farnesyl transferase enzyme that carries out this modification. The inhibitors worked great, since the farnesyl transferase has a nice big pocket for its large substrate to bind, and doesn't bind it too tightly. But they didn't inhibit the RAS proteins, because there was a backup system- geranygeranyl transferase that steps into the breach as a backup, which can attach an even bigger fatty tail to RAS proteins. Arghhh!

While some are working on inhibiting both enzymes, the presenter, Kevan Shokat of UCSF, went in another direction. As a chemist, he figured that for the fraction of the KRAS mutants at position 12 that transform from glycine to cysteine, some very specific chemistry (that is, easy methods of cross-linking), can be brought to bear. Given the nature of the genetic code, the fraction of mutations that go from glycine to cysteine are small, there being eight amino acids that are within a one-base change of glycine, coded by GGT. So at best, this approach is going to have a modest impact. Nevertheless, there was little choice, so they forged ahead with a complicated chemical scheme to make a small molecule that could chemically crosslink to that cysteine, with selectivity determined by a modest shape fit to the surface of the KRAS protein near this GEF binding site. 

A structural model of KRAS, with its extremely tightly-bound substrate GDP in orange. The drug sotorasib is below in teal, bound in another pocket, with a tail extending upwards to the (mutant) cysteine 12, which is not differentiated by color, but sits over a magnesium ion (green) being coordinated by GDP. The main job of sotorasib is to interfere with the binding of the guanine exchange factor (GEF) which happens on the surface to its left, and would reset KRAS to an active state.

This approach worked surprisingly well, as the KRAS protein obligingly offfered a cryptic nook that the chemists took advantage of to make this hybrid compound, now called the drug sotorasib. This is an FDA-approved treatment for cancers which are specifically driven by this particular KRAS mutation of position 12 from glycine to cysteine. That research group is currently trying to extend their method to other mutant forms, with modest success. 

So let's take a step back. This new treatment requires, obviously, the patient's tumor to be sequenced to figure out its molecular nature. That is pretty standard these days. And then, only a small fraction of patients will get the good news that this drug may help them. Lung cancers are the principal candidates currently, (of which about 15% have this mutation), while only about 1-2% of other cancers have this mutation. This drug has some toxicity- while it is a magic bullet, its magic is far from perfect, (which is odd given the exquisite selectivity it has for the mutated form of KRAS, which should only exist in cancer tissues). And lastly, it gives, on average, under six months of reprieve from cancer progression, compared to four and a half months with a more generic drug. As mentioned above, tumors at this stage are riven with other mutations and evolve resistence to this treatment with appalling relentlessness.

While it is great to have developed a new class of drugs like this one against a very recalcitrant target, and done so on a highly rational basis driven by our growing molecular knowlege of cancer biology, this result seems like a bit of a let-down. And note also that this achievement required decades of publicly funded research, and doubtless a billion dollars or more of corporate investment to get to this point. Costs are about twenty five thousand dollars per patient, and overall sales are maybe two hundred million dollars per year, expected to increase steadily.

Does this all make sense? I am not sure, but perhaps the important part is that things can not get worse. The patent on this drug will eventually expire and its costs will come down. And the research community will keep looking for other, better ways to attack hard targets like KRAS, and will someday succeed.


Saturday, May 27, 2023

Where Does Oxygen Come From?

Boring into the photosynthetic reaction center of plants, where O2 is synthesized.

Oxygen might be important to us, but it is really just a waste product. Photosynthetic bacteria found that the crucial organic molecules of life that they were making out of CO2 and storing in the form of reduced compounds (like fats and sugars) had to get those reducing units (i.e. electrons) from somewhere. And water stepped up as a likely candidate, with its abudance and simplicity. After you take four electrons away from two water molecules, you are left with four protons and one molecular oxygen molecule, i.e. O2. The protons are useful to fuel the proton-motive force system across the photosynthetic membrane, making ATP. But what to do with the oxygen? It just bubbles away, but can also be used later in metabolism to burn up those high-energy molecules again, if you have evolved aerobic metabolism.

On the early earth, reductants like reduced forms of iron and sulfur were pretty common, so they were the original sources of electrons for all metabolism. Indeed, most theories of the origin of life place it in dynamic rocky redox environments like hydrothermal vents that had such conducive chemistry. But these compounds are not quite common enough for universal photosynthesis. For example, a photosynthetic bacterium floating at the top of the ocean would like to continue basking in the sun and metabolizing, even if the water around it is relatively clear of reduced iron, perhaps because of competition from its colleagues. What to do? The cyanobacteria came up with an amazing solution- split water!

A general overview of plant and cyanobacterial photosystems, comprising the first (PSII), where the first light quantum hits and oxygen is split, an intervening electron transport chain where energy is harvested, and the second (PS1), where a second light quantum hits, more energy is harvested, and the electron ends up added to NADP. From the original water molecules, protons are used to power the membrane proton-motive force and ATP synthesis, while the electrons are used to reduce CO2 and create organic chemicals.

A schematic of the P680 center of photosystem II. Green chlorophylls are at the center, with magnesium atoms (yellow). Light induces electron movement as denoted by the red arrows, out of the chlorophyll center and onwards to other cytochrome molecules. Note that the electrons originate at the bottom out of the oxygen evolving complex, or OEC, (purple), and are transferred via an aromatic tyrosine (TyrZ) side chain, coordinating with a nearby histidine (H189) protein side chain.

This is not very easy, however, since oxygen is highly, even notoriously "electronegative". That is, it likes and keeps its electrons. It takes a super-oxidant to strip those electrons off. Cyanobacteria came up with what is now called photosystem II (that is, it was discovered after photosystem I), which collects light through a large quantum antenna of chlorophyll molecules, ending up at a special pairing of chlorophyll molecules called P680. These collect the photon, and in response bump an electron up in energy and out to an electron chain that courses through the rest of the photosynthetic system, including photosystem I. At this point, P680 is hungry for an electron, indeed has the extreme oxidation potential needed to take electrons from oxygen. And one is conducted in from the oxygen evolving center (OEC), sitting nearby.

A schematic illustrating both the evolutionary convergence that put both photosystems (types I and II) into one organism (cyanobacteria, which later become plant chloroplasts), and the energy levels acquired by the main actors in the photosynthetic process, quoted in electron volts. At the very bottom (center) is a brief downward slide as oxygen is split by the pulling force of the super-oxidation state of light-activated P680. After the electrons are light-excited, they drop down in orderly fashion through a series of electron chain transits to various cytochromes, quinones, ferredoxins, and other carriers that generate either protons or chemical reducing power as they go along. Note how the depth of the oxygen-splitting oxidation state is unique among photosynthetic systems.

A recent paper resolves the long-standing problem of how exactly oxygen is oxidized by cyanobacteria and plants at the OEC, at the very last step before oxygen release. This center is a very strained cubic metal complex of one calcium and four manganese atoms, coordinated by oxygen atoms. The overall process is that two water molecules come in, four protons and four electrons are stripped off, and the remaining oxygens combine to form O2. This is, again, part of the grand process of metabolism, whose point is to add those electrons and protons to CO2, making the organic molecules of life, generally characterized as (-CH2-), such as fats, sugars, etc. Which can be burned later back into CO2. Metals are common throughout organic chemistry as catalysts, because they have a wonderful property of de-localizing electrons and allowing multiple oxidation states, (number of extra or missing electrons), unlike the more sparse and tightly-held states of the smaller elements. So they are used in many redox cofactors and enzymes to facilitate electron movement, such as in chlorophyll itself.


The authors provide a schematic of the manganese-calcium OEC reaction center. The transferring tyrosine is at top, calcium is in fuschia/violet, the manganese atoms are in purple, and the oxygens are in red. Arrows point to the oxygens destined to bond to each other and "evolve" away as O2. Note how one of these (O6) is only singly-coordinated and is sort of awkwardly wedged into the cube. Note also how the bond lengths to calcium are all longer than those to manganese, further straining the cube. These strains help to encourage activation and expulsion of the target oxygens.

Here, in the oxygen evolving center, the manganese atoms are coordinated all around with oxygens, which presents the question- which ones are the ones? Which are destined to become O2, and how does the process happen? These researchers didn't use complicated femtosecond X-ray systems or cyclotrons, (though they draw on the structural work of those who did), but room-temperature FTIR, which is infrared spectroscopy highly sensitive to organic chemical dynamics. Spinach leaf chloroplasts were put through an hour of dark adaptation, (which sets the OEC cycle to state S1), then hit with flashes of laser light to advance the position of the oxygen evolving cycle, since each flash (5 nanoseconds) induces one electron ejection by P680, and one electron transfer out of the OEC. Thus the experimenters could control the progression of the whole cycle, one step at a time, and then take extremely close FTIR measurements of the complexes as they do their thing in response to each single electron ejection. Some of the processes they observed were very fast (20 nanoseconds), but others were pretty slow, up to 1.5 milliseconds for the S4 state to eject the final O2 and reset to the S0 state with new water molecules. They then supplement their spectroscopy with the structural work from others and with computer dynamics simulations of the core process to come up with a full mechanism.


A schematic of the steps of oxygen evolution out of the manganese core complex, from states S0 to S4. Note the highly diverse times that elapse at the various steps, noted in nano, micro, or milli seconds. This is discussed further in the text.


Other workers have provided structural perspectives on this question, showing that the cubic metal structure is bit more weird than expected. An extra oxygen (numbered as #6) wedges its way in the cube, making the already strained structure (which accommodates a calcium and a dangling extra manganese atom) highly stressed. This is a complicated story, so several figures are provided here to give various perspectives. The sequence of events is that first, (S0), two waters enter the reaction center after the prior O2 molecule has left. Water has a mix of acid (H+) and base (OH-) ionic forms, so it is easy to bring in the hydroxyl form instead of complete water, with matching protons quickly entering the proton pool for ATP production. Then another proton quickly leaves as well, so the waters have now become two oxygens, one hydrogen, and four electrons (S0). Two of the coordinated manganese atoms go from their prior +4, +4 oxidation state to +3 and +2, acting as electron buffers. 

The first two electrons are pulled out rapidly, via the nearby tyrosine ring, and off to the P680 center (ending at S2, with Mn 3+ and Mn 4+). But the next steps are much slower, extricating the last two electrons from the oxygens and inducing them to bond each other. With state S3 and one more electron removed, both manganese atoms are back to the 4+ state. In the last step, one last proton leaves and one last electron is extracted over to the tyrosine oxygen, and the oxygen 6 is so bereft as to be in a radical state, which allows it to bow over to oxygen 5 and bond with it, making O2. The metal complex has nicely buffered the oxidation states to allow these extractions to go much more easily and in a more coordinated fashion than can happen in free solution.

The authors provide a set of snapshots of their infrared spectroscopy-supported simulations (done with chemical and quantum fidelity) of the final steps, where oxygens, in the bottom panel, bond together at center. Note how the atomic positions and hydrogen attachments also change subtly as the sequence progresses. Here the manganese atoms are salmon, oxygen red, calcium yellow, hydrogen white, and a chlorine molecule is green.

This closely optimized and efficient reaction system is not just a wonder of biology and of earth history, but an object lesson in chemical technology, since photolysis of water is a very relevant dream for a sustainable energy future- to efficiently provide hydrogen as a fuel. Currently, using solar power to run water electrolyzers is not very efficient (20% for solar, and 70% for electrolysis = 14%). Work is ongoing to design direct light-to-hydrogen hydrolysis, but so far uses high heat and noxious chemicals. Life has all this worked out at the nano scale already, however, so there must be hope for better methods.


  • The US carried off an amazing economic success during the pandemic, keeping everything afloat as 22 million jobs were lost. This was well worth a bit of inflation on the back end.
  • Death
  • Have we at long last hit peak gasoline?
  • The housing crisis and local control.
  • The South has always been the problem.
  • The next real estate meltdown.