Showing posts with label chemistry. Show all posts
Showing posts with label chemistry. Show all posts

Saturday, April 8, 2023

Molecules That See

Being trans is OK: retinal and the first event of vision.

Our vision is incredible. If I was not looking right now and experiencing it myself, it would be unbelievable that a biological system made up of motley molecules could accomplish the speed, acuity and color that our visual system provides. It was certainly a sticking point for creationists, who found (and perhaps still find) it incredible that nature alone can explain it, not to mention its genesis out of the mists of evolutionary time. But science has been plugging away, filling in the details of the pathway, which so far appear to arise by natural means. Where consciousness fits in has yet to be figured out, but everything else is increasingly well-accounted. 

It all starts in the eye, which has a curiously backward sheet of tissue at the back- the retina. Its nerves and blood vessels are on the surface, and after light gets through those, it hits the photoreceptor cells at the rear. These photoreceptor cells come in two types, rods (non-color sensitive) and cones (sensitive to either red, green, or blue). The photoreceptor cells have a highly polarized and complicated structure, where photosensitive pigments are bottom-most in a dense stack of membranes. Above these is a segment where the mitochondria reside, which provide power, as vision needs a lot of energy. Above these is the nucleus of the cell (the brains of the operation) and top-most is the synaptic output to the rest of the nervous system- to those nerves that network on the outside of the retina. 

A single photoreceptor cell, with the outer segment at the very back of the retina, and other elements in front.

Facing the photoreceptor membranes at the bottom of the retina is the retinal pigment epithelium, which is black with melanin. This is finally where light stops, and it also has very important functions in supporting the photoreceptor cells by buffering their ionic, metabolic, and immune environment, and phagocytosing and digesting photoreceptor membranes as they get photo-oxidized, damaged, and sloughed off. Finally, inside the photoreceptor cells are the pigment membranes, which harbor the photo-sensitive protein rhodopsin, which in turn hosts the sensing pigment, retinal. Retinal is a vitamin A-derived long-chain molecule that is bound inside rhodopsin or within other opsins which respectively confer slightly shifted color sensitivity. 

These opsins transform the tickle that retinal receives from a photon into a conformational change that they, as GPCRs (G-protein coupled receptors), transmit to G-proteins, called transducin. For each photon coming in, about 50 transducin molecules are activated. Each of activated transducin G-protein alpha subunits induce (in its target cGMP phosphodisterase) about 1000 cGMP molecules to be consumed. The local drop in cGMP concentration then closes the cGMP-gated cation channels in the photoreceptor cell membrane, which starts the electrical impulse that travels out to the synapse and nervous system. This amplification series provides the exquisite sensitivity that allows single photons to be detected by the system, along with the high density of the retinal/opsin molecules packed into the photoreceptor membranes.

Retinal, used in all photoreceptor cell types. Light causes the cis-form to kick over to the trans form, which is more stable.

The central position of retinal has long been understood, as has the key transition that a photon induces, from cis-retinal to all-trans retinal. Cis-retinal has a kink in the middle, where its double bond in the center of the fatty chain forms a "C" instead of a "W", swinging around the 3-carbon end of the chain. All-trans retinal is a sort of default state, while the cis-structure is the "cocked" state- stable but susceptible to triggering by light. Interestingly, retinal can not be reset to the cis-state while still in the opsin protein. It has to be extracted, sent off to a series of at least three different enzymes to be re-cocked. It is alarming, really, to consider the complexity of all this.

A recent paper (review) provided the first look at what actually happens to retinal at the moment of activation. This is, understandably, a very fast process, and femtosecond x-ray analysis needed to be brought in to look at it. Not only that, but as described above, once retinal flips from the dark to the light-activated state, it never reverses by itself. So every molecule or crystal used in the analysis can only be used once- no second looks are possible. The authors used a spray-crystallography system where protein crystals suspended in liquid were shot into a super-fine and fast X-ray beam, just after passing by an optical laser that activated the retinal. Computers are now helpful enough that the diffractions from these passing crystals, thrown off in all directions, can be usefully collected. In the past, crystals were painstakingly positioned on goniometers at the center of large detectors, and other issues predominated, such as how to keep such crystals cold for chemical stability. The question here was what happens in the femto- and pico-seconds after optical light absorption by retinal, ensconced in its (temporary) rhodopsin protein home.

Soon after activation, at one picosecond, retinal has squirmed around, altering many contacts with its protein. The trans (dark) conformation is shown in red, while the just-activated form is in yellow. The PSB site on the far end of the fatty chain (right) is secured against the rhodopsin host, as is the retinal ring (left side), leaving the middle of the molecule to convey most of the shape change, a bit like a bicycle pedal.

And what happens? As expected, the retinal molecule twists from cis to trans, causing the protein contacts to shift. The retinal shift happens by 200 femtoseconds, and the knock-on effects through the protein are finished by 100 picoseconds. It all makes a nanosecond seem impossibly long! As imaged above, the shape shift of retinal changes a series of contacts it has with the rhodopsin protein, inducing it to change shape as well. The two ends of the retinal molecule seem to be relatively tacked down, leaving the middle, where the shape change happens, to do most of the work. 

"One picosecond after light activation, rhodopsin has reached the red-shifted Batho-Rh intermediate. Already by this early stage of activation, the twisted retinal is freed from many of its interactions with the binding pocket while structural perturbations radiate away as a transient anisotropic breathing motion that is almost entirely decayed by 100 ps. Other subtle and transient structural rearrangements within the protein arise in important regions for GPCR activation and bear similarities to those observed by TR-SFX during photoactivation of seven-TM helix retinal-binding proteins from bacteria and archaea."

All this speed is naturally lost in the later phases, which take many milliseconds to send signals to the brain, discern movement and shape, to identify objects in the scene, and do all the other processing needed before consciousness can make any sense of it. But it is nice to know how elegant and uniform the opening scene in this drama is.


  • Down with lead.
  • Medicare advantage, cont.
  • Ukraine, cont.
  • What the heck is going on in Wisconsin?
  • Graph of the week- world power needs from solar, modeled to 2050. We are only scratching the surface so far.



Saturday, March 11, 2023

An Origin Story for Spider Venom

Phylogenetic analysis shows that the major component of spider venom derives from one ancient ancestor.

One reason why biologists are so fully committed to the Darwinian account of natural selection and evolution is that it keeps explaining and organizing what we see. Despite the almost incredible diversity and complexity of life, every close look keeps confirming what Darwin sensed and outlined so long ago. In the modern era, biology has gone through the "Modern Synthesis", bringing genetics, molecular biology, and evolutionary theory into alignment with mutually supporting data and theories. For example, it was Linus Pauling and colleagues (after they lost the race to determine the structure of DNA) who proposed that the composition of proteins (hemoglobin, in their case) could be used to estimate evolutionary relationships, both among those molecules, and among their host species.

Naturally, these methods have become vastly more powerful, to the point that most phylogenetic analyses of the relationship between species (including the definition of what species are, vs subspecies, hybrids, etc.) are led these days by DNA analysis, which provides the richest possible trove of differentiating characters- a vast spectrum from universally conserved to highly (and forensically) varying. And, naturally, it also constitutes a record of the mutational steps that make up the evolutionary process. The correlation of such analyses with other traditionally used diagnostic characters, and with the paleontological record, is a huge area of productive science, which leads, again and again, to new revelations about life's history.


One sample structure of a DRP- the disulfide rich protein that makes up most of spider venoms.
 The disulfide bond (between two cysteines) is shown in red. There is usually another disulfide helping to hold the two halves of the molecule together as well. The rest of the molecule is (evolutionarily, and structurally) free to change shape and character, in order to carry out its neuron-channel blocking or other toxic function.

One small example was published recently, in a study of spider venoms. Spiders arose, from current estimates, about 375 million years ago, and comprise the second most prevalent form of animal life, second only to their cousins, the insects. They generally have a hunting lifestyle, using venom to immobilize their prey, after capture and before digestion. These venoms are highly complex brews that can have over a hundred distinct molecules, including potassium, acids, tissue- and membrane-digesting enzymes, nucleosides, pore-forming peptides, and neurotoxins. At over three-fourths of the venom, the protein-based neurotoxins are the most interesting and best studied of the venom components, and a spider typically deploys dozens of types in its venom. They are also called cysteine-rich peptides or disulfide-rich peptides (DRPs) due to their composition. The fact that spiders tend to each have a large variety of these DRPs in their collection argues that a lot of gene duplication and diversification has occured.

A general phylogenetic tree of spiders (left). On the right are the signal peptides of a variety of venoms from some of these species. The identity of many of these signal sequences, which are not present in the final active protein, is a sign that these venom genes were recently duplicated.

So where do they come from? Sequences of the peptides themselves are of limited assistance, being small, (averaging ~60 amino acids), and under extensive selection to diversify. But they are processed from larger proteins (pro-proteins) and genes that show better conservation, providing the present authors more material for their evolutionary studies. The figure above, for example, shows, on the far right, the signal peptides from families of these DRP genes from single species. Signal peptides are the small leading section of a translated protein that directs it to be secreted rather than being kept inside the cell. Right after the protein is processed to the right place, this signal is clipped off and thus is not part of the mature venom protein. These signal peptides tend to be far more conserved than the mature venom protein, despite that fact that they have little to do- just send the protein to the right place, which can be accomplished by all sorts of sequences. But this is a sign that the venoms are under positive evolutionary pressure- to be more effective, to extend the range of possible victims, and to overcome whatever resistance the victims might evolve against them. 

Indeed, these authors show specifically that strong positive selection is at work, which is one more insight that molecular data can provide. (First, by comparing the rates of protein-coding positions that are neutral via the genetic code (synonymous) vs those that make the protein sequence change (non-synonymous), and second by the pattern and tempo of evolution of venom sequences compared with the mass of neutral sequences of the species.

"Given their significant sequence divergence since their deep-rooted evolutionary origin, the entire protein-coding gene, including the signal and propeptide regions, has accumulated significant differences. Consistent with this hypothesis, the majority of positively selected sites (~96%) identified in spider venom DRP toxins (all sites in Araneomorphae, and all but two sites in Mygalomorphae) were restricted to the mature peptide region, whereas the signal and propeptide regions harboured a minor proportion of these sites (1% and 3%, respectively)."

 

Phylogenetic tree (left), connecting up venom genes from across the spider phylogeny. On right, some of the venom sequences are shown just by their cysteine (C) locations, which form the basic structural scaffold of these proteins (top figure).


The more general phyogenetic analysis from all their sequences tells these authors that all the venom DRP genes, from all spider species, came from one origin. One easy way to see this is in the image above on the right, where just the cysteine scaffold of these proteins from around the phylogeny are lined up, showing that this scaffold is very highly conserved, regardless of the rest of the sequence. This finding (which confirms prior work) is surprising, since venoms of other animals, like snakes, tend to incorporate a motley bunch of active enzymes and components, sourced from a variety of ancestral sources. So to see spiders sticking so tenaciously to this fundamental structure and template for the major component of their venom is impressive- clearly it is a very effective molecule. The authors point out the cone snails, another notorious venom-maker, originated much more recently, (about 45 million years ago), and shows the same pattern of using one ancestral form to evolve a diversified blizzard of venom components, which have been of significant interest to medical science.


  • Example: a spider swings a bolas to snare a moth.

Saturday, January 7, 2023

A New Way of Doing Biology

Structure prediction of proteins is now so good that computers can do a lot of the work of molecular biology.

There are several royal roads to knowledge in molecular biology. First, and most traditional, is purification and reconstitution of biological molecules and the processes they carry out, in the test tube. Another is genetics, where mutational defects, observed in whole-body phenotypes or individually reconstituted molecules, can tell us about what those gene products do. Over the years, genetic mapping and genomic sequencing allowed genetic mutations to be mapped to precise locations, making them increasingly informative. Likewise, reverse genetics became possible, where mutational effects are not generated randomly by chemical or radiation treatment of organisms, but are precisely engineered to find out what a chosen mutation in a chosen molecule could reveal. Lastly, structural biology contributed the essential ground truth of biology, showing how detailed atomic interactions and conformations lead to the observations made at higher levels- such as metabolic pathways, cellular events, and diseases. The paradigmatic example is DNA, whose structure immediately illuminated its role in genetic coding and inheritance.

Now the protein structure problem has been largely solved by the newest generations of artificial intelligence, allowing protein sequences to be confidently modeled into the three dimensional structures they adopt when mature. A recent paper makes it clear that this represents not just a convenience for those interested in particular molecular structures, but a revolutionary new way to do biology, using computers to dig up the partners that participate in biological processes. The model system these authors chose to show this method is the bacterial protein export process, which was briefly discussed in a recent post. They are able to find and portray this multi-step process in astonishing detail by relying on a lot of past research including existing structures and the new AI searching and structure generation methods, all without dipping their toes into an actual lab.

The structure revolution has had two ingredients. First is a large corpus of already-solved structures of proteins of all kinds, together with oceans of sequence data of related proteins from all sorts of organisms, which provide a library of variations on each structural theme. Second is the modern neural networks from Google and other institutions that have solved so many other data-intensive problems, like language translation and image matching / searching. They are perfectly suited to this problem of "this thing is like something else, but not identical". This resulted in the AlphaFold program, which has pretty much solved the problem of determining the 3D structure of novel protein sequences.

"We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14), demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods."

The current authors realized that the determination of protein structures is not very different from the determination of complex structures- the structure of interfaces and combinations between different proteins. Many already-solved structures are complexes of several proteins, and more fundamentally, the way two proteins interact is pretty much the same as the way that a protein folds on itself- the same kinds of detailed secondary motif and atomic complementarity take place. So they used the exact AlphaFold core to create AF2Complex, which searches specifically through a corpus of protein sequences for those that interact in real life.

This turned out to be a very successful project, (though a supercomputer was required), and they now demonstrate it for the relatively simple case of bacterial protein export. The corpus they are working with is about 1500 E. coli periplasmic and membrane proteins. They proceed step by step, asking what interacts with the first protein in the sequence, then what interacts with the next one, etc., till they hit the exporter on the outer membrane. While this sequence has been heavily studied and several structures were already known, they reveal several new structures and interactions as they go along. 

Getting proteins from inside the cell to outside is quite complicated, since they have to traverse two membranes and the intermembrane space, (periplasm), all without getting fouled up or misdirected. This is done by an organized sequence of chaperone and transport proteins that hand the new proteins off to each other. Proteins are recognized by this machinery by virtue of sequence-encoded signals, typically at their front/leading ends. This "export signal" is recognized, in some instances, right as it comes out of the ribosome and captured by the SecA/B/E/Y/G machinery at the inner bacterial membrane. But most exported proteins are not recognized right away, but after they are fully synthesized.

The inner membrane (IM) is below, and the outer membrane (OM) is above, showing the steps of bacterial protein export to the outer membrane. The target protein being transported is the yellow thread, (OmpA), and the various exporting machines are shown in other colors, either in cartoon form or in ribbon structures from the auther's computer predictions. Notably, SurA is the main chaperone that carries OmpA in partially unfolded form across the periplasm to the outer membrane.

SecA is the ATP-using pump that forces the new protein through the SecY channel, which has several other accessory partners. SecB, for example, is thought to be mostly responsible for recognizing the export signal on the target protein. The authors start with a couple of accessory chaperones, PpiD and YfgM, which were strongly suspected to be part of the SecA/B/E/Y/G complex, and which their program easily identifies as interacting with each other, and gives new structures for. PpiD is an important chaperone that helps proline amino acids twist around, (a proline isomerase), which they do not naturally do, helping the exporting proteins fold correctly as they emerge. It also interacts with SecY, providing chaperone assistance (that is, helping proteins fold correctly) right as proteins pass out of SecY and into the periplasm. The second step the authors take is to ask what interacts with PpiD, and they find DsbA, with its structure. This is a disulfide isomerase, which performs another vital function of shuffling the cysteine bonds of proteins coming into the periplasmic space, (which is less reducing than the cytoplasm), and allows stable cysteine bonds to form. This is one more essential chaperone-kind of function needed for relatively complicated secreted proteins. Helping them form at the right places is the role of DsbA, which transiently docks right at the exit port from SecY. 

The author's (computers) generate structures for the interactions of the Sec complex with PpiD, YfgM, and the disulfide isomerase DbsA, illuminating their interactions and respective roles. DbsA helps refold proteins right when then come out of the transporter pore, from the cytoplasm.

Once the target protein has all been pumped through the SecY complex pore, it sticks to PpiD, which does its thing and then dissociates, allowing two other proteins to approach, the signal peptidase LepB, which cleaves off the export signal, and then SurA, which is the transporting chaperone that wraps the new protein around itself for the trip across the periplasm. Specific complex structures and contacts are revealed by the authors for all these interactions. Proteins destined for the outer membrane are characterized by a high proportion of hydrophobic amino acids, some of which seem to be specifically recognized by SurA, to distinguish them from other proteins whose destination is simply to swim around in the periplasm, such as the DsbA protein mentioned above. 

The author's (computers) spit out a ranking of predicted interactions using SurA as a query, and find itself as one protein that interacts (it forms a dimer), and also BamA, which is the central part of the outer membrane transporting pore. Nothing was said about the other high-scoring interacting proteins identified, which may not have had immediate interest.

"In the presence of SurA, the periplasmic domain [of transported target protein OmpA] maintains the same fold, but remarkably, the non-native β-barrel region completely unravels and wraps around SurA ... the SurA/OmpA models appear physical and provide a hypothetical basis for how the chaperone SurA could prevent a polypeptide chain from aggregating and present an unfolded polypeptide to BAM for its final assembly."

At the other end of the journey, at the outer membrane, there is another channel protein called BamA, where SurA docks, as was also found by the author's interaction hunting program. BamA is part of a large channel complex that evidently receives many other proteins via its other periplasmic-facing subunits, BamB, C, and D. The authors went on to do a search for proteins that interact with BamA, finding BepA, a previously unsuspected partner, which, by their model, wedges itself in between BamC and BamB. BepA, however, turns out to have a crucial function in quality control. Conduction of target proteins through the Bam complex seems to be powered only by diffusion, not by ATP or ion gradients. So things can get fouled up and stuck pretty easily. BepA is a protease, and appears, from its structure, to have a finger that gets flipped and turns the protease on when a protein transiting through the pore goes awry / sideways. 


The author's (computers) provide structures of the outer membrane Bam complex, where SurA binds with its cargo. The cargo , unstructured, is not shown here, but some of the detailed interface between SurA and BamA is shown at bottom left. The beta-barrel of BamA provides the obvious route out of the cell, or in some cases sideways into the membrane.

While filling in some new details of the outer membrane protein export system is interesting, what was really exciting about this paper was the ease with which this new way of doing biology went forth. Intimate physical interactions among proteins and other molecules are absolutely central to molecular biology, as this example illustrates. To have a new method that not only reveals such interactions in a reliable way, from sequences of novel proteins, but also presents structurally detailed views of them, is astonishing. Extending this to bigger genomes and collections of targets, vs the relatively small 1500 periplasmic-related proteins tested here remains a challenge, but doubtless one that more effort and more computers will be able to solve.


Saturday, December 10, 2022

Mechanics of the ATP Synthesizing Machine

ATP sythase is a generator with two rotors, just like any other force-transducing generator.

Protein structural determination has progressed tremendously, with the advent of cryo-electron microscopy which allows much faster determinations of more complex structures than previously. One beneficiary is the enzyme at the heart of the mitochondrion that harnesses the proton motive force (pmf; difference of pH and charge across the inner mitochondrial membrane) to make ATP. The pmf is created by the electron transport chains of respiration, powered by the breakdown of our food, and ATP is the most general currency of energy in our cells. And in bacteria as well. The work discussed today was all done using E. coli, which in this ancient and highly conserved respect is a very close stand-in for our own biology.

The ATP synthase is rotary device. Just like a water wheel has one wheel that harnesses a running stream, linked by gears or other mechanism to a second wheel that grinds corn, or generates electricity, the ATP synthase has one wheel that is powered by protons flowing inwards, linked to another wheel that synthesizes ATP. The second wheel doesn't turn. Rather, the linking rotor from the proton wheel (called Fo) has an asymmetric cam at the end that pokes into the center of the ATP synthase wheel, (called F1), and deforms that second wheel as it rotates around inside. The deformations are what induces the ATP sythase to successively (1) bind ADP and phosphate, (2) close access and join them together into ATP, and lastly (3) release the ATP back out. This wheel has three sections, thus one turn yields three ATPs, and it takes 120 degrees of turn to create one ATP. This mechanism is nicely illustrated in a few videos.

The ATP synthase has several parts. The top rotor (yellow, orange; proton rotor, or "c" rotor) is embedded in the inner mitochondrial membrane, and rotates as it conducts protons from outside (top) inwards. The center rotor (white, red) is attached to it and also rotates as it sticks into the bottom ATP synthesizing subunits (green, khaki). That three-fold symmetric protein complex is static, (held in place by the non-moving stator subunits (blue, teal), and synthesizes ATP as its conformation is progressively banged around by the rotor. At the bottom are diagrams of the ATP generating strokes (three per rotation), with pauses (green) reflecting the strain of synthesizing ATP. All this was detected from the single molecules tracked by polarized light coming from the polarizing gold rods attached to the proton rotor (AuNR- gold nano rod).


Some recent papers focus on the other end of the machine- the proton rotor. It has ten subunits, (termed "c", so this is also called the c rotor), each of which binds a proton. Thus the ultimate stoichiometry is that 10 protons yield 3 ATP, for a 3.33 protons per ATP efficiency. (The pH difference needs to be about 3 units, or 1000 to 5000 fold in proton concentration, to create sufficient pmf.) But there are certain asymmetries involved. For one, there is a "stator" that holds the ATP synthetase stable vs the proton rotor and spans across them, attaching stably to the former and gliding along the rotations of the latter. This stator creates some variation in how the rotors at both ends operate. Also, the 10:3 ratio means that some power strokes that force the ATP sythase along will behave differently, either with more power at the beginning or at the end of the 120 degree arc. 

These papers posit that there is enough flexibility in the linkage to smooth out these ebbs and flows. Within the stator is a critical subunit ("a") which conducts the protons in both directions, both from outside onto the "c" rotor, and then off the "c" rotor and into the inner mitochondrial matrix. Interestingly, the protein rotor of "c" subunits ferries those protons all the way around, so that they come in and go back off at nearly the same point, at the "a" subunit surface. This means that they are otherwise stably bound to the proton rotor as it flies around in the membrane, a hydrophobic environment that presumably offers no encouragement for those protons to leave. So in summary, the protons from outside (the intermembrane space of the mitochondrion) enter by the outer "a" channel, then land on one of the proton rotor's "c" subunits, take one trip around the rotor, and then exit off via the inner "a" channel.

One question is the nature of these channels. There are, elsewhere in biology, channels that find ways to conduct protons in specific fashion, despite their extremely small size and similarity to other cations like sodium and potassium. But a more elegant way has been devised, called the Grotthuss mechanism. The current authors conduct extensive analysis of key mutations in these channels to show that this mechanism is used by the "a" subunit of the Fo protein. By this mechanism, a chain of water molecules are very carefully lined up through the protein. The natural hydrogen exchange property of water, by which the pH character and so many other properties of water occur, then allow an incoming proton to create a chain reaction of protonations and de-protonations along the water chain (nicely illustrated on the Wikipedia page) that, without really moving any of the water molecules, (or requiring much movement of the protons either), effectively conducts a net proton inwards with astonishing efficiency.

It is evident that the interface of the "a" and "c" subunits is such that a force-fed sequence of protons creates power that induces the rotation and eventually through the rotor linkage, the energy to synthesize ATP against its concentration gradient. It should be said parenthetically that this enzyme complex can be driven in reverse, and E. coli do occasionally use up ATP in reverse to re-establish their pmf gradient, which is used for many other processes.

One techical note is of interest. The authors of the main paper used single molecules of the whole ATP sythase, embedded in nano-membranes that they could observe optically and treat with different pH levels on each site to drive their activity. They also attached tiny gold bars (35 × 75 nm) to the top of each proton rotor to track its rotation by polarized light. This allowed very fine observations, which they used to look at the various pauses induced by the jump of each ATP synthesis event, and of each proton as it hopped on/off. Then they mutated selected amino acids in the supposed water channels that conduct proteins through the "a" subunit, which created greater delays, diagnostic of the Grotthuss mechanism. The channel is not lined with ions or ionizable groups, but is simply polar to accommodate a string of waters threading through the membrane and the "a" protein. Additionally, they estimate an "antenna" of considerable size composed of a "b" subunit and some of the "a" subunit of Fo that is exposed to the outside and by its negatively charged nature attracts and lines up plenty of protons, ready to transit through the rotor.

Another presentation of the proton rotor behavior. The stator "a" subunit is orange, and the "c" subunits are circles arranged in a rotor, seen from the top. The graph at right shows some of the matches or mismatches between the three-fold ATP synthesizing rotor (F1) and the ten-fold symmetric proton rotor (Fo, or "c"), leading to quite variable coupling of their power strokes. Yet there is enough elastic give in their coupling to allow continuous and reasonably rapid rotation (100 / sec).

In the end, incredible technical feats of optics, chemistry, and molecular biology are needed to decipher increasing levels of detail about the incredible feat of evolution that is embodied in this tiny powerhouse.


Saturday, November 5, 2022

LPS: Bacterial Shield and Weapon

Some special properties of the super-antigen lipopolysaccharide.

Bacteria don't have it easy in their tiny Brownian world. While we have evolved large size and cleverness, they have evolved miracles of chemistry and miniaturization. One of the key classifications of bacteria is between Gram positive and negative, which refers to the Gram stain. This chemical stains the peptidoglycan layer of all bacteria, which is their "cell wall", wrapped around the cell membrane and providing structural support against osmotic pressure. For many bacteria, a heavy layer of peptidoglycan is all they have on the outside, and they stain strongly with the Gram stain. But other bacteria, like the paradigmatic E. coli, stain weakly, because they have a thin layer of peptidoglycan, outside of which is another membrane, the outer membrane (OM, whereas the inner membrane is abbreviated IM).

Structure of the core of LPS, not showing the further "poly" saccharide tails that would go upwards, hitched to the red sugars. At bottom are the lipid tails that form a strong membrane barrier. These, plus the blue sugar core, form the lipid-A structure that is highly antigenic.

This outer membrane doesn't do much osmotic regulation or active nutrient trafficking, but it does face the outside world, and for that, Gram-negative bacteria have developed a peculiar membrane component called lipopolysaccharide, or LPS for short. The outer membrane is assymetric, with normal phospholipids used for the inner leaflet, and LPS used for the outside leaflet. Maintaining such assymetry is not easy, requiring special "flippases" that know which side is which and continually send the right lipid type to its correct side. LPS is totally different from other membrane lipids, using a two-sugar core to hang six lipid tails (a structure called lipid-A), which is then decorated with chains of additional sugars (the polysaccharide part) going off in the other direction, towards the outside world.

The long, strange trip that LPS takes to its destination. Synthesis starts on the inner leaflet of the inner membrane, at the cytoplasm of the bacterial cell. The lipid-A core is then flipped over to the outer leaflet, where extra sugar units are added, sometimes in great profusion. Then a train of proteins (Lpt-A,B,C,D,E,D) extract the enormous LPS molecule out of the inner membrane, ferry it through the periplasm, through the peptidoglycan layer, and through to the outer leaflet of the outer membrane.

A recent paper provided the structural explanation behind one transporter, LptDE, from Neisseria gonerrhoeae. This is the protein that receives LPS from a its synthesis inside the cell, after prior transport through the inner membrane and inter-membrane space (including the peptidoglycan layer), and places LPS on the outer leaflet of the outer membrane. It is an enormous barrel, with a dynamic crack in its side where LPS can squeeze out, to the right location. It is a structure that explains neatly how directionality can be imposed on this transport, which is driven by ATP hydrolysis (by LtpB) at the inner membrane, that loads a sequence of transporters sending LPS outward.

Some structures of LptD (teal or red), and LPS (teal, lower) with LptE (yellow), an accessory protein that loads LPS into LptD. This structure is on the outer leaflet of the outer membrane, and releases LPS (bottom center) through its "lateral gate" into the right position to join other LPS molecules on the outer leaflet.

LPS shields Gram-negative bacteria from outside attack, particularly from antibiotics and antimicrobial peptides. These are molecules made by all sorts of organisms, from other bacteria to ourselves. The peptides typically insert themselves into bacterial membranes, assemble into pores, and kill the cell. LPS is resistant to this kind of attack, due to its different structure from normal phospholipids that have only two lipid tails each. Additionally, the large, charged sugar decorations outside fend off large hydrophobic compounds. LPS can be (and is) altered in many additional ways by chemical modifications, changes to the sugar decorations, extra lipid attachments, etc. to fend off newly evolved attacks. Thus LPS is the result of a slow motion arms race, and differs in its detailed composition between different species of bacteria. One way that LPS can be further modified is with extra hydrophobic groups such as lipids, to allow the bacteria to clump together into biofilms. These are increasingly understood as a key mode of pathogenesis that allow bacteria to both physically stick around in very dangerous places (such as catheters), and also form a further protective shield against attack, such as by antibiotics or whatever else their host throws at them.

In any case, the lipid-A core has been staunchly retained through evolution and forms a super-antigen that organisms such as ourselves have evolved to sense at incredibly low levels. We encode a small protein, called LY96 (or MD-2), that binds the lipid-A portion of LPS very specifically at infinitesimal concentrations, complexes with cell surface receptor TLR4, and sets off alarm bells through the immune system. Indeed, this chemical was originally called "endotoxin", because cholera bacteria, even after being killed, caused an enormous and toxic immune response- a response that was later, through painstaking purification and testing, isolated to the lipid-A molecule.

LPS (in red) as it is bound and recognized by human protein MD-2 (LY96) and its complex partner TLR4. TLR4 is one of our key immune system alarm bells, detecting LPS at picomolar levels. 

LPS is the kind of antigen that is usually great to detect with high sensitivity- we don't even notice that our immune system has found, moved in, and resoved all sorts of minor infections. But if bacteria gain a foothold in the body and pump out a lot of this antigen, the result can be overwhelmingly different- cytokine storm, septic shock, and death. Rats and mice, for instance, have a fraction of our sensitivity to LPS, sparing them from systemic breakdown from common exposures brought on by their rather more gritty lifestyles.


  • Econometrics gets some critique.
  • Clickbait is always bad information, but that is the business model.
  • Monster bug wars.
  • Customer-blaming troll due to lose a great deal of money.

Saturday, October 15, 2022

From Geo-Logic to Bio-Logic

Why did ATP become the central energy currency and all-around utility molecule, at the origin of life?

The exploration of the solar system and astronomical objects beyond has been one of the greatest achievements of humanity, and of the US in particular. We should be proud of expanding humanity's knowledge using robotic spacecraft and space-based telescopes that have visited every planet and seen incredibly far out in space, and back in time. But one thing we have not found is life. The Earth is unique, and it is unlikely that we will ever find life elsewhere within traveling distance. While life may concievably have landed on Earth from elsewhere, it is more probable that it originated here. Early Earth had as conducive conditions as anywhere we know of, to create the life that we see all around us: carbon-based, water-based, precious, organic life.

Figuring out how that happened has been a side-show in the course of molecular biology, whose funding is mostly premised on medical rationales, and of chemistry, whose funding is mostly industrial. But our research enterprise thankfully has a little room for basic research and fundamental questions, of which this is one of the most frustrating and esoteric, if philosphically meaningful. The field has coalesced in recent decades around the idea that oceanic hydrothermal vents provided some of the likeliest conditions for the origin of life, due to the various freebies they offer.

Early earth, as today, had very active geology that generated a stream of reduced hydrogen and other compounds coming out of hydrothermal vents, among other places. There was no free oxygen, and conditions were generally reducing. Oxygen was bound up in rocks, water, and CO2. The geology is so reducing that water itself was and still is routinely reduced on its trip through the mantle by processes such as serpentinization.

The essential problem is how to jump the enormous gap from the logic of geology and chemistry, over to the logic of biology. It is not a question of raw energy- the earth has plenty of energetic processes, from vocanoes and tectonics to incoming solar energy. The question is how a natural process that has resolutely chemical logic, running down the usual chemical and physical gradients from lower to higher entropy, could have generated the kind of replicating and coding molecular system where biological logic starts. A paper from 2007 gives a broad and scrupulous overview of the field, featuring detailed arguments supporting the RNA world as the probable destination (from chemical origins) where biological logic really began. 

To rehearse very briefly, RNA has, and still retains in life today, both coding capacity and catalytic capacity, unifying in one molecule the most essential elements of life. So RNA is thought to have been the first molecule with truly biological ... logic, being replaced later with DNA for some of its more sedentary roles. But there is no way to get to even very short RNA molecules without some kind of metabolic support. There has to be an organic soup of energy and small organic molecules- some kind of pre-biological metabolism- to give this RNA something to do and chemical substituents to replicate itself out of. And that is the role of the hydrothermal vent system, which seems like a supportive environment. For the trick in biology is that not everything is coded explicitly. Brains are not planned out in the DNA down to their crenelations, and membranes are not given size and weight blueprints. Biology relies heavily on natural chemistry and other unbiological physical processes to channel its development and ongoing activity. The coding for all this, which seems so vast with our 3 Gb genome, is actually rather sparse, specifying some processes in exquisite detail, (large proteins, after billions of years of jury-rigging, agglomeration, and optimization), while leaving a tremendous amount still implicit in the natural physical course of events.

A rough sketch of the chemical forces and gradients at a vent. CO2 is reduced into various simple organic compounds at the rock interfaces, through the power of the incoming hydrogen rich (electron-rich) chemicals. Vents like this can persist for thousands of years.

So the origin of life does not have to build the plane from raw aluminum, as it were. It just has to explain how a piece of paper got crumpled in a peculiar way that allowed it to fly, after which evolution could take care of the rest of the optimization and elaboration. Less metaphorically, if a supportive chemical environment could spontaneously (in geo-chemical terms) produce an ongoing stream of reduced organic molecules like ATP and acyl groups and TCA cycle intermediates out of the ambient CO2, water, and other key elements common in rocks, then the leap to life is a lot less daunting. And hydrothermal vents do just that- they conduct a warm and consistent stream of chemically reduced (i.e. extra electrons) and chemical-rich fluid out of the sea floor, while gathering up the ambient CO2 (which was highly concentrated on the early Earth) and making it into a zoo of organic chemicals. They also host the iron and other minerals useful in catalytic conversions, which remain at the heart of key metabolic enzymes to this day. And they also contain bubble-like stuctures that could have confined and segregated all this activity in pre-cellular forms. In this way, they are thought to be the most probable locations where many of the ingredients of life were being generated for free, making the step over to biological logic much less daunting than was once thought.

The rTCA cycle, portrayed in the reverse from our oxidative version, as a cycle of compounds that spontaneously generate out of simple ingredients, due to their step-wise reduction and energy content values. The fact that the output (top) can be easily cleaved into the inputs provides a "metabolic" cycle that could exist in a reducing geological setting, without life or complicated enzymes.

The TCA cycle, for instance, is absolutely at the core of metabolism, a flow of small molecules that disassemble (or assemble, if run in reverse) small carbon compounds in stepwise fashion, eventually arriving back at the starting constituents, with only outputs (inputs) of hydrogen reduction power, CO2, and ATP. In our cells, we use it to oxidize (metabolize) organic compounds to extract energy. Its various stations also supply the inputs to innumerable other biosynthetic processes. But other organisms, admittedly rare in today's world, use it in the forward direction to create organic compounds from CO2, where it is called reductive or reverse (rTCA). An article from 2004 discusses how this latter cycle and set of compounds very likely predates any biological coding capacity, and represents an intrisically natural flow of carbon reduction that would have been seen in a pre-biotic hydrothermal vent setting. 

What sparked my immediate interest in all this was a recent paper that described experiments focused on showing why ATP, of all the other bases and related chemicals, became such a central part of life's metabolism, including as a modern accessory to the TCA cycle. ATP is the major energy currency in cells, giving the extra push to thousands of enzymes, and forming the cores of additional central metabolic cofactors like NAD (nicotine adenine dinucleotide), and acetyl-CoA (the A is for adenine), and participating as one of the bases of DNA and RNA in our genetic core processes. 

Of all nucleoside diphosphates, ADP is most easily converted to ATP in the very simple conditions of added acyl phosphate and Fe3+ in water, at ambient temperatures or warmer. Note that the trace for ITP shows the same absorbance before and after the reaction. The others show no reaction either. Panel F shows a time course of the ADP reaction, in hours. The X axis refers to time of chromatography of the sample, not of the reaction.

Why ATP, and not the other bases, or other chemicals? Well, bases appear as early products out of pre-biotic reaction mixtures, so while somewhat complicated, they are a natural part of the milieu. The current work compares how phosphorylation of all the possible di-phosphate bases works, (that is, adenosine, cytidine, guanosine, inosine, and uridine diphosphates), using the plausible prebiotic ingredients ferric ion (Fe3+) and acetyl phosphate. They found surprisingly that only ADP can be productively converted to ATP in this setting, and it was pretty insensitive to pH, other ions, etc. This was apparently due to the special Fe3+ coordinating capability that ADP has due to its pentose N and neighboring amino group that allows an easy electron transfers to the incoming phosphate group. Iron remains common as an enzymatic cofactor today, and it is obviously highly plausible in this free form as a critical catalyst in a pre-biotic setting. Likewise, acetyl phosphate could hardly be simpler, occurs naturally under prebiotic conditions, and remains an important element of bacterial metabolism (and transiently one of eukaryotic metabolism) today. 

Ferric iron and ATP make a unique mechanistic pairing that enables easy phosphorylation at the third position, making ATP out of ADP and acyl phosphate. At step b, the incoming acyl phosphate is coordinated by the amino group while the iron is coordinated by the pentose nitrogen and two existing phosphates.

The point of this paper was simply to reveal why ATP, of all the possible bases and related chemicals, gained its dominant position of core chemical and currency. It is rare in origin-of-life research to gain a definitive insight like this, amid the masses of speculation and modeling, however plausible. So this is a significant step ahead for the field, while it continues to refine its ideas of how this amazing transition took place. Whether it can demonstrate the spontaneous rTCA cycle in a reasonable experimental setting is perhaps the next significant question.


  • How China extorts celebrities, even Taiwanese celebrities, to toe the line.
  • Stay away from medicare advantage, unless you are very healthy, and will stay that way.
  • What to expect in retirement.

Sunday, August 21, 2022

What Holds up the Nucleus?

Cell nuclei are consistently sized with respect to cell volume, and pleasingly round. How does that happen?

An interesting question in biology is why things are the size they are. Why are cells so small, and what controls their size? Why are the various organelles within them a particular size and shape, and is that controlled in some biologically significant way, or just left to some automatic homeostatic process? An interesting paper come out recently about the size of the nucleus, home of our DNA and all DNA-related transactions like transcription and replication. (Note to reader/pronouncer: "new clee us", not "new cue lus".) 

The nucleus, with parts labeled. Pores are large structures that control traffic in and out. 

The nucleus is surrounded by a double membrane (the nuclear membrane) studded with structurally complex and interesting pores. These pores are totally permeable to small molecules like ions, water, and very small proteins, but restrict the movement of larger proteins and RNAs, and naturally, DNA. To get out, (or in), these molecules need to have special tags, and cooperate with nuclear transport proteins. But very large complexes can be transported in this way, such as just-transcribed RNAs and half-ribosomes that get assembled in the nucleolus, a small sub-compartment within the nucleus (which has no membrane, just a higher concentration of certain molecules, especially the portion of the genomic DNA that encodes ribosomal RNA). So the nuclear pore is restrictive in some ways, but highly permissive in other ways, accommodating transmitted materials of vastly different sizes.

Nuclear pores are basket-shaped structures that are festooned, particularly inside the channel, with disordered phenylalanine/glycine rich protein strands that act as size, tag, and composition-based filters over what gets through.

The channels of nuclear pores have a peculiar composition, containing waving strands of protein with repetitive glycine/phenylalanine composition, plus interspersed charged segments (FG domains). This unstructured material forms a unique phase, neither oily nor watery, that restricts the passage of immiscible molecules, (i.e., most larger molecules), unless accompanied by partners that bind specifically to these FG strands, and thus melt right through the barrier. This mechanism explains how one channel can, at the same time block all sorts of small to medium sized RNAs and proteins, but let through huge ribosomal components and specifically tagged and spliced mRNAs intended for translation.

But getting back to the overall shape and size of the nucleus, a recent paper made the case in some detail that colloid pressure is all that is required. As noted above, all small molecules equilibrate easily across the nuclear membrane, while larger molecules do not. It is these larger molecules that are proposed to provide a special form of osmotic pressure, called colloid osmotic pressure, which gently inflates the nucleus, against the opposing force of the nuclear membrane's surface tension. No special mechanical receptors are needed, or signaling pathways, or stress responses.

The paper, and an important antecedent paper, make some interesting points. First is that DNA takes up very little of the nuclear volume. Despite being a huge molecule (lengthwise), DNA makes up less than 1% of nuclear volume in typical mammalian cells. Ribosomal RNA, partially constructed ribosomal components, tRNAs, and other materials are far more abundant and make up the bulk of large molecules. This means that nuclear size is not very sensitive to genome copy number, or ploidy in polyploid species. Secondly, they mention that a vanishingly small number of mutants have been found that affect nuclear size specifically. This is what one would expect for a simple- even chemical- homeostatic process, not dependent on the usual signaling pathways of cellular stress, growth regulation, etc., of which there are many.

Where does colloid osmotic pressure come from? That is a bit obscure, but this Wiki site gives a decent explanation. When large molecules exist in solution, they exclude smaller molecules from their immediate vicinity, just by taking up space, including a surface zone of exclusion, a bit like national territorial waters. That means that the effective volume available to the small solutes (which generally control osmotic pressure) is slightly reduced. But when two large molecules collide by random diffusion, the points where they touch represent overlapping exclusion zones, which means that globally, the net exclusion zone from large molecules has decreased, giving small solutes slightly more room to move around. And this increased entropy of the smaller solutes drives the colloid osmotic pressure, which rises quite rapidly as the concentration of large molecules increases. The prior paper argues that overall, cells have quite low colloid osmotic pressure, despite their high concentrions of complex large molecules. They are, in chemical terms, dilute. This helps our biochemistry do its thing with unexpectedly rapid diffusion, and is explained by the fact that much of our molecular machinery is bound up in large complexes that reduce the number of independent colloidal particles, even while increasing their individual size.

So much for theory- what about the experiment? The authors used yeast cells (Schizosaccharomyces pombe), which are a common model system. But they have cell walls, which the researchers digested off before treating them with a variety of osmolytes, mostly sorbitol, to alter their osmotic environment (not to mention adding fluorescent markers for the nuclear and plasma membranes, so they could see what was going on). Isotonic concentration was about 0.4 Molar (M) sorbitol, with treatments going up to 4M sorbitol (hypertonic). The question was.. is the nucleus (and the cell as a whole) a simple osmometer, reacting as physical chemistry would expect to variations in osmotic pressure from outside? Recall that high concentrations of any chemical outside a cell will draw water out of it, to equalize the overall water / osmotic pressure on both sides of the membrane.

Schizosaccharomyces pombe are oblong cells (left) with plasma membrane marked with a green fluorescent marker, and the nuclear membrane marked with a purple fluorescent marker. If one removes the chitin-rich cell wall, the cells turn round, and one can experiment on their size response to osmotic pressure/treatment. Hypertonic (high-sorbitol, top) treatment causes the cell to shrink, and causes the  nucleus to shrink in strictly proportional fashion, indicating that both have simple composition-based responses to osmotic variation.


They found that not only does the outer cell membrane shrink as the cell comes under hypertonic shock, but the nucleus shinks proportionately. A number of other experiments followed, all consistent with the same model. One of the more interesting was treatment with leptomycin B (LMB), which is a nuclear export inhibitor. Some materials build up inside the nucleus, and one would expect that, under this simple model of nuclear volume homeostasis, the nuclei would gradually gain size relative to the surrounding cell, breaking the general observation of strict proportionality of nuclear to cell volumes.

Schizosaccharomyces pombe cells treated with a drug that inhibits nuclear export of certain proteins causes the nuclear volume to blow up a little bit, relative to the rest of the cell.

That is indeed what is seen, not really immediately discernable, but after measuring the volumes from micrographs, evident on the accompanying graph (panel C). So this looks like a solid model of nuclear size control, elegantly explaining a small problem in basic cell biology. While there is plenty of regulation occuring over traffic into and out of the nucleus, that has critical effects on gene expression, translation, replication, division, and other processes, the nucleus can leave its size and shape to simple biophysics and not worry about piling on yet more ornate mechanisms.


  • About implementing the climate bill and related policies.
  • We should have given Ukraine to Russia, apparently. Or something.
  • Big surprise- bees suffer from insecticides.

Saturday, March 26, 2022

A Brief History of DNA Sequencing

Technical revolutions that got us to modern DNA sequencing.

DNA is an incredibly elegant molecule- that much was apparent as soon as its structure came out. It is structurally tough, and its principles of information storage and replication are easy to understand. It is one instance where evolution came with, not a messy hack, but brilliant simplicity, which remains universal over all the life that we know. While its modeled structure was immediately informative, it didn't help to figure out its most important property- its sequence. Methods to sequence DNA have gone through an interesting evolution of their own. First were rather brutal chemical methods which preferentially cut DNA at certain nucleotides. Combined with the hot new methods of labeling the DNA with radioactive P32, and of separating DNA fragments by size by electically pushing them (electrophoresing) through a jello-like gel, this could give a few base pairs of information.

A set of Maxam-Gilbert reactions, with the DNA labeled with 32P and exposed to X-ray film after being separated by size by electrophoresis through a gel. Smallest are on the bottom, biggest fragments on on the top. Each of the four reactions cleaves at certain bases, as noted at the top. The intepretation of the sequence is on the right. PvuII is a bacterial enzyme that cleaves DNA, and this (palindromic) sequence noted at the bottom is the site where it does so.

Next came the revolution led by Fred Sanger, who harnessed a natural enzyme that polymerizes DNA in order to sequence it. By providing it with a mixture of natural nucleotides and defective ones that terminate the extension process, he could easily develop far bigger assortments of DNAs of various lengths (that is, reads) as well as much higher accuracy of base calling. The chemistry of the Maxam-Gilbert chemical process was quite poor in base discrimination. This polymerase method also eventually used a different isotope to trace the synthesized DNAs, S35, which is less powerful than P32 and gave sharper signals on film, which was how the DNA fragments were visualized after laid out and ordered by size, by electrophoresis.

The Sanger sequencing method. Note the much longer read length, and cleaner reactions, with fully distinct base specificity. dITP was used in place of dGTP to help clarify G/C-rich regions of sequence, which are hard to read due to polymerase pausing and odd behavior in gel electrophoresis. 

There have been many technological improvements and other revolutions since then, though none have won Nobel prizes. One was the use of fluorescent terminating nucleotides in place of radioactive ones. In addition to improving safety in the lab, this obviated the need to generate four different reactions and run them in separate lanes on the electrophoretic gel. Now, everything could be mixed into one reaction, with four different terminating fluorescent nucleotides in different colors. Plus, the mix of synthesized DNA products could now be run through a short bit of gel held in a machine, and a light meter could see them come off the end, in marcing order, all in an automated process. This was a very significant advance in capacity, automatability, and cost savings.

Fluorescent terminating nucleotides facilitate combined reactions and automation.

After that came the silicon chip revolution- the marriage between Silicon Valley and Biotech. Someone discovered that silicon chips made a good substrate to attach DNA, making possible large-scale matrix experiments. For instance, DNA corresponding to each gene from an organism could be placed at individual positions across such a chip, and then experiments run to hybridize those to bulk mRNA expressed from some organ or cell type. The readout would then be fluorescent signals indicating the level of expression of each gene- a huge technical advance in the field. For sequencing, something similar was attempted, laying down all possible 8 or 9-mers across such a chip, hybridizing the sample, thereby trying to figure out all the component sequences of the sample. The sequences were so short, however, that this never worked well. Assembling a complete sequence from such short snippets is nearly impossible.

What worked better was a variation of this method, where the magic of DNA synthesis was once again harnessed, together with the matrix layout. Millions of positions on a chip or other substrate have short DNA primers attached. The target DNA of interest, such as someone's genome, is chopped up and attached to matching primers, then hybridized to this substrate. Now a few amplification steps are done to copy this DNA a bunch of times, all still attached in place to the substrate. Finally, complementary strands are all melted off and the single DNA strands are put through a laborious step-by-step chemical synthesis process, similar to how artifical DNA is made to order, across the whole apparatus, with chemicals successively washed through. No polymerase is used. Each step ends with a fluorescent signal that says what the base that just got added was at that position, and a giant camera or scanner reads the plate after each pass, adding +1 to the sequence of each position. The best chemical systems of this kind can go to 150 or even 300 rounds (i.e. base pairs), which, over millions of different DNA fragments from the same source, is enough to then later re-assemble most DNA sequences, using a lot of computer power. This is currently the leading method of bulk DNA sequencing.

A single DNA molecule being sequenced by detecting its progressive transit through a tiny (i.e. nano) pore, with corresponding electrical readout of which base is being wedged through.

Unfortunately, our DNA has lots of repetitive and junky areas which read sizes of even 300 bases can not do justice to. We have thousands of derelict transposons and retroviruses, for instance, presenting impossible conundrums to programs trying to assemble a complete genome, say, out of ~200 bp pieces. This limitation of mass-sequencing technologies has led to a niche market for long-read DNA sequencing methods, the most interesting of which is nanopore sequencing. It is almost incredible that this works, but it is capable of reading the sequence of a single molecule of single stranded DNA at a rate of 500 bases per second, for reads going to millions of bases. This is done by threading the single strand through a biological (or artifical) pore just big enough to accommodate it, situated in an artifical membrane. With an electrical field set across the membrane, there are subtle fluctuations detectable as each base slips through, which are different for each of the four bases. Such is the sensitivity of modern electronics that this can be picked up reliably enough to read the single thread of DNA going through the pore, making possible hand-held devices that can perform such sequencing at reasonable cost.

All this is predicated on DNA being an extremely tough molecule, able to carry our inheritance over the decades, withstand rough chemical handling, and get stuffed through narrow passages, while keeping its composure. We thought we were done when we sequenced the human genome, but the uses of DNA sequencing keep ramifying, from forensics to diagnostics of every tumor and tissue biopsy, to wastewater surveillance of the pandemic, and on to liquid biopsies that promise to read our health and our future from a drop of blood.