Saturday, July 31, 2021

RAD51 and the DNA Hokey-Pokey

DNA repair and recombination rely on homology search between separate DNA molecules, one of which is double-stranded. How is that done?

BRCA2 is one of the more significant cancer-causing genes, when mutated. It is a huge protein of 3,418 amino acids, with lots of interactions, and functions that are not, even at this late date, very well understood. Like many eukaryotic proteins, it does alot of facilitation and organization of other proteins, roles which have clearly snowballed over evolutionary time. But its core function seems to be to bind right at the site of DNA breaks, and load the recombination protein RAD51 onto the ragged single stranded end. RAD51 then coats the remaining single stranded DNA and does the important work of helping it to find matching DNA elsewhere in the nucleus, which can then be copied to properly repair the break.

It is clear that DNA repair is a critical and highly regulated process, thus the continuing elaboration of proteins like BRCA2 which have mangerial roles. But RAD51 has the more fascinating structural role to play. How does it enable a job that seems impossible- to search efficiently through a whole genome of 3 billion basepairs, crammed in a crowded and jostling nucleus, and wound into double-stranded form on nucleosomes and other chromosomal proteins, to find the exact partner with which to pair and perform the dance of filling in the missing bit of DNA?

RAD51 is, unlike BRCA2, highly conserved, from bacteria to humans. Due to the different genetic methods used to find it, it is named RecA in bacteria, (for a specifically recombination-oriented screen), but is called RAD51 in eukaryotes, following a screen done in yeast cells for all sorts of mutants sensitive to high-energy radiation. Work over the last couple of decades has clarified the structure of RecA/RAD51 and thus how it functions.

Schematic of a DNA break, after processing, searching and finding a homolog to complete the repair. Not mentioned in this post, but the two ends need to be held in a coordinated way to facilitate repair across the break, even while the single stranded ends engage in a nucleus-wide homology search.

RAD51/RecA coating DNA, in scanning electron microscopy. Note how linear and stiff it is. Comparison is with similar DNA coated with another protein, single-strand binding protein, which imposes much less structure.

As mentioned above, RAD51 coats the single stranded end left after a DNA break has been detected and processed / cleaned up by the initial enzymes, and after BRCA2 binds to the recessed junction where the single strand starts. RAD51 forms a stiff and bulky filament, holding the DNA in a stretched conformation that is a thousand times stiffer than single stranded DNA, and 20 times stiffer than double stranded DNA. Interestingly, the single stranded DNA is held deep within the RAD51 filament, quite hard to see from the outside. Only the bases peep out, in triplet sets, amongst the protein structure that holds it so tightly. RAD51 is an ATP-ase, using the energy of ATP to polymerize and construct the filament, and also to de-construct it, but not for the searching operations.

Structure of a RAD51/RecA filament- macro above, and micro below. The single stranded DNA whose homolog is being sought is in orange, tucked deep within the protein filament. In closeup, a slight opening of the incoming double stranded DNA (blue) allows its bases to sample a little bit of the target. The pinkish blobs are positively charged lysines / argenines, ready to mate with the negatively charged incoming DNA backbone. Video here.

So much for the single strand doing the homology search. What about the double stranded DNA being searched against? The RAD51 filament makes provision for that as well, binding it lightly (in the proper directional orientation) and additionally having local splaying interactions that encourage its strands to separate slightly, binding the non-searching single strand, and allowing the searching strand to pair with the triplets peeking out from the core RAD51 filament. At this atomic scale, there is a lot of brownian motion / jostling- the DNA does breathe a bit naturally- so this is not very hard to do in a rapid way. But RAD51 obviously facilitates this in an optimized way.

Another structural view of the core sampling interaction, emphasizing the DNA strands. In brown is the target single strand DNA. In green is the slightly opened strand from the incoming double stranded DNA doing the sampling of one target triplet (with its single strand complement in red held off a little to the side). Note how the target DNA is held in very stretched form, with triplets of bases separated by slight gaps, which are RAD51 protein residues.

The binding of the invading double strand DNA is then very heavily dependent on how well it pairs with the single strand triplets. Pairing with three exposed bases is not a big deal. But pairing with eight consecutive bases stabilizes the match, and pairing with 26 or more seals the deal to be a long-lived match, which can induce de-polymerization of RAD51 and the arrival of repair polymerases. It is clear that RAD51 coordinates a complex dance of on-off sampling of nearby double stranded DNAs, including non-specific capture of local DNA, detailed samping by encouraging strand opening, as well as linear back and forth shifting, allowing some linear scanning as well. These diffusion mechanisms somehow add up to a thorough search of the nucleus for the right partner.

In bacteria, with genomes of a few million base pairs, sequences of 15 nucleotides are usually unique. In a genome of three billion bases, longer sequences are needed to be sure of true homology, nuclear volume is much larger, and there is more complex chromatin to deal with. Yet, the homology search time is not much less- about an hour. Why this is is not yet really clear. In eukaryotes, homologous chromosomes may typically reside close to each other in a semi-stable nuclear architecture. Or other aspects of the chromatin milieu may facilitate the search, paradoxically. And how damaging is an incorrect match? If a closely related sequence is chosen, (sequences which in eukaryotes are common due to replication errors, recombination errors, gene amplification and duplication, and repetitive sequences of many other kinds), it may not matter at all, depending on the size of repair span being copied from the intact homolog. Tract lengths repaired by copying from the other homolog are typically between 50 to 800 nucleotides long.

An even more focused view of the evolving match between a RAD-51 bound single strand (red) and an incoming DNA from a double-stranded sequence match (blue).


No comments: