We all want to cross-link peptides and do mass spec on them, right? We have a protein of interest and we want to know what other proteins are interacting with it. So the strategy is to throw in one of the 20 or so crosslinkers out there, pull down our protein of interest with an antibody or something and then do MS/MS on everything OR specifically study the crosslinked peptides.
Problems? When we pull down one protein we pull down tons of proteins. Part of the reason is that antibodies really aren't 100% specific, particularly due to the incredible number of protein isoforms present in biological systems. Another part is that no protein system is composed of just a few proteins. Billions of years of evolution have forced an unbelievable level of intricacy in hundreds if not thousands of proteins working simultaneously together to efficiently achieve even the most simple of tasks in the most energetically favorable manner. This isn't done by textbook pathway drawings of 6 proteins. Not when throwing in 100 more will could the energy requirements of that reaction by 30%.
Another problem? The false discovery rates of small protein complexes (or, heck, even big ones) sucks. FDR works best with bigger and bigger datasets. Small ones just don't work right.
A worse problem? The crosslinked peptides give horrendous FDR calculations. Awful. Cause you have to use so many dynamic modifications per peptide sequence. This equals horrible dynamics. Add that to your small sample size and your often looking at a random number generator.
This paper is badass, btw. You know what they did? They look at the crosslinking in a biologically relevant context. No kidding! They take into account the protein crystal structure providing the proximity of the residues for crosslinking and throw that into the FDR!!!! Cause we have that data out there for most proteins (okay, not most, but for most important proteins.)
Okay, so there is a disconnect here, maybe. Yes. We have the crystal structure for individual proteins. Lots and lots of them. And this process will work for that (they prove it in this paper using RNA Polymerase II as an example). But what I'm more interested in is in complexes, and we don't have nearly the same degree of data for those. So I guess I'm extending the real power of this paper a little, but what a step forward! I'm imagining the extension of this algorithm if it eliminated binding sites we know are in use or are deep in the internal structure of the protein. But, holy cow, this paper is really really smart....
Read it (currently open access) here!