Monday, June 13, 2016
Correcting protein interaction databases by separating protein into >5k fractions!
Yeast 2 hybrid assays (Y2H) are a classical molecular biology technique. They were a huge leap forward in our ability to determine protein-protein interactions. They got even better at it when they could be cranked up with automation -- tons of robot arms whirling about and generating huge knowledge bases of what protein interacts with what other ones.
So....what if those knowledge bases turned out to be somewhat less than perfect? That would be exciting, right? What if other databases of protein-protein interactions also turned out to have some high FDR as well? It would at least be pretty controversial. So maybe you should cover your bases by doing some serious benchmark. Maybe running more than 5,000 fractions(?!!) before you submit that one to to MCP....
And that is what happens here in this new paper from Maxim Shatsky et al.,. These authors use a seriously convoluted methodology involving protein level fractionation and iTRAQ labeling to determine interacting protein partners. The paper is open access so I can borrow this image, I think.
They start by growing 400Liters(!!) of their opportunistic pathogen (D. vulgaris), which I imagine doesn't get chalked up as someone's very favorite semester, so they can start with 10 grams or so of protein. Then they start by doing the ammonium sulfate fractions. Each fractions then go into ion exchange (this is all still protein level), they shuffle up the fractions to make it less likely they have overlap and do HIC and repeat the step above with SEC.
The goal? End up with fractionation so complete that they should only see proteins hanging out together IF they are part of the same complex. At this point...I'm...skeptical....but curious enough that I keep reading.
Thank goodness, here they don't actually run 5,000 fractions. They digest the protein fractions - iTRAQ labelled with either the 4plex or the 8plex. The method section is very confusing here...unless...and I'm not ruling this out...they actually started this study before the iTRAQ 8 plex reagent was released. I first used that in 2009 or 2010, I think. This might explain some other things here.
Next, iTRAQ labeled fractions are combined by mixing fractions from further down the fractionation scale to further minimize overlap. Then the iTRAQ labeled mixtures are (oh no!!) peptide fractionated and MALDI spotted. Seriously. I'm writing about this study because I think its good. But I'm envisioning one of these authors completing his/her Ph.D. on this project after 13 years or so of working on it. Which is fine. If you're gonna spend 13 years in grad school, there are worse places to do it than Berkeley.
Joking aside. This is where it gets more confusing. So now you have these iTRAQ combined 1,400 fractions or so. And you get these quantification values for all of these peptides. So then you have to recombine this data. With this amount of fractionation they are able to get about 1,400 protein with the MALDI-TOF. This is about half the bacteria's proteome. Now they have to recombine the data to figure out what is coeluting (and therefore complexing) with the others.
I mentioned some skepticism above. So they look for their model complexes. The ones that Y2H has seen (as well as other techniques). And they are there. With each other. This crazy thing works. Wow. They do mention that they previously did this on a much smaller scale. But there definitely had to be some relief after all of this work to see proteins showing up together that you know should be together.
The method section then has a lot of words I don't know, but this is probably where you figure out how 17 trillion MALDI-spot files work together. This is where they need to build their pipeline to work out these interactions and build up their database of what interacts with what in this bacteria. For an added complication they get some historic data and run it through their pipeline and it turns out that 1) some of this published stuff has HUGE FDR. And some of these established databases through Y2H and other MS techniques have FDRs higher than we'd hope to see. In one dataset they suggest and FDR as high as 85%....yikes.
Seriously. I do like this study. And they needed to cover their bases here. Especially with the MS technology they had to work with, the upfront workflow is going to need to be huge. And you can't start something this ambitious and then stop or change the methodology in the center.
And then this guy gets to graduate! (Last joke. I promise)
BIG and ambitious study. Glad it was done and almost as glad that I didn't have to do it.
The big take away here. If your MS/MS data suggests that you've got some interacting proteins but the databases say it isn't likely, or true, it might not be the mass spectrometer that is wrong. And hopefully, eventually we'll be able to replace some of these existing knowledge bases with something better and more accurate.