You know what I love? When people start applying nice statistics to proteomics data. A lot of these datasets are geting far too large for us to say "x is twice y". But we all have a lot on our plates. We can't just take a bunch of stats classes (believe me, I'm trying and I've already had to drop on that I paid for this summer...) in order to get caught up. We need good, trustworthy, time tested stats built into our processing schemes.
Why not go for simple p-values?
Because, obviously, it isn't that simple, dummy!
HAHA! But it turns out that it is!
JJ Howbert and Bill Noble think it is and they have some really good evidence. Check out this paper (it appears to be open access) in press at MCP.
In this study, they went to the original Xcorr values assigned by Sequest and looked at the total score distribution across all the peptide-spectral matches. At this level, they were able to determine the probability that their test hypothesis (in this case, the Xcorr value) was true, cause that's what p-values do.
When they went back and ranked their peptides by p-value, rather than Xcorr, they found they had a much more accurate measurement of PSM validity than merely saying "anything above an Xcorr of 2.0 is trustworthy" (which is what most of us have been doing all along, be honest, and we've all secretly known it was silly. It's like saying a TMT fold regulation of 1.25 is significant. It's just us being lazy....)
Awesome, right? As proof of principle, they compared the same data set to a bunch of different engines and, predictably, this worked better than the other engines tested.
What about Percolator?!?!
This is where I don't know quite enough Greek letters..or at least when you're adding and dividing them it does a funny thing to my brain. What I know this morning? They were able to work this pipeline into Percolator and I fell asleep. They come from the same place. Of course it works with Percolator!