Saturday, October 26, 2013
How does Percolator affect MSAmanda search results?
Here is a paraphrase of the question:
MSAmanda gives you more peptides than Sequest using a target decoy search on high resolution MS/MS data, but how does it fare when we use Percolator? I'm actually going to extend that question one step further, if both using MSAmanda and using Percolator give you more peptides than Sequest + Target decoy alone, are these the same peptides? This analysis will come later.
Dataset: A 120 minute HeLa digest run on an Orbitrap Elite using a 25 cm EasySpray column and operating in standard high-high mode employing at Top15 methodology. So, an extremely complex high-high dataset with nice chromatography.
1) Sequest + target decoy
2) Sequest + percolator
3) MSAmanda + target decoy
4) MSAmanda + percolator
Conditions for processing were as similar and simple as possible, Uniprot/Swissprot database parsed on "Sapiens" alone, iodoacetamide as a static mod and M oxidation as a dynamic mod. FDR of 0.01 as the "strict" cutoff, and those are the only peptides I looked at. Mass tolerance of 10ppm at the MS1 level and 0.02 Da at the MS/MS
The following data is all at the Unique protein group level:
First Sequest + target decoy vs. Sequest+ percolator
As expected, Percolator ends up giving us more total unique protein groups. No surprise there.
Question #1 then, does Percolator + MSAmanda do the same thing?
Yup! Okay, I totally dig question #1. And I wonder why the heck I added more work to it, because if I hadn't we're looking at an open and shut case. MSAmanda is definitely Percolator compatible and at the end of the day, we are looking at more protein groups from MSAmanda whether we use target decoy OR percolator than we get from Sequest.
My conscience is saying that I need to check to see if these new peptides are any good (ugh...). This is my opinion on false discovery rate calculations (feel free to look at my other discussions on this site), they're a shortcut. Inherently, I do not trust them, and neither should you. They are a mechanism to help you, but manual verification is ALWAYS a good idea. Unfortunately, looking at tens of thousands of MS/MS spectra is a poor use of time.
My strategy: manually look at a sample of the worst scoring peptides at your 1% FDR cutoff and at your 5% FDR (please excuse my shorthand, you know what I mean, or you probably wouldn't be reading this unless you were really odd.) If you have crappy peptides at your 1% FDR, it isn't strict enough. If you have great looking peptides all over the place at your 5% FDR, you are too stringent. Adjust your cutoffs accordingly and re-evaluate.
This is how I do it on a Saturday afternoon on a dataset I'm not getting paid to analyze.
1) Go to the peptide tab for each analysis
2) Arrange the peptides in order of respective peptide score from worst to best
3) Double click on peptides 1,5,10,15 and 20 to reveal the XICs with the overlayed fragment matches.
4) Rapidly score them by this point system: 10 points if you would publish that peptide spectral match as it is, 5 points if you think it is okay, and -5 points if it is some junk. Yes, I made this up, geez! But it works. Remind me and I'll show more evidence at some point.
Here is how they did:
Sequest + Target decoy: 45 points (one mediocre peptide match)
Sequest + Percolator: 5 points, several bad matches
MSAmanda + Target decoy: 35 points (3 mediocre ones. not bad, but I wouldn't publish alone)
MSAmanda + Percolator: 50 points. In this small sample set, I would trust every one of these PSMs. Ummm...not exactly what I was expecting...but I'm cautiously excited about it!
Okay, this is getting out of hand. Now my conscience says: Is this due to the sample size? I looked at the next 20 spectra, spaced every 5 and I don't think so. You'll have to take my word on it. But at this default cutoff, there is NO doubt in my mind that the peptides scored by MSAmanda in conjunction with Percolator are significantly better than the peptides scored by Sequest + Percolator and are on par with, or are better(!?!?!), than the peptides scored by the much more conservative target decoy search.
In this sample set, the WORST PSM scored by MSAmanda + Percolator and passing default cutoffs:
By comparison, the lowest scoring peptide from the Sequest + Percolator search that passed default FDR cutoffs.
2 y ions? Seriously? This is why you CAN NOT trust your default FDR cutoffs. Take this as a shortcut. In case you were wondering, I gave this peptide a -5!
I'm going to cut this analysis off now. Enough data processing for a Friday evening.
Again: Question #1, does MSAmanda work with Percolator? My answer, based on 4 runs of 1 dataset. Absolutely. In fact, Percolator seems to work a whole lot better with MSAmanda than it even works for Sequest.
By the way, I'm not putting down Percolator + Sequest, I would simply tighten the FDR cutoff until I got to consistently good data. In this example Percolator simply over-shot the mark a little and dug too hard trying to get us as many peptides as possible. In fact, that peptide may be a good match, but it is one that I certainly would not show someone to convince them that we found their protein of interest.
Disclaimer, because I'm still a little thrown off by this: This is one dataset. The results are surprisingly convincing, however, and the logic is beginning to make sense to me. Percolator is trying to dig into the data to pull out PSMs that we mistakenly threw out as false (oversimplification, but let's roll with it) and it can only do that based on the quality of the data that was originally identified. If MsAmanda is doing a superior job of making peptide to spectral matches, Percolator has more to work with.
TL/DR: Use MSAmanda for high resolution MS/MS spectra. Also use Percolator, they are compatible and give you more data. Always verify if your FDR cutoffs are giving you good data!