Okay, here is the question. What, if anything, is MSAmanda giving us that we aren't getting from Sequest + Percolator? In the previous entry I think I did a good job of highlighting 2 things: 1) MSAmanda and Percolator work VERY well together and 2) We get more proteins from high resolution MS/MS spectra with MSAmanda.
I guess the question is this, at the peptide level, how many are unique? Are we scoring the same stuff, mostly, or is this really complementary data. In the end, does it really matter? More peptides is a great thing, right? But I want to have a solid metric (on one data set...) to say, "adding this search engine can give you XX% more results," or something. There is lot of data out there regarding different complementary search engines used together, like this poster. In general, however, I tend to expect an extra search engine to boost my IDs by ~10%.
First of all: I mentioned last time that in this particular run, I was not happy with the Sequest + Percolator PSMs that made it through my filter. I need to narrow those down to peptides that I trust.
I used the method that I mentioned last time: I cut back my FDR cutoff at the "high" confidence level (since this is Percolator, this is based on q-value), until I got to consistently good peptides.
Here is a summary:
At 0.01, I had 11600 peptides from Sequest + Percolator
At q value 0.009, I had 11482, and they still didn't meet my threshold cutoff
At q value 0.006, I had 11,108, but I still didn't trust all of the lowest scoring peptides, and so on.
I ended up cutting it to 0.001, which left me with 10,226 peptides, ~800 less than I started with, but still a big boost over what I got from Sequest + Target decoy. In a related note, I did this a second way, by cutting the original Xcorr factor to a minimum of 1.75 using the same sampling technique I liked the peptides and came up with close to the same numbers. Interesting, but maybe coincidental.
Here are the peptide numbers from each analysis:
Sequest + Percolator: 10,226 (trusted)
MSAmanda + target decoy: 9262
MSAmanda + percolator: 12,241
And here is what it looks like:
Not a bad chart, right? By the way, I'm completely fascinated by the fact that target decoy search sometimes gives me peptides that I don't get from Percolator. It makes me wonder if we should be doing both in order to boost our ID counts. Remember from the last entry that my quick and lazy analysis said that the peptides from both Amanda runs seemed trustworthy.
Anyway, I guess I was looking for a hard number. So, if we add those up, it looks like we get 12,532 unique peptides from this one run. And if we just look at the unique ones from Amanda + Percolator, we get 2086 (1585+501). That is a 16% boost in trustable (that isn't a word either? WV public education...) unique peptide IDs. It's actually a little better than that since the total I have here also has the additional peptides from the MSAmanda + TD search, but I'm not going to do that math. It's late, and this is reasonably close.
Okay, so I know running MSAmanda takes extra time. But so does adding extra time to your gradient. This is a 2 hour run that I'm analyzing. If we added an extra hour to it we might have boosted our peptide IDs by another 10-20% (just guessing, but I should do that analysis, I have data just like that on a hard drive somewhere). We could also have boosted this by running this same sample on a faster instrument like a Fusion. We could also run it with a longer gradient + DMSO on a Fusion and do this, and you know what, we'd get a ridiculous number of peptides IDs. The point is, this is free data right there in your RAW file, you just need to take the time to pull it out.
TL/DR: Are the peptides from MSAmanda unique? A lot of them sure are! When running vs Sequest in this dataset it gave us 16% new peptide IDs in exchange for a little extra processing time.