Wednesday, November 6, 2013
What is a TopN Peaks filter?
One at a time, I've been going through the PD nodes, new and old and evaluating them in exactly the way that one should. Using my current favorite dataset, I simply add in the new node and run the same sample with and without this node. It makes for some easy entries. On the down-side, using just one dataset may not be an accurate representation of what this node can do, as they may be more useful for more specialized datasets.
The TopN filter is an interesting one. It has two settings 1) the number of MS/MS fragments to look at and 2) the window width in which to look for these fragments.
For example, the defaults are Top 6 with 100 Da. What this does is go through each and every MS/MS spectra and break it into 100Da windows. Within each window, it determines the 6 most intense ions and eliminates everything else. If you scanned from 400-1400, then you've reduced your MS/MS spectra to the 60 most abundant peaks and dropped a lot of noise from your spectra.
Sooooo... what does this button do!?!?
For one, it's fast. On my current favorite dataset, a 2 hour HeLa high-high dataset, it takes about 2 minutes to run. This is offset by the fact that the spectrum selector ends up taking less time. My search using a Target Decoy ran 6 minutes, whether I used this filter or not. Yes, my laptop knocks out PD searches in 6 minutes. Let me know if you want the specs on it, it wasn't very expensive at all.
Okay, so there are no apparent consequences, time-wise, to doing it! How are the peptides?
Well, in both the case of the target decoy and percolator searches, we ended up with slightly fewer peptides and protein groups when we use the TopN filter. Yup, fewer. End of entry.
Nope! I'm joking. Not about there being fewer peptides. There are fewer, but remember a few entries back where I was talking about Percolator trying too hard on Sequest searches and letting some junk through? What if there was now less of that junk? That would be a perk, right?
And it is. The number of peptides drops (from ~11,600 to ~11,100 in this search) but when you look at the worst scoring peptides that made it through the Percolator cutoff, they aren't nearly as bad. The thing is that Sequest and Percolator are just digging too deep and making mis-assignments on what is essentially noise. But if you do a good job of eliminating that noise, then we're looking at fewer false positives.
I encourage you to check out this node. I'd love to know how it performs on a larger dataset. I would expect it to work much better, but who knows.