Saturday, November 23, 2019
Static Percolator allows application to smaller datasets!!
Okay -- if we've talked at a meeting about data processing -- we've talked about this. I'm at an amazing meeting right now and I was in 2 great conversations about this concept already.
Percolator is fantastic. It is the gold standard for false discovery rate calculations, but it was designed for global applications. If you've looked at your data you've seen this phenomena where all the sudden you can't seem to trust what it is giving you by default.
Some of those videos over there on Proteome Discoverer are like 7 years old now? And I ramble incoherently about it there. But what is the solution? Could it be this!?!?
I have a finite number of stored Obama Boom GIFs left, but this deserves one.
Static modeling percolator!!
What's the difference? Normal percolator is dynamic. It learns from that big 'ol dataset you just gave it and that's what it uses to set your parameters.
Static modeling flips the switch. What if it learns from a big 'ol dataset and takes all that stuff it just learned and you apply those settings to the little dataset you just gave it?
Well -- it looks like you've got a smart Percolator right out of the box!
As shown in the picture at the very top that I stole and then clipfarted over -- when your datasets are too small for Percolator to learn enough from it gets "discordant" (their word). I like fuzzy better. I've always wondered if there was a static cutoff. It's great at 100,000 PSMs. It can be BAD at 1,000 PSMs. Where is the cutoff?
What we learn here is that there is no set number (which makes sense, it would be weird it there was) but you get progressively fuzzier as PSM numbers drop (which we've all seen. we're all totally smart. just not smart enough to fix it). These guys (one of them, I hear, has some evidence he knows something about how the whole thing works) just pulled the whole thing together.
100% Recommended reading (or at least skimming). It has big implications for our field and how we will process ALL our data in the future.
Edit: Want to know more? Check out this blog post!