Thursday, September 3, 2015

Intelligent optimization of search parameters for best possible data!

This is a very heavy and extremely interesting paper from a search algorithm optimization perspective. Oh, the paper in question is from Sonja Holl et al., and is available here.

You should probably read it yourself (its open access!) but I'm going to stumble through my layman-level interpretation of what I just read over Holiday Inn coffee that I think was some sort of homeopathy caffeine experiment...

The paper is essentially a meta-analysis of 6 data sets from three different types of instruments. Some come from ion traps, some come from Orbitraps, and some come from something called a Q-TOOF ;)

The goal of the study was to see how much changing the search parameters in a guided way would improve or hurt the results. And its kind of drastic. What they came up with is something that is a new optimization platform for a big and super interesting project called Taverna (will investigate!). The optimization plaform in Taverna looks at your data and determines what search parameters that you should be using for ideal levels of high quality peptide spectral matches (PSMs).

The taverna optimization platform looks at a number of variables including mass accuracy, isotopic distributions and more peptide-centric parameters like missed cleavages and enzyme fidelity. Up to this point, I was wondering why someone would re-write Preview....but then they make a sharp right turn and incorporate retention time prediction into the algorithm!  Interesting, right?!?

Another interesting plus? It appears to be a designed for server level applications!  A nice read even if your neurons aren't firing all the way!  Now its time to figure out what this Taverna thing is all about!


  1. Hi Ben,

    Thanks for sharing this interesting paper.
    Although I still do not know exactly what it is, I will try to dig the paper a little bit.
    Would you suggest we use this algorithm to our data before using PD to search?

    Many Thanks and Warm Regards,

  2. Catherine,
    I think this is a super interesting concept, but I don't know if the capabilities really exist for Proteome Discoverer to take advantage of the power here. For example, the real novel thing to me would be to integrate the retention times into the prediction models. For proteome discoverer you can't do better than the free Preview node from ProteinMetrics. During the Proteome Discoverer workshops I show an example data set where I am able to run a dataset through Preview and then use Preview's advice to get a boost of almost 20% more PSMs. That dataset is a little on the more extreme end, honestly, but I commonly get 5-10% more IDs from running the Preview node and using its recommended settings.
    Hope this helps!