Thursday, June 6, 2013

False Discovery Rate Calculations Part 3: Do we gain anything by running two FDR algorithms in tandem?

You can read part 2 of this monologue here.


In part 1 I rambled a little about FDR.  In part 2 I demonstrated what happens when you use a regular database and a database with a concatenated reverse target to run the same sample.

Here in part 3, I want to highlight an issue with FDR with a very extreme example.  What happens when you use more than 1 level of FDR at the peptide level?  I mention this because there is some nice post-processing software out there.  Scaffold is the one I run into the most, often in conjunction with Proteome Discoverer.  People run their data through PD, get files and then import the MSF into Scaffold.  In Scaffold, you have the option of running FDR.

Should you use it?

If you have used a target decoy database in PD, the answer is almost certainly NO.

To illustrate this point, I'm going to combine the experiment from part 2 with the Target Decoy PSM validator node in PD 1.4.

Experiment (same sample from yesterday, same parameters, etc.,)

Run1:  Normal database:  2958 protein groups, 10,483 peptides
Run2:  Concatenated:  0 proteins, 0 peptides

Again, this is an extreme example but it highlights the main point here.  The job of the FDR calculator is to find bad peptides and throw them out.  In most cases, it will find bad peptides even when there are no bad peptides to see.

By using the same FDR method twice (reverse target decoy) we've eliminated ALL peptides.  Something very similar will occur if you use two similar FDRs, though it will be less extreme.

Keep in mind that I'm not saying that you can't use the FDR in Scaffold.  Ultimately, I've heard very good things about this algorithm.  If you are going to use it, however, do not use an FDR in Proteome Discoverer. Instead, use the Fixed value PSM validator node and import that resulting file into Scaffold.


No comments:

Post a Comment