Monday, March 18, 2013

Proteome Discoverer 1.4 vs MaxQuant 1.3.0.5

A couple of years ago, I wrote a short and blurb on my experience comparing MaxQuant vs Proteome Discoverer.  Turns out, it may have been the most read thing I've ever written.  If I'd known how many people would read it, maybe I would have done a more thorough job!

Here is my vindication, though!  To celebrate last week's release of Proteome Discoverer 1.4,  I took a very nice SILAC labeled data set and ran it through PD 1.4 and MaxQuant 1.3.0.5 (the newest iteration, as of this posting date).

Dataset:  Human cancer cell line passaged in SILAC media with Lysine (6) and Arginine (10).  The data was analyzed in one go on a Q Exactive system on a 180 minute gradient using a Top20 approach.  ~600 MB file.

Software settings:  
Dynamic modifications:  Carbamidomethylation (C), Oxidation (M), N-acetylation, and SILAC labels
MS1 tolerance:  10 ppm (20 ppm first search for MxQ, 10 ppm for second)
MS2 tolerance:  50 ppm
FASTA:   IPI Human 3.77 (originally downloaded from maxquant.org)
MaxQuant used Perseus 1.3.0.4 with an FDR of 0.01 and implemented 4 threads
PD used Sequest with the Percolator algorithm at default parameters

PC:  AMD Quad Core, clocked at ~3 GHz with 8 GB of RAM

Total search time:
MaxQuant:  109 minutes
PD:  23 minutes

Results:
MaxQuant:  425 total grouped IDs, 27 of which were contaminants and 12 were reverse sequences.
386 human protein IDs
286 quantifiable

Proteome Discoverer:
465 grouped proteins
380 quantifiable.

I'll be honest.  I was scared at first.  MaxQuant has gone through some significant revision since I was last using it commonly.  Some of the new features, such as the ability to go back and re-search spectra are crazy impressive.  That team contains some of the best researchers in our field and continues to innovate how we do proteomics and process MS/MS data.  However, I have met a lot of the team that writes PD and they are no slouches either.

I'm going to throw in a caveat here:  I am an expert at using Proteome Discoverer.  I've been using PD since version 1.0 and have been using the beta versions of PD 1.4 for about 6 months.  I'm less adept with MaxQuant.  For a quad core cpu, I don't know how many threads would be optimal.  4 seems the smartest, but I may have been able to optimize that number and sped it up (virtual threading, or whatever...)  It may also be possible to optimize first search/second search parameters to gain more IDs.  Would I have picked up almost 100 quantifiable IDs?  I doubt it, but maybe the disparity wouldn't have been as large.

In time, I might do a follow-up article to this one.  It would be nice to see what the overlap in ID and/or quan is like.  It is a little difficult due to how differently MaxQuant and PD deal with protein grouping.  My guess, however, is that the majority of IDs and quan are the same, but that Andromeda and Sequest would each add complementary data to each other.

But for now, for just pure depth of coverage and quan, Proteome Discoverer appears to be the winner, though I'd still encourage you to try running both.  The worst that would happen is that you'd get more data from that MS/MS experiment.


7 comments:

  1. Hi Ben,

    This is Eric from the PEAKS team. I'd like to send you an email but could not find it anywhere on your blog.

    I read your articles and noticed that you have tried PEAKS back in 2012. I am glad that PEAKS is helpful for your work and I appreciate your praise for our product. I am curious how PEAKS will perform on this dataset. If this is something we can work together, you can reach me at eric[.a.t.]bioinfor.com.

    We have participated the ABRF iPRG studies for the recent years and performed excellently. Here are the links of the results.

    http://www.bioinfor.com/peaks/corp/conferences/abrf-2013.html

    http://www.bioinfor.com/peaks/corp/conferences/abrf-2012.html

    http://www.bioinfor.com/peaks/corp/conferences/abrf-2011.html

    ReplyDelete
  2. Dear Ben,
    please take care if you compare FDR at protein and PSM level.


    If you would like to find more phosphorylation sites please use MS Amanda.

    Karl

    ReplyDelete
  3. Dr. Mechtler, thanks for the input! You are one step ahead of me. I've been dying to use MSAmanda but I've been one of the beta testers for PD 1.4 and I've had to dump everything to uninstall/reinstall the new versions every month. I have a beautiful QE phospho study we're trying to push out the door and I can't wait to use this node on it. On the next series of Proteome Discoverer 1.4 videos I'm going to have one where I install MSAmanda.

    ReplyDelete
  4. Hi Ben,

    Can you give me more details on the computer setup you use for proteome discoverer 1.4? I'm currently using an amd Opteron 4234 (3.1 GHz) processor with 3 Gb Ram and a 7200 rpm hard drive (60 Gb) and it's taking nearly a week to analyse a standard iTRAQ mgf file (~0.8 Gb). I see your search in this blog only took 23 minutes which would be a substantial improvement on our times if we could get a similar system in place.

    ReplyDelete
  5. I am not sure one would choose one program over the other based on search time, but rather quality. However, you do not objectively measure either.

    Speed: I am sure both programs can be speed up using a RAM disk (processing files from RAM), and all timing should be separated into CPU time and I/O wait time.

    Quality: Are the 260 quantifiable proteins found using MaxQuant all contained in the 380 quantifiable proteins found using Proteome Discoverer? Perhaps combining the output of both programs is the best possible approach. Too bad both are closed source and cannot be improved by the community (who also have impressive skills).

    ReplyDelete
    Replies
    1. David,
      Yes, I still do run both. You never lose anything when you throw in an additional search engine, other than time, of course. And for a lot of people out there processing time is every bit as much of a bottleneck for them as run time. We could also speed up the PC using a rocket cache or a high speed SSD, but these aren't resources that I have in my house, or the PCs that I see in most labs I visit.
      It is a good point. When I get a chance to look at this data, I'll do an overlap study. The way the two parse the uniprot database makes it a little trickier to do than just dumping the results into Venny and getting my overlap.
      On the open source end, I definitely see your point. I try very hard to evaluate the new quality open source engines that are out there (time is limiting). There are 2 main reasons that I started using these two packages for my research. 1) Support. There are people out there who are paid to answer questions for PD and (it sure looks like) for MaxQuant. You're never on your own. 2) The software isn't a variable anymore. Every person can take this data set and run it through this version of MaxQuant or this version of PD with this database and get the same results. When the community makes improvements to the software, the software ends up becoming a variable. Considering the level of variabliity in sample prep methods and LC-MS/MS instruments and settings from lab to lab, it is nice to have at least one part of your workflow that isn't an X factor.
      I definitely appreciate your comments!

      Delete
    2. Hello Ben,

      I also appreciate you taking the time to objectively evaluate these programs, and agree that it is a difficult task. I have the luxury of running the analysis on a machine with 32 cores and 64GB of memory, and have found that use of a RAM disk (ImDisk) speeds up all analysis programs considerably (while allowing other programs to run smoothly at the same time).

      We are all currently stuck with these closed source tools (and vendor libraries) because there are no viable alternatives (I have tried a few). Personally, I would like to see MaxQuant return to its open source roots (MSQuant) for both ethical and practical reasons (growing the community, improving the software, removing an X-factor). I guess in these commercial times scientists often forget how high the stack of giants we stand on is, and how much has been shared with us.

      Kind regards,
      David

      Delete