Saturday, January 14, 2017

Do you hate PD 2.1 and just want to run it like PD 1.4? I made you a template.


I was having lunch with 2 of the most skilled proteomics guys I personally know -- and both of them talked about how they're still using Proteome Discoverer 1.4 -- and hate PD 2.x. I understand, for real!  It is a new architecture -- and with that backlog of samples you have, there isn't a lot of time to learn new software interfaces.

So I put this together this morning and maybe it will help? I call it the "I Hate Proteome Discoverer 2.1" Analysis Template.  Disclaimer: This is for simple peptide ID runs. Maybe I'll do a quan one later.

Step 1: Open the accursed new version of Proteome Discoverer and start a new Study, name the study and choose your RAW files. Ignore the processing templates and other things (you can click to expand these pictures)


Step 2: Download this template from my personal DropBox account here.  Depending on your browser, you may need to right click here to download (I have to say "save link as").

Step 3: Ignore all that junk in the big grey box. All of it -- just Open the Analysis Template you just downloaded, then Click the weird little button by the "Processing Workflow" text.


Now you are in a window that looks just like what you're used to in PD 1.0 - 1.4 SP1, right? The consensus workflow is set up and you don't need to bother with it. Go to your search engine, add your FASTA file, adjust your tolerances, all exactly the way you do in the version of PD that you don't hate and you're almost set.  All you need to do is get your RAW files into that little box below the number 5 (above) and hit the "Run" button.

Step 4: Click the Input tab near the top of the screen to get to the Raw files you added earlier. You can also add more raw files here.  Click on the Raw file you want and drag it over to the window below the processing tab. It takes a few tries to figure out where you need to click (and where you can't) to drag the files over. You can hit the <ctrl> button to click and highlight more than one file at once, just the way you would in Excel.   Hit the Run button!


Step 5: Go to the Job Queue and open your processed files. You'll need to open the Consensus workflow for each file and you're golden.


Now...the output report might not look like what you're used to. I haven't used PD 1.x in a long time, and I don't remember what it looks like (and it's Saturday, I'm gonna get out and enjoy some of this snowy day, rather than work on this blog all day!). If you hate the output and have suggestions, email me what you want it to look like and I'll see if I can create a filter method that I can add to that download that will make the output closer to what you want.

While I might not have LOVED the PD 2.x architecture at first -- I immediately preferred the PD 2.x output reports -- but if it is bugging you, let me know, I can take a swing at it!

Wait --> One more disclaimer before I put on my boots --> the results from this template may not 100% match what you get in PD 1.4, because of 2 changes in PD 2.x --> one I could change to PD 1.4 format and one I don't think I can change.

1) PD 1.4 only does false discovery rate filtering at the Peptide Spectral Match (PSM) level. PD 2.x can also do FDR at the peptide group and protein (and protein group level). I left the peptide group FDR on. There is too much evidence in the literature that this step is essential to getting the best data for me to recommend turning it off. There is a video over there to the right where I discuss this and show you how to turn it off if you need to match your peptide IDs exactly.

2) Parsimony for protein group identification. In PD 1.3 and 1.4 (maybe the earlier ones -- it has been too long for me to recall) when we have equal evidence at the peptide level that would equally support the identity of 2 distinct proteins, the protein that would have the highest percent coverage is made the top hit in the protein group ID. In PD 2.x, under these same conditions, the most intact protein reference from the FASTA is chosen for the group ID. The protein (not protein group) list is unchanged. I absolutely love this change because most databases have alternative cleavage events and partial protein fragments that, in the older versions, would get a higher group ranking -- in big databases you'd almost never see your intact protein -- even though it is probably(?) the most likely one biologically to be actually present.

That's it for today-- Gusto is wearing a scarf and he's ready to go!



3 comments:

  1. Hi Ben,
    Thanks for all the information you are sharing with us.I am interested to see the peptide group FDR OFF. You mensioned in the above post that ''there is a video over there to the right where I discuss this and show you how to turn it off if you need to match your peptide IDs exactly'' but unfortunately I could not able to find out that video. could you provide the link of that video here?
    Thanks,
    Raj

    ReplyDelete
  2. Hi Ben,
    The template is no longer exist, can you send me the link?

    Thanks

    Jinglei

    ReplyDelete
  3. Hi Ben,
    I would like to have a basic consensus workflow to reprocess my PD1.4 files, and this might be a great starting point. Could you please provide the link for the WF you mention in this post?

    Many thanks,
    Diana

    ReplyDelete