I just want to process this weird mass spec data!


Do you just have some funny files from someone somewhere and have no idea where to start? I'm going to try and help.

Please keep in mind, though, that there are -- no joke -- over 1,001 different proteomics data processing packages in the world as of this paper in 2019. I'm for real going to try, though.

There are a lot of ways to get started. Let's start with finding out what you have. Probably 3 things you could have
1) Processed data that is hard to interpret
2) Raw instrument files (vendor format, often proprietary indecipherable gibberish that they really shouldn't be allowed to use, but we haven't figured out a way to punish them for it yet.)
3) Instrument files that have been converted to some sort of a universal format that, even after 10+ years of trying to come up with some sort of a conclusive universal format we're all such obstinant jackasses that we STILL don't have a universal format (sorry)

Step 1: Check the extension....


....the file extension....(the letters after the period in the name of the file -- if you can't see it, you are probably in Windows and you should unhide extensions of known file types -- this tutorial should help)

There is no better resource than this surprisingly current open article from Eric Deutsch from 2012.


(Click to expand, or go to the paper to see the highest resolution version).

Processed data is anything touching the "Analysis Results" bubble. I'll throw in a few more here.
.msf -- Proteome Discoverer PSM results
.pdresult -- Proteome Discoverer Protein Results
.tsv -- Morpheus or MetaMorpheus output (new builds of MM will automatically associate)
(I honestly thought I knew more off the top of my head....I need to add more!)

If you've got Vendor Files or Open Spectrum files you're going to need to do the data processing yourself. There are SO MANY WAYS to do this that I'm only going to be able to scratch the surface.

Let's start with this question: What are you most comfortable with?

Linux?
Windows?
Macintosh (is it called Apple again?)

Next questions (maybe I should make a flow chart...):



Operating system agnostic because you're already comfortable with coding or data analysis stats:

1) How are you with R? If you're like "I love R!" Click this link if it's metabolomics (XCMS!) or this if it's a proteomics experiment (RforProteomics, FTW!)

2) What about Python? You're a pro already? Hmmm...this is a little tougher to narrow down....
Python Metabolomics -- I'd go with SECIM Tools first
Python Proteomics -- IdentiPY -- as part of the Pyteomics toolbox

3) Are you a math programmer at heart? By that I mean -- are you using C++? Do I ever have the solution for you!  OpenMS -- Proteomics AND metabolomics and all sorts of other weird stuff (RNA-Protein interactions and on and on -- and it's C++

4) Perl? Duuuuuude....I'm writing this in 2019....

Now that the nerds are out of the way, let's get to the biologists!

Are you on Linux or Apple/Macintosh? I really shouldn't combine these...I'll break them out later...

1) SearchGUI/Peptide Shaker should probably be your first choice. I'm also going to recommend this for Windows people below. There is no way to use more open source engines with less effort.
SearchGUI searches the data
PeptideShaker combines it all

2) The Trans Proteomic Pipeline does not natively support MacOS, but 2 people wrote helpful guides (I've seen it work). You can find these here.



Windows -- and here comes the trouble -- there are so many here. A lot of it comes from the fact vendors have had such a hand in our development as a field.

In NO PARTICULAR ORDER -- I swear!

1) IMP-Proteome Discoverer (if you're a biologist and you're collaborating with me, this is how I'm going to show you and give you data) -- you get a vendor designed scaffold for data processing with great open code filled in so it's totally free. There are instructions for getting and setting up here.  It is easy and with all the cool code the IMP develops you can visualize and make reports and it's seriously just remarkable what they do within the confines of this package. The downside is that you get just one search engine (MSAmanda). For PTM localization and validation it's an A+ in my book.

2) MetaMorpheus is THE combination of the best code in the easiest to use package. Period. This is my favorite engine for everything. I run every file with it whether I need PTMs or not.

3) If you really want to explore the dark matter of the proteome -- FragPipe. This is shockingly powerful, but the results can be a little intimidating. There is a PD node and I need to spend some time investigating it.

4) I mentioned SearchGUI and PeptideShaker in the Mac/Linux section above. The links are there.

5) MaxQuant -- honestly, this is the gold standard for proteomics analysis. The learning curve can be a little steep. Watch a few summer school videos to get yourself caught up. It has a tendency to do much smarter things than you can believe behind the scenes, but if you don't know that it's doing it, it can be a little stressful. For real -- no particular order. I've got MaxQuant open right now and using it to massively amplify my results through MS1 libraries.


View Processed Data:

Proteome Discoverer tutorial here.
Download Scaffold Viewer here.




No comments:

Post a Comment