Thursday, December 14, 2017

A simple workflow to diagnose some sample/instrument quality issues using PD!

A couple people have asked me to look into this over the years and I thought I'd finally give it a try.

Here it goes!

Sometimes you just need a quick snapshot that will tell you if the samples that are running on your instrument the next 2 weeks is worth your time. Could we just build a quick Proteome Discoverer template that would allow you to snapshot that first fraction to give you confidence that everything is okay?

To keep it simple for the first one I'm going to say these are the requirements:

1) A histogram of the relative mass discrepancy at the MS1 level
2) A measurement of the relative number of missed cleavages for determining your enzymatic digestion efficiency
3) A measurement of your alkylation efficiency?
4) Complete data analysis in under 10 minutes on a normal desktop.
5) Must use either Proteome Discoverer normal nodes or IMP-PD (the free Proteome Discoverer version)

#1 is super easy. #2 requires some serious computational power to do correctly on a modern complete RAW file and will require a bit of data analysis reduction.

If you are working on human samples, I'll walk through it. I'll try to post the FASTA and templates somewhere here a bit later (out of time).

If you are working on something else -- tough luck. You'll have to do this yourself.

Step 1) Generate yourself a good limited FASTA. Something small enough to allow you to perform very large data permutations rapidly, but large enough to get a good picture of your data.

To get this, do a normal search of a representative data file. Feel free to use the default Proteome Discoverer templates. The only thing we're doing here is finding the most abundant proteins in your data. Fractionation may complicate this, but I ain't never seen a human offline fraction that didn't have an albumin or Titin peptide in it. I'd also use the cRAP database, but it isn't super important at this step as long as you do use it here somewhere.

I threw in Minora, but don't feel as you have to here. If you are using IMP-PD, use MsAmanda, Elutatator, and PeakJuggler.  Normal tolerances for your search (10/0.02 for FT/FT & 10ppm/0.6Da for FT/IT)

Same thing for the consensus -- something normalish. I'd throw in the post-processing nodes as well as the ProteinMarker node so that you can clearly distinguish your contaminants from your matches.

Step 2 Run this full search search.

Let's find the most abundant proteins and make a FASTA. You can do this a couple of different ways. I recommended using Minora and/or PeakJuggler so that you can sort your proteins by XIC abundance.

Interestingly, the most abundant protein is a cRAP entry. I'm starting to remember why I was asked to troubleshoot this file a few years ago and why I marked it "keep for example purposes"

Step 3: Make a small FASTA! What I'm going to do is filter out the contaminants and make a FASTA of the 150 most abundant proteins. You can use your mouse to hover over it and your down button on your keyboard to scroll. Once you have the area covered, then right click "check all selected in this table" then File > Export > Fasta > Checked only

Step 4: Process with this crazy FASTA! Now you have a FASTA to work with! Import it into PD through the Administration tab. Once it's in make a crazy method.

I'm allowing up to 10 missed cleavages. 100ppm MS1 tolerance and 0.6 Da MS/MS tolerance for FT/FT (ion trap, maybe try 2 Da?) please note -- this database is likely too small for Percolator to work well on. I've turned it off here and am relying on Xcorr alone (Fixed value PSM validator)

Even with 10 missed cleavages, my old 8 core Proteome Destroyer completed the file in 4 minutes.

Step 5: Evaluate the data quality: Let's check the deltaM. This is the pic at the very top of this post and it looks kinda bad. However, this is mostly a histogram binning issue. Change the number of bins to 100 and it looks much better:

What about missed cleavages?

A few -- but it looks like you'd capture well over 90% if you used 2 missed cleavages on this data. I'd say the digestion was okay.

Alkylation output:

In order to see your relative alkylation efficiency (please keep in mind I'm assuming iodoacetamide. You will need to make it a dynamic modification.

In your output report you can see your relative alkylation efficiency by applying this filter:

Then go to and plot this data:

In this output we're looking at around 73% alkylation efficiency. A quick look shows me that about 49 of these peptides are from cRAP -- even if you take those out of consideration (which only makes sense for peptides introduced late in the process -- this still is pretty low. I'd check the expiration data on this iodoacetamide, or see if it has spent a lot of time exposed to direct sunshine.)

This is an evolving project (there is a lot more we can do here) but I'm going to stop here for now.

1 comment:

  1. "there is a lot more we can do here". Please, keep it up. Great post!! It is terrible to find out months later that your data is useless because IAA was not working.