Wednesday, November 1, 2017
Quick tutorial -- how to estimate your digestion efficiency in PD 2.x
I got an email from a reader who is having some trouble assessing digestion efficiency with the awesome free Preview node from Protein Metrics. While they get that sorted out, I suggested doing it the way we did before that team gave us all free software that would do it for us. Then I realized that it is even easier in the new Proteome Discoverererers than it was in the past.
Quick, lazy tutorial time!!
You need a couple of things.
A representative FASTA of the most abundant proteins in your samples (you probably have to switch this if you run lots of different organisms)
Firstly, you need to process one or more of your representative RAW files in Proteome Discoverer and get a result report.
I just grabbed a quick file. This is some HeLa digest I recently ran on a CE-QEHF (ZipChip 8 minute runs), processed vs. UniProt/SwissProt and sorted by highest number of PTMs
Alternatively (and probably more validly [wait..."validly" is a word?!?!]) you should go with the intensity or something if you've got it from LFQ or whatever.
Next you'll want to "Check" these highest hits. I just grabbed everything on the front page of the 46 inch TV someone was throwing away(!!) that I now use as a PC monitor. As long as you select more than 25 proteins, you'll be fine. You can draw a box around all of them and then right click "Check selected", or you can checkmark each one of them.
Then make a new .FASTA from these proteins. File>Export>To FASTA> Checked Proteins Only
BOOM! Tiny FASTA file.
Now you can import that FASTA and then use that to search the data that you're concerned about digestion efficiency.
With a database this small it doesn't matter if you allow your search engine to run with 10 missed cleavages. It still won't take very long
Once you have an output report find where you can plot your data. The icon looks different in PD 2.2 than in the other versions but it's at the top. Then toggle over to your histogram, choose PSMs #missed cleavages and hit Refresh (cut off in this screenshot)
Now you have a simple representative FASTA and a quick way to use the search engine of your choice to get a picture of your sample digestion efficiency.
Of course -- this is all assuming that the digestion efficiency will universally affect the proteins by abundance in the same way, but this is an assumption that seems reasonably safe to make. To be super thorough you could just run your whole FASTA with 10 missed cleavages, but this could take a really long time....
Thanks to Dr. A.H. for the informed questions and really interesting problem I haven't seen before (and still don't know how to solve) that led me around to putting this together.