Wednesday, November 8, 2017

Peptide Prediction with Abundance (PPA)


Some peptides are invisible to mass spectrometry. One of my favorite pathways is a phosphorylation cascade where the active sites are something like KXS(p)XK  -- if you try to study this pathway and are smart enough to use trypsin, you'll enrich a lot of stuff that is singly charged and/or too small to ever identify.

X!Tandem at the GPM has a really neat function where instead of getting percent coverage of your total protein, you can choose to get percent coverage of the protein for what should be visible in your protein (singly charged and super small peptides don't count against you).  It's cool to run it at least once to see that you really have been getting 100% coverage of BSA in every sample you've ran -- for years and years.

Check this out, though!


Please keep in mind that this tool, PPA, is far more powerful than what I'm about to use it for and I'm going to return to use it's more advanced functionality in the future, for sure. However, it just did a really neat trick and that's why I'm talking about it here.

PPA is a fancy machine learning algorithm that can figure out how likely your peptide, including modified versions of your peptides are, to show up in your MS/MS analysis. The authors validate it with some really complex datasets using files from several instruments. You can load in your theoretical databases and your experimental and that's all the advanced stuff.

You can use PPA on the online portal here or you can download it to run it locally if you're good at PERL.

The neat trick that I'm very impressed by is that you can just give it the FASTA file and it will predict the likelihood of each individual peptide of being detected by MS/MS using 15 known properties of peptides in general and give you a likelihood of detection for that peptide on a scale of 0 to 1 (with 1 being very good).

And...this could be small sample size...but I've got some data I've been trying to help troubleshoot on my desktop in my off hours. The problem has been the decrease in total % protein coverage of the protein of interest as the experiment has progressed...and PPA is surprisingly predictive of the peptides that are still around late in the experiment. The authors of this software have more lofty goals for this algorithm, but seeing it do something simple that matches experimental really well lends it a ton of credibility in my mind.

No comments:

Post a Comment