Saturday, March 28, 2020

DEqMS -- Statistical significance of proteomics data -- adjusting for variable PSM numbers matters!

One of the cool things about genomics being a decade or so ahead of us in many regards is that we can learn from the mistakes they made (...well...we theory...) hmmm.... I'm...hmm...okay...well... start over 

can steal a lot of their cool ideas and programs! 

If you're trying to figure out what peptides or proteins are significantly different between your conditions, you're probably using a tool that was designed for RNA microrrays, like: 

edgeR or

LIMMA works great. We have loads of proof, but there is a huge difference between RNA microarrays and shotgun proteomics.
...besides the fact that RNA doesn't correlate with protein levels....
You always get the same number of measurements for each target! The old Affy arrays I used would have something like 46,000 RNA things stuck to it. Each sample would hybridize (or whatever) to those 46,000 so, in theory, you're always getting back 46,000 measurements per sample.

Shotgun proteomics isn't like that at all! Some proteins will only get 1 or 2 PSMS, even when you go all out. Even in high abundance proteins, you'll have stochastic effects, you almost never see even a technical replicate where you always got 84 PSMs for the protein in each one. 

What if you adjusted for that in some way? Like you purposely adjust your model so that it expects a situation where the PSM levels are realistically variable from run to run? 

I've rambled enough and I can't pretend I can follow the math, but this group validates the crap out of this approach using a ton of different types of publicly deposited proteomics data and, across the board, it looks fantastic in every one of them. Here are the conclusions that I am the most excited about: 

No comments:

Post a Comment