Wednesday, April 5, 2017
What is a significant chance in protein abundance?
I'll be honest, this one hurts my head more than a little, but I'm trying to get in front of it for a number of reasons. As I'm sure you're aware, very few people seem interested in just getting a list of peptide IDs anymore. Everyone wants a list of identified proteins with quantification values -- and probably quantification on individual modified peptides. None of our jobs are going to get easier as this trend develops.
It gets worse. As soon as you start talking about quantification -- people are gonna start asking weird stuff about adjusting Greek carrots with pea-values.
If, like myself, this was the first thing that you thought they said. Maybe it's time for us to read some recent papers like this one!
This paper might seem less intimidating to me than some of the others primarily because it is short. There is plenty of Greek letters and formulas in it. It primarily deals with iTRAQ, but it introduces a lot of useful stats terms people have been throwing around like LIMMA.
From the label free quan perspective (XIC-based, rather than spectral counting -- a whole 'nother topic) we have to go just a little bit further back in time to get some Open Access perspective on what everyone is talking about. There are others, but this is the first reference I'd send someone who asked about this topic.
A big reason for this is that this paper details how the heck to deal with fractionated samples in label free quan. This is not without challenges because there is quite a bit of overlap where a peptide will show up maybe on the edge of fraction 5 and 6 and be present in both. Not to mention a protein might easily have a peptide present in every single fraction. This is the earliest paper I'm aware of that deals with this head-on. It isn't trivial....
One more -- because we seriously can't even talk about statistics in any sense these days without talking about R packages. Statistics is what R does best, and if you are willing to think something like "hey, I bet the nice people who wrote this cool free stuff knew what they were talking about," this paper is the first step of what a lot of the more recent proteomics R quan stuff is built upon.
Honestly, the paper is a little low on content -- it introduces the concept and this extremely useful website! (http://msstats.org/)
At this site you'll find a breakdown of different statistical concepts, training data sets and instruction manuals -- mostly for the software -- but that will explain a lot of the concepts and why they're important.
Okay, so honestly I don't know if any of this is useful at all. Honestly, I just type stuff. What I do know is that proteomics quan statistics is something we're all going to be hearing more and more about as this field develops and as the commercial software packages more ubiquitously integrate these things -- and these papers/links are things I've found useful.
In case you are concerned that this blog and I are becoming far too serious, here is an angry Bugg puppy dressed as a flying monkey....
...I actually have a good picture of exactly when he bit me for this....