Saturday, September 24, 2016

How does the Precursor Ion Area Detector node work?

I'm procrastinating this morning and just when I was running out of excuses for not finishing an ongoing bathroom remodel, I realized there were a bunch of unapproved questions/comments on the blog!  This is the last one. After writing far too many lines in the little comment box about why the NIST antibody is so much better than the commercial sources that have been around, I didn't want to tackle this one the same way.

How does the Precursor Ion Area Detector node work? And a reference?

The reference might surprise you!

You can direct link to it here, and I think its open access.  Look, I'm gonna give Q-TOFs a hard time. I've only had one in all my career and it was, on its best day, a turd sandwich not very good. [Completely dedacted rant about Ben's hatred for Q-TOFs and sarcastic statements about their many uses as well as recently acquired facts regarding their value in scrap metal components].  Wow, I feel much better about publishing this post now -- and it is much much shorter!

Remember, though, that there was a day when this was the cutting edge and people were just as smart back then as they are now, and they did good research despite the limits of their instrumentation!  This paper is such a study. It is definitely intended to be a paper showing off a new (at the time) fragmentation technology, but in it they set the framework that most label free quantification is based on --or at least, influenced by.

The idea -- the high resolution (here 10,000) extraction of the intensity of the 3 most intense peptides from each separate maximum intensity is very strongly correlated with the abundance of the protein.

This is the Proteome Discoverer interpretation  -- you're ticking along and identifying peptides and you assign each PSM (peptide spectral match) the intensity it had in the MS1 event that it was selected from.  When you compile the PSMs into the peptide, if there are more than one PSM the peptide is assigned the intensity of the highest PSM. When the peptides are pulled into the protein or protein group, the average of the intensity of the (up to) 3 (adjustable in PD 2.1) peptides is averaged into the protein area.

If you have a protein that has only one PSM, this is easy. The "area" of that protein is the intensity of the PSM.
If you have 3 PSMs that all go to one peptide and into one protein, still easy. The "area" of the protein is the intensity of the most intense PSM.
If you have 3 PSMs for each of 3 peptides, the protein "area" will be the average of the most intense PSM from each peptide.

Important note here!  The protein "areas" will not always be calculated from the same peptides. If you've got something where you had 50-60% sequence coverage and have 200PSMS, chances are it won't be the same peptides at all. But, seriously, this totally works at the protein level. You are going to need to go to the PSM or peptide level intensities if you want to say, for example, how this modified peptide changes from run to run, and that requires a good bit extra work.

Michael Bereman, who knows a little something about protein quantification (SProCop! and and he told me it worked, if I remember correctly, "surprisingly well". I use it in virtually every sample I process in PD. It has never once hurt me to have that extra information!

Are there better ways of getting relative quantification of proteins and peptides? Sure!  And these algorithms are coming -- and are going to absolutely change EVERYTHING about how we do proteomics -- Minora, PeakJuggler, and IonStar are all getting ready for prime time and are going to usher in something I think will finally be worthy of the title "next gen" proteomics by allowing us to finally see all the stuff in Orbitrap data that we've never seen before. Your Orbitrap, right now, is far better than you think it is.

1 comment:

  1. Howdy Ben,

    Thanks for the information!

    I wanted to know how PD handles technical replicates in protein quantification. I have samples that I've run in triplicate and grouped together by study factors. For example, sample 1 is actually the 3 MS runs, and they are all grouped together as "Sample 1". I get an area for proteins for sample group 1, and I can see in which replicates the protein was identified ie Sample1_1 yes, Sample1_2 no, Sample1_3 yes. I'm just not sure what peptides it uses for quantification. Could it use three peptides from one replicate or does it use a single peptide from each replicate that finds the protein?