Friday, March 20, 2015

Let's do some quality control!

Thanks to my dumb body picking up a stupid virus and I'm not going ramping up in the mountains this weekend, I figure its time to work on a few big blog projects I've been wanting to work on.

This one I've been leading up to for a while and I've gotten some emails from you guys about it, along the way.  They go something like this "hey Captain talks-to-much, you are always talking about quality control, but you never tell us what you run. P.S. pugs are stupid."

Since you're all so nice about it, I should tell you.  For discovery proteomics my favorite QC is the PRTC peptides spiked into the HeLa digest.  For the Q Exactives, Orbitrap Elite and Orbitrap Fusion with nanoflow I run 200ng of HeLa with 100fmol of PRTC spiked in.

If you buy the 50fmol/uL PRTC you can add 450uL of 0.1% FA to it andthen just put 200uL of that into the 100ng/uL HeLa vial.  Inject 2uL and you are there.  You have 100 QC injections for < $2 each. If you keep it at -20C its stable for like 6 months.  You can buy the bigger vials, aliquot it out and keep the aliquots at -80C.  I think you can get it down to about $0.50 per run.

Okay, so you're asking now "Hey, Smelly, why are you using this one!?! Why not just one or the other." Excellent question (and I've got a cold, I'm supposed to smell bad)!  The HeLa digest gives you a nice quick metric.  Search it real quick and have a good feel for where you are.  Are your number of peptides the same as they were at PM? When the instrument was new?  Bingo!  If its down, then you can extract the PRTC peptides.  You'll always see them, they are equimolar and you know where they ought to elute.

Best of all?  I have reference files!  I stole this method from work being done by Tara Schroeder and Lani Cardasis who use these to QC the instruments in their labs!  And I stole some of their RAW files!  So I know what I ought to be looking at.  So if I load up the exact same method (ask me if you want it, I'll send it to you) on a QE and run the same gradient and I don't get, I don't know, 16,000 unique  peptides, then I know something is wrong.  I can line up the two files and see what happened.

This is the QC gradient we go with.  And here are the method parameters (click to zoom in):

Why is there a T-SIM in there?  So you can test your isolation with multiplex SIM!  On select PRTCs!

This is cool because if you are having problems, you can quickly see if it is an isolation issue.  At 100fmol you're going to likely end up fragmenting the PRTCs.  So you will have a measurement of the PRTCs isolated and fragmented vs. just isolated.  Smart, right!?!!?

If you need to diagnose things, you can easily build 2 XIC layouts in Xcalibur.  This will allow you to see your PRTCs and peak shapes.  If your peaks start tailing and looking gross then you know your column is getting old.  If  your IDs are low and you are missing the most hydrophilic or hydrophobic PRTCs, then you know that you aren't trapping right or that your pump isn't putting out enough B.  Here is a table of what the extracted data should look like:

What else?  Okay, what might this look like from the processed level?

DISCLAIMERS (Critical, please don't get me in trouble with people who do this stuff professionally):

1) I'm only leaving up these protein/peptide/PSM numbers (below) as an example to show that there is always run to run variation, even if you run these back to back.

2) Please don't consider these numbers as what you should be getting in all your runs. Your column conditions, emitter age, buffer quality, ion transfer tube cleanliness, background contaminant ions from your deodorant (no joke), the compounds leaching out of your centrifuge tube, the pipettor you picked up the formic acid with, vacuum quality level -- I've seen this for real -- the amount of direct airflow in your room -- the construction they're doing on the other side of the building, who prepped your sample -- and, especially how, etc., etc., etc., all can affect your peptide ID numbers. I just looked for a link, but couldn't find it, there was an old study that showed how 0.1% formic acid in your samples sitting on the autosampler caused a noticeable decrease in the detectable peptides when samples were queued up for a few days....not sure if that reproduced.

3) Even under perfect conditions -- new LC, new column, new instrument run to run variation exists in back-to-back samples.

Here are some representative processed runs.  Clean instrument (QE Plus) new column, clean LC, perfect spray stability.  Honestly, this is better data than I normally get, but you get the point.

You should establish yourself a baseline for when your instrument is ideal and go from there. A few years ago at ASMS there was some serious controversy regarding whether processed data should be used for QC. I use it to troubleshoot some things, but I'd much rather see the peak shape and intensity!

One thing I can use it for is dynamic exclusion and you can get a feel for that by paying attention to the final column (or row?...whatever it is...) the % Unique.

This is the number of unique peptide groups divided by the number of PSMs.  Why is this so important?  If the unique % goes down then you, my friend, are over sampling.  You need to work with your peak widths and dynamic exclusion cause you are fragmenting the same peptide over and over.

By the way, you can apply this method to other instruments as well. It is always easiest to explain ideas when you start with a QE, though!

Shoutout to my good friends Tara and Lani and Josh Nicklay cause I didn't do any of this.  They blew my mind with this at a meeting last summer and I've been using it ever since.  I run this at almost every lab I visit and I meant to tell you guys about it a while ago.


  1. Dear Ben.

    Thank you for sharing.

    By the way, would you by any chance have a method (FullMS-ddMS2 plus m/z-targeted for the spiked peptides) ready for an Orbitrap Elite?

    Thank you and cheers,

  2. Davi,
    At this point I don't have this exact method for the Orbitrap Elite. Someone else asked for it as well. I'll see if I can write up a good one. The problem would be that we wouldn't have historical data for it...

  3. Hello Ben.
    You wrote "if I'm within 15% of these numbers I think I've got a good run".
    I am a little confused by this. Do you mean "within a range of (more or) less 15% of these figures" ie, between 85 and 100%, or do you mean "above 15%"?
    This would change widely your meaning, but as values in proteomics may change a lot depending on the conditions (and some instruments are known to perform in this range of 15% /500 proteins), I can't decide on either meaning.
    Would you please mind to clarify?

  4. I really thank you for the valuable info on this great subject and look forward to more great posts. Thanks a lot for enjoying this beauty article with me. I am appreciating it very much! Looking forward to another great article. Good luck to the author! All the best!