Monday, March 4, 2019

It's finally time to discuss MS1 based libraries! And how to use them for anything/everything!

Time for a story! Okay -- not a story -- but something that I think is going to surprise a lot of people here in 'Murica.

Just about everyone is using MS1-based libraries, except for you. You know how I know this? Because it surprised the holy Heck out of me.

Actually -- let's start with a study that I'm a little obsessed with that came from Yale, I think.

This paper is a big deal for a lot of reasons. One is that it's really hard to get human brains. My experience so far is that the proper course of action is a series of begging, trades, justifications, and begging. All this beats going out and getting a bunch of brains yourself, I assume.

This group got material from a bunch of different areas of a bunch of different brains. However -- a lot of material from the stingy brain storage people still generally isn't very much. And you aren't getting a paper in Nature Neuroscience with low coverage.


Okay -- so here was the mistake. This group didn't know that this is supposed to be a European secret strategy for kicking everyone else's asses in proteins/peptides identified per run. They spelled it all out. You're supposed to be vague about it and use the terms "match between runs" a lot.

I'm being facetious, of course. Just because I didn't know that everyone else is doing proteomics this way, doesn't mean anyone was hiding it! It just means you have to read a lot.

In fact, you can pretty much read about this strategy in this great protocol update on MaxQuant a couple years ago. can completely read about it because it is described in painstaking detail....

Here's the idea. You generate a pool of all the samples you want to work with and you fractionate the holy heck out of it -- then you lie to your software and tell it that you didn't fractionate it.

Then you run all the samples you care about getting quan on with single shot. It's very important that you use similar chromatography for your single shot and for your fractions.

Then feature identification matches up the stuff that you didn't fragment and ID in your single shot samples with the stuff you did fragment and ID in your deep fractionated "library" samples.

Bingo. Totally works -- only trick is you have to stop using whatever you're using and start using MaxQuant....

...or.....maybe your software isn't left handed either? {Groans...}

Can I lie to Proteome Discover the same way and get same/similar results? Totally.

Back to the brains above. I have brain samples that we got as detailed above. We even got much better mass spectrometrists than me (shoutout to some winners!) to run them since my system was still en route, and the most we could get was seriously like less than < 5ug of protein --and well -- it wasn't the newest Orbitrap in the whole world (actually...well...the oldest...but still an Orbitrap!)

Let's lie to some software and get some phenomenal results! (Please note, this is all done in PD 2.2 -- you can do this in PD 2.1 (or any other version equipped with the apQuant nodes), but I haven't tried matching the results)

First off -- I'm going to download the proper section of the brain from the study above. It's at ProteomeXchange/PRIDE here. 005445

Now I have 15 awesome sample fractions from pooled samples from multiple patients from the correct brain area where our stuff came from. THIS IS MY MS1 library!

CRITICAL STEP1: Add your library files as "Files" not as "Fractions" Fractions will complicate things. Then use your study factors to group them together

CRITICAL STEP 2: Use Minora (and if you have PD 2.2 also recalibrate your MS1s with SpectrumRC -- super useful -- particularly when comparing newer to older instrument data). If you aren't lucky like me and talk someone into accurately replicating the chromatography conditions of your library samples, you'll want to widen your alignment properties in Minora. (This is actually in the Consensus steps here)

Step 3: (I need to investigate this further): These are my settings for the quantifier.

I think these are all important, but have no proof.

CRITICAL Step 4: Use Data Distributions Post Processing Node!

This makes finding your data so much easier!!

That might be it...maybe I'll just put the workflows up somewhere that they can be downloaded....

What do the results look like?

Let's pick one of my sad -- sample limited single shot files!

Green basically is the stuff that you found by MS/MS and will roughly correspond (+/- statistical changes) to what you'll get if you run this file alone. Blue is what I'm interested in here. That's stuff that is found that is new and thanks to the contributions of the other files present!

1263? That sounds much better! Without this "library" every single shot patient sample together didn't come up with this many hits as this single file alone compared to the library in this manner.

Okay -- this can obviously be a gamble. How do you estimate the FDR here? Do you weight the discoveries at the same level?

Let's go back to the Yale paper above --  I really think they did this part right as well.

1) They tried to estimate the false discoveries in the match between runs peptides using knowledge of the biological samples (honestly -- I can't entirely follow it, but they come up with a maximum of 3.8% "imposter" matches.
2) They provide all the data tables, clearly indicating what was identified by MS/MS and by match between runs. I thought they also did their pathway analysis stuff using both the small and big lists, but I might have that mixed up with another study.

Downsides I almost forgot to mention!  HOLY COW. THIS TAKES FOREVER!  You're adding feature alignment and tons more files and spectra?!? You're talking about taking 50 files that take 4 hours to process on one of the OmicsPCs MaxDestroyer systems and now you're adding a ton more files to it? It'll push a big data processing system like mine one to as much as a day of sitting there crunching numbers and being tough to play video games on. This isn't PD exclusive, it takes comparable time in my hands for MaxQuant as well....


  1. Hi Ben,
    there is another option. The apQuant-node (formerly known as Peak-Juggler) from the colleagues at IMP Vienna is able to do the same in PD 2.1 and PD2.3, all free to download here:

  2. Eeeee......I am just working on my green card application and came across this really very nice discussion of our paper - thanks! So glad you appreciated the availability of the data and the two tiered approach - but apologies if we gave the secret European game away ;)

  3. This comment has been removed by a blog administrator.