Sunday, August 10, 2014

DeMix workflow. Identify the other peptides you accidentally isolated

This is really really cool and currently in press (open access!) at MCP and comes from work done at Roman Zubarev's lab.  Edit:  Here is the link to the abstract (left it out before).

In a DDA experiment we pick the ion we're interested in that looks like a peptide, based on the parameters we provide that say "this is a peptide and it is probably one that will fragment well with the method that I'm using right now".  Then we isolate it and, too often, a bunch of shit around it.  Typically, we try to eliminate as much of those co-interfering compounds as we possibly can.  One of the biggest improvements the Q Exactive Plus has over the Q Exactive is that we can move from isolation widths of 2.0 to as low as 1.2 with very little loss in signal (I have heard at least one account of the QE HF being used with a 0.7 Da isolation window, but I still haven't got my hands on one of those magnificent boxes so I can't confirm...)

For years I've heard people kicking around this idea:  What if we identify our peptide from our MS/MS spectra and then we remove every MS/MS fragment that can possibly be linked to that peptide.  Then we're left over with the fragments from the peptides we accidentally isolated.  Lets then database search that and find out what that is.

And that is exactly what DeMix does.

Let me rant a little bit about how cool the workflow here is.  They ran this stuff on a QE with 70k MS1 and 17.5k MS2 and used isolation widths of 1 - 4 Da.  They converted everything over to centroid using TOPP (btw, they found better results when they used the high res conversion option for this data, so I'm using that from now on.  Next they ran their results through Morpheus using a 20ppm window and a modified scoring algorithm.  The high scoring MS/MS fragments were used to recalibrate the MS/MS spectra (just like the Mechtler lab PSMR does) using a Pyteomics Python script.  

Interestingly, when they made their second pass runs they tightened all of their tolerances and processed the deconvoluted MS/MS fragmentation events where the previously matched fragments were ignored.  I should probably finish my coffee and then work my way through the discussion, because I would have done it the opposite way (and, when we do serial searches in PD, that is the default workflow).  I'm not knocking it, I just find it counter-intuitive.

So what.  Did it work?  Of course it did, or it wouldn't have made it into MCP!  Final stats?  The QE was knocking out about 7 MS/MS events for every MS1.  Using this approach, they IDENTIFIED 9 PSMS(!!!) out of each 7 spectra.  They didn't get 2 ideas per MS/MS event, but they got about 1.2 which is a heck of a lot better than 1!

I can not wait to try this and I've got the perfect data set to run it on sitting right in front of me.  I'll let y'all know how it goes.


  1. Hi Ben,

    It is happy to see you discussing our work on your blog.

    I had presented it during ASMS and received some positive feedbacks; but also received some sharp criticisms on the scoring method during the review. Have to admit that the scoring method we used in this study may not be perfect, it is over-simplified and biased against short or highly-charge peptides. However, the major part of the improvement came from accurately detecting co-fragmented precursors, not from one previous single MS1, but from chromatographic feature detection (RT, intensity and isotopic modeling). Replacing the minimalism search engine with a more complex method (e.g. MS-GF+Percolator) might give an even greater number of unique sequences, although seemed too good to be true.

    We used a tightened mass tolerance rather than a loosened one in the second-pass search, because each spectrum after de-convolution should uniquely represent a specific precursor; co-fragmented precursors with larger mass deviation should have separate spectral clones while also meeting the strict mass accuracy requitement.

    We thought that the high performance of this workflow reflects the importance of properly utilizing the advantage of high S/N ratio and accuracy from Orbitraps. Thus, as you might have noticed, preprocessing high-res spectra may significantly affect the final result. We stored our raw spectra in profile mode, converted into the even larger mzML format, then pick centroids using TOPP, and further deisotoped all MS/MS. The whole pipeline looks a bit complicated and time consuming. I would also regret those scripts, written by an amateur programmer, may be a bit confusing to execute. Anyway, please feel free to ask me if you find any difficulty in reproducing the result.

    What I am curious about is how this workflow works on the two human draft proteome datasets. Both studies used 0.05 Da mass tolerance for Orbitrap MS2, which seem to be far too large comparing to less than 20 ppm (0.01 Da at 500 Th) as observed in our data.

    Please keep blogging when you have any interesting thoughts. It is enjoyable to read your posts.

    Best regards,

  2. Bob,
    Thanks so much for your clarifications! I will definitely contact you if I run into issues getting through this. You know, I hadn't noticed the 0.05 Da cutoff in those 2 studies. I typically use 0.02 and am interested to see what happens if I cut down further. My pipeline will probably be a lot less convoluted, as I think I can do several of the earlier processing steps in Proteome Discoverer and export the data for later stages. Probably will be a while before I can get to it, however.
    Thanks again!