Tuesday, July 28, 2015

Process DIA data directly in Proteome Discoverer 2.0 with DIA Umpire

Alright, this DIA stuff is confusing.  There are methods all over the place.  WiSIMDIA on the Fusion, pSMART, multiplex-DIA, and even boring old SWATH.  Software-wise, there is tons of stuff out there.  This weekend I processed some DIA data from a Fusion directly through Proteome Discoverer 2.0...and it looks amazing.

The data in question was ran through the DIA-Umpire to convert the data into a handy-dandy MGF file format.  You can find details on the DIA-Umpire in this Nature Methods paper by Chih-Chiang Tsou et al., out of Alexey Nezvizhskii's lab.  Essentially, it is a (currently) command line driven program that takes your DIA data and "deconvolutes" it down to a format that is friendly to the proteomics processing pipelines we already know and trust.  How does it work?  No idea.  But it works, and the data looks amazing (did I say that once already?)

Here is a random high scoring PSM I grabbed.  Looks pretty incredible, right?  They all do.  And I didn't have to change my workflow at all.  I used SequestHT, target decoy and my normal basic consensus report.  I ended up with a ton of IDs and really nice true FDRs at every level I set them at (PSM, peptide, and protein).

If you are interested in identifying peptides via DIA and you are a little swamped by your software options, you might want to check this out.  I'm tired of learning new software interfaces -- lets put everything in Discoverer!


  1. Hi Ben,

    What versions of software did you use? I"m not having any luck getting any output (mzML or mzXML) from MSConvert to work in PD 2.0. I am able to use MSConvert to generate files for DIA-Umpire just fine. But I cannot get from the .MGF to a file format that I can put into PD 2.0 successfully. Thanks!

    1. Wow. That's weird. Normally MGFs are straight-forward and go into PD no problem. Since the MGF loses all the scan header functions you may need to go into the Scan selector and tell it what its looking at. Otherwise, I'm not sure what to suggest.
      If you open the MGF file does it look normal? http://proteomicsnews.blogspot.com/2012/08/what-is-in-mgf-file.html

  2. Thanks, Ben. Yes, MGF looks similar, but it is missing the scan number line. Not sure if this matters? In the Spectrum Selector parameters tab pf PD2.0 I have parameters defined for QE data (mostly default values except mass analyzer and activation type), but am unsure which line items are critical to interpreting the mgf file. I'm also wondering if there is something about SE params file for DIAUmpire v2 that make a difference? I used default values for that with the exception of setting mass tolerances to 10 ppm for parent and fragment.