Sunday, May 19, 2024

Processing data from anything in Proteome Discovererererrererer!


Proteome Discoverer can be a controversial thing, I get it. But if you've put in the time to learn the best toolkit for Proteome Informatics 😇, and losing access to that is enough to scare you away from other systems, you're generally in luck. PD can take mass spectrometry data in virtually every format.

I've been using TOFs for the last 4 years after making fun of them and the people who would actively choose to use them pretty much nonstop for the 10 years previous to that. I'm probably exaggerating how long some of that was. 

TIMSTOF data? I'm processing it in PD (the free version of the commercial version)

ZenoTOF data? Same thing. 

The big thing you're lacking with these is MS1 feature detection. It exists for TIMSTOFs and you can get it from but I've never tried it. 

As fast as these TOFs are, one weakness is that they aren't as good as on-the-fly decision making as the Orbis. Sounds bad, but the up side is that they're fantastic spectral counting instruments. 

And....proteome discoverer is great at spectral counting! You don't need that core lab software that makes the very nice pretty plots, either. You just need -- 

This thing! (get it at

It might not work for your newest PD versions, I guess. I'm only on 2.5 personally. 

If you are using external data in a universal format there are some things missing. PD probably can't tell what data you're giving it, so it helps to tell it --

Otherwise it will assume that every spectra you give it is ion trap. Maybe it is. But if you'd like to see something behind the decimal point you probably want it to know what you're looking at. 

Is it a sector? 

That was meant to be a joke. It isn't really a sector is it? If it is, you can search it! Dude on Reddit had set his PD workflow to his data being from a single quad. It wasn't. It's only funny to me because I do know who you are. Keep at it, dude. We're all learning! 

Also - my gosh - the TopN peaks filter is awesome. It bins your data (for just your search engine steps) and throws out anything that isn't in the top N in that bin.

For example I like a top 12 per 100 bin. So in an MS/MS spectra that runs from 125-1500, your search engine only sees

The 12 most intense peaks from 125-225, and 225-325, etc. etc., it's a super fast noise reducer. Very very good for TOFs, particularly those with extremely high intrascan linear dynamic ranges like the ZenoTOF. Blatant plug, but I don't think anyone else has published on the ILDR of that device. Having infinite dynamic range isn't always useful. 

You also probably need to get your data into a universal format. MSConvert can do this, but I don't love the default settings. Here is a walkthrough for converting current SCIEX data to a nice centroided version. And if you're on one of those trapped ion thingies, I recommend these settings. 

However, if your instrument isn't in the damp confines of the basement of a condemned building maybe you can maintain a 1/k0 fluctuation of 0.1. I currently can not, but my mass accuracy is generally better than 0.05 at MS1. It definitely isn't within the 0.015 in the default dynamic exclusion settings. I put these the same at 0.03.  

Next up, you'll want to think about your display settings. If you are generating 100+ MS2 scans a second you're going to have a very large MSF file. Do you need to see all of these data? Probably not. 

You can have a default layout that hides the things you don't need to see. What I do is run a couple of files from each study first (mostly for QC) arrange the tables and filters the way I want them and then save them.

You can just put in this little node in your consensus and load those filters and layouts.

Boom. Indestructible bears. 

Nevermind, that's something else. But you can have the data that makes sense for your sector or single quad proteomics instrument (or ultra fast and very sensitive TOF) 

No comments:

Post a Comment