Saturday, May 26, 2018

BoxCar/BoxFahrt real data and new mysteries!!


So...I'm confused. So far I've had exactly zero luck with forcing BoxFahrt to work on our QE HF using Thermo's factory issued software. The great Dr. Antonius Koller (now of NorthEastern University if you can't reach him through his old CUMC account and want to bug him while he's getting set up) and I have been in touch a lot as he has been working on making use of the basic time saving logic behind BoxCar to improve his results. He came up with a work around this week (raising the default mass settings to match the width of the BoxCar!!) that I haven't been able to try yet, but so far...

While editing (I'm trying to do a 48 hours before posting rule now, so I seem less slightly less odd, and don't tell you things like "I'm writing this from a 4 day death metal festival". I already like the blog less. P.S. I'm an adult, I'm definitely not blogging from a tablet and waiting for the Ruins of Beverast) I came across a reader comment -- one major problem with the QE manufacturer software is that you have just one inclusion list. If you use it for your targeted SIM -- it's now problematic for your T-SIM dd-MS2 -- which might be the main misconception for why people (like me) have always thought that method doesn't work. It isn't doing dd-MS2 within your window, it is doing T-SIM and then only doing MS2 if it sees what you're looking for in your T-SIM.  Toni's work-around (essentially increasing the T-SIM inclusion mass accuracy cutoff to include the entire BoxCar helps over-ride this).

As a side note -- why hasn't a complete industry popped up of people selling software to alter instrument software? For real -- there are thousands of them out there that could be improved. There is only one company I know of -- and they might be closed now, I wrote them for quotes about a month ago... you can run the Q Exactive with Visual Basic for goth sake. In the back of a lot of our brains is Basic -- we had to use it in order to be able to play video games. Commodore 64, yo!

Back to the awesome Bill Murray meme!

I'm not kidding. And I'm not cheating. No MS1 or MS2 spectral libraries. No FASTA with 7e6 entries. Just Proteome Discoverer, UniProt Human (and cRAP) FASTA entries. And BoxFahrt.  Heck, the chromatography doesn't even look that great.


I'll post the method iterations. There is a lot to learn here on the Fusion -- and lots of room to improve from where I am right now.

However -- the approach isn't without some mysteries and drawbacks right now.

Mystery  #1) I can't use Morpheus with these files. No idea why. I get loads of PSMs and Peptide groups, but I only get 2 (possibly the same 2) proteins past 1% FDR. If it is the same 2 proteins, for real, we need to figure out what is special about them. I bet they're full of ANGST.

Edit: 5/28/18: The development team (If you've never been up to Wisconsin to see why they're so great at mass spec -- you should try to go visit. There are such great people up there doing such brilliant stuff -- plus that's a cool town) has reached out to see why this isn't working and I'm sending files now. Thanks for looking at this, Zach!

Mystery #2) Percolator in PD 2.1 HATES these files. HATES them. I only found out on accident by using the default Thermo Fusion basic ID workflow (I think it only corrects by target decoy at the peptide group level.  This is what gives me the almost 6,000 protein groups. Gotta check on that.

Throw in Percolator --- less than half the PSMs make it through the filter. knocking the BoxFahrt 400ng 90 minute HeLa runs down to less than 2,500 protein groups in 85 minutes.

Mystery #3) Are these spectra crap? Well -- they are ion trap -- so they are crap (kidding!!) -- but they aren't any worse than any other ion trap PSMs by eye -- let me know if you want to see and I'll send you the processed data. The image at the very top is my very worst MS/MS spectra (the default workflow appears to require a minimum XCorr of 2.0 -- which -- back in the day when I'd totally spend multiple days at a death metal festival and wondering when I'd run into those fun guys from the Hunt lab -- who are probably also too grown up for this stuff, I'd have considered pretty darned good.   However, I can't objectively say whether 2e5 MS/MS spectra are worse or better, but wouldn't it be cool to think that there is something important here that Percolator doesn't like about these spectra?

Maybe they're too large? Wait -- where is that picture I made it last night...? I'll find it and add it in later. I tried to overlay histograms of the charge and MH+ for peptides ID'ed with each approach. It looks like the stuff that Percolator is throwing out that Target Decoy is keeping are considerably larger and higher charged peptides, but this is inconclusive with the amount of time I have right now.

Mystery #4) Minora doesn't work AT ALL. No traces, no quan and this is a major drawback for me.

I've got some samples in I've been dying to run all year and BoxFahrt gives me loads of peptide IDs, but I need quan -- I had to resort to spectral counts (yes, I died inside a little -- but I didn't throw up or anything...I'm an adult (warning! sound)-- a spectral counting hating adult....) and they lined up with what we know from the phenotype/RNASeq for these cells-- awesome -- but I need real quan -- so the samples went back to EasyStar (IonStar for people with EasyNano and EasySprays -- see -- I'll steal method names from anyone, including my friends and collaborators. IonStar is a much cooler name. Putting results here has been on my to do list for a while. The 50cm is pretty darned close and limited runs with the 75cm EasySpray PepMap suggest that it might have more theoretical plates than the 100cm 3um column. But now I'm off topic.

Here is my best Fusion 1 BoxFahrt method iteration so far. 
Edit 5/28/18 -- here is the link. That would be useful, I guess.

It uses 60,000 resolution MS1 for 3 T-SIMs with each T-SIM getting 1.5 seconds to do as many ddMS2 ion trap MS/MS scans that it can. I use the "use all parallelizable time" AGC target over-ride feature.

If you raise the T-SIM MS1 target any higher (actually, I only tried 5e6) you lose IDs (n=1) ~10% loss.

I tried 120,000 resolution MS1 and it cost me 15% IDs.

I tried turning off the fill time over-ride and that cost me 6-8%

If you have the Fusion 2, it may be possible to alter your MS/MS isolation windows for the msxT-SIMs. I can't do it on my Fusion 1 with this tune build....bummer....

Wow. That's a lot of words -- conclusion?!?  If I can deal with the temporary loss of some of my favorite tools -- if I use staggered msxTSIM-ddMS2 on my Fusion 1 with parallelization in the ion trap, I might possibly be getting the best results I've ever seen from any instrument.


  1. In doing 3 windows you will likely encounter many scenarios where you are not looping back to a given precursor mass for another 4.5 seconds. With a generous base-to-base peak width of 20 seconds for a given peptide, you will get 4 data points for it. For TMT I am ok with this, but for LFQ, maybe not.

    Try doing a tSIM on your Fusion as you have been doing, but only do a single multiplex experiment. More windows means a better chance of a given mass not succumbing to dynamic range issues, but it comes at the cost of time (have to do multiple tSIMs). With a single set of multiplexed mass ranges (one tSIM experiment with 10 multiplex windows of different isolation sizes...sharper windows in dense regions, broader ones in sparse regions, triggering MS2 scans for 2 seconds) you get the benefit of minimizing dynamic range suppression without as much of a sacrifice in scan rate.

    In regards to software and the files, there are indeed some odd things happening in these raw files (we are noticing some things as we adapt examine them with RawQuant). For example, the isolation window output in the header seems to have some sort of character limit, so you can't see all of the injection times for all of the windows. This is not the case in earlier versions of tune. In addition, the 'precursor scans' seem to not track as they should. We observe many instances of an MS2 event that is associated with a trigger scan that happened >2 MS1 scans prior, which is odd, and I can imagine some software tools would not like this.

    1. Chris -- as always -- thanks for the feedback! I couldn't have gotten this far without you. I've taken a run with the single MSX event and -- you're right -- it's comparable. If I had control over the window widths on the Fusion 1 -- I'd probably run this way....