Thursday, May 31, 2018

Why is there a crab on the cover of JASMS?!?

What's this about?!? I believe some people are in need of these instructions!

Actually -- the study that scored the cover is really cool, so I'll give them a pass this time.

Multifaceted is a very nice word for -- this team did a ton of work -- including neuropeptide imaging?!? Check out how cool that looks (actually, it looks better in the paper).

Sure -- it's crabs -- but what a promising use of technology I didn't know was possible. Begs the question of how far away is mammalian neuropeptide imaging?

Wednesday, May 30, 2018

Phase constrained deconvolution -- can resolve TMT11 plex in 32 milliseconds!

Could this be a hint of things to come?!?!

Last year some people at Thermo, including some Makarov guy, published a paper on a phase constrained deconvolution alternative to fast fourier transform (FT) (that you can find here). Honestly, it looked like kind of a thought experiment. The fourier transformation is a great math trick that allows you to go from frequency of orbitals to masses in these instruments (but has, no joke, about a thousand other uses behind the scenes in today's world).

The phase constrained deconvolution goes beyond what the fourier transform can do -- pushing into the limits of "fourier uncertainty" where our instruments currently don't, and improving both resolution and sensitivity.  Again -- cool paper, but ---

-- this is WAY COOLER ---

Keep in mind that this chart is biased (smaller m/z gets higher resolution in the Orbitrap -- even with phase constraint model) -- but look at these flippin' numbers for the TMT reporter region resolution (this is an ultra high field Orbitrap -- I think this is the D20 with 5kV on the central electrode, but don't quote me --if I'm right, this is what is in the Tribrids and HF/HF-X; D20 with 2.5(?)kV is Elite)

Only 32 ms --- that is 31 Hz (ignoring overhead) and they got 80,000 resolution at the 127 marker -- enough to revolve the 6mmu (0.0062 Da) separation between the TMT127 N and C reporters.


Okay -- so I have to throw this in, because I've got a meeting with the developer of this thing planned --

--Cause -- Yury's booster system appears to be right at this same level --

The box on the right isn't the clearest, but the red peaks show complete baseline resolutoin of the 127 and 128 TMTC/N isotopes using 15,000 resolution on a Fusion/Lumos. Given that we use 50,000 resolution to resolve these in our lab -- 15,000 translates to 3 times faster!!

Plus, I've been told that it's really easy to just put it out of sight in case your service engineer is all nosy. If he's coming to fix one system you can just take the FTMS Booster and plug it into another one and use it on that one. Whichever one needs a ton of resolution that day.

I'm not trying to stir anything up this morning -- the Phase constrained deconvolution looks amazing -- but the FTMS booster is something that I could buy today (if I had the budget for it) and it costs less than an HPLC that I really want.

EDIT: MassSpecPro posted this link yesterday -- understanding the Fast Fourier Transform. If you want to know more about all of this stuff, this is a great starting point!

Tuesday, May 29, 2018

Let's check out the new PASEF TOF thing!

Okay -- I finally did it. A little more than 2 years later, I finally got to a point where I felt guilty about my rambling about the relatively low quality of the first TIMSTOF data that I had access to and took the original post down.

I'll save it for historic purposes, but Bruker has improved or just fixed every problem that I had with the original instrument. The mass accuracy is better, the speed is even faster and there are options for data processing

Also, the post was just kind of crappy. This placeholder is worse.

Monday, May 28, 2018


You know what I could really use? A --

Super Lazy phosphoproteomics protocol!

[Super Laid-back was suggested this morning. I dig it.]

[Stream Lined? That also works 😺 ]

11 channels for quan? That sounds like a good start!

Spin column phosphopeptide enrichment and elution?

SPS-MS3? Okay -- I can work around that. Especially if the results are really this good....

Sunday, May 27, 2018

Even more features added to SearchGUI/PeptideShaker!

About time someone published an updated paper on what you can do with SearchGUI! I didn't even know about all the stuff on the right side of this picture, but you can read about it all in this new study at JPR here.

SearchGUI is an amazing central interface for a ton of different open search engines -- including the deNovoGUI (which now has DirecTag and the super crazy no-way-it's-that-fast Novor de novo engine)

Now this paper reveals all the work that is going on downstream -- links to Galaxy!??

Saturday, May 26, 2018

BoxCar/BoxFahrt real data and new mysteries!!


So...I'm confused. So far I've had exactly zero luck with forcing BoxFahrt to work on our QE HF using Thermo's factory issued software. The great Dr. Antonius Koller (now of NorthEastern University if you can't reach him through his old CUMC account and want to bug him while he's getting set up) and I have been in touch a lot as he has been working on making use of the basic time saving logic behind BoxCar to improve his results. He came up with a work around this week (raising the default mass settings to match the width of the BoxCar!!) that I haven't been able to try yet, but so far...

While editing (I'm trying to do a 48 hours before posting rule now, so I seem less slightly less odd, and don't tell you things like "I'm writing this from a 4 day death metal festival". I already like the blog less. P.S. I'm an adult, I'm definitely not blogging from a tablet and waiting for the Ruins of Beverast) I came across a reader comment -- one major problem with the QE manufacturer software is that you have just one inclusion list. If you use it for your targeted SIM -- it's now problematic for your T-SIM dd-MS2 -- which might be the main misconception for why people (like me) have always thought that method doesn't work. It isn't doing dd-MS2 within your window, it is doing T-SIM and then only doing MS2 if it sees what you're looking for in your T-SIM.  Toni's work-around (essentially increasing the T-SIM inclusion mass accuracy cutoff to include the entire BoxCar helps over-ride this).

As a side note -- why hasn't a complete industry popped up of people selling software to alter instrument software? For real -- there are thousands of them out there that could be improved. There is only one company I know of -- and they might be closed now, I wrote them for quotes about a month ago... you can run the Q Exactive with Visual Basic for goth sake. In the back of a lot of our brains is Basic -- we had to use it in order to be able to play video games. Commodore 64, yo!

Back to the awesome Bill Murray meme!

I'm not kidding. And I'm not cheating. No MS1 or MS2 spectral libraries. No FASTA with 7e6 entries. Just Proteome Discoverer, UniProt Human (and cRAP) FASTA entries. And BoxFahrt.  Heck, the chromatography doesn't even look that great.


I'll post the method iterations. There is a lot to learn here on the Fusion -- and lots of room to improve from where I am right now.

However -- the approach isn't without some mysteries and drawbacks right now.

Mystery  #1) I can't use Morpheus with these files. No idea why. I get loads of PSMs and Peptide groups, but I only get 2 (possibly the same 2) proteins past 1% FDR. If it is the same 2 proteins, for real, we need to figure out what is special about them. I bet they're full of ANGST.

Edit: 5/28/18: The development team (If you've never been up to Wisconsin to see why they're so great at mass spec -- you should try to go visit. There are such great people up there doing such brilliant stuff -- plus that's a cool town) has reached out to see why this isn't working and I'm sending files now. Thanks for looking at this, Zach!

Mystery #2) Percolator in PD 2.1 HATES these files. HATES them. I only found out on accident by using the default Thermo Fusion basic ID workflow (I think it only corrects by target decoy at the peptide group level.  This is what gives me the almost 6,000 protein groups. Gotta check on that.

Throw in Percolator --- less than half the PSMs make it through the filter. knocking the BoxFahrt 400ng 90 minute HeLa runs down to less than 2,500 protein groups in 85 minutes.

Mystery #3) Are these spectra crap? Well -- they are ion trap -- so they are crap (kidding!!) -- but they aren't any worse than any other ion trap PSMs by eye -- let me know if you want to see and I'll send you the processed data. The image at the very top is my very worst MS/MS spectra (the default workflow appears to require a minimum XCorr of 2.0 -- which -- back in the day when I'd totally spend multiple days at a death metal festival and wondering when I'd run into those fun guys from the Hunt lab -- who are probably also too grown up for this stuff, I'd have considered pretty darned good.   However, I can't objectively say whether 2e5 MS/MS spectra are worse or better, but wouldn't it be cool to think that there is something important here that Percolator doesn't like about these spectra?

Maybe they're too large? Wait -- where is that picture I made it last night...? I'll find it and add it in later. I tried to overlay histograms of the charge and MH+ for peptides ID'ed with each approach. It looks like the stuff that Percolator is throwing out that Target Decoy is keeping are considerably larger and higher charged peptides, but this is inconclusive with the amount of time I have right now.

Mystery #4) Minora doesn't work AT ALL. No traces, no quan and this is a major drawback for me.

I've got some samples in I've been dying to run all year and BoxFahrt gives me loads of peptide IDs, but I need quan -- I had to resort to spectral counts (yes, I died inside a little -- but I didn't throw up or anything...I'm an adult (warning! sound)-- a spectral counting hating adult....) and they lined up with what we know from the phenotype/RNASeq for these cells-- awesome -- but I need real quan -- so the samples went back to EasyStar (IonStar for people with EasyNano and EasySprays -- see -- I'll steal method names from anyone, including my friends and collaborators. IonStar is a much cooler name. Putting results here has been on my to do list for a while. The 50cm is pretty darned close and limited runs with the 75cm EasySpray PepMap suggest that it might have more theoretical plates than the 100cm 3um column. But now I'm off topic.

Here is my best Fusion 1 BoxFahrt method iteration so far. 
Edit 5/28/18 -- here is the link. That would be useful, I guess.

It uses 60,000 resolution MS1 for 3 T-SIMs with each T-SIM getting 1.5 seconds to do as many ddMS2 ion trap MS/MS scans that it can. I use the "use all parallelizable time" AGC target over-ride feature.

If you raise the T-SIM MS1 target any higher (actually, I only tried 5e6) you lose IDs (n=1) ~10% loss.

I tried 120,000 resolution MS1 and it cost me 15% IDs.

I tried turning off the fill time over-ride and that cost me 6-8%

If you have the Fusion 2, it may be possible to alter your MS/MS isolation windows for the msxT-SIMs. I can't do it on my Fusion 1 with this tune build....bummer....

Wow. That's a lot of words -- conclusion?!?  If I can deal with the temporary loss of some of my favorite tools -- if I use staggered msxTSIM-ddMS2 on my Fusion 1 with parallelization in the ion trap, I might possibly be getting the best results I've ever seen from any instrument.

Friday, May 25, 2018

Chemical mediated proteolysis opens a whole new realm in middle-down proteomics!

Okay -- so -- as much as I love the concept of top-down proteomics, I don't actually do it in our lab. When a new and really difficult protein or proteomics project comes in I try to see if top-down is the solution. I really do. Like every time. With our best-suited equipment to the task (QE HF with protein mode and Fusion gen 1) I've got some size and complexity limitations that always make top-down seem ----just----this----far--- out of reach.

Conversely, I'm seeing more and more projects where trypsin or GluC or any of the other awesome enzymes in our freezer I always forget the keycode to -- just aren't right either.

Is it middle-down time??  It can't be. That only works for monoclonal antibodies, right?

No longer, thanks to this awesome paper from Kristina Srzentic et al.,!

This team details the use of a number of chemicals that can be used for middle down applications all the way down to plotting the sizes of the fragments produced and -- so important, and often overlooked --- provide a table of the mass shifts incurred by such reagents.

Okay -- at the risk of annoying my dear friends at ACS and JPR, I'm putting this table here as my civic duty to people to maybe make it just a little easier to find (ACS, please contact me directly if this is a real problem (, p.s., you're the best!)

This is just another great new tool for our utility belts for when the next weird protein (or, more likely, PTM) comes through the door!

Thursday, May 24, 2018

New bioRxiV paper shows how the EvoSep works!

I've mentioned the EvoSEP on here at least once before, but it has been a mystery how this ultrafast and low to zero- carryover system works. I know that it uses disposable columns for each sample but this new open source pre-print finally shows how it works.

It's is really smart and a nice step forward for clinical proteomics or anything where any carryover is going to sink you.

Wednesday, May 23, 2018

Jailbreak 2.0!

Are you feeling a little limited with your awesome and super easy to use EasySpray source? Want to power it up?

Check out EasySpray JailBreak 2.0! (unless it's illegal -- then don't. and don't tell anyone I sent you to this site)

The original JailBreak kit let you put any nano columns into your EasySpray that you wanted to -- 2.0 lets you go up to MicroFlow -- WITH SHEATH GAS CONTROL!

We're hearing lots about MicroFlow right now -- which -- depending on how you define it, your high flow LC or your nanoflow LC can probably do it (even EasyNanoLCs -- just make sure your gradient is short enough that you don't have to restroke the buffer pumps).

This source addition is the missing link.

Worth noting, the manufacturer does have MicroFlow columns and emitters now, but if they don't have the solution that works for you this looks like a viable option.

Saturday, May 19, 2018

Analysis of PNGase F-resistant glycopeptides with SugarQb!

Is it glycoproteomics week? Sacred Bos taurus, there have been so many awesome papers this week.

If this is your field -- or if you know it is coming your direction, you'll be happy to know that every aspect of it appears to be developing rapidly!

I recommend checking out --

The GlycoPeptide Decoy Generator (and a new Glycopeptide CID library!)

This new metal based enrichment strategy for glycopeptides!

This new proteoglycan deep sequencing paper (they use a some neat enrichment with a QE HF and process the data with PEAKs in conjunction with an awesome software package GlycReSoft that they developed. While you're there, check out the gold mine of other neat little algorithms they have posted!)

Told you!! One heck of a week for glycoproteomics!!

The title of the post is about this one, though! 

I'm out of blogging time today, but if that title doesn't interest you (cryptic specificities? what?) you're probably here for the jokes -- but if nothing else it's great to see SugarQb being put to use. In case you don't have this free node installed in your copy of Proteome Discoverer 2.1 -- you should -- you can get it here.

Friday, May 18, 2018

Addressing more BoxCar/BoxFahrt comments!

It can be hard to both post and to find comments that are made on this dumb blog. Especially when they go on separate posts. It helps to address them directly sometimes! In no particular order

Q1) Is the Xcalibur add-in available yet?

--Not yet, I don't think. I'll probably run around screaming when I find out it has

Q2) How can you process the data currently?

-- For the TMT stuff we've been doing (LFQ runs are on this weekend when I did the math and realized there were a few (very rare in our lab) open hours on something!) Proteome Discoverer 2.1 has no problems with the data. Actually -- I know the peptide ID is great, but I need to do the quan comparisons later. I haven't tested PD 2.2 (I'm using IMP-PD nodes for this project and they aren't all available in PD 2.2 yet)

-- The MaxQuant version in the BoxCar paper is specifically equipped for real BoxCar data. I don't know yet (maybe next week) if it can handle BoxFahrt.

-- Testing is in progress right now for RAWQuant --- which, honestly, deserves it's own post. It's REALLY cool and I think it is something that we need to integrate into our data processing immmediately.

NOT A Q!!!)  Okay -- so -- thank you Chris -- I didn't know that the Fusion has features to allow you to select individual isolation windows. I will evaluate this immediately.  If you can optimize your isolation windows to spread out the densest regions of ion current (like BoxCar does) -- the results I'm getting on the Fusion right now might just be the beginning of the improvements I'm seeing!!

Q3) How does this differ from WiSIMDIA? It's got some similarities in that we're doing gas phase fractionation for the MS1 -- and WiSIMDIA is probably a good starting template for how BoxCar can be adapted to DIA. BoxCar staggers the isolation in the MS1 and allows for a more even distribution of MS1 ions than WiSIM -- and that even distribution allows lower intensity ions to come up out of the noise and be selected for fragmentation.

Thursday, May 17, 2018

Finally learned the XlinkX workflow and nodes!

High on my "to do" list for 2017 was to learn how to use the new XlinkX workflow and nodes. I didn't get to it -- and there I was just wandering through Mount Ember minding my own business and --

I hadn't battled crosslinked proteins in half a decade, but we recently got the XlinkX workflow added into one of our PD boxes --- 2 hours later (mostly because I was reading stuff I didn't actually need to)

(Not my sample -- just being cautious) but HOW COOL IS THAT OUTPUT? Here is what we think it is (very top) -- here is the MS2 evidence (first spectra). Bottom panels -- Here is the MS3 evidence for each side of the DSSO crosslink!

If you did crosslinking in the past and you still wake up from nightmares of the experience, I really recommend you check out this new generation of crosslinking reagents, instrument methods and data processing software. For real. I think the nodes for Proteome Discoverer are $500 in the US. The DSSO reagent set us back $100?

Wednesday, May 16, 2018

EASI-tag -- some lab in Germany is working on new reporter ion technology!

What a busy week! Some labs would be content with identifying a major point of inefficiency in every mass spectrometer in the world and demonstrating a strategy to confront it -- and maybe take some time to sit around and feel smart about it.  This new preprint shows that this isn't how they do things. 

If you're thinking "hey! we have lots of reporter ion tagging technologies already. what could this one add?"

What if I said that you could take 3 samples and label them and mix them in a ratio of 1 to 12 to 144 -- and when you process the data that your output was 1 to 12 to 144? In MS2 -- no funny, time wasting, MS3 tricks, no isotope suppression!

It is worth keeping in mind that this was a single shot of yeast digest on a 95 minute 45cm Dr. Maisch column on a QE HF -- this setup alone (sharp peaks, sample with only around 4,000 proteins on a fast instrument) would probably cause some reduction in ratio suppression, but when TMT10 first came out, Dr. Min Du and I 2D fractionated some human protein digest we labeled in 1:2:4:8:16 -- and 16 doesn't look like 16 -- it looks like 8-10 on a QE Plus. (This is the example set in any of the TMT/iTRAQ Proteome Discoverer processing videos I've made over the years.) If you could really see a 144 fold difference in MS2 scans? This is huge.

How's EASI-Tag do it? The reporter fragments off at significantly lower energy than it take to break the peptide backbone. The authors use a 2 stage collision energy, one low, and one normal.

They also do two things to the QE HF I'm not sure I understand the logic behind.

#1 -- They offset the isolation window for MS/MS
#2 -- They use a special setting they've developed for the QE HF software to preferably isolate the monoisotopic peak. Since we're adding a tag to these peptides, this is about the opposite of what we normally do -- for example --

When looking at a peptide greater than around 1600Th -- the M+1 peak becomes the most intense species, statistically, on something as large as this peptide, the M+2 is almost as abundant as the monoisotopic.  Since I'm signal starved and the heavy isotopes are distributed evenly across the peptide (and the Orbitrap onboard computer can identify the monoisotopic -- regardless of what is fragmented), I generally want the M+1 to be fragmented....

OH -- They describe the reason why they did this in Figure 1 c. Both the preferential selection of the monoisotopic and the offset. I think it has a lot to do with the particular characteristics of this tagging technology and doesn't mean I should start reoptimizing my other reporter ion experiments.

The co-first author on this great new study is one of my fellow instructors at the Advanced Proteomics summer school in Vienna in July! I can compartmentalize it in my brain (forget about this entirely -- there isn't all that much space in here...) and ask a million questions this summer!

Tuesday, May 15, 2018

BoxFahrt (BoxCar-ish on a Fusion -- no hacking required!)

I'm on something like iteration 48 -- but I think I've got it.

First off -- if you haven't seen BoxCar yet, you should check it out.  Once we all can do it, I think it will change how shotgun proteomics is done from here on out.

That is the kicker, though. It's going to be a bit before we can all do it -- and a bit longer before all of our data processing software can handle the output (mucking about with the MS1s is rough on label free quan.)

You know what doesn't have a downside -- just a massive potential upside? REPORTER ION QUAN.

I'm on iteration 48 and (with some false starts) getting massively better data using what the native Fusion software (whichever version came out in December -- the one that adds the "30Hz" and the funny quad isolation glitch if you use IC).

I won't walk you through all the stuff -- but if you click on the picture above you'll see what I've gotten around to method-wise.

3 Instrument methods (or segments) consisting of just these things:

The three segments are identical -- with the exception of the fact that each segment has a separate set of 10 T-SIM scans. For TMT, where my care for the MS1 scans across the peak is not the highest, a total cycle time of 4.5 seconds is just fine for me.

What I'm running right this minute uses a 60k MS1 scans, so this is the screenshot, but what I've been running has primarily been 120k -- it just hurts me to use 600ms on MS1 scans -- even when they're this good (and HOLY COW) the MS1 scans are good (below)

The individual segments are easy to set up. Important to note, the AGC target on the Fusion is the target in the box DIVIDED by the number of MSX events (took 5 runs to figure that out from the scan header -- I was still impressed by the quality of the data)

It's really easy to see the improvements in the MS1 signal distribution as shown in the BoxCar paper (I've got a bunch of examples just like this -- I'm not picking and choosing). In the top, this obnoxious singly charged peak uses up the entire AGC target -- you can't even see the 843 or 722 or a bunch of other really good ions -- msx-T-SIM it -- there they are!  Then -- you start to wonder -- if dropping from 1e9 to 1e7 is gonna work out real well and you realize you divided 20ms fill time between 10 T-SIMs...

Looking at the Fusion scan headers has caused people to eat Tide Pods (I can't prove this) but it helps if you think about them historically.

I highlighted the ion injection time -- it brings the first one to the top. This isn't a sum of all the multiplex injections, just the first or shortest one (difficult to discern because the first one is almost always the shortest). You'll see in the second line, the injection times of the first(?) 6 injections. Your fill time is a sum of those (plus the other 4 that can't be displayed).

Thinking about it historically -- how old is Xcalibur? Somebody probably knows. I don't, but if you told me that if you dug really deep into it you'd find Xeroxes of the punch cards that were used to code it, I wouldn't be shocked. 

How long have we been able to multiplex? 2012. when the QE came out? And even on the great Q Exactive Classic, I've never successfully multiplexed >5 ions. If the scan header only has enough room for 6 multi-injection spaces -- it makes sense to me.

Getting MS1 improvement is easy. The tricky part was getting the Fusion to fragment the stuff that I tell it to -- the most abundant, MIPS passing, peptides it sees. If you use an MSX and MSX control, it seems to get confused, cause this is what I tried first --

Don't get me wrong, the MS1 scans look great! But it didn't select anywhere near the number of MS/MS scans that appeared available (to me -- admittedly limited measurements). It is quite likely someone smarter will have better success, but --

Breaking them out into separate segments appears to get me more IDs -- even just using big T-SIM windows in a single MS1 scan appears to help!

Let's sum this up.

#1) I'm a BoxFahrt believer
#2) OMGauss, I can't wait to be able to do BoxCar right
#3) Umm....I'm getting an impressive boost in my TMT labeled peptide IDs with BoxFahrt -- can you imagine what BoxCar can do!??! 
#4) I'm doing TMT11! I'm, of course, not even using the ion trap on the instrument!   Very next thing for me (when I get the excuse to tinker again) BoxFahrt with parallelization for MS/MS in the Ion Trap!! I swear, the more I look at it, the more I think it will work better than the Q Exactive methods.... it could be filling and doing low res MS/MS scans simultaneously!

Monday, May 14, 2018

Become a power user (HACK YOUR FUSION!!)

Do you have an awesome idea for a better way to run your Orbitrap? There have been great examples over the years -- like Gygi lab adding TMT MS3 to their Orbitrap Elite and S. Gallien et al., adding IS-PRM to their Q Exactive and this recent BoxCar thing.

The manufacturer has always offered developer's kits and API's for those of us who can find somebody with the skills to write some code and change things, but this is the first time I've heard of a course about it!

Conveniently released about 2 hours after my employer's travel people set up my flight, but starting early enough in the morning that I'd never make the beginning of it anyway (winning!) you early birds can register for this cool ASMS (Saturday June 2nd) workshop here!

Sunday, May 13, 2018

BoxCar updates!

So...ummm...BoxCar is a popular topic. I've hardly had a conversation since the paper came out that didn't end up with us talking about the paper and/or our independent reanalyses of the RAW files.

I'd like to bring everyone's attention to this comment on the blog by Florian Meier, the first author on the BoxCar study: 

Dear Ben, all, Thank you very much for your excitement about the BoxCar acquisition method. We are about to release an Xcalibur plug-in that will enable BoxCar scans without the hassle of tweaking the Xcalibur method editor or extra software from the vendor. Please give us some time to fix last bugs and follow for updates and download details once available. Regards Florian on BoxFahrt-- BoxCar for people who can't alter their instrument software.


The fact someone from this team wrote a post here (!!!) and with news THIS good? A Chuck Norris thumbs up is quite literally the most powerful approval I could come up with. (I've heard that on this take the camera survived, but the cameraman and key grip did not). 

Thanks, Florian! We can't wait to try it!! 

Saturday, May 12, 2018

IonStar -- Global proteomics with reproducibility as #1 priority!

Okay, ya'll. This is going to look a little self-serving, because I've been lucky enough to contribute in some small way to this amazing project, but I'm on a mission.

This mission is to prove that:
1) Proteomics CAN be reproducible
2) Proteomics CAN contribute to clinical studies
3) Proteomics CAN be part of what we use to diagnose patients -- to find out when they're sick before it's too late and to help pick the drugs that they need to use to get better the fastest.
4) If proteomics focuses on what it can do to COMPLEMENT genomics and transcriptomics, rather than trying to beat them all the time (an exome sequencing is under $350, y'all, and a full 30-50x transcriptome might drop under $1k really really soon) at things they can do better and cheaper -- we can do great amazing things together. Do we really want to try and compete with that -- when they can't do any PTMs and have basically no ability to do proteoforms!?!?

I think an awful lot of people in our field are on this same mission -- but sometimes it doesn't seem like it -- because we can't don't seem to be able to stop messing around with the settings on our instruments and settle on methods that will make our experiment not only impactful for us for singular studies -- but also impactful for anyone who wants to go to ProteomeXchange and look at our data and compare those to other datasets.

I'm guilty of the same thing. Why have 112 settings I can change on my Fusion IF I DON'T TRY EVERY COMBINATION OF THEM!?!?!?!  I stayed up basically all night last night trying to make a QE HF do BoxCar (follow-up post coming -- I think I got it)

Jun Qu is also on this same mission. To prove it, his lab pretty much stopped changing their sample prep and mass spec methods a couple years ago -- and it's reaping some amazing dividends (more papers published -- just since 2017 -- than I've published in my career...).  So I'm going to present yet another great IonStar paper here.


7,000 proteins ID'ed quantified in mammal cell lysates with no missing values
Introduces IonStaR Stats which can be downloaded here so you, too, can have all the tools shown in this paper.

The suggestion of -- you know what?!?! if we just do great chromatography and ultra-high resolution MS1 scans -- maybe that triply charge peptide that we've got at 104 +/- 0.5min with 1ppm mass accuracy is the same peptide from run1 to run 8,412.  Maybe we can use MS1 libraries (not presented here -- but it sure sounds like it might work)...

Another advantage -- and maybe just because I'm a little bit of a funk because of some disappointment the last 2 days -- if you've got a TriBrid you're good to go. You don't have to hack your instrument or anything like that. The vendor's software -- a seriously nice column (Jun's lab uses 100cm columns), in limited experiments at my facility with 50cm and 75cm EasySprays -- the performance doesn't looks that far off. The bit I lose, I'd trade for the ease of NanoViper. I'm lazy -- sue me.  The important part is 1) picking a sample prep method, best you can, and follow it exactly 2) Run the same exact instrument method. As much as you want to try that higher AGC target --DON'T. 3) And consider that if you are just using peak finding (match between runs) that there is a certain number of false discoveries that will occur -- use some method of FADR to control it a little!

Friday, May 11, 2018

BoxFahrt-- BoxCar for people who can't alter their instrument software. Might even work!

EDIT 5/13/18: Please ignore this post. BoxCar is under testing and is coming to all of us soon! I'm only leaving this post here as a reminder to myself to think before I hit the "Publish" button -- and in case anyone wants to look at what the BoxCar RAW files look like to help understand the instrument method logic.

In my old neighborhood in Baltimore there is a hilarious race each year. It started as a soapbox derby, but due to all the artists and weirdos, it rapidly descended into chaos.

Now -- a bunch of people race down a hill riding old toilets.  I'm not making this up (proof) . Am I on the right blog? I am!

Okay -- so --- I was really really excited about BoxCar, then I saw a Tweet and blog comment that made me realize -- I can't natively multiplex >10 isolation windows in any Exactive Tune I have -and they use 16!!   I went to ProteomeXchange and got some of the RAW data -- and...umm... this is completely custom written software on the QE HF...ugh.... However -- I don't think doing something similar is impossible -- maybe I just have to make some compromises!

I've got the RAW files in front of me, a notebook, a pen that is a monkey with googly eyes, a tablet that thinks it is a Q Exactive HF -- working on this late at night -- so some ethanol might have made it into this espresso.

Time to build something that should simulate BoxCar!  In honor of  something very vulgar I said very loudly when I saw the .XML stuff where the instrument method is supposed to be - which made me think of the Toilet Derbies, I'm going to call this method BoxFahrt.

Disclaimers -- yo. if you've read this far and you think that I'm about to do something smart, shame on you. However, just to be sure -- no guarantees this will work, I won't have a chance to start testing it until at least Monday. But, don't you worry, I'll let you know how it goes.

First off, lets look at the RAW files from ProteomeXchange (you can get them here)  and try to diagnose what everything is doing (without trying to read the .XML used as an instrument file).

Using the plasma samples (smallest set at 3.6GB) as an example this is the method as I see it:

1) MS1 scan at 120,000 resolution from 300-1650
2) 16 BoxCar isolations with a 120,000 (?) resolution MSX orbitrap "full scan"
3) Same as 2, but with altered overlapping windows
4) MS/MS scans -- as I flip through the RAW file, it appears that we're looking at something realistically approaching a "Top5"

First question I have -- how important is #1? 3 MS1 scans at 120,000 resolution, even on the HF, is a lot of time. Let's assume it is important and I'll throw it in later. However --- my first attempt at BoxFart is going to be --

Step 1:  Set up MSX TSIM-ddMS2 runs (in this example 2)

In BoxCar, the authors run from 400-1200(m/z). BoxFahrt will do the same thing. To get this in 2 windows I'm going to need to do 40Da 80 (m/z) windows. If I do 3 x 10, it's going to be smaller isolation windows. For example purposes, I'm going to go with just the two here.  CORRECTION --80 Da windows.  Should be corrected in image above. 

Downsides of this way of doing things (BoxFahrt wasn't entirely meant to be a huge compliment or anything to this parody of a great method) -- if I set it up this way we're looking at the first round of MSX-t-SIM followed by the MS/MS scans selected from that Orbitrap scan. THEN we're looking at the MS/MS selected from the second round of MSX-t-SIM scans.

BoxCar appears to pick them from the two together, but I don't think that makes a ton of difference. The problem here may be the challenge in AGC control.

I don't have fine tune control over the AGC targets I'm going to be using. I can set just one number for BoxFhart. I'm going to say 5e5 and 20ms for my 40Da isolation windows

The logic behind my settings --

We know the QE family can handle 5e6 charges in the C-trap with limited ill effects (no reference, I just have friends who run above 3e6 for MS1 -- I'm sure there are references -- however, I've been working on this for a long time already and I'm getting sleepy.)

If we MSX 10 windows equally, that would allow us to run 5e5 ions per BoxFahrt window.
On a QE Classic or Plus, the 140k Scan is something like 512ms . That would allow us to have a Maximum IT per MSX-SIM of 50 ms, give or take (overhead is around 14ms -- so maybe shoot for 40?)
On the QE HF, 120,000 resolution is about half that. I'm erring on the side of caution and going to 5e5 and 20ms. It might be smarter to raise the target. Again -- this is where I'll start when I can actually have free time on our massively overworked instruments.

Now you need to build an inclusion list.

BoxCar alternates the overlapping windows. Please keep in mind that quad isolation isn't truly symmetrical on any quadrupole, but the Q Exactive classic is an older style (non segmented) quad and the isolation discrepancy on the edges is particularly steep off of symmetrical -- the QE Plus and HF have segmented quad stat are much closer to symmetric. The BoxCar paper goes into how to best deal with the quad isolation issues on the edges. Considering they use the HF -- just keep in mind that you might be looking at some loss in signal at the edges if you use a QE Classic -- or -- Fusion, to lesser degree, Fusion 1 systems.

What we need to do in BoxFahrt is build smart windows and (possibly -- can't say for sure yet) control our MSX ID #s(?)

Don't quote me on this (or anything I write here. that goes without saying, right?!?!) -- but I'd probably first try to run with no MSX ID filled in. If that didn't work great, I'll next put in some MSX ID numbers. Even with an MSX of 10, you can't put in #1 for all the ones in the first batch and #2 for all the ones in the second. I think, therefore, that this feature just allows you to keep your scans in order.

EDIT number 4,212: In older versions of the QE tune software (I definitely think up to 2.2) if I put in an inclusion list like the one below and walked away and came back to the method, I'd find that the list had reorganized itself in increasing order. I...believe....that this is no longer the case, but I've never verified. If you go here, I've put links to Planet Orbitrap where you can get the Vendor notes on Tune versions. If you are on an older version of QE Tune -- you'll have an issue setting up inclusion lists like this. You'll end up getting a 2 phase, over-complicated gas phase fractionation method. That might still work, but will be less cool.

I've spent way too much time on this last night and today -- so I'm going to stop here for now. I won't know anything until I actually try shooting some standard protein mixtures on an instrument or 5, but this is where I plan to start.  You'll note I started with a Loop Count of 5 -- in this setup this would be 5 from each MSX-TSim -- so we're really looking at a spaced out simulation of a Top10.

Honestly -- I think this is going to be easier to simulate on the Tribrids, but I'll probably leave this alone until I have some real data.

WAY WAY too much time spent on this the last 24 hours. Gonna have to save it and end here.

EDIT 5/11/18 later in the day:  What? I'm back to this. I want to address another reader comment -- if you did want to do BoxCar right what would you need?

I presume you'd need the API. You need to contact the vendor to get it, I think. It was on the BRIMS portal for a long time, but I don't think it's there now. The API is a Windows Visual Studio interface that allows you to completely control your Q Exactive.  Some really cool stuff can be done with the Q Exactive when you get the API.

Warning, though, it is a LOT of work to use. It is kind of a blank slate. I'd presume, however, that if you cut the instrument method text out of the BoxCar RAW data that it would have most of the things necessary.  There is a PDF talking about the API (directly opens from this link.)

More Edits late the next day:  I've heard, from a reputable source, a reputable sounding rumor that the vendor is investigating making BoxCar available to the rest of us. Some legal review needs to be done to see if distribution can be done. Stay tuned.

Also -- it was pointed out I had misspelled the name of my method repeatedly in the post. I have made these corrections.

Thursday, May 10, 2018

BOXCAR -- It's time to stop and redesign most proteomics experiments!!

Okay. I'm going to try not to freak out. This is so simple and brilliant that every one of us sitting in front of an Orbitrap should have come up with it on our own.

We've spent all our time focused on how to get more MS/MS signal that we've ignored the fact that our 120,000 resolution;  MS1 scan (that might take 250-500ms to actually do) only had ONE millisecond of fill time.

What happens if we stop being dumb and use all that wasted time for something? 

10,000 proteins in a single injection?  Hmmm....that wouldn't be too bad. That's, I dunno, 5 to 10 times what I'm normally getting per injection.  WHAT!?!?!?!?!?!?!?

That's what happens when you unbias your MS1 acquisition and distribute it across the mass range. The authors use the quadrupole to isolate a series of little windows across the mass range. The next MS1 scan will have a different set of windows. All the sudden that one single albumin peptide isn't consuming the full 3e6 charges that the C-trap can take. It wasn't isolated in that analysis, and in the one it did it was equally distributed between the other boxcar windows. It CANT waste all of your AGC!

What they get? An average of 20-60x higher signal to noise! Which kind of results in more peptide ID's per unit time than anything I've ever seen.

If you're thinking -- wait -- they just butchered the MS1 scan by cutting it into little sections -- you're right. However, the authors are somewhat involved in the MaxQuant project and have written some clever code and optimized the isolation windows to make the quan reproducible and reliable.  This might mean, however, that if you aren't using MaxQuant you're going to need to tweak your LFQ software to make this work. I'll look at the ones I use and let y'all know if I can make it work.

In a human clinical study they quantify around 6,200 proteins with no missing values across 10 samples. That's -- impressive.... On a less limited sample (mouse brains) they crack 10,000. Single shot. 10,000 proteins.

Man -- this paper is awesome, logical and I still feel really dumb -- but I'd feel a lot dumber if I wasn't using this method for some projects next week when I'm back in the lab....

EDIT: Thanks for the comments guys -- I'm out of the office till Monday. I'll be working on standardizing BOXCAR methods so that they can be distributed through   I hope to have something so people not on this great new study can be running next week.

Wednesday, May 9, 2018

Proteoforms as the next (and most important?) proteomics currency!

Currencies have been in the news a lot lately -- mostly thanks to the newfound public awareness of so called "cryptocurrencies" like Bitcoin. Some of the most interesting questions have been why things like the Dollar have value while unique 256 bit alphanumeric codes permanently preserved on a decentralized global network do not. Or, more recently, since people are agreeing that they do -- how many dollars they could be worth.

This short note in Science today makes me think there are plenty of parallels in the idea that -omics is similar to dealing in currencies. The authors summarize some of the mounting evidence that if we aren't focusing on proteins at the proteoform level -- we might be missing the most important aspects of how proteins regulate cells.

I'm feeling thoughtful and rambly -- so what about this question?

We know that the RNA quantitative level doesn't correlate well with the amount of a protein that is present. Does that mean the RNA level present isn't useful information? My wife probably won't read this but just in case I'm going to say "of course not. RNA quantification is definitely still important".

But if you had the choice -- regardless of what your background is I think that in most situations every researcher would rather know how much of the protein was present in a cell over the amount of DNA/RNA present.  I'd argue that is the reason why there are so many western blots in genetics studies for "validation" of their observations.

But -- what if you had the choice of knowing how much whole p53 was around as well as how much S15phospho p53 AND how much S392 phospho p53 was around AND how much S149glyco p53 (these proteoforms of p53 do different things to this important protein -- nope, I don't know what).

With shotgun proteomics, we have to do a global analysis, a phospho enrichment, and likely a glycopeptide enrichment -- and for the PTMs we have to trust quan with just one (probably poorly ionizing) peptide measurement. I think every researcher in the world would want the proteoform level quantification.

Unfortunately -- I picked kind of a poor example because it's around 44kDa -- and its still currently pretty hard with the instruments most labs have to do global proteoform analysis with proteins over 40kDa, but it has been getting easier every generation of instruments!

Now here is the big question, I think. Say your Q Exactive can only get you global quantitative proteoform analysis of the proteins in your tissue up to 30kDa. What is more biologically relevant to your disease state? The small protein proteoform analysis, or the relative quantification of 1 or more peptides from 30-70% of the proteins present in your sample of interest and the complete loss of information of what proteoforms those peptides came from?

Holy cow -- there might be such a thing as too much coffee for me -- but it's an interesting idea, right? When we can realize the proteoform currency, this little note might be like Satoshi's whitepaper -- hopefully less like the guy who spent 10,000 bitcoins on a pizza delivery.