News in Proteomics Research: March 2021

Wednesday, March 31, 2021

Scanning SWATH with ultrashort gradients-- 2,000 proteins in 1 minute?

I'm gonna drop this here because everyone else is talking about it.

Scanning SWATH goes way back to almost the beginnings of SWATH as an idea. I think it is very similar to SONAR from Waters in that the quad is not a stationary bin like we use for quadrupole Orbitrap based DIA.

The trick here is fully on an informatics level, executed through the impressively easy to use DIA-NN software.

There was an informal ABRF wrap up meeting with a lot of smart people from around the world that I somehow got invited to and one thing we talked about at length was the new generation of LCMS software that doesn't give you easy access to MS/MS fragments. DIA-NN falls in that group. This isn't going to make everyone comfortable, but it is something to be aware of. There is a whole generation of proteomics people using new software and getting stupendous numbers of identifications that verified by software but are not (or at least, not easily) verified by checking to see that MS/MS fragments actually exist.

If the software is right? Who cares! If the software is wrong, for many of these tools I don't know how you ever find out. I've been trying to look at the RAW data from this study but MSConvert won't recognize the .wiff.scan files and I'll probably just assume the reviewers did their due diligence on this fantastic sounding study.

Tuesday, March 30, 2021

More improvements for MSAmanda -- for all operating systems!

MSAmanda has continued to improve, and if you haven't takend a look at this free search tool in a while now might be the time to take another look. Check out a short summary here!

For people using Macintoshes there aren't exactly a ton of options for proteomics. You can say the same for Linux probably unless you are adept at pulling tools from Github and figuring out what steps the developer thought you were smart enough to know about without them telling you.

Other highlights include more compatibility with file types recommended by the Proteomics Standards Initiative which must be composed of the most tolerant people who have ever lived on earth. Most normal people after 1 year of trying to get proteomics to standardize anything:

They've been coming up with ideas for almost 20 years!

Monday, March 29, 2021

Proteomics needs more spectral library formats!

At both the conveniently overlapping USHUPO and ABRF a couple weeks ago a big story was the emerging new alternative proteomics techniques. Illumina is getting into the game and is chasing the SOMASCAM technology to see who can be the fastest to accurately quantify 1,500 proteins in human samples.

Four new companies launched last year alone raising huge amounts of money with basically the same pitch "we want to be the Illumina of proteomics", which probably led to Illumina wondering why it couldn't be the "Illumina of proteomics" too. Google the term in quotation marks. You'll find them and their huge successful investment raises.

The outside world is excited and ready to invest in proteomics and it's becoming readily apparent that LCMS is not part of the conversation. Could someone raise anywhere near that kind of investment capital on a pitch based on LCMS? No way. If you take a step outside our little community spin around in place 3 times and look back in you can probably see why the scientific community is fatigued with us and all the dumb shit we spend our time on.

For example: I think that a substantial percentage of people in the field right now are spending their time trying to come up with completely new and completely incompatible spectral library formats. Is that what you're planning to do today?

Why do we need a tool like EncyclopeDIA to have to converters for 7 different spectral library formats? Why is that not even enough?

I downloaded the files from a new study from a single word journal this morning to see the results for myself and I'm absolutely fucking thrilled to see a new spectral library format this morning. Even Pinnacle, which can natively load practically any format of MS data and has options for accepting a scrollable list of input formats (did you know that some ultralarge biopharma companies have their own internal spectral library formats? they do, because we've all absorbed too much acetonitrile through our skin and it has done something awful to our brains. I know because Pinnacle has the option to accept those as well on it's pulldown list of options for input) just closes when I try to feed it this amazing new spectral library format. I'm sure that OptyTech could fix that for me today, but why should they have to?

When you apply for your next grant and you're beaten by the genomics core across town because they can now use their NovaSeq to quantify 1,500 proteins, don't be bitter.

Go back to your lab and get back to working on a completely new and unnecessary way to extract proteins from your cell types that we already have 15 ways to effectively work with. Alternate between 100mM TEAB for resuspending your trypsin today and swap over to 50mM AmBiC on Friday, heck, mess with the ratios of protein to trypsin while you're at it. Tinker with that gradient to get that one extra albumin peptide you've always wanted to see in your global runs, you know you want to. Hell, put a grad student on making an entirely new spectral library format. We probably don't have enough anyway.

Just keep in mind that every step along the way we have done basically everything we could think of to make proteomics inaccessible to the greater scientific community and make it as challenging to reproduce our results as we possibly could. If Illumina pulls off 3,000 protein identifications in their next generation of technology as they have promised, we'll be lucky if LCMS proteomics exists anywhere outside of Cambridge and Munich, because by and large the scientific community is tired of our circular tail-biting craziness. And they should be.

Sunday, March 28, 2021

Mesh -- select and fragment multiple charge states of intact proteins -- in real time!

And the winner for understated study so far for 2022 goes to.....

Man, this is why I'm still here doing this stuff, I guess. From the title and abstract alone, I bet you probably didn't want to read this. Why would you?

Well, you can get a program developed for this study from Github here that will allow you to operate your Q Exactive system with some new super powers!

MetaDrive? What's that do? Well, it's nothing to yell about really, it allows you to look at intact proteins on the fly and deconvolute them and then it selects multiple precursors for your intact protein (like targeted multiplex quantification) and will hit them with stepped CE allowing you to get massively more signal on your intact proteins. That's all.

When you do intact proteins on an Orbitrap and select an ion for fragmentation the best case scenario is that it looks like this and you selected that one charge state for your MS/MS fragmentation.

You're only getting a fraction of the total protein signal for fragmentation. It's a large part of the reason that MS/MS scans of proteins look so lousy.

What if when this signal came off your QE, there was a program sitting there that was smart enough to deconvolute that protein and recognize multiple intense charge states? Keep in mind that the example above is a pure myoglobin standard. Top down doesn't look like that normally because you'll have multiple protein envelopes overlapping.

Just imagine if you were able to select even these three below? You've effectively tripled your protein signal for fragmentation!

Even better, since the Orbitrap is by far the slowest part of this process, there may be effectively no increase in the amount of time each cycle takes (like BoxCar, you're accumulating multiple isolated ion windows, holding them and then scanning, but here you can fragment them!)

You know, not a big deal. The authors might have understated how freaking cool this is because they do note that the real time calculations are a little slow. They're working on implementing the screaming fast FlashDeconV program into this to help alleviate the bottleneck.

The files are up on PRIDE, but still locked. I already put in a request to open them up if you want them!

Saturday, March 27, 2021

MS2Go Reports for MSFragger open search results!

I had a couple of Zoomie Team things this week where the conversations landed on MSFragger. Part of that is due to the fact that it is currently the only free way to process a big TIMMYTOF file in reasonable time. (As an aside we had OmicsPCs build us a new box with a water cooled Ryzen 5950X for this purpose. When we ordered it in December I think it was the fastest desktop CPU in the world. Probably isn't now, but we're getting MSFragger searches completed in about 1/3 the time this paper described.)

If you're using FragPipe much, you're probably familiar with this box.

And if you get distracted because, for example, you have an abnormally large 10 week old human who just discovered he has hands and his favorite thing to do when you look away for 10 seconds is punch himself in his own eyeballs, you might forget which peptide.tsv you have open. In this example you should close them all and go try and show him the dozens of frogs that just came out of hibernation that are having a loud jumping and splashing party. Oh. Punching yourself in the face is more interesting than frogs climbing up an 8 foot rock wall just to jump off into a little pond? Cool.

Okay, but the real challenge people point out is that it can be a little challenging to get from your protein to the peptide and PSM level evidence with all the sheets FragPipe generates.

Would a single output sheet like this help?

A single sheet with the proteins....

...and coverage map...

....and your open search reports all hyperlinked together and sorted out with premade filters?

There are some steps to this, but nothing that will cost you any money and you won't get all the superpowers that FragPipe has, but if you're just searching DDA data with MSFragger in Open or Closed search you probably won't notice (if you've got something other than .RAW you'll need to convert it to a universal format first.)

You'll need PD or a PD demo version.

I'll do it on this one.

And you'll need MS2Go (it says it only works in versions up to PD 2.3, but it totally works in 2.4, so I assume 2.5 will be just fine as well.)

I made a tutorial for installing MSFragger in PD a while back you can find that here and I think I put in links to better instructions from their lab somewhere there as well.

MS2Go is super simple. Just point it at your .pdresult file.

And export your condensed report. If you didn't use Feature based quan you'll see this warning that it couldn't find it.

Obviously, this isn't a solution for everyone or every project, but if you or people you work with are used to seeing this kind of a condensed output it will save you a ton of time over assembling this from the FragPipe output yourself!

Friday, March 26, 2021

Want to learn the annoying field of single cell proteomics?

Are you tired of hearing the words "single cell"?

Not yet? Okay, so you should totally try getting a single cell into the bottom of a 96-well plate, or floating on the surface of a tiny droplet on a glass slide (NAnOPot) for yourself, and then successfully digest that cell and get it into a mass spec.

Don't know how? There are 2 great meetings coming up!

Chronologically, the first one is only 1 month away!

Busy in April?

ASMS has been moved to the fall, so what about the Boston meeting in June?

Thursday, March 25, 2021

Our field lost a great scientist -- and an even better person.

Wow, is this one hard to write. If you weren't aware, Michael Bereman of NC State passed away last week. As an ASMS member you might remember Michael receiving one of the awards that are given to extraordinary young researchers just a few years ago (2015). He was brilliant and far too young for me to be writing this.

Michael's stuff has been on this blog a lot. AutoQC, SProCoP and other tools and we reused the amazing CSF proteomics files that his lab generated for #ALSMinePTMs, partly because I wanted to make sure the files we used for reanalysis were generated by a real expert.

A GoFundMe page just went up for Michael's family that you can find here, with additional info on this great guy that we'll all miss.

Wednesday, March 24, 2021

CIDer -- Account for the differences between CID and HCD!

Were you comfortably just sitting there pretending that CID and HCD fragmentation were the same until someone worked it out for you? So was I! Shame on both of us.

Well, they clearly are not the same. Yes, they both predominantly result in b/y ion fragmentations when hitting nice unmodified tryptic peptides, but you don't have to go too far off those friendly ions before it is clear they are very different.

I'm comfortable saying this now because these nice people worked it out.

Tuesday, March 23, 2021

MS1-only proteomics is back with some machine learning boosts!

How much evidence do you need to make a peptide identification?

If you're using match between runs, if your identification is by "Match only" you are relying on your aligned retention time, mass and your isotopes to make that identification.

Is that enough?

MS1-only proteomics says черт возьми it is! (Google said that is how you say "hell yeah" in Moscow)

What if you had aligned retention time, and mass and some ion mobility hardware as well? Is that enough to make an identification?

Is that enough? It's certainly more.

What if you then toss in some sophisticated FDR based on machine learning?

Some of the work is performed on a FAIMS equipped Fusion Lumos, and some other files are from a QE HF.

This is one of an increasing number of studies that are questioning the foundations of "what makes a confident peptide identification?" and is definitely worth thinking about.

Tuesday, March 9, 2021

Downloads for ReCal Offline and other old FTPrograms!

Are you looking for RecalOffline? For some reason it can be hard to find.

Some person I ran into at this week's conferences has a link set up for downloading three separate versions.

If you aren't aware, this allows you to take data acquired on a Thermo FTICR or Orbitrap and apply a recalibration factor. It can be quite slow for today's files, but sometimes it's a lifesaver.

Our mysterious benefactor put up separate links for:

FTPrograms 4.0

FTPrograms 3.0

FTPrograms 2.0

Happy recalibrating!

Monday, March 8, 2021

Effects of column and gradient lengths on proteomics!

I spent time on a lot of references, and eventually found the absolute worst one as a follow-up from some ramblings from this weekend.

Want to ruin your lunch break reading about stuff you don't want to think about? I recommend this one.

We convert biological molecules into ionized gases and move them around however we want -- even blowing them to bits -- in vacuum chambers with increasingly sophisticated tools --- and if I don't think about peptide solubility I can't maximize my results? BORING. However, that's how it works.

I dislike this paper the most because it makes it seem so simple.

I strongly recommend you don't read this paper or my main takeaways.

Takeaway #1) When using a 10cm column there is basically no difference in the number of peptides identified when increasing from 60 to 90 minute gradient. Too much time, too little chromatography material.

Takeaway #2) When looking at gradients of up to 90 minutes in total length, there was basically no difference in the number of peptide IDs when moving from a 40cm to a 60cm column. You've got more chromatography material, but to really take advantage of that you'll need a longer gradient.

This one I'm not sure would reproduce the same way today. An LTQ FT Ultra system was used and we're talking about probably 5 MS/MS scans acquired per second? I've never actually ran one of those myself, but I thought it was comparable to the Orbi XL. I'd expect if your instrument was 10x faster that you'd see a difference here, but at 90 minutes it might not be a pronounced one.

Hopefully this is the last I have to discuss this for a while, because it's booooring, but it's stuff we do have to consider. I've seen a couple of great studies recently with important samples and expensive instruments and a fraction of the #PSMs/peptides/proteins that I'd expect. It's such a bummer to pull down the RAW files and see that something as boring as chromatography held the whole thing back.

Sunday, March 7, 2021

Now, More Than Ever, Proteomics Needs Better Chromatography!

Okay, I've definitely rambled about this before, but I learned from @DrJeanita (how have we not met before, I literally just go to NIST in Gaithersburg to hang out, and not just to lure people to the microbrewery across from the gate with me...but what happens pre-COVID, happens...if you don't know her, you should follow her) at the USHUPO mentoring day that it's okay to repeat really important stuff and this paper is SO SO SO important...

I'll go ahead and say it, CHROMATOGRAPHY....

And I've gotten away with pretty much ignoring it (for proteomics) my whole career. My QTrap got a decent MS/MS scan every second or so, my Orbi XL realistically got around 5 Scans/second, my Orbi Velos probably 8 scans/second and a QE Classic/Plus gets 10 or so.

If you are only getting 10 scans/second....your peaks can look like the Appalachian mountains. The resolution of your chromatography isn't holding you back. You can probably only fragment the things at the very top anyway, unless your sample is very simple.

However, as this amazing paper describes in painstaking detail -- at some point your scan speed is no longer your limiting factor -- and many of today's instruments now are at that point. What IS your limiting factor?

Ugh...as much as I hate to say it...it might absolutely be your chromatography.... We're seeing an increasing number of papers IN PROTEOMICS that are discussing chromatographic resolution, peak capacity and (...puked right on my keyboard...took a minute to replace it....) theoretical plate counts.

I can't stay on this topic much longer without thinking hard about whether I was happier the 6 years of my life I spent washing dishes in the back of a bar-b-que joint in the south that didn't have air conditioning in the kitchn, so I'm going to leave you with a really staggeringly boring powerpoint from Agilent (direct link here) that shows how you can use Algebra to understand chromatography.

If you're wondering why you can't get the results that the labs that write the papers that you hate you're collaborators carrying into your office are getting, at least part of it might be chromatography. Part of it might be that they're now famous enough that they can publish literally anything, but this sentence is probably a joke. We all know that's not how science works!

Saturday, March 6, 2021

The best free software for protein deconvolution!

Have I posted all this before? I feel like I have, but the question comes up all the time.

Scenario -- you have a beautiful spectra of an intact protein like the thing above and you just want the deconvoluted mass and you don't want to spend $6k for one seat of software to get something like this?

This is the NIST mAB standard with 2 clear glycoforms (148,080 Da)

If you can pick out one single peak and get one pretty spectra like the top panel, there is a software from 1998 wrapped in a nice GUI that works in Windows 7 and 10 that will do this in under 10 seconds.

This is the paper that explains it.

And someone, not me, put up a link to the secret word-of-mouth, you can only get at ASMS from a friend of a friend if you bring a clean USB stick and meet behind the piano bar at a specific time.

The legality of the software is something everyone always seems worried about. Everyone made me concerned enough that I've never once put up a link for it, but since someone else put up a link with the intention of mentioning it to people a lot at the 2 overlapping conferences this week (ABRF & USHUPO simultaneously? Wut?)

You can get it here. There is an old video from some rambly nasaly person using an older version on Vimeo that eventually shows you how to use it.

Okay -- so that works for extremely simple stuff. What if your protein isn't fantastic? You can go crazy sorting through and averaging MS/MS spectra.

Here is the question you need to ask next; Do you:

1) Want to see if your protein is there?

2) Do you want to look at the difference in the proteins between sample A and sample B?

If I answer #1, then I go straight to UniDec.

If you answered #2 then I fire up FlashLFQ, which I find easiest to use through the Proteoform Suite. Proteoform suite is designed for big top-down, but I'd argue that this is just as good for antibody characterization as any of the expensive software packages out there. You need to convert your files to mz(X?)mL first and then you can deconvolute all the proteins in your file and do quantitative comparisons between files.

Angry baby, gotta go. Hope this is helpful somehow!

Friday, March 5, 2021

ProteaseGuru -- Take the guesswork out of picking your enzyme!

I just downloaded this and it's going on my desktop right now (if I ever close it).

I feel confident saying that you need ProteaseGuru too!

What is it? It's a really simple little program that will tell you what protease (trypsin, lysC, etc., etc., it has everything I can think of preloaded) is a good decision for your experiment and what you're trying to accomplish!

There probably aren't too many proteins that people on this planet have spent more money studying than KRAS --

I highlighted the part that everyone I've ever met seems to care about. Trypsin ain't gonna do it. What will?

You could sit there with a calculator and look at what the 7 different enzymes you have sitting around from past experiments might do -- or -- you could load your FASTA (with isoforms, etc, heck -- your ENTIRE ORGANISM FASTA) -- and just pick from an amazing array of digestion conditions

(This thing is super easy to use, you don't need instructions if you're a lazy person)

And -- we have a winner! AspN (with considering up to 2 missed cleavage) totally covers the whole lysine rich domain.

Worth noting, there are other helpful tools here including handy histograms to visualize theoretical output.

Don't feel like reading more than this? You can download this great piece of software at Github here.

Thursday, March 4, 2021

RawBeans -- Ultrafast, ultrasimple, ultrapowerful QC software for shotgun proteomics!

Once upon a time we had a cool program called RawMeat from VastScientific that was free and could rapidly look at any Thermo RAW file and provide all sorts of insight into your experiment. Vast got bought up by some corporation and all their software eventually disappeared (I'd still argue that peak finding software was competitive with anything out there right now -- it was designed for much smaller data files and can take months on today's massive data files). If you've got an older instrument RawMeat is still great and I can help you get it, but it doesn't work for anything beyond a certain Foundation version.

I've mentioned RawBeans before, but wooo -- it's been updated and it's unbelievably easy and powerful now. And it's got a paper now!

For real, it's super simple. You can get it here. And this is all you have to do (finding where Windows10 hides your MSConvert.exe is the hardest part).

What can you do with a Lumos file in 101 seconds? (I have a weird mouse glitch but I ziptied all my PC wires together so they look really neat and I can't talk myself into replacing my infuriating mouse and redoing it all).

(Click to expand, if you need to see the amazing output reports you can get). All of this populates into an HTML file.

Did you get low IDs and you're wondering why? Run your file through RawBeans. If you maxed out your ion injection time then rerun the sample with a higher max ion injection time (also called fill time on other instruments).

Best of all, it's vendor neutral!

Tuesday, March 2, 2021

Tired of buying columns?!? -- pack hundreds a day yourself!

Packing your own columns might not be for everyone....

...but if you are someone who does this, you know that when you get that stupid slurry just right it sometimes makes sense to just keep making them until you have to stop.

As an aside, I'm not sure if I've mentioned this, but due to COVID spacing we moved most of our instruments into Bob Cotter's lab space which has been used little over the years. One thing I found was the results of someone's great slurry day because there were around 20 packed columns taped to the wall!

What if you had a great slurry day and you knocked out enough columns for yourself for the next 10 years?

This preprint has all the plans to set yourself up to pack 50 columns in 100 minutes!