Monday, July 13, 2020

PEPPI-MS -- In gel separation of intact (and NATIVE??) proteins!

What if you could make top down (and NATIVE!) mass spectrometry a whole lot more accessible -- with gel based separation? Would that make you peppy?

The P stands for Passive and I'm not sure why, but if more than a few of us are ever (really) going to adopt top-down and native proteomics (I don't mean super pure antibodies, I mean the complex stuff) we need to simplify the front end and this looks like a great new tool for it!

You can check it out here! (Open access!)

Sunday, July 12, 2020

New and improved tools for the Omics Crew!

I'm a couple weeks behind...on...well....everything....but I really dig July's special edition cover -- and, in particular, this great summary piece!

Want a summary of some new -- or newly improved tools for your workflow? Here are 37 new ones organized out! 

If nothing else, bookmark it, and the next time someone comes by your office with something that you feel like you could probably pull off, but aren't sure how with your typical workflows -- maybe it's here!

The summary is called --

-- and if you see a technique on the list that you've never ever heard of -- it's probably in this special edition!

Someone remind me to read up on block's not in my folder, and I think I've retweeted it 4 times so I'll remember....

Saturday, July 11, 2020

ProVision -- ShinyApp for Quick Analysis of MaxQuant Output!

"....Google Images, making weird mass spectrometry blogs even weirder, since 2009.." 

I'm running a lot of MaxQuant right now because I'm running lots of really complex samples with short BoxCar runs and -- if you to quantify BoxCar data, MaxQuant has a little box where you put a checkmark and then it does magic. There are 2 other options for processing BoxCar data.

1) Gibberish quan

(Please correct me if something has changed recently, but as long as MaxQuant works, I'm honestly cool with it for these projects)

For MaxQuant output, you have loads of options.
1) An persistently hideous spreadsheet format that Excel has even more persistently refused to open 100% successsfully for nearly a decade.
2) The ultra powerful, can do absolutely everything if you're smart enough, will do nothing whatsoever if you are not quite smart enough, constantly improving for people smart enough to run it, constantly infuriating for people not smart enough to run it the end of the day....some of us aren't getting smarter....that train has sailed....

There are other options out there that try to bridge this gap -- and here is a new web-based one (which was what I meant to type about, forgot, and then the espresso finally got to my brain, 

You can read the paper (it's open access!) or you can just hop over and try out the program online here.

Now -- it is Shiny -- so, like all R/Shiny programs it is very particular about what your rows/columns are named. There is a function in the app to rename them, but I can't get it to work. I can only proceed if I go into my proteingroups.txt and rename them there first.

What do you get from it? Fast and pretty output. (Example plots from paper shown)

There is a funny number of warnings in the online version about imputing missing values, probably an appropriate number of them, which gives you a feel for the authors thoughts on the topic.

You can also run it locally through R in case some weird blogger is uploading tons of stuff to it and it's running really really slow today. You can get it here.

Friday, July 10, 2020

FeMS Transition Talks! Submissions open!

FeMS is at it again, coming up with innovative ways to get interesting science out that we might not otherwise end up seeing -- despite the challenges of 2020.

If you're near the end of a stage in your career -- or know someone who might be interested in showing some cool stuff to people at the ultracool Females in Mass Spectrometry organization, send them to this link! 

Thursday, July 9, 2020

Make sense of what all these new glycopeptide software packages do with this new review!

This brand new review at MCP is a tremendous resource. And someone is very good at making figures.... 

By separating out the different types of software into distinct categories it helps you to figure out what you actually need based on what you want to do with the data you've acquired.

I won't lie and say I had time to read this, but its hear so I'll remember to read it later.

Wednesday, July 8, 2020

MSFragger tears through TIMMYTOFF data at a process each 2 hour run!

MSFragger is FAST. Like really really fast. The original command line version could finish processing a Q Exactive file in about the amount of time it took me to push the enter button. As features have piled up on MSFragger to make it more useful, it has slowed down, but it is still a program that is darned fast for a desktop CPU program. says something....I'm not sure what....something, for sure....when a 2 hours TIMMYTOF run with PASEF takes 70 minutes, on average, to process with MSFragger and that is massively outstripping the competition.

Good news -- there is another tool to process PASEF-TOF data with!
Bad news -- you still have to be patient. This data is dense!

You can check all this out here.

Friday, July 3, 2020

What's an "instrument qualification" and should you think about getting one?

Whoa. This blog has suffered due to the fact that I'm just tired of writing. Or just tired. One of those. My last shift in clinical chemistry was over a dozen years ago and while the instruments have gotten much much much better, the amount of paperwork necessary to prove that your assays are valid when it really counts has....gotten better if you are thinking about thoroughness (the most important part) and...hmmm.... nah...not sure where I was going with that. I'm sleepy even though I'm staying within walking distance of my lab so I can be here ALL THE TIME! WOOOOOHOOOO!

What I thought would be interesting to ramble about over lunch is INSTRUMENT QUALIFICATION (sounds the oppositive of interesting, right? However --did you know that when you get a new instrument that you can purchase a whole ton of extra installation things to verify that your instrument does everything that you assume that it should?

It's called an "Instrument Qualification Package" and basically every vendor offers these. If you aren't in a clinic or some other regulated environment, your sales rep might not bring it up, but I think it is something everyone should at least consider with their new instrument purchase:

Why would you do this? A bunch of reasons
1) You get a bunch of stickers that are signed by a special Field Service Engineer who is approved to check all these aspects of your instrument
2) You get this special FSE around for days. After my installs I got 2-4 extra days of onsite time with an FSE who reeeeeeaaaaaally knows that instrument inside and out.
3) You get hundreds of pages of extra documentation on your instrument and its specifications:

This is just the extra documentation on the Q Exactive. What's in it? Tons of great information! 

What about a pressure test of your LCMS system and it's linearity? Do you get that during a regular install?

(Maybe they do it, but I've never gotten a hand signed report and comparison to factory data!)

Sensitivity metrics compared to factory specifications? Heck yes, you get that too!  My new QE has a really great S/N vs spec (due, primarily to low noise levels, which is likely temporary, but how cool is it to know that?)

Even cooler, maybe, is the fact that you get your software installed and pressure tested!

This might only be for targeted or for EFS/Clinical stuff, but I'm not sure. The FSEs brought their own data and processed it through the versions of software that they installed to verify 1) the software all works right and 2) it produces the data that it should AND 3) they left the data. You can quickly master the software by trying to replicate their results. How often have you wished you had a good file set for learning your new instrument software with? Turns out you can just buy that.

On top of all of this -- there are certain lab certification processes where all this information is required. Without it, you can't get contracts or jobs for some government agencies, etc.,

Worth noting, this is not free. For 3 instruments this was around $30,000 USD extra, something that I was initally annoyed about --- but I got close to 12 extra days to annoy some of the vendor's top engineers with days of uninformed questions about the inner workings of these boxes. And they installed all my software and gave me over 1,000 pages of paper, much of which they personally signed. In the end, I'll definitely do it again. Maybe soon, cause the new Exploris systems are surprisingly affordable (no joke, if you're thinking about any mass spec this year you should get some new Exploris quotes, you can score one [with a D20!] for less than most vendor triple quads.'re giving up some functionality as the number on the front of the box decreases....but I'll take a quad-Orbi that has recently been on actual fire (or possibly currently is on actual fire) over most unit resolution instruments, but if you've had the misfortune of being on this blog much, you already knew that)

Tuesday, June 23, 2020

Spritz -- Proteogenomics for everyone!


Preprint is here!

You can get the software here.

You do need the Docker Desktop thing, and if you were forced the mandatory Microsoft Edge thing in the last 5 days, Docker may have issues (my 2 PCs that had that update throw an error that my Windows version isn't new enough, my PCs that have updates disabled are just fine).

Like most Smith lab software it is really straight-forward.  You do need to make sure that you allow Docker access to the hard drive where you have Spritz.

You can either download your FASTq files from your "next gen" stuff or you can pull directly from SRA/SRX at NCBI here.

One surprise is that (being a dummy) I thought that Docker was just for GPU based software, so I wanted to make sure to run it on a PC with a decent old GPU, but the software recognizes how many CPU cores you have and defaults to using all but 1 of them.

THEN -- Spritz does all the proteogenomics magic stuff you've heard of  itself --
It makes a snake and then it calls the variants on your cellphone AND --

If the authors aren't lying (it is a preprint and I don't have time to verify this) -- you know how there is a version of UniProt databases in XML format that holds PTM information? What if you could cross reference the variants and this PTM information?

The authors demonstrate the use of Spritz in conjunction with both bottom up AND top down data!!

Monday, June 22, 2020

What is up with all the ubiquitin memes??

My inbox is pretty weird sometimes.....time to share!

I couldn't even find the one I was thinking about when I started this....I'll add it later if I run into it.

Found it!  Someone texted it to me. (Weird people have my phone number....)

Sunday, June 21, 2020

Advances in Proteomics (and metabolomics) symposium 6/25/2020!

 You know what you need right after your 3 weeks of ASMS videos??

Another Proteomics and Metabolomics Symposium!!!  You're in luck, there is one this week. It's free and you can register for it here.

John Yates is the Keynote and he's a great speaker, and I'm not sure I've ever heard Kathryn Lilley speak (bonus!) and Birgit Schilling always has something cool to talk about how she's working on stopping people (only cool people) from aging.

I also get to ramble a little about the #ALSMinePTMs project finally!! But this kind of snuck up on me so the data hasn't all been crunched yet.

Saturday, June 20, 2020

The Power of Three -- More Enzymes, More Search Engines, More Databases!

When I read the catchy title of this new paper, I thought of some TV commercial jingle, but if Google images gives you an old Doctor Who image, you use it....

...particularly if it's a cannabis paper. Why? I dunno. I just thought it was funny to type.

I haven't done tons of plant proteomics. Some cannabis stuff, a few grape vines,  a scumbag arabidopsis or three, maybe a tomato study that I consciously try to forget? seems familiar and besides being a pain to get the proteins out, and seeing anything but ruBisCO, the proteins are typically really short.

This group does a stellar job of trying to overcome the two relevant obstacles for them by painstakingly optimizing (in previous studies you'll find in the text) the extraction and digestion methods. They demonstrate here that the use of complementary enzymes is critical to getting good sequence coverage because of the stupid short proteins. They also scrounge up protein sequences from just about everywhere they can for the plant, throw in Mascot and SeQuest to sort it all out.

The only dubious part might be the use of  a really old version of Proteome Discoverer for FDR. When you get into larger number of PSMs either because you've generated tons of MS/MS spectra, or just have a tremendous number of sequences to sort through, we know that PSM level FDR has a tendency to under filter on it's own. This is probably best highlighted by the Ezkurdia et al., paper which caused most software packages to begin implementing peptide group level FDR steps as well.

Friday, June 19, 2020

Even more SARS-CoV-2 proteomics! (with phosphoproteomics via DIA!)

Rapid highlights?

-EvoSep 1 ultra rapid gradients + QE HF-X
-Fractionation + DDA for libraries
-HF-X DIA for (most of? again, moving fast, busy day ahead) the global proteomics
-Interestingly -- DIA for the phosphoproteomics and the # of isolation windows changes for the phosphoproteome, but the collision energy does not.
-Some of the prettiest downstream interpretation data you'll see this week. Are y'all keeping artists around for this stuff? My stuff comes out of Perseus looking like this.

(I actually Googled "ugliest R plots" and was relieved that most of the things that popped up are even worse than my histograms)

Thursday, June 18, 2020

The Proteome Landscape of the Kingdoms of Life!

Article titles are something that I think can go one of two directions. 
Direction 1) How many words can I squeeze into this box the publisher's provide in order to make sure that I describe everything I did in the most thorough way possible while ensuring that no one will ever actually look forward to reading this paper. 
Direction 2) BOOM. This is what we did and we've got a name for it. 

The problem with direction #2 is that you can sometimes get into the paper with the flashy title and feel like it was a marketing job. "Like....oh....yeah....of course I read it, but it wasn't that good." 

Every once in a while, though, you run into Article title/type #3 where you get a really good and catchy title for something that makes you want to stop your car at a closed rural highway truck weighing station so you can read it -- and even after you have to explain to a police officer why you parked at 4:30 AM at a closed weighing station  (defund the police) you're going to move your car out of the special space that says "emergency services only" and the paper is good enough that you're still going to finish your blog post anyway. 

What I'm typing about right now is the 3rd type. 

You can read about it here. Or you can go dig around in the data at

What's it about? Well, they did proteomes of 100 taxonomically distinct organisms. For one study. Even if it was bacteria, that's a buttload of proteomes. And it's not all bacteria. It's all sorts of organisms. 

Quick takeaways in case I have to move a hypothetical car again. 

1) uPAC columns have somehow come in and vindicated what Jun Qu has been doing for upwards of 10 years now. Columns over 1 meter are all the rage! In this study 2 meter long uPAC columns were used. 

2) This is how you set them up! (due to the extra emphasis in this paper on the importance of grounding, I'm imagining there is a story that someone somewhere knows....) 

3) How would you do 100 proteomes for one study? 

You could:
-Digest with preOmics robot/Bravo
-Spider Fractionate (8 fractions) (yup -- there's over 800 HF-X RAW files!) 
-Run uPAC columns at 750nL/min with QE HF-X with DDA
-You could also mess around with MaxQuant targeted (further investigation warranted here)
-Process with MaxQuant
-Develop some new tools that will be necessary to deal with this much data (put them up on Github!
-Put all the data up on ProteomeXchange Partners (PXD014877)
-Set up a snazzy website for interpreting all the data
-And wait till June for Nature to publish the paper that was accepted back in April

Monday, June 15, 2020

piNET -- Downstream data analysis with PTM to MODIFYING ENZYME connections!

Okay --- maybe THIS is finally it? Maybe this is the tool that can link all the phosphoproteomics data together into something that saves us from having to look up each and every one, or relying on that one "pathway analysis guru" that you might or might not be lucky enough to have at your work, or occasionally go rock climbing with? (I've been trying to figure out a social distancing rock climbing solution, to no avail....booooo....2020....booo.....)

I haven't test driven it yet, but a quick check of the figure above (what I highlighted) MAPK1/3 are definitely linked to the phosphorylations on Q15418. If those are linked to TP53 phosphorylation regulation, UniProt doesn't shout it out in the first page, but it seems very likely that they'd be linked by one of the many data sources that piNET is actively referencing!

The trick is filtering out the studies where people were obviously going for peak bagging (trying to grab bragging rights for the highest number of phosphorylation sites they could identify, something that was mistakenly thought to be cool for about 2 weeks one year long long ago, that we all universally agree is not at all cool to do now and was just an unfortunate mixup) but with some filtering capabilities in this software? This looks like an insanely powerful and useful new tool for our utility belts!