Saturday, July 31, 2021

If you want a mouse phosphopeptide and it isn't here, it might not exist!


Wow...okay....I figured when I got to one of the world's largest medical research institutions, we'd laugh about the stops on my career where we only had access to mice to study. Turns out you can't knock out genes in humans to test hypotheses, so there are actually reasons to study mice (there are others and you probably know them!)

NIST has nicely assembled a great resource of 70,000 or so phosphopeptides from humans in label free and a bunch in iTraQ labeled form, but what about the mice? 

Welcome to the mouse compendium phosphopeptides

That's the paper, but there is also a really nice ShinyApp that links you to every peptide and phosphopeptide they've positively identified. Got a peptide of interest and wondering what FAIMS compensating voltage you should use? It's in the App! Wondering the dominant charge state when you TMTPro label it? It's there too! 

Wednesday, July 28, 2021

IceR -- Easy to use Shiny tool for next level alignment and peptide identification!

I'd left myself the helpful notes a while back that had a link to this preprint and just the words scribbled "this looks blank killer!!"

I'd like to thank past me for leaving this very helpful note in such a conveniently odd file in my documents folder, and after stumbling upon it, I have to agree. 

This is what past me was so excited about.

You can get IceR here with very helpful install directions. (The authors are very candid about their installation instructions for MacIntoshes)

There are bunches of ways to get great LFQ alignments to improve your IDs, right? Why would you be interested in this one? 

...someone has ran a lot of nanoflow LCMS (or at least looked at it very very closely)! This is some next level alignment from some people who are really thinking about actual sample level variability and the results seriousy show. 

The ShinyApp is fully compatible with both Orbitrap and TIMMYTOF data. 


Tuesday, July 27, 2021

Get that assay to the clinic/regulated environment by moving it to a single quad!

I was watching a talk by Matthias Mann recently where he described his main focus right now as being clinical applications. Which is super cool for one of our field's top innovators to focus on boring practical work (where we desperately need new approaches!)  And with the work from this group that I've excitedly rambled about (example, example2) it's clear that the group isn't messing around. So far, these developments have utilized LC-high resolution accurate mass instruments (LC-HRAM) 

Last year I went through the process of ISO qualifying and getting Department of Health approval for an LC-HRAM assay. I'm excited to say that the assay I developed is still in use and has completed thousands of samples since the clearance about a year ago and is approved for continued use by both bodies for at least another year. There were 2 huge challenges with getting this assay through. 

1) Instrument cost! 

2) The extra steps necessary to qualify a Q Exactive for use in a regulatory environment. (I detailed a super important and expensive step here.)

If I had to develop another regulatory assay would I go the same route? Or would I try something with less of both challenges?  I'm not sure, I really like HRAM, but here is a really good argument! 

You can probably guess by some of the author names that this is small molecule focused, but it doesn't take much imagination to extend this to peptides and proteins.

This group investigates how good a single quad based assay can really be when you design the experiments and downstream analysis really well. The answer? In many cases almost as good as a triple quad! Now, this does assume that your single quad is designed to be good, rather than designed with silly limitations to make it clear that the vendor really wants you to buy the triple quad. 

In this case the authors puts a Waters TSQ up against an Agilent 6130 (which seems like a powerful little quad system). 

How does that effect the challenges that I mentioned above?

1) Cost? A COVID discount bundle on a stripped down Vanquish Q Exactive was still $275k. You can easily pick up a good LC-single quad for under $100k, even if your sales rep doesn't like you because maybe you talk too much. 

2) You can avoid LOTS of qualification steps because you can get a single quad that is an approved medical device. (Of course, I think every major vendor has triple quads that have been through CFAR and FD&C clearance as medical devices), but a good LC-QQQ is about the same price as an LC-HRAM. 

Here is an approved medical device single quad from Shimadzu that could speed your assay out of your lab and out into the world helping patients. 

Monday, July 26, 2021

Remember spectral clusting? It's back and improved!

 A while back a bunch of people got together and said something smart like "wait, what if we just took all the MS/MS spectra that are associated with disease X and cluster them together?" 

The end result was supposed to be a file that is spectra that you'd know are disease X spectra and you can focus on figuring them out. Super smart, right? 

I rambled about it a couple times over the years. Here is a blog post from 5 years ago. These tools still exist and are still active, but I haven't seen them in use in a long time.

It's still smart, but maybe the concept needs an upgrade? That's what Falcon is! 

As these authors point out speed is an issue for the original code. How fast were the mass specs when the first paper came out? Not as fast as they are now! And this approach only really makes sense when you look at a lot of files. 

Falcon is fast because it reduces MS/MS spectra down to vectors and with the data massively reduced it can do smart things with it, like nearest neighbor searches. 

Want to try it? You can get it here!

Sunday, July 25, 2021

Tutorial -- Proteomic biomarker discovery and validation!

If y'all haven't noticed, there are warning signs everywhere that Proteomics is going to have to grow up soon. The real world is starting to pay attention to us and that means really good and really bad things, depending on your perspective.

Bad things?

There is going to be increasing pressure against us playing mad scientist (prepping every sample a different way and messing with those irresistible buttons on those instruments). 

The people best able to resist pushing buttons are the ones that are going to win all the grants and external funds.

We're all going to have to use robust statistics (or collaborate with someone who does). I tutored undergrad statistics in's tempting to think it's in this here brain somewhere...nah... I'm going collaborator hunting! 

Good things! 

We're about to get our shot to prove that we can be as good as, if not better, (better!) than all the other -omics technologies. 

We've got a shot to be THE driving force for the next phase of medicine! 

If you're a crazy mad scientist (or analytical chemist) and you want to know how to be on the front wave of how proteomics is going to change the world, I can't point you toward a better starting point than this killer tutorial. 

Saturday, July 24, 2021

Isobaric tags alter phosphopeptide enrichment -- new method to fix it!

 In depressing news, this new study demonstrates that isobaric tagging reagents (at least the Tandem Mass Tags) cause problems with some common phosphopeptide enrichment strategies.

Fortunately, they point this out along with a super efficient way to get around it and prove it with ridiculously efficient recovery of phosphopeptides from 50 micrograms of treated human cell lysate! 

Friday, July 23, 2021

Proteomics takes on Fish Fraud!

This isn't the first use of proteomics for fish fraud, but this is the first one I know of that went head to head with genomics based fish fraud! 

What's fish fraud you ask?  Here is a CNN article. Sorry if it's stupid. Here is the thing, though, it makes a lot of sense to take a cheap food product and market it as a more expensive one. 

In the end food counterfeiting is relatively common and mostly benign, unless you get have allergies to one specific food and are served another of your have cultural reasons, etc.,. And, btw, what is the tuna salad at subway actually made of? I know they keep failing tests for actual tuna in the tuna salad.

This is what we've found when we've investigated whether products are actually what they say they are: we generally find a correlation between lies and other bad things. For example, when we examined CBD oil products in the US, we found that products that were the most ridiculously labeled, including statements like "approved by the FDA" there was a pretty good chance the Q Exactive would find something in that oil that doesn't belong. And products that were adulterated were also more likely to have industrial contaminants, etc.,. 

Wait. This was about this paper! 

More proof that proteomics can at least supplement (if not completely replace) DNA technologies! 

Alpha-fold AI solved all the human 3D protein structures! (Sort of...)


Who else saw this on their news feed in the middle of the night and realized they had to try a super easy and intuitive way to find 3D structures of...everything from humans?

Whoa, I must have been busy or something. It's been out for a week! Here is the paper (it's 1 page!)

More importantly -- here is the interface use the 3D modeling tools in PeptideShaker a lot....I wonder how hard it would be to interface with this data source. Feature request time? 

Okay -- so this is probably worth noting -- you know that wonky peptide that sometimes passes through Percolator's semi-supervised machine learning filters -- it might also happen with AlphaFold. Again, not a 3D expert, but that structure looks a little off....

Thursday, July 22, 2021

Corrections on some TIMSTOF posts!

It has been 8 months since our TIMSTOF Flex was installed. I missed more than a month when my kid was born and I honestly can't remember since then until recently due to sleep deprivation, but it's starting to get a whole lot cooler around here. 

However, with 2 papers from the TIMSTOF  in review it's clearly working. Wiebke (pronounced close to Veebkeh) may even be integrating some clarifications into new software builds based on things that we've found very tricky to understand with the (3?? 3?) of these things we now have on campus since this Flex landed here in November. 

I'm going back over some of the previous TIMSTOF posts and making some corrections that were due to either my own inexperience, or things that have improved since their initial posts. I've put dates on the posts where edits have been made. 

I've made minor edits to my first impressions post here

100% worth mentioning, AlphaTIMS is software from Matthias Mann's lab that is much faster than Bruker Data Analysis, and has a lot of functionality that you might be looking for. 

Major edits need to go into the "Where did all my scans go" post, as well as a new Python script that accurately calculates how many scans that you will get when setting your target and intensity threshold values (thanks Wiebke!). You can get it from my Github here

I feel like there was another one that could use some work. Maybe I'll find it later! 

To anyone out there who has found issues with anything I've posted here and would like to correct or clarify, please feel free to reach out! My contact info is over there somewhere if you don't have it.  I don't have an abundance of free time, but it is important to me that what is here has at least some semblance of accuracy. 😉

Wednesday, July 21, 2021

Webinar -- Process Proteomics Samples with an OpenTrons!


I've probably rambled about OpenTrons here before. It's a $3500, $4000, $5000 autopipetting robot. If you google it and don't have a great AdBlocker enabled you will never ever forget about it. Their advertising game is fierce. I've bought a couple over the years at various stops for various purposes. (We've actually got a preprint out and paper submitted on another way to use one).

From previous posts you might find on the blog, I can say that the interface has improved markedly since the first one I ordered (which was Python only). There is a really intuitive little interface for it and the library of methods are improving all the time. 

These people (one you might recognize) are going to show you how to process a ton of proteomics samples with one! 

Tuesday, July 20, 2021

MS-PIANO -- A nice step forward in N-glycopeptide fragment annotations!


Gotta move fast, and -- full disclaimer -- I haven't downloaded this to run it yet, but Sandy Markey and Steve Stein are on this paper and I'm gonna just assume that the software works as advertised. I WILL be using it soon, though, I've got a bunch of N-glycopeptides to annotate, I've got a talk to give soon! 

My one problem with the MS-Piano paper is the story about the name. 

I may only have a West Virginia 'Murican public school education, but I know the name of a famous piano player when I see one, that dude wrote the song they play at US graduation ceremonies. 


Don't feel like reading? You can get MS-PIANO here

Thursday, July 15, 2021

Black sheep -- How to handle proteins with extreme ratios (R/Python)!


It would be super convenient if every protein that is differential in your model would fit nicely into your volcano plot in such a way that it is clearly differential without messing up your nice visualization.

However, sometimes get a whole list of proteins that say something like 100 because your software default is that is the maximum fold-change to report. How do you deal with those? Do you look at each one manually to see if it is a 112-fold change? Or do you wonder if you should have used imputation or something so you didn't have as many missing values (even if you divide by a quan value that is made up)? 

Black sheep is for the more serious informatics mass spectrometrists, maybe, but it is flexibly provided in both R and through Conda with really clear documentation. The test cases are really smart, including a phosphopeptide data set! 

Wednesday, July 14, 2021

Integrated multiomics to understand yeast alcohol tolerance!


Remember when ethanol seemed like a really smart way to help reduce greenhouse gas production? I'm not saying it's not (I don't know) and I do see the warning signs on some gas pumps in the US that say "warning, 10% ethanol" (presumably because old cars can't handle it well). If yeast had a higher tolerance to alcohol and could make a lot more of it, producing alcohol becomes a lot cheaper and easier to do. 

If we could understand it better, maybe we could mess with it and crank it up, right? 

Time for some super smart multi-omics (I'd argue the experimental design might be the star here, though)! 

Since this is a blog (supposedly) about proteomics, we'll focus on that. A Q Exactive HF was used for the proteomics with MaxQuant doing the data processing on these strains that were selected as they were forced to evolve increasing tolerance to ethanol in bioreactors!  How cool is that? 

By selecting multiple different clones as tolerance evolved, they could rule out a lot of the noise of this pressurized selective process, landing on 25 proteins of interest.

I won't lie and say I understand the multi-omics gene copy number stuff but you can check that out. But if you immediately needed to make a yeast strain with a higher tolerance to alcohol than the ones you already have -- there is a short list in this paper of what genes/proteins to start messing with! 

Tuesday, July 13, 2021

ASMS 69 Abstract deadline is today, slackers!

For the first time ever, all of my ASMS abstracts are in -- with hours to spare! 

I know not every one is in. Get on it, yo! We gotta get to Philly for ASMS 69

Monday, July 12, 2021

Time resolved proteomics of COPD of cigarette smoking!


This is a really interesting study new study on a lot of levels. Now, it is required under the rules of this blog that I insert at least the following gif. 

This is a BIG study, though and smoking mice were used to get started and to build a set of presumptive targets before moving into human tissue. Oh yeah! This is the study

There is a ton of work here, but why would you want to read it? Well, it does a really good job of integrating a ton of iTraQ 8-plex data from a Q Exactive Plus. Offline fractionation is involved as well as a ton of different time points and conditions. 

Interstingly, the authors do a lot of this with Perseus after processing with Proteome Discoverer which, now that I'm thinking about it, makes a lot of sense to do. 

Around all the great proteomics stuff, the biology comes off as super interesting as well. It's been pretty well established that smokers are biologically older than they are chronologically and this work lands on a really interesting observation I'll just steal from the text here. 

I found this just a really enjoyable work with a complicated series of goals that succeeded thanks to a rigorous experimental design, that pays off in the end with some cool biology. Highly recommended! 

Saturday, July 10, 2021

Proteometabolomics identifies protein bound metabolites in a fungus!


Oh. Raise you hand if you hadn't been considering how your metabolites might be directly interacting with your proteins and that it might be really powerful to do so. 

A really cool part of this method is how this can be a one pot solution. Despite how we seem to be divided into different camps between proteomics and metabolomics, basically all of today's high resolution instruments can do both really really well. In this case the authors do both great metabolite ID and proteomics on a Q Exactive Plus instrument. 

Friday, July 9, 2021

Rapid analysis of small amounts of heart tissue with Azo and PASEF!


I've got to move fast and can't do this great new study justice. Have you ever tried heart proteomics? It isn't a ton of fun. There are just a couple of proteins that make up just about the entire proteome. Unless there have been new developments, there aren't easy depletion kits. Most high coverage proteomics is long 2D experiments starting with tons of material -- or -- just 4 million Titin peptides. 

But if you want to understand what is going on in the heart, you have to go to the protein level. A lot of the cells rarely divide so you aren't exactly dealing with a lot of genomic instability issues like in cancer, and that's why this new study is so cool!

1) Low amounts of sample (they work down to 1 milligram of heart tissue. Not protein. Starting material!) 

2) It's fast! They optimize a sample prep method that gets the digestion conditions both reproducible and down to 30 minutes? What?

3) It's 1 dimensional! (or 4, depending on how you count, I guess) Two hour gradients on a TIMSTOF Pro. 

And it gets some serious depth of coverage. 4,000 proteins? 

Tuesday, July 6, 2021

Pioneer -- A clear pipeline for generating spectral libraries!

I think I've been using Skyline for close to a decade. Is that possible? I think it must be! And I know how to do exactly 2 things with it. And if I need to make even the slightest modification to those 2 pipelines, I'm more likely to beat the Oregon Trail (which I've never done) than to get Skyline to do it. 

Today I successfully pulled off a 3rd thing in Skyline in only like 11 tries, all thanks to Pioneer! 

You ready? This paper is super awesome

I've never heard of this journal, but if this is the kind of stuff they publish, I'm bookmarking it. This is step by step how to make a spectral library from like 12 search engines if you want. AND with the secret locations of everything in Skyline to allow you to compile spectral libraries WITHOUT the original RAW data! There are legitimately 6 steps highlighted in Pioneer that I'd never have guessed were remotely linked to making a library. 

If you're doing DIA or need spectral libraries for any reason at all, I can't recommend this great paper enough. 


Thursday, July 1, 2021

Requested repost -- how to install Fragger and other community nodes in PD!


Would you like to have a fully functioning version of the Proteome Discoverer environment on your PC at home (or on multiple PCs throughout your lab, which is a much more normal thing to do)? You obviously can't have the commercial nodes that the manufacturer has to pay royalties on, but you can have lots of tools including:

!!!MS-FRAGGER!!! operating in PD on any Windows PC. 

The SugarQB glycoproteomics workflow (which...I'd argue is as good as ANYTHING available for glycoproteomics for any price right now, with that caveat that you have to have your glycan mod in your database) 


Here is a link to a 20-ish slide walkthrough for setting up PD with a bunch of cool free nodes. I tried to include every relevant link, so your browser or security settings might be mad about all the extensions in the slide deck. 

This might not be completely accurate (duh), but I tried and I hope it helps. 

I put up the Version info in the image above, because maybe I'll update it going forward. 

As an aside, having PD installed at home helps me morally justify having a badass OMICSPCs system in my basement. It's only coincidental that it can run Crysis.