Wednesday, February 12, 2020

Is a peptide quantitatively measurable? Here's how you find out!

Okay....are you guys ready for this one? I wish I could say I was, but it's too important for us as a field to not think about....

Matrix matching?
"Analytical figures of merit"??  Hey! This is the proteomics party, don't you come in here with all your boring analytical chemistry validation stuff....oh.....ugh...okay....

(Yes. I had to make that. You're welcome.)

Why is this (study) important? In part because it addresses 2 separate concepts that need to be separated -- and they're right in the abstract:

"....Our results demonstrate that increasing the number of detected peptides in a proteomics experiment does not necessarily result in increased numbers of peptides that can be measured quantitatively....." 


First of all, this study is like 4 pages or something and it represents an absurd amount of work. SRMs and DIA experiments (QE HF, I think) and a bunch of different HPLCs and the matrices are all sorts of fun -- CSF and FFPE and yeast digest and maybe I missed one.

What's the point? Well, I think the goal was to set out and develop some powerful standard curves without heavy standards, but the quote above suggests a really powerful fundamental truth was kind of a side effect and it kind of steals the show.

We do a lot of relative quan stuff in proteomics.'s seriously just relative....and a lot of the results make no sense at all. And this study looks at an absurd amount of data and -- look -- some peptides are just not quantifiable in their background matrix. Real quan has things like linear dynamic range and other boring terms like LOQ/LOD/LLOQ/LLLLOQ and if you really dig into them the way this team did, there is only one solution --

"....Our results demonstrate that increasing the number of detected peptides in a proteomics experiment does not necessarily result in increased numbers of peptides that can be measured quantitatively....." 

Same quote twice....? Why not.

Tuesday, February 11, 2020

The single cell proteomics revolution!

There are seriously 10 papers open on my desktop that I want to blog about -- and will! -- but I'm busy, so time for another super lazy post.

Last year some cool people asked me if I'd be interested in doing some articles about things happening in proteomics that I absolutely thought the outside world should know about. My first thought?!?? Single cell proteomics (by SCoPE-MS).

This is the best I could come up with.

(Of course, I love to type, so I also talked about the study I credit with making proteomics a reality for the rest of us.)

On this topic, I recently was so sleepy that I went through all the "comments" on my blog. There was around 2,000 spam messages suggesting all sorts of terribleness, but there were also some legit comments. And -- I tell you what -- SCoPE-MS gets some comments. Particularly regarding aspects of the RAW data in the public repositories, and I think that is something we will really need to talk about at some point.

My opinion is that we've been really lucky as a field in that we....mostly haven't actually been sample limited. Ten years ago the people doing cell culture would look at me like I was a tyrant when I said I needed 1mg of protein for global + PTMs. I get the same exact look now when I ask for 50 micrograms.

With the exception of PTMs on tyrosine, glycopeptides and a few other weird things, I'd feel comfortable saying that >90% of the peptide MS/MS spectra reported in the literature have looked like this --

>80% sequence coverage thanks to
1) An abundance of signal
2) Really really friendly charge distribution thanks to basic residues

In SCoPE-MS we don't have #1. There is a limit to how much you can load your carrier channel without fogging your single cell signal (as an aside, I have a crazy hypothesis that this limit is very different depending on whether you are using a D20 or D30 Orbitrap). So...the spectra are always flirting with the background noise. low signal, nothing is all that pretty.

Here is the big question though:
How many fragments to you actually need for confidence in that identification?

Another question: If you were doing targeted peptide stuff with SRMs how many do you need to trust an identification? 3? With unit resolution? And a good reproducible retention time?

I think we've got a philosphical hurdle at some level for this one, particularly for people in our field with Analytical Chemistry as their background. If you look at who got really comfortable with the SCoPE-MS stuff and jumped on it first, I think it has been the people who are coming from the genomics or informatics world.

I promise, if you had been looking at microarrays yesterday, the SCoPE-MS data is a huge and beautiful upgrade. But, if you are used to loading 1ug of peptides on your Q Exactive....SCoPE-MS data is going to take some getting used to.

Monday, February 10, 2020

Purple -- Pick unique peptides for viral (and other?) experiments from FASTA!

Hey you! Are you looking for a tool to help you select viral peptides for targeted assays? 

Unrelated --- what is the best color of dinosaur? 

I got you, yo. Check this out. 

Before you panic, when they wrote the paper "Purple" was just a Python script that you can get here. I assure you this is no longer the case. There is a very straight-forward (to install) executable that will set you up with a GUI that looks just like this --

-- that you can get here.

What does it do? Well, it helps you select peptides that are ideal for targeted assays from the databases you feed it. Imagine Picky, but you can load stuff that isn't human into it. (If you are doing human proteomics -- you should be using Picky, btw. It's amazing).

Purple: Feed it your peptide sequences you're interested in: Feed it your contaminating background. Choose your rules. Get your peptides!

Sunday, February 9, 2020

UniProt has a page and resources set up for 2019-nCoV now!

A lot of people downloaded my ugly FASTA for 2019-nCoV after I posted it. UniProt has done their normal crazy meticulous job of assembling all the data and is a much better resource.

You can check it all out here.

Thursday, February 6, 2020

Peptide biomarkers for bacterial pathogens!

I've only got a few minutes, but -- wow -- is this ever worth reading!

Microbial ID by shotgun proteomics is NOT new. But promising study after promising study seems to end up with -- no new clinical assays.

MALDI-TOF with a BioTyper is easier in the clinic, I guess, but maybe we just need the right technologies to get us over the hump. Clearly, the insistence of researchers to continue utilizing NanoLC is a big hurdle, but maybe innovative sample prep methods would also help bridge the gap?

They use some crazy technology in this one. A flow cell digestion method that allows a tryptic digest of bacterial proteins in one hour? And a depletion technology that removes "host" (human!) biomass??

I have to mention that this study is a big collaboration between groups in Stockholm (where HUPO 2020 is!) and Gothenburg, a city blessed by some dark metal gods or something to be the birthplace of the greatest bands that have ever walked this earth. Yup, I definitely had to mention that.

Tuesday, February 4, 2020

22 Phosphoproteomics Data Analysis solutions go head to head!

Sometimes I take a dataset and compare 2 different data processing pipelines. One time, maybe I compared 3? 

22? What? Wow! Why do we even have 22 pipelines?  The abstract suggest that there are very good reasons, actually -- the results aren't the same....and they propose a solution for this. Only a paywall and a biological requirement for sleep stand in my way of reading this right now!

As a reminder -- there is a super epic community proteomics PTM challenge coming up in less than 2 weeks and I think maybe 10 labs have signed up for it so far.

I think that this is probably a great resource to help set the stage.

Covalent Protein Painting to measure in vivo protein misfolding!

If there is an easier looking experimental method to measure protein misfolding in vivo, I've never seen it.

If you are interested in structural proteomics stuff at all, I highly recommend this preprint.

Formaldehyde is pretty efficient at binding to proteins! Turns out that:

1) you can get heavy stable isotopically labeled formaldehyde
2) in your cells the formaldehyde can only get access to the outside of your protein 3D structures, effectively "painting" the surface of them.
3) You can compare different biological conditions by using "heavy" and "light" formaldehyde.

Digest your proteins with chymotrypsin and 'voila -- you can quantitatively compare the outside of your proteins and protein-protein complexes!

The downside here is that you have to think hard about the peptide identifications as -- CDH2 : 13CH3 , 13CH3 : CDH2 , 13CHD2 : CD3 , CD3 : 13CHD2 -- could correspond to Disaster Level: "deuterated deamidation" study.

To fully eliminate this an issue, these authors acquired MS/MS at 120,000 resolution! my opinion is overkill, but on the instrument they used, theyv'e got 60,000 or 120,000 to choose from and 60,000 is going to get a little sketchy on the larger fragment ions. (Loosely related...I commonly run at 90,000 resolution on another instrument...)

Despite the decreased number of scans possible on an LC time scale, they come back with a tremendous amount of data.

In case any of the author see this -- Unless I'm completely misunderstanding what I'm seeing -- Extended Data Figure #4 is possibly my favorite visualization I've seen of anything so far this year. (Maybe I should put this commend on the bioRXIV thing like I'm supposed to....)

Oh yeah! I almost forgot! On top of how cool the technique is, the authors make some interesting findings regarding protein folding and alzheimers!

Sunday, February 2, 2020

Remember that Prosit thing everyone was talking about? It is super easy to use!

It's about time that we talked about how to add....

...well...deep learning...(but...come on, I HAD to use that when I found it, right?!?) to your proteomics workflow!

Don't want to read my rambling about why Prosit is awesome and just want to do it? Skip to Part 2 below!

I almost guarantee that there is someone at your facility who drops all sorts of words like this around -- and maybe that same person has given you reason to question their intelligence in other matters, but as long as they keep saying things about "neural networks" and "semi-supervised" whatevers it seems like everyone wants to talk to them, and maybe give them lots of money. Follow this easy walkthough and THAT COULD BE YOU. 

I jest, because Prosit is the real deal and has real world advantages, including more and higher confidence identifications right now.

For a biomolecule, the peptide bond is a joy to work with -- energetically -- crudely optimize the collision energy and you'll break most of them. Our friends in the small molecule world, where I continue to dabble don't have it anywhere near as good. There seems to be no rhyme or reason to what energy will break which bonds. When I do QE metabolomics, I step my CE, typically with 10, 30, 100. Just to come close. The ID-X even has something called "assisted" where it tries to help. Most of the time when you've got a molecule you really want to study, it makes sense to run it 10 times with different energies....

However -- just because peptides are better than most molecules at fragmenting, that doesn't make them consistent. Look at them. Why on earth would you miss the y7 in this peptide or the y4 in that one? It's just not there. And -- at some level it must make sense --energetically.

Prosit was described here last year:

In as few words as I appear capable of writing -- Prosit looks at the ProteomeTools database (you know that thing where they are synthesizing EVERY human peptide and then fragmenting them and making libraries?) and it models the peptides YOU give it against that library with this deep learning thingy.

PART 2: How to use Prosit! 

You will need:
1) A protein .FASTA database.
2) The EncyclopeDIA (you can get it here)
3) That's it. I just felt dumb making a list with 2 entries in it.

EncyclopeDIA can do all sorts of smart stuff (some of which I wrote not smart stuff about here) -- and it also has awesome utilities. Such as "Create Prosit CSV from FASTA"

As an aside, I heard from the Prosit team -- they'll have this integrated soon, but if you wanted to put the words "deep learning" on your ASMS abstract that is due tomorrow you have to do what I am doing.

This is ridiculously easy. Add your FASTA. It will make you a Prosit .CSV file. I believe very strongly in you and your abilities. You'll definitely be able to do it!

Now -- go to and load that CSV you just made.

Hit next and then tell Prosit the format of your output library:

I'm using MSP because I can't afford Spectronaut yet. Then submit your job!

Now -- this is important. When you submit the job you'll go into the queue. You'll want to copy the link URL it gives you and/or the Task ID number. You will not want to close your browser without remembering to do this, because you won't get your library. When it's ready you'll get a download link!

If you want to check the quality of your MSP library -- the PDV is a nice, lightweight, java program that will allow you to flip through all of them. If you've already got the NIST MS Interpreter installed it will also load them. PDV will look something like this!

For this peptide, Prosit predicts that for a CE of 27 I'm not going to see every b/y ion. There are some bonds that it thinks, from the hundreds of thousands of real peptides it has studied, just won't fragment well.

And if, for example, you are looking at that real peptide. And it's right? Then you aren't penalized for missing that fragment when using this library!

Saturday, February 1, 2020

Predicting PTMs in 2019-nCoV Wuhan Coronavirus

Yeah....maybe I need a hobby....but I think this stuff is cool AND I've learned how to use some new tools thanks to my curiosity about this new virus and thinking about how I would analyze proteomics data from the virus if I could get my hands on it....

Here is the question: PTMs don't typically just happen indiscriminately. There are particular motifs that are the targets of the enzymes that add the PTMs. So...can we start with just some unknown linear proteins and predict what PTMs that we would find?

And...are those predictions any good? I can't yet answer that part directly, but I'm trying.

There are a LOT of tools that predict PTM sites. After two late nights of trying a few of them and doing a lot of failing -- this older one is my current leading favorite -- and you can read about it here.

If you've got better things to do on a Saturday than read, I got you, yo! 

You can also just go and dump stuff into their server at The interface is super straight-forward. Put in your protein FASTA entry (one at a time), pick your mods and hit the button. (You can also install it locally, but I'd rather use their electricity.)

You are capped at 5,000 amino acids per model with the web interface of their server.  And you are definitely penalized for longer sequences. At 1,000 amino acids, I recommend walking your dog.

Okay -- so only one protien from the 2019-nCoV translated FASTA is over the cap, so I broke it into 5 separate translated regions in order to have a large overalap in peptide sequences (in case the domains it is modeling against for PTM prediction are large ones). And -- it took basically all morning.

You get a pretty output that you can keep or have it kick you out a Tab(?) delimited text file. I spent a lot of time swearing while combining everything into a single Excel file (I need to grow up and stop using Excel. It always seems like it will be easier -- even though it increasingly is not the easiest solution.

Okay -- and here I'm talking smack about Excel -- and the Ideas button just did something smart!! Normally, it's just funny to hit the button, but -- darn -- it made a decent Pivot Table!

If you're interested in the actual motifs predicted to be modified, you can download them from my Google drive here.

Okay -- so -- that's all nice and all. Predicted PTMs are a pretty big step away from actual PTMs.

..and rightly so...

Can we test this?

I mentioned a couple of days ago that there was some cool unpublished MERS-CoV proteomics data on MASSIVE.

Now -- this is CID ion trap MS/MS data -- not my favorite source of data for identifying PTMs. It also kind of rules out some of my favorite tools, because they were designed with HRAM MS/MS data in mind. So...back in the time machine to the 1990s to fire up SeQuest and take a minute to polish up my sense of skepticism....

Okay -- this will take more than a minute or two....I forgot how long CID MS/MS takes to search with a couple of PTMs.

I broke it up into queues and only one has finished -- aaaaaaannnnnddddd....nothing! I do actually need another hobby....maybe something I can do inside, in case I screw up my knee and have to do a lot of sitting around for a while.

However -- there is A LOT wrong with this system. One -- we're looking at single shot analysis from 2009s best mass spectrometer -- in a human cell background. We're not exactly digging to the full depth of the proteome -- and PTMs rarely want to announce themselves. Two -- I'm using a prediction model of one virus that is similar to another, but we are definitely reaching when trying to make predictions off the little data across the board. Three through 41 --? I didn't even look to see if that region of the similar protein is even digested by trypsin. Maybe that is for next Saturday.

Posting some friendly reminder from Dr. Yates.

One of the laziest posts I've ever made...but I've got a lot of stuff to do this weekend.....