Sunday, May 1, 2016

Differential proteomics for unsequenced species!


This comes up a LOT in conversation, particularly as I'm wandering about preaching the gospel of modern proteomics superiority over...well...everything else....cause that is just what you have to do sometimes.

What do we do if we've got no genome sequence for the organism we're interested in?

My general answer is de novo! If we're looking at high resolution accurate mass MS/MS spectra -- PEAKS and DeNovoGUI away and BLAST search that output. Or do the BICEPS thing.

Or take a look at this new (paywalled) paper in JPR from Sule Yilmaz et al., here.

Quick and interesting note from the paper:

In 2014 there were:

2334 completed genomes!
21,471 genome drafts

Which sounds like a ton, right!  Until you consider estimates of 2-8M species...

Okay, so how do these guys do it? By massively reducing the number of spectra they have to think about. I'm a little bit fuzzy on the details, not sure if I used decaf or if the method is just a little unclear -- but this is my interpretation.

They run two different samples. In this case they are looking at 2 parasites that are similar. One has a partial genome (or unannotated) and the other has none. By running them both, you can look at the samples pair-wise. What is unique to one can then be evaluated. What is shared can be eliminated then you end up with a lot less MS/MS spectra to worry about.

By using the genome sequence from the one and homology searching the stuff from the other you end up with a feel good story that this works well.

I really like the elimination approach here. I have some small concerns about the number of spectra that are there due to the intrinsic undersampling still prevalent in LC-MS/MS runs (by that, I mean the fact that we don't fragment every peptide present in every run so you might end up with a bunch of stuff that just shows up in the one sample that looks like its unique to one organism but really is just sampling issues)). Also, the 4% FDR cutoff here initially hurt my brain, but considering the variables employed and the relatively large mass cutoff filters they need to use (presumably due to instrument limitations? I don't know the device they use), 4% is a pretty tight control.

 My minor concerns aside, I think the approach described here is smart and unique, and one I think that would be something that would be amenable to several software packages. I'd love to give it a try!

I may revisit this one later. Where is that hyperbolic time chamber again?

No comments:

Post a Comment