Thursday, March 27, 2014

Getting beyond parsimony

Wanna feel dumb sometime?  Pick up a paper with Oliver Serang listed as an author.  But its important, very very important that we start thinking about the things he has been thinking about, because what Oliver has been working on is our problems with parsimony.

(I've stolen the following screenshots from a lecture posted by the Tabb lab at Vanderbilt.)

Parsimony is the thing (how do you grammar?) that happens when we sequence a bunch of peptides that can be explained by multiple proteins.
We don't have any evidence to say whether this is Protein A or Protein B that contributed these peptides, or even whether both proteins are present and they both contributed these same LC-MS compatible peptides together.  What we do then is group them together.  If Protein A is shorter than protein B, then we'd call this protein group and give it the accession number of the shorter protein in the grouped table.

Another stolen image from Tabb lab (I love you guys, and I owe you a beer!) This one shows how much worse this problem can be:

In this case (not so rare as we might hope), two proteins can explain all of these peptides, but that doesn't truly mean that either are there.  Other proteins could explain this, but it is much simpler to say it is these two and not some other 4.  And I think that really illustrates the problem here.

What if there was a way of telling which protein contributed this peptide?  Imagine the possibilities here.  Wait, don't imagine it.  If you are using Proteome Discoverer, go into one of your protein reports, turn off protein grouping, and look how many more proteins are on your front page!  Anything that could get us closer to that point would most certainly be a win, right?!?!

This is why we need to be thinking about this.  The potential for free data.  The potential to separate keratin 77 (a potential cancer biomarker) from keratin 90 (crap floating in the air all over the place).

How do we do this?  Crazy advanced statistics  Probabilistic networking (I think that is the term)....the stuff that Oliver does!

I'll direct you to two different papers.
One I've been puzzling over for a while from the Steen and Steen lab where non-parametric thing-a-ma-jings are evaluated (open access).
and
A new one in PlosOne that demonstrates how such networking need not eat all of the processing power on the planet for every RAW file (that is brand new and helped remind me how much I was procrastinating on this subject....)

So...how is this useful to us biologists out there who are scared of Greek letters in general, or is this another one of my useless tangents?  Well, on the other side of the screen where I'm typing this I have the Proteome Discoverer 2.0 Alpha version open.  Now, it is an Alpha, so I can't guarantee everything in this thing is going to be in the full PD 2.0 release...AND...I can't guarantee that I'm allowed to talk about it.  But considering the number of empty Stellas in front of me right now, this isn't my biggest concern at the moment.

BUT...some of these equations appear in PD 2.0 and I'm about to test them...as soon as I find my plane.


No comments:

Post a Comment