Wednesday, September 24, 2014

MixGF -- A solution for coisolated peptides?

Every once in a while we get an MS/MS spectra that matches more than one peptide in our database. Maybe the two sequences have exactly the same amino acids, but the order is different, or maybe two combinations of different amino acids will give you extremely similar masses.  With high resolution / accurate mass measurements at the MS1 level this is a whole lot more rare than with lower resolution devices, but it can happen.

MixGF is a probability algorithm from some guys at the proteomics powerhouse UCSD.  Description of the algorithm is currently in press at MCP and you can read it here.

In the introduction to this paper, the authors introduced me to a very scary statistic (and the references to back it up).  50% of "isolated" peptides for MS/MS are (at the very least) coisolated.  Sure, I knew the number was high, but 50% of what we're looking at that are assuming is a single peptide is 2 or more with signal high enough to be statistically relevant to our scoring?  The more I think about it the more it makes sense, but I do find it scary.

MixGF is a program that attempts to make this seeming disadvantage and advantage for us.  What if we could identify the co-isolated peptides?

A similar analysis program, with a similar name was recently released.  When two or more top end labs work on tackling the same problem, you know its a big one.  I don't have time to really dig through the math from the two to tell you the differences, but it sure puts the perspective on this issue.  If we ignore it, we're missing a lot of information.  If we tackle it head-on, however, there may be lots to gain!

Fortunately, there is light at the end of the tunnel here.  This study finds a much lower level of statistically relevant coisolated peptides than the studies they site and estimate around 30% for the digests that they analyze.  (This would match surprisingly well with the running averages I have for most Proteome Discoverer analysis...simply by averaging the "coisolation" column on big datasets).

And they also show that the use of MixGF leads to a great big boost in peptide ID numbers!  Great study.  Let's all get thinking about these things (and if you want to write a free node for Proteome Discoverer I'll do what I can to help you out!)


  1. It's unfortunate there's no software implementation available to download. The method seems to be important for all MS/MS experiments, so it's surprising that the inventors would expect everyone to implement the algorithm themselves.

  2. Hi Dario,

    I have found this link for you.