Tuesday, March 31, 2015

Find cancer mutations easily with Proteome Discoverer 2.0 and XMAn!!!



Wow!  I haven't posted in days. I have been busy! And I've got lots of cool stuff to write about thanks to some reader emails and some neat new literature studies, but this one has to go first.  I hosted a Proteome Discoverer workshop last week and the coolest idea occurred to me part way through.  Possibly this was brought on by a participant comment (there were a lot of smart people in the room!) but I'm going to claim this as my own semi-original thought.

If you are a cancer researcher you have a big interest in what mutations are present in your genes or proteins.  What if I told you that PD 2.0 can tell you if you have cancer mutations present and detectable in your proteomics sample?  Maybe you already have an awesome and easy pipeline. But I think this is pretty darned cool (and its new to me!)

What you need:
1) PD 2.0 (sorry...)
2) The XMAn database
3) A good contaminant database (go for cRAP)

Import all 3 databases into PD and then set up a normal Processing Workflow:


In PD 2.0, you can select as many databases as you want to use for any search.  Here I selected my contaminants, my normal human and my mutants (XMAn database)

Next you need to set up a consensus workflow.  The critical thing you need to use is the Consensus node called the Protein Marker.  This node lets you keep track of which ID comes from where AND lets you label what the output is:



Here I've labeled my contaminants as such, my proteins that match Uniprot exactly as "normal" and anything from the XMAn database as a cancer mutation.  This is what you get in the PD output:


As you'd expect, most proteins aren't mutated.  Thanks to the Protein Marker node, however, you get three new columns.  You get an X in the column(s) that finds matches you your spectra.  Again, most are going to be normal.  Here, I cheated, though.  I used a HeLa digest and its got mutations all over the place.  By clicking on the top of the column I can sort by proteins that have 1 or more mutations in them.

This gets really interesting when you go to the peptide group level.


Here, I did a quick filter by peptides identified that were NOT present in the "normal" Uniprot database.  IT turns up several peptide sequences that were only found in XMAn.  Known cancer mutations found in HeLa, what do you know?  And what can I do next?  Well, at the top level you can pull out the XMAn nomenclature for the protein ID (the one I highlighted is O00203).

I figured the easiest place to get info was COSMIC.  I typed the nomenclature (as I got it from the PD column into the search bar....and BOOM!


Tons of info!  This mutation has been noted in over 100 studies in the past.  I get references to all of them, I can look at the structural info from the genome level that leads to this mutation.  Now, the obvious next test (running right now!) is:  can annotation and/or Protein center make sense of the XMAn nomenclature?  If not, I bet you they could get it going pretty quick!

Okay.  Again, maybe you have a cooler way of doing this.  But I didn't.

P.S. The data looked better when I used MSAmanda over SequestHT.  Better FDRs.  Maybe due to the single file I'm searching vs. Percolator.

2 comments:

  1. Hi Ben! Thanks for a very nice blog! I tried to find out when the PD 2.0 is available, but didn't find the information. Do you know any time frame?

    ReplyDelete
  2. I wish I knew. The beta tag was removed in the newest version. I think I'm playing with the full commercial release right now. I think we're currently waiting for the lawyers to sign off on it.

    ReplyDelete