Wednesday, February 17, 2016

Should you narrow down your database to realistically observable peptides?


Hmmm....I'm kinda liking this idea, even if at first it seems a little like cheating...

Here is the paper (JPR, paywalled) from Avinash Shanmugan and Alexey Nesvizhskii.  I know this first-hand -- if you are studying mouse proteins and you search your MS/MS spectra against the sequences of every organism ever sequenced...


...you're gonna have a bad time. (Sure, there are exceptions, but in general why would you search mouse proteins against a Archaea FASTA? 

So lets take this thought a little further. What if you more intelligently built your FASTA database by using additional data at your fingertips?  For example, what if you went to the GPM and take a swing at targeted (peptides likely to be present in your LC-MS/MS runs) and untargeted (all sorts of stuff, whether its likely to be there or not) and see how it affects results? 

Turns out you end up with better data by making your databases even more realistic (i.e., biologically relevant)!!!

Sorry if I've been misusing the "i.e." thing. I just Googled it and one of the first entries was a long thing that explains the difference between i.e. and e.g. and it was way too many words.

I'm pretty fascinated by this idea and I think I'll give it a shot once I come up with a good example file and database. Definitely check this out when you get a chance!

No comments:

Post a Comment