Sunday, September 6, 2015
GOFDR! Analyzing proteomics data from the gene ontology level
Shotgun proteomics is amazing at identifying peptide spectral matches (PSMs). This is what we get out of the instrument: an MS/MS spectra that we can match to something with high confidence to something in our database. The tricky part is getting relevant biological data back out. Figuring out exactly what PSM belongs to what peptide and what peptide belongs to which protein is the hard part. Evolution is working against us here -- it is much easier from a biological standpoint to make proteins with new functions from similar protein than it is to make a new one from scratch.
There are some really clever people thinking about other ways of inferring biological data out and I think we'll be hearing about a lot of it soon. One new (to me!) approach is called GOFDR and its from Qiangtian Gong et al., and is described in this new paper here.
The idea is this: cut out the middlemen. That is, we've got the PSM confidently identified. If it is from a conserved region of a protein why would we bother going all the way through trying to infer which peptide and protein it is from. Chances are if its a PSM that matches multiple different proteins that those proteins are at least similar in their function. Thats the gene ontology part.
Example: This drug leads to upregulation of this peptide that can be linked to one of 60 different actin variants? Who cares what one it is, it sounds like this drug has a cytoskeletal component!
Thats the "GO" part. The "FDR"? its cause thats the level where they want to apply the false discovery rates, at the gene [protein] ontology level.
Is it simple in this form? Not at all. To run this pipeline the data is ran through multiple programs, including PSI-BLAST. At the end they see that they really have to spend time manually adjusting their scores and thresholds. Is it an interesting way to look and to think about our data? Absolutely.