Tuesday, December 19, 2017
ProteomeGenerator -- Another step closer to comprehensive proteogenomics!
Wow. Um...this is a solid study. Awesome new tool? Top notch mass spec work? The best gains I've ever seen in incorporating transcriptomics into a proteomics workflow? Check. check. check.
It is in BioRxiV here.
We've seen some big proteogenomics papers and I'm sure that we've only seen the tip of the iceberg, but there is some important stuff in this paper. First off -- yeah -- we have to do some bioinformatics to make this work, but this one is laid out pretty well.
Second -- the pipeline here runs in something new to me called SnakeMake. It appears to be a framework to basically make any code as scalable as you want. Scalable isn't a word? Too bad.
A really interesting finding is the size of the databases this tool generates from the transcriptomic data -- it isn't huge amounts of FASTA coming out of this tool and skewing your FDR all over the place. It is compact. Smaller than the canonical databases, because your cell type isn't producing every theoretical human protein at every point in time. It's only producing transcripts for the ones it needs. Smaller databases, lower FDR, and more peptide matches!