Question: If we went all out and threw every protease we had at a cell line and did offline 2D fractionation and then really optimized the nLC-MS/MS parameters could we finally match or beat RNA-Seq?
Answer: Heck yeah we can!
Question 2: Would we need 100 fractions? Would we need 72 fractions? At what point do we stop gaining PSMs/peptides/proteins? Pushing this question further, could we compete on the same time scale as "next gen" DNA/RNA sequencing?
Both the answers are in this brand new (open access) Cell Systems paper here!
<start ramble 1> In the genomics vs proteomics argument these are what I perceive are our key disadvantages from talking to people on the outside.
1) Proteomics has reproducibility issues (we know we're getting better, I gave a talk to a bunch of physicians recently -- this was the #1 thing people wanted to talk about afterward...for a long long time afterward...). We'll get the QC thing down, new data processing algorithms are and will demonstrate that we have less problems being reproducible than it has looked like, and better sample prep kits that require fewer compromises will help a lot.
2) Proteomics is still intimidating to researchers outside our field. I think part of this has to do with the fact we've been in our exponential growth phase in terms of capabilities. I'm running into more researchers who had proteomics covered in class, or even sent some samples off to a core at some point for PMF or something -- but so much has changed, so fast, that even these most experienced outsiders have trouble keeping up with what we're doing today....
3) A few people have shown coverage levels in human proteomics that comes close to what you'd get with "next gen" sequencers, but it took a whole lot more than 1-3 days to get the data!
<end ramble 1>
If someone says to you "I need the deepest possible characterization of this mammalian cell line or tissue that you can possible get me" -- read this paper before you start optimizing!
They tried different numbers of fractions -- all the way up to 70!!
They used 5 different enzymes
They painstakingly optimized the second dimension nLC to match the ideal fractionation method.
You'll never guess what they came up with as optimal. I need to add a GIF then I'll ruin
46 high-pH reversed phase offline fractions
33 minute nLC gradients (15cm C-18 columns) 33 min!! Are you as surprised as that majestic animal above?!?! I sure am!!!
That is under 36 hours! Depending the sequencer arrangement, this is coverage approximating "next gen" gets --but at the PROTEIN LEVEL -- on a benchtop mass spectrometer -- in possibly half the time!
There is a lot more in this paper, but I can't spend my whole Sunday raving about how much I love this paper. They show PTMs identified en masse without enrichment. They also do phospho- and acetyl- peptide enrichments. Tons and tons of work, and obviously weeks of runtime, all so we don't have to.
Did I even mention once they optimized this with one cell line -- they used this methodology on a whole bunch of other ones!??! If the authors happen to see this -- THANK YOU!!!
All the data is also available via PRIDE at this ProteomeXchange link (PXD004452)
If you are doing global proteomics and some sorceress places a curse on you that you can only read one proteomics paper that has been published so far this year -- I recommend this one.
In the most minor minor minor of drawbacks I'm going to have to mention with this ridiculously awesome work and resource for our community...I'm just a little stressed about keeping 46 separate fractions organized and checking each one for corruption (in 2017, so far I have downloaded exactly 1 file from ProteomeXchange partners that has wouldn't load. Re-downloading fixed it. But...man, it is no fun to be 48 hours into a queue and finding out one file is dead...)
Side note: Does anyone have a tool that would look at a big folder full of RAW files and tell me if all of them are functional? I just open every one of them in Xcalibur and close it. If there is a better way, or if you'd like to work on making one, please let me know! I know I sound seriously ungrateful for someone benefiting from the Golden Age of proteomic bioinformatics but at that step I feel like I'm doing this...
TL/DR: AN AMAZINGLY AWESOME study out of some great minds in Denmark shows us how to get the most comprehensive proteomes we've ever seen -- rapidly and with somewhat counterintuitive methodology!!!
Edit 7/4/17: UCDProteomics pointed out on Twitter that this method is not entirely novel. It is similar to the Fast-Seq/Fast-Quan workflow from the Qu Lab, a study that is news to me.