We kicked this around really hard back when there was a Proteomics Old Time Radio Hour. Not this paper, but the base concept of mixed proteome digests for quantitative studies. I'm still uncomfortable with it as a concept, but let's talk about the paper first.
The main part of the study is simulating single cell digests with proteomics people's favorite toolkits. They took a cancer digest and spiked in an E.coli digest at one concentration and Yeast digest at another.
Then they used a really cool robot that doesn't appear commercially available that they have designed themselves over the last 16 years or so that I really truly wouldn't mind having and they did some actual single cells. Most of the paper is on the first part, the low level mixed organism quantitative digest.
Since they now knew what the ratios should be they ran a bunch of samples and replicates and used DIA-NN and SpectroNaut and PEAKs and tried different settings and came up with some interesting findings.
Begin concerns about multi-species proteomic mixtures as benchmarks
Here is where my concern always comes in for these things, though. The yeast proteome is like 4,000 proteins, and you'll basically always see 1,500-2,000 of the higher concentration ones. E.coli can produce like 3,000 proteins, but some are for anaerobic growth and whatever so I think you'll normally see something like 600-800 E.coli proteins from an aerobic digest without trying too hard at all.
I love the concept of a mixed species digest, but is that a realistic biological model? In what point in human biology are 1) there going to be an extra 30% of proteins available and 2) is 30% of the proteome going to be significantly altered and 3) altered in the same way?
It's weird, right? Like if I was writing a normalization algorithm I think that I'd write an IF/Then statement that is like
IF 30% of the proteome is at 1/10 of the base peak
THEN you f'ed up somewhere, PRINT gibberish.
That's just me, and I don't know what the real answer is, but I sure haven't seen a comparison of two drug treated cells where 1,000 proteins have been significantly altered. I doubt that if you had a biopsy of a patient colon that was noncancerous and one that was at tumor that you'd see over 1,000 proteins that are significantly altered. So I'm not sure that's the best possible way to test an algorithm.
Back to the paper!
- there is solid gold in this study, btw. What normalization things to use, what post-analysis R packages seemed to work and what seemed to distort things worse. Totally worth realding even without the bit about the 5 papers about their microcope based sample pickup and prep robot.
Also - just noting - the instrument used for label free single cell proteomics is a Pro2. Not an SCP or Ultra, etc., and they get some legitimately useful numbers.
