Monday, December 8, 2025

Benchmarking algorithms for single cell proteomics - is multi-proteome the right way to do it?

 


We kicked this around really hard back when there was a Proteomics Old Time Radio Hour. Not this paper, but the base concept of mixed proteome digests for quantitative studies. I'm still uncomfortable with it as a concept, but let's talk about the paper first.


The main part of the study is simulating single cell digests with proteomics people's favorite toolkits. They took a cancer digest and spiked in an E.coli digest at one concentration and Yeast digest at another.

Then they used a really cool robot that doesn't appear commercially available that they have designed themselves over the last 16 years or so that I really truly wouldn't mind having and they did some actual single cells. Most of the paper is on the first part, the low level mixed organism quantitative digest.

Since they now knew what the ratios should be they ran a bunch of samples and replicates and used DIA-NN and SpectroNaut and PEAKs and tried different settings and came up with some interesting findings. 

Begin concerns about multi-species proteomic mixtures as benchmarks

Here is where my concern always comes in for these things, though. The yeast proteome is like 4,000 proteins, and you'll basically always see 1,500-2,000 of the higher concentration ones. E.coli can produce like 3,000 proteins, but some are for anaerobic growth and whatever so I think you'll normally see something like 600-800 E.coli proteins from an aerobic digest without trying too hard at all.

I love the concept of a mixed species digest, but is that a realistic biological model? In what point in human biology are 1) there going to be an extra 30% of proteins available and 2) is 30% of the proteome going to be significantly altered and 3) altered in the same way? 

It's weird, right? Like if I was writing a normalization algorithm I think that I'd write an IF/Then statement that is like 

IF 30% of the proteome is at 1/10 of the base peak

THEN you f'ed up somewhere, PRINT gibberish. 

That's just me, and I don't know what the real answer is, but I sure haven't seen a comparison of two drug treated cells where 1,000 proteins have been significantly altered. I doubt that if you had a biopsy of a patient colon that was noncancerous and one that was at tumor that you'd see over 1,000 proteins that are significantly altered. So I'm not sure that's the best possible way to test an algorithm.

Back to the paper! 

 - there is solid gold in this study, btw. What normalization things to use, what post-analysis R packages seemed to work and what seemed to distort things worse. Totally worth realding even without the bit about the 5 papers about their microcope based sample pickup and prep robot. 

Also - just noting - the instrument used for label free single cell proteomics is a Pro2. Not an SCP or Ultra, etc., and they get some legitimately useful numbers. 

Sunday, December 7, 2025

Finally! A ready-to-run human plasma proteomics standard!

 


Disclaimer: I'm going to ramble about a new commercial product that was totally my idea and if you buy it I'll probably get money back for a whole lot of enzymes I personally bought. This was actually a tough post to write that I deleted and re-typed several times because it seems antithetical (which might be a thing) to this whole blog thing. Meh.

Ramble: 

I had a few months between my academic appointments which ended up being a top notch sabbatical, and that's what I'm going to call it from now on. I consulted for some really cool companies, found time to gracefully exit the CRO thing I founded several years ago, and got a really up-to-date view of what dozens of companies in proteomics are doing these days. During the consulting bit I'd sometimes go places or remote log in to instruments and help with experiment optimization. 

Everyone had the K562 proteomic digest from Promega or the HeLa digest from Thermo/Pierce. Add formic acid, inject it, it should look the same on identical instrument configurations regardless of where you are. 

Unfortunately, almost everyone actually wanted to do blood/plasma proteomics. And these things couldn't be more different. More than 90% of blood is composed of 1 protein and 95% of it is composed of like 14 proteins. That's not what the proteome is of cancer cells with 150 chromosomes which are full almost to bursting trying to express every protein in their entire genome. A great K562 method might give you plasma proteins, but it's not going to be great. It's tough to find 2 things in proteomics that are more different. 

So I went and batch prepped some plasma so I had a standard that I could use to compare things for the companies I was working with - and it was awesome. I also had comparator data because it was a sample I'd used before on multiple instruments over the years, and I ain't changed my bulk proteomics sample prep method since 2017. 

Then I was like - wait. WTF. Shouldn't there be a commercially available one? Why isn't there a commercially available plasma proteome tryptic digest?? 

How hard and expensive could that be? 

Oh. Oh ye of excessive confidence. 

But now you can just buy the first successful attempt at a standard - Equalizer I - from ESI source solutions! It's just a neat plasma digest, so it's ridiculously insanely hard to see anything besides albumin and immunoglobulins and about 100 other things, which is the exact opposite of the cancer cell line digest. Again, very clearly biased, but if no one ever buys it, I honestly don't care because I won't ever have to prep a plasma proteome digest ever again in my life and I've personally got something to do method development on. If anyone else finds it useful, we tried hard to keep the price down and $375 will get you 100x 200ng injections along with comparator data from 6 different instruments or something (a number I hope will grow soon). 

Saturday, December 6, 2025

DancePartner - Use Python wizardry to mine multi-omics from...PubMed?

 


I saw this one 3 times, loved the logo, but questioned whether it was anything useful to me and finally just read most of it. I moved to the Github halfway and started trying to install it

Paper link


Is it the easiest thing I've tried to do today? No, but I also had a 4 year old pumped full of hot chocolate in a Sporting Goods store when dude decided football cleats WERE MISSION CRITICAL and we ended up leaving with nothing at all. 

But....could you....hypothetically have Dance Partner dig through PubMed and find you a list of proteins, transcripts, lipids and metabolites that have been associated with the blood brain barrier? I don't know, but my cat keeps screwing with my mouse and if I put typos in some python code in Spyder nothing works, where I can put typos in this box and just hit the publish button and it's just normal. 

Friday, December 5, 2025

Frustrated by TIMSTOF chromatography limitations? FREE THE CAPTIVESPRAY!

 


I ran across this looking for something else.... Honestly, I really like the Ultra2 source, but if I still had one of the older ones I'd look into this, for real. 

Tuesday, December 2, 2025

opt-TMT -scale down everything so you aren't wasting so much reagent!

 


There is another optTMT, but that one doesn't have a dash and it's for designing smart multi-batch mutiplexed experiments. You can read about that one here

This new one is about how a lot of TMT labs are labeling 400 pounds of peptide (181 kg) and then injecting 200 micrograms per injection on their Orbitraps and 1000 micrograms on their Astrals. 

If you wanted to just label 10x more peptide than you'd possibly use instead of 10 million times more peptide, how would you do it? That's what the dash is for! 


While this might seem just a little silly since there are protocols out there that have been replicated dozens of times for labeling single human cells, they are actually a lot more convenient than you'd think. We know how much reagent in our lab to use for 1 cell or 25 cells and it's a drag when we have to break out the peptide quan kits and borrow someone's plate reader. This study gives you that in-between concentration fully optimized. 

Monday, December 1, 2025

Another funny solvent is better than formic acid for proteomics?

First off -- 

CHECK WITH YOUR HPLC MANUAL OR MANUFACTURER!!



Is the resolution of GIFs getting worse all the time? If so, it's the only change I've personally seen from this whole "AI revolution", except people saying "I asked ChatGPT" when they would have said "I did a Google search" back before Google reorganized and put their search algorithm teams under the control of their marketing teams. True story, that's why Google really doesn't work well anymore and AskJeeves is back, but now it needs more electricity than all of Spain will sue this year to look up stuff on Wikipedia for you. 

Okay, so someone at some time decided formic acid was a pretty good compromise. Pretty sure it was people in the John Yates lab. TFA gave you the best possible HPLC peaks for peptides, but it lowered your ionization efficiency. Acetic acid gave you the best ionization efficiency but if you were doing MuDPiT (which was a 2D chromatography system for proteomics best left forgotten today but it provided unprecedented proteomic coverage with the awful HPLCs we had at the time), acetic acid messed up your peaks too bad. So...formic acid it is.

Worth noting, formic acid has some drawbacks like poor stability in light, particularly when diluted. So when a lab dropped a paper showing acetic acid should be revisited, we jumped on it. My lab doesn't use formic acid in our HPLCs at all. We do have vendor permission and we have several thousand runs to demonstrate it hasn't been a bad idea at all

So when I was contacted by a researcher who was like - "yo, we have something better!"  we borrowed someone else's HPLC and tested it out. In our hands on (nanoflow) it's only marginally better than acetic acid, and possibly so marginal that on the sub-nanogram loads it wasn't significant by student's t-test. I forget, and Cameron actually did the work while I was visiting collaborators. But when you crank up the flow rates? 


Okay, so someone at some time decided formic acid was a pretty good compromise. Pretty sure it was people in the John Yates lab. TFA gave you the best possible HPLC peaks for peptides, but it lowered your ionization efficiency. Acetic acid gave you the best ionization efficiency but if you were doing MuDPiT (which was a 2D chromatography system for proteomics best left forgotten today but it provided unprecedented proteomic coverage with the awful HPLCs we had at the time), acetic acid messed up your peaks too bad. So...formic acid it is.