Monday, December 8, 2025

Benchmarking algorithms for single cell proteomics - is multi-proteome the right way to do it?

 


We kicked this around really hard back when there was a Proteomics Old Time Radio Hour. Not this paper, but the base concept of mixed proteome digests for quantitative studies. I'm still uncomfortable with it as a concept, but let's talk about the paper first.


The main part of the study is simulating single cell digests with proteomics people's favorite toolkits. They took a cancer digest and spiked in an E.coli digest at one concentration and Yeast digest at another.

Then they used a really cool robot that doesn't appear commercially available that they have designed themselves over the last 16 years or so that I really truly wouldn't mind having and they did some actual single cells. Most of the paper is on the first part, the low level mixed organism quantitative digest.

Since they now knew what the ratios should be they ran a bunch of samples and replicates and used DIA-NN and SpectroNaut and PEAKs and tried different settings and came up with some interesting findings. 

Begin concerns about multi-species proteomic mixtures as benchmarks

Here is where my concern always comes in for these things, though. The yeast proteome is like 4,000 proteins, and you'll basically always see 1,500-2,000 of the higher concentration ones. E.coli can produce like 3,000 proteins, but some are for anaerobic growth and whatever so I think you'll normally see something like 600-800 E.coli proteins from an aerobic digest without trying too hard at all.

I love the concept of a mixed species digest, but is that a realistic biological model? In what point in human biology are 1) there going to be an extra 30% of proteins available and 2) is 30% of the proteome going to be significantly altered and 3) altered in the same way? 

It's weird, right? Like if I was writing a normalization algorithm I think that I'd write an IF/Then statement that is like 

IF 30% of the proteome is at 1/10 of the base peak

THEN you f'ed up somewhere, PRINT gibberish. 

That's just me, and I don't know what the real answer is, but I sure haven't seen a comparison of two drug treated cells where 1,000 proteins have been significantly altered. I doubt that if you had a biopsy of a patient colon that was noncancerous and one that was at tumor that you'd see over 1,000 proteins that are significantly altered. So I'm not sure that's the best possible way to test an algorithm.

Back to the paper! 

 - there is solid gold in this study, btw. What normalization things to use, what post-analysis R packages seemed to work and what seemed to distort things worse. Totally worth realding even without the bit about the 5 papers about their microcope based sample pickup and prep robot. 

Also - just noting - the instrument used for label free single cell proteomics is a Pro2. Not an SCP or Ultra, etc., and they get some legitimately useful numbers. 

Sunday, December 7, 2025

Finally! A ready-to-run human plasma proteomics standard!

 


Disclaimer: I'm going to ramble about a new commercial product that was totally my idea and if you buy it I'll probably get money back for a whole lot of enzymes I personally bought. This was actually a tough post to write that I deleted and re-typed several times because it seems antithetical (which might be a thing) to this whole blog thing. Meh.

Ramble: 

I had a few months between my academic appointments which ended up being a top notch sabbatical, and that's what I'm going to call it from now on. I consulted for some really cool companies, found time to gracefully exit the CRO thing I founded several years ago, and got a really up-to-date view of what dozens of companies in proteomics are doing these days. During the consulting bit I'd sometimes go places or remote log in to instruments and help with experiment optimization. 

Everyone had the K562 proteomic digest from Promega or the HeLa digest from Thermo/Pierce. Add formic acid, inject it, it should look the same on identical instrument configurations regardless of where you are. 

Unfortunately, almost everyone actually wanted to do blood/plasma proteomics. And these things couldn't be more different. More than 90% of blood is composed of 1 protein and 95% of it is composed of like 14 proteins. That's not what the proteome is of cancer cells with 150 chromosomes which are full almost to bursting trying to express every protein in their entire genome. A great K562 method might give you plasma proteins, but it's not going to be great. It's tough to find 2 things in proteomics that are more different. 

So I went and batch prepped some plasma so I had a standard that I could use to compare things for the companies I was working with - and it was awesome. I also had comparator data because it was a sample I'd used before on multiple instruments over the years, and I ain't changed my bulk proteomics sample prep method since 2017. 

Then I was like - wait. WTF. Shouldn't there be a commercially available one? Why isn't there a commercially available plasma proteome tryptic digest?? 

How hard and expensive could that be? 

Oh. Oh ye of excessive confidence. 

But now you can just buy the first successful attempt at a standard - Equalizer I - from ESI source solutions! It's just a neat plasma digest, so it's ridiculously insanely hard to see anything besides albumin and immunoglobulins and about 100 other things, which is the exact opposite of the cancer cell line digest. Again, very clearly biased, but if no one ever buys it, I honestly don't care because I won't ever have to prep a plasma proteome digest ever again in my life and I've personally got something to do method development on. If anyone else finds it useful, we tried hard to keep the price down and $375 will get you 100x 200ng injections along with comparator data from 6 different instruments or something (a number I hope will grow soon). 

Saturday, December 6, 2025

DancePartner - Use Python wizardry to mine multi-omics from...PubMed?

 


I saw this one 3 times, loved the logo, but questioned whether it was anything useful to me and finally just read most of it. I moved to the Github halfway and started trying to install it

Paper link


Is it the easiest thing I've tried to do today? No, but I also had a 4 year old pumped full of hot chocolate in a Sporting Goods store when dude decided football cleats WERE MISSION CRITICAL and we ended up leaving with nothing at all. 

But....could you....hypothetically have Dance Partner dig through PubMed and find you a list of proteins, transcripts, lipids and metabolites that have been associated with the blood brain barrier? I don't know, but my cat keeps screwing with my mouse and if I put typos in some python code in Spyder nothing works, where I can put typos in this box and just hit the publish button and it's just normal. 

Friday, December 5, 2025

Frustrated by TIMSTOF chromatography limitations? FREE THE CAPTIVESPRAY!

 


I ran across this looking for something else.... Honestly, I really like the Ultra2 source, but if I still had one of the older ones I'd look into this, for real. 

Tuesday, December 2, 2025

opt-TMT -scale down everything so you aren't wasting so much reagent!

 


There is another optTMT, but that one doesn't have a dash and it's for designing smart multi-batch mutiplexed experiments. You can read about that one here

This new one is about how a lot of TMT labs are labeling 400 pounds of peptide (181 kg) and then injecting 200 micrograms per injection on their Orbitraps and 1000 micrograms on their Astrals. 

If you wanted to just label 10x more peptide than you'd possibly use instead of 10 million times more peptide, how would you do it? That's what the dash is for! 


While this might seem just a little silly since there are protocols out there that have been replicated dozens of times for labeling single human cells, they are actually a lot more convenient than you'd think. We know how much reagent in our lab to use for 1 cell or 25 cells and it's a drag when we have to break out the peptide quan kits and borrow someone's plate reader. This study gives you that in-between concentration fully optimized. 

Monday, December 1, 2025

Another funny solvent is better than formic acid for proteomics?

First off -- 

CHECK WITH YOUR HPLC MANUAL OR MANUFACTURER!!



Is the resolution of GIFs getting worse all the time? If so, it's the only change I've personally seen from this whole "AI revolution", except people saying "I asked ChatGPT" when they would have said "I did a Google search" back before Google reorganized and put their search algorithm teams under the control of their marketing teams. True story, that's why Google really doesn't work well anymore and AskJeeves is back, but now it needs more electricity than all of Spain will sue this year to look up stuff on Wikipedia for you. 

Okay, so someone at some time decided formic acid was a pretty good compromise. Pretty sure it was people in the John Yates lab. TFA gave you the best possible HPLC peaks for peptides, but it lowered your ionization efficiency. Acetic acid gave you the best ionization efficiency but if you were doing MuDPiT (which was a 2D chromatography system for proteomics best left forgotten today but it provided unprecedented proteomic coverage with the awful HPLCs we had at the time), acetic acid messed up your peaks too bad. So...formic acid it is.

Worth noting, formic acid has some drawbacks like poor stability in light, particularly when diluted. So when a lab dropped a paper showing acetic acid should be revisited, we jumped on it. My lab doesn't use formic acid in our HPLCs at all. We do have vendor permission and we have several thousand runs to demonstrate it hasn't been a bad idea at all

So when I was contacted by a researcher who was like - "yo, we have something better!"  we borrowed someone else's HPLC and tested it out. In our hands on (nanoflow) it's only marginally better than acetic acid, and possibly so marginal that on the sub-nanogram loads it wasn't significant by student's t-test. I forget, and Cameron actually did the work while I was visiting collaborators. But when you crank up the flow rates? 


Okay, so someone at some time decided formic acid was a pretty good compromise. Pretty sure it was people in the John Yates lab. TFA gave you the best possible HPLC peaks for peptides, but it lowered your ionization efficiency. Acetic acid gave you the best ionization efficiency but if you were doing MuDPiT (which was a 2D chromatography system for proteomics best left forgotten today but it provided unprecedented proteomic coverage with the awful HPLCs we had at the time), acetic acid messed up your peaks too bad. So...formic acid it is.

Sunday, November 30, 2025

New Nature Genetics study comparing pQTLs is....worth reading....

 


Ummm.....so...Imma just leave this here and not talk about it any more, maybe. Wait. Maybe just this - if your technology is producing results that can be validated 30% of the time then you could save a lot of time and just pick a gene or protein and flip a coin and go read up on other technologies....



Saturday, November 29, 2025

DIA Multiplexed proteomics with off-the-shelf TMTPro reagents!

 



This is obviously interesting - and surprisingly easy to pull off. The data is processed in FragPipe and one of the output sheets is put into these python tools to identify the complementary fragment ions. 

I like the figure above because they use 2 very similar peptides labeled with TMT and demonstrate that they can clearly find clean complementary fragment ion pairs. Oh yeah, here is the paper

They really really don't want to do any spectral deconvolution so they only used 3 of the TMTProC tags that give them clusters of complementary ions 4 Da apart. The open suggestion is here the whole time that if you aren't afraid of deconvoluting your complementary ion clusters - you can obviously do more than a 3-plex DIA experiment. 

This is a really nice read with the appropriate controls included as well as a way to dramatically increase the throughput of some DIA proteomics workflows on basically any mass analyzer. Worth a read for sure. 

If you type "TMTc" into the blog search bar you'll find a lot of stuff over the years. This is one old post that goes more into what this is and why it can be valuable. 

Wednesday, November 26, 2025

Y-MRT - a new prototype TOF with 1 million resolution and 300 Hz?!?

 


Ummmmm......okay so....these specs are amazing....


How do you increase mass resolution? Generally just increase the flight path, right? But you can only go so far before there isn't enough electricity on earth to generate the appropriate vacuum. Reflectrons double the path and the W-TOFs from Pegasus that a big vendor acquired recently can really push those numbers up by multiple reflectrons.

The Y-TOF takes that concept to 11. It's one thing to say "I can make my instrument do 1 million resolution". Give me 45 minutes with your Q-Exactive and I can make it do 1 million resolution. Each scan will take about 8 minutes. (more like 4 seconds, I forget) but it's completely impractical. 

AND you can tune a Time of Flight to get really good mass resolving power at one particular m/z. My Q-TOF gets incredible resolving power in a mass range that isn't exactly where I need it.

The Y-TOF did a 30 minute proteomics run and averaged 600,000 to 800,000 resolution across the usable peptide range!!!  

AND sub-PPM mass accuracy. Parts per BILLION mass accuracy. ON A TOF. 

Obviously a prototype, but more obviously something we should keep our eyes on. Worth noting, they do have to use Astral level loads for bulk proteomics (1 microgram of peptides for the best data) and that this prototype isn't going to smoke your recently purchased $1M instrument, but it's starting in a very nice spot. 

Tuesday, November 25, 2025

Prosit-PTM! Deep learn modified peptides???

 


We all know other great protein informatics teams are working on the holy grail for DIA proteomics - deep learning and prediction of modified peptides.

Am I extra excited because the team that gave us Prosit is working on it? Yes. Yes, I unfairly am, when I should be evaluating this preprint purely on it's own merits and not the historic success of one of our field's most historically reliable teams.  And not just because of their informatics skills. What makes me excited the most is their long history of making tools that anyone can use. 

Check out this preprint here!