Sunday, May 24, 2026

From peaks to power - scans/peak still really truly matters!

 


If you've been on this blog much recently, I am sorry.

Also, you have probably seen me in some level of outrage about some recent studies where people have gotten anywhere from 1-4 measurements of the peptides they are looking at. Is it better than Illumina ProteinCrap? Absolutely. But is it good for mass spectrometry data? No. 

Why is it bad? Because some blogging academic says so? 

This new preprint looks at the problem in depth and finds that for high abundance proteins in blood, the 1-4 measurements per peak is actually not all that bad. Unfortunately....the cancer biomarker you are looking for is probably not albumin, transferring, or immunoglobulins. For low abundance proteins, getting fewer scans per peak means you miss any changes between healthy and cancer patient blood. So....honestly... what's the fucking point of doing the study anyway? 

They say it nicer than this here! 



Thursday, May 21, 2026

Nanopores are coming with 150,000 peptide libraries!

There is some replication is flattery quote, right? I forget what it is.



You might need a free account to read this, though. And the stuff from the article that I found most interesting was a link to another GenomeWeb article. Not sure what the rules are for taking screenshots from it.... But the point is that the Oxford people have taken a page out of the ProteomeTools project and have 150,000 peptides multiplexed labeled that they're currently running through nanopores! Smart, right?

Which seems similar to what this group recently published on here, except they aren't working from synthetic peptides, rather LysC digested proteins. 



Wednesday, May 20, 2026

Nanosplit the transcriptome and proteome from single cells (without the hard part!)

 


When I first saw this I thought - okay, so someone copied the nanosplits paper but they had an Asstral.

And it's almost what this is


...but nanosplits requires a technically tough step where you split the droplet containing your mostly lysed single cells. This protocol gets around that step. They still use the same silly robot to isolate the cells, but you absolutely don't need it here (where you basically do need it for nanosplits, it's tough to print that droplet array in a FACs core), and that's a huge win for anyone who doesn't have the slow silly robot. 

Tuesday, May 19, 2026

Library biases still remain in proteomics hardware particularly for low input TIMSTOF data!

 


I was first going to start with something like this - 


When I read this title 

But I realized that 

1) That's sorta mean.

2) I bet a lot of people thought that all the work that has been done to adjust spectral libraries and deep learning algorithms has been successful

3) Not everyone is doing loads of weird cell types by single cell proteomics on TIMSTOFs and probably doesn't run into this every single day that their TIMSTOF happens to be working.

4) The giant red light on the whole front of my instrument is bumming me out. 

Here is the thing. The Orbitraps had a HUGE head start on data on public repositories. And in the libraries we used to train deep learning algorithms. And every other data type is just different. Especially when you're going down to low load. Even there, we know the Orbitraps struggle against high load libraries. I should put a link in but I can't find it. 

We absolutely find that having reference libraries in single cells helps a lot. On an Ultra2 we like a 25, 50 or even a 100 (for very small cells) cell pool that we run a couple of times and include that in our data analysis workflows. For big studies I've had luck making the library with those 100 cell pools and then just searching the single cells against those new libraries. Now...you'll probably miss that rare cell type and what makes it special, but you might not care about that in every experiment. 

Anyway - this group has some really smart tips for how to build these libraries and the observations in different software. Ultimately they report a 90%(!!!!!) improvement in low load peptide ID rates, so...that's absolutely worth looking at!



Sunday, May 17, 2026

S100P levels are linked to recurrence in cholangiocarcinoma

 


It might be easier to make a list of things S100 proteins don't appear involved in at this point.

This paper is going to be posted here because I'm personally interested in it and I wish my lab had access to these samples. 


The samples were digested with some amount of trypsin. You'll never find out how much, but I bet it is fine. They were also labeled with some kind of TMT reagents. The TMT labeled (and, presumably, pooled) samples were analyzed with a Q Exactive of some kind, probably, despite the Agilent high flow coupled Fusion system in the diagram above. The files are on ProteomeXchange if you cared to look. A secret length and flow rate of a gradient of some length you could extract from the .raw files if you wanted, was used for what was most likely a very reasonable DDA method. They couldn't share the resolution of the MS/MS because that might tip you off to what TMT reagents were used. And if they said they used a 1.4Da isolation window someone would complain about it, as would another group if they used a 0.4 Da isolation window. The authors avoid all that controversy by not sharing any of the steps necessary to repeat this analysis of these same tissues.

That being said - the files are publicly available. It could be one of those things where a core ran the samples and the group never paid them, and the core subsequently couldn't find the hours to contribute meaningful corrections to the paper. Also, the downstream analysis seems compelling and it looks like they really thought about their stats in this little cohort. We can probably assume that the mass spec stuff was done right. We can also assume that the reviewers and editors had a lot on their plates when this one slid through peer review. And that happens, we're all busy.

Thursday, May 14, 2026

Taxonomy source identification from proteomics of hair!

 


Are you an investigator who was assigned a bank heist? 

Do you suspect a certain goat, recently out of the pen, with an alibi that seems a little too good to be true? 

If you can find just one hair at the scene of the crime, this is the study and  these are the resources you need! 


Is that goat still baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaad, or has 20 years on the streets stripped you completely of your idealism about the system and it's ability to reform animals? 

Find out with proteomics! 



Wednesday, May 13, 2026

More ultrafast proteomics with thermolysin!

 


Wait. Where does thermolysin cut? Does everyone know that and that's why it isn't listed in the methods section of the paper? It's so common knowledge that the Wikipedia page doesn't list them explicitly either? Gemini, which is apparently now installed in my browser without my permission says it's Leu, Phe, Ile, Val, Ala, and Met, so I'll bet you that is NOT where it cuts. However, this is really cool, for real. 


I love fast cheap enzymes for proteomics! Let's go. However, the reason this paper is great is because they went the extra mile and developed a stable isotope labeled protein standard for this enzyme! 

You chuck in the protein they set up and then you throw in your thermolysin and do your fast digestion and now you've got a pile of internal heavy labeled peptide standards. If you don't see them...the digestion didn't work! If you do see them, you can use them for your quantification! This group does a pile of targeted PRMs on an Orbitrap Exploris with an EvoSep on it. So high resolution discriminatory power of the mass spec and reproducible run times with no tinkering allowed. 

Tuesday, May 12, 2026

TIMSTOF Ultra2 after 1 year - still incredible when it works!

 


I received some questions about my longer term impressions of the amazingly sensitive and somewhat fast TIMSTOF Ultra2. 

It's been a year already? Yikes. I guess I'll reference my earlier TIMSTOF reviews.

Great stuff!

1) The sensitivity of this thing is still absolutely unreal. Multiple people are writing up papers on single cells ran on it with DIA label free and the numbers are cell type specific and amazing.

Even with a high loss, but very inexpensive and very fast sample prep, most cancer cell check in at well above 2,000 protein groups per cell and human hepatocytes are around 2,500. Throw in a big cohort and actual spectral libraries (non-predicted) and you can add 50% to those numbers. Most studies of >100 cells will have 5,000+ protein groups across the study. Some big cells like cultured human neurons will come in >7,000 on their own. Crazy.

Some small blood cells <10 microns have been a challenge, but if you've only got 80 pg of protein and we know our fast sample prep loses about 50pg, that's...involved..... someone FACs isolated 12 different cell types from patient blood and we struggled with getting 500 protein groups in 2 of the cell types. It's tough to tell what's going on from those levels of sample.

Not so great stuf! 

1) Compass data analysis is still the only way you can do things like extract an XIC. Looking for a weird peptide? Good luck. As an aside, Biognosys sells a targeted quan package that we had for several months because I thought it would do that part for us. It will if you're doing prmPASEF, but it won't help you find a peptide in DIA or DDA data. It also doesn't help you predict the IMS so you basically have to identify your peptide to target it. So we traded that software back for more months of SpectroNaut.

2) We've went through more internal electrical components in the last 6 months than HPLC columns. That's not an exaggeration. This board, that board, that other board, the second board's power supply, the TIMS cartridge, etc., etc.,. This is rough because I knew we didn't have a very local FSE here. They have to come in from DC or Jersey or fly in from Bremen or California. So the downtime can be on the long side when they go down. I thought going into 2026 we basically had a brand new instrument thanks to the sleigh load of electric components we got over the holidays so I wasn't super stressed about a short lapse between warranty and service plan starting up but - there went another pile of electronics.....whoops. To be fair they are building one of the biggest hospitals in the country next door to us and we've had a couple of power outages. But...you start to wonder if they're getting these circuit boards from Alibaba....

3) There appear to be even less support people in the field today. This one is a real surprise given the fact that it looks like the company has been doing really well and the prices of some instruments have increased more than 200%. You'd think that would equal a higher number of apps scientists and engineers in the field, but it appears to be just the opposite. The ones we have are INCREDIBLE. But you get the impression that they never sleep or get a day off because there are two of them for planet earth. 

4) Look, I'm going to complain about absolutely everything. And I have absolutely zero regret for this fit-for-my-lab purpose instrument. It's an incredible precision instrument for what it does. We locked down a full workflow 8 months ago and we just do biological discovery on this thing. Same prep, same column, same method and the papers in preparation are a million times better than anything I've ever done. It's nice to have time to write because your one instrument is down, but there is a point where you have to look back at 50-ish weeks of ownership and start counting the weeks that the front of the instrument has been bright red doom and panic a little. 

5) I did a lot of searching as well, and unlike other instruments there still appears to be no second party field apps support. That's a bummer because in some places you can go to a second party like ZefSci and get superior field apps support than what the vendor offers for a lot less money. If you know of someone who services these things PLEASE LET ME KNOW! I'd switch in a heartbeat. 

Monday, May 11, 2026

Single cell metabolomics (by infusion??) with Medusa!


This new paper maybe isn't for everyone, but I'm excited to look at these scripts


There are a very small number of supported software packages out there in the world (approaching zero) that can make sense out of direct infusion of flow injection analysis quantitative data. 

We used to have a pile of them. Ion A is 10x higher than ion B in these two matrices, you could extract that a bunch of ways, but that's largely fallen off.

I'm also excited to see how many metabolites look real in direct injection in an Exploris 480. The intrascan linear dynamic range of a D20 Orbitrap is one of the lowest you can find in mass spectrometry so if this looks like it works you could do a lot with it. (Assuming the package works and is supported, of course!) 

Friday, May 8, 2026

This week's podcast on SNOT with Dr. Jennifer Mulligan is way cooler than you'd guess!

 


For real, I've been telling people about this conversation since we recorded it a while back. Snot is way cooler than you'd guess.


Thursday, May 7, 2026

Variable (geographic!) distribution of ion species in electrospray ionization!

 


The fact that some labs have some weird background ions that other labs don't (check my repositories for Pug keratin! There's tons of it!) isn't news. Someone a while back showed they could tell when a study was done in the winter due to the amount of wool peptides ionized in their deposited data. I forget who that was.

But this new paper in JASMS shows that it's not just proteomics and peptides. It can even be those nasty adduct things that everyone outside of one group in Madison, Wisconsin ignores is even a thing in proteomics. 

This is the first time I've seen a multi-lab controlled study. It's been more of a "wow..that's fucking weird...wonder why they have axolotls there...?  Totally worth thinking about, though. I wonder if we went back through the tightly controlled and super smart CPTAC studies if we'd see the same things with tools that actually consider such things? Probably! 

Tuesday, May 5, 2026

Peptide cross-sections are bi-modal?

 


Maybe this was here before? I'm not going to look, but it's definitely out now in JPR.


It makes sense in my head, though. The same way that a single population of a million ions ionized at the exact same second might end up being distributed between mostly +2 charged, some +3 and maybe a barely detectable number of +4. Why wouldn't that population of peptides dissolved in acidic buffer also have 2 different possible shapes (or more?) Is that charge linked in some way? Would make sense. The authors suggest a simple calculator for predicting both modes - which would be amazing - but it doesn't appear to be in the Github. https://github.com/cox-labs/CCS - maybe it's coming? Or maybe I don't have nearly enough time today and all the maths in the paper scared me a little? Probably.

Monday, May 4, 2026

Dissecting honey bee differential development!

 


I'm legitimately knocking out a couple of blogposts to get my brain fired up for writing and my hands used to the new (quieter) keyboard I brought to a super intensive 3 day writing camp. R01 resubmit peer pressure time! As you might guess, both R01s I should be writing on are about the human liver and not honey bees, but you probably have a dumb way of doing things as well. 

Where the f' is the control key? I'd rather look for it here. AND honey bees are super cool! 


Did you know that worker and drones (which I thought were the same thing) develop at very different rates? Neither did I. Do I care? Right now I do. And these authors did and that's what really matters. I'm pretty sure it isn't a great time to be a honey farmer person. 

Want to talk about an experimental sampling procedure that doesn't sound like fun? These authors collected 1,000 developing workers and the same number of developing drones from at least 8 different time points, up to 70 hours. I feel like a gif should go in here, but that would definitely make it clear to everyone around me that I'm not working on my grants. I'm warming up my brain! 

The sample prep is ...interesting....and kind of old fashioned, but that's how they've been doing it in their group. Acetone precipitation and a lot of urea. Probably there's lots of weird stuff in the developing bees. Would I have put them in liquid N2, smacked them with a hammer and S-trapped it and gotten the same or better results? We'll never know, but that's how I'd do it.

The boring stuff is well-described, which is a refreshing change of pace this year. QE HF ran in top20 mode and a gradient I could reproduce without guessing. Yay MCP reviewers! Downstream analysis in PEAKS against a surprisingly complete sounding FASTA. Solid work all around and - screw it. - 

8 time points! 



Thursday, April 30, 2026

OmicsMLMentor - A web app for machine learning in -omics data!

 


Interesting! When this group talks about -omics they even include lipids and metabolites. Worth taking a look at for sure. 


Figure 2 is one of the clearest descriptions I've ever seen of machine learning classifiers. 

The link to the web portal in the paper appears to need a user name and passcode, but I ain't got time for that.

Probably faster to pull the code from this Github anyway

Wednesday, April 29, 2026

What is a token? Running AI /LLMs locally for proteomics people?

 


I had a really weird conversation this week when people were talking about how many "tokens" they were using for making AIs do things poorly for them.

Look, I'm also getting AIs to poorly do things for me that I don't know how to do. What I'm not doing is 
1) Paying for them...
2) Letting some money hoarding corporate weirdos see what I don't know how to do by sending my prompts off to some AI datacenter they knocked down a park to build.

And the LLMs on modern hardware can run faster than the cloud based ones because the upload/download speed can be the bottleneck. 

So! Ben's short and poorly written guide to running an AI / LLM thing locally on a new or old PC.

Disclaimer and clarification: I know people have to use these for their jobs and they have their own local instances that are on their own HPCs so their work can control data access, etc., This isn't shade for you at all. I was surprised by all of this and I'm sharing it. 

For this example I'm going to use my GTX 1080T video card I purchased to run PacMan on a really really big screen in/around 2017/2018. Possibly longer ago than that. 

Since I'm dumb, I use a Graphical User Interface (GUI) called 
LM Studio




Once you install it, you need a Model. For this example I'm just going to use the first one that's famous. It rhymes with Chutney. 


No joke, it's seriously that easy. I like this big old PC that will be retired soon because it 
1) Doesn't have a wifi card
2) I can just disconnect the ethernet cable from it. 
3) It has trouble telling what the year is. I have the same problem. 

Once I know it's offline and I've confirmed I haven't had another head injury or something and I do know what year it is, then I ask it things that I know stuff about. In this example I asked it about single cell proteomics. The answers are seriously no worse than what the ones on the Cloud will give you. It did blow my mind when I realized this. 

For real, if you're paying for one of these things you should try it. The reason I like to have a PC I can physically disconnect is that some of the available AI models written for data centers can't tell if they're online or not. ChutNeyPT will INSIST sometimes that it is running on a GPU farm in Arkansas when I know it's running on a GPU that is roughly 80% cat and Pug fur by actual weight. 

Honestly, the 8GB model that runs on this old GPU does have some very noticeable lag. And the total data it is drawing from is significantly smaller than other models. It's got to squeeze into 8GB so some things have to go. 

If you want it to run faster than the internet/cloud versions you need to get something newer. The 1080 video card is ooooooold.... 5090 is on the market now and they haven't released a new generation every year. More like every 1.5-2 years. An M4 Mac with 24GB of unified memory that I got last year for $1300 is legitimately lag free. So. Fast. 

Which brings up this question. What are all the huge data centers for? 

When I say that I'm doing dumb things with these AIs, I'd like to humbly consider that - as a scientist without any real hobbies except...proteomics.... the stuff I'm doing with these LLMs might be harder than what the average person typing prompts is doing. And....like....I'm also blasting the new At the Gates album on this same PC. I think I've got 40 tabs open and I've got 2 separate Python APIs open because I don't know where the default folders are located and I don't want to save the side scroller I've been tinkering with for 8-10 years and will likely never finish with the work scripts that I'll likely also never finish. So....like what are the 40 zillion core data centers doing other than accelerating the collapse of our climate?  

Is this a tutorial or a rant by someone who is ultimately very confused. 

Monday, April 27, 2026

Temporal dynamics of gastruloid development!

 

I love when a proteomics study makes my newsfeed! 

Did I know what a gastruloid was before yesterday? Related, do you have gastroids? 


Here is a link and there are reasons this ultracool study is making the popsci popups!


This is one of the earliest stages of mammalian development - studied at ridiculously high depth here by RNA-Seq, proteomics (by TMT SPS RT MS3) and phosphoproteomics by the same.

Don't feel like reading? Check out this awesome interactive webpage with protein networks and protein by protein visual analysis


 


Edit: I thought it had phosphopeptide interactions mapped, but I think I just clicked on a bunch of phosphoproteins coincidentally. I also implied that protein-protein interactions were performed in the study, but when I got to the methods I realized that this was a complex and multi-level meta-analysis. It's easier for me to copy pasta here. There is a Github up for reproducing this analysis as well. 

Solid and very interesting work, even if RTS was employed. 😇 



Sunday, April 26, 2026

What is in Fetal Bovine Serum?!?

 


Okay, so here we go - a real question for proteomics scientists.

WTF is in that weird yellow stuff you put in the cell culture media? Apparently it comes from a cow. And - even if you don't have it in your database to look for it, it probably has an effect...

Super cool idea for a study. 

https://pubs.acs.org/doi/10.1021/acs.jproteome.5c01097



Friday, April 24, 2026

Single bacterium proteomics - round 2 - label free!

 


Whew...what a month..... if only the highest numerical % of your grant was the one that got you funded, I'd be looking at catching my breath and starting a deep dive into some amazingly cool single cells for a couple of years. It is, however, the lowest number that gets funded, which is both seemingly weird (totally weird....nerds....) and it's funny to joke about it and not funny to be a little sad.

While I was doing ALL THE THINGS the world kept moving and I kept mostly meeting my daily reading goal, so I'll back print some things like -

SINGLE BACTERIUM PROTEOMICS - ROUND 2 - LABEL FREE??? Yikes. That's crazy.

I can't remember, but I think Akos's group got 12 good solid E.coli proteins

IMP-Vienna got 50 without TMT!   That's crazy. It's so so so little protein. I'm really impressed that it all didn't end up permanently trapped to the plastic of the 384 well plates they used. Super cool to see what we could do if we really really wanted to make a statement. 



Wednesday, April 22, 2026

Deeper is not always better in plasma proteomics!

 


So...this came up with some incredible scientists I met at the University of North Carolina this week...

And here is a really cool review/perspective on the same issues. 

UNC's core is getting WAY higher plasma proteome coverage than I ever have with their amazing robots and magic nanoparticle things. But when they do quantitative comparisons and have rigorous restrictions on their quantitative accuracy, the numbers drop.

Is it as bad as an aptamer? Of course not. Nothing is as bad at measuring the abundance of a protein as an aptamer. Might as well flip a coin ;) 

But this is a smart look at different proteomics technologies for plasma enrichment that...wait....did they only give 5 stars to the one they developed...? Hmmm.... I mean...I'm not going to make fun of the stuff I developed either.... hmmm.... okay, but they make some incredible points about a whole lot of this stuff.

AND - BTW - when you're drawing blood where does the stuff go that you stabbed a needle through? Does the needle just perfectly part it's way through skin and blood vessels? It must, right? There's not just a big chunk of human skin floating around in there, right? 

Tuesday, April 14, 2026

GlycoDiveR - Actually make sense of glycoproteomics data?

 


We were JUST talking about this in lab meeting last week! I swear.

I said something like "well...sure...we can generate loads of good glycoproteomics data (I've got a tattoo that is almost old enough to drive that shows I've successfully pulled it off at least once on some pretty crappy instrumentation)....but you can't actually interpret what that big pile of glycopeptide stuff means....


And....well...there went that argument! 


Monday, April 13, 2026

Deep Visual Proteomics of Pain!

 


Wow.... I do just have to leave this here and move on. I've already forwarded the paper to a bunch of people, though, and can't wait to spend more time on it. 

We need to figure out how much DDM you can use before it's a bad thing, though! This group used 6-8x more than what we use, and they get a lot more membrane proteins.....

Totally worth taking a look at! 



Sunday, April 12, 2026

DIA-NN 2.5! Now with 70% more ....70%?? ...more peptides!?!?

 

Okay, we have to take a look at this for real. I do like the color scheme on these plots, though...

As an aside, I ran a commercial program for some people recently and it gave me 20% more protein groups than the ones I currently use. Those extra 20% really annoyed my collaborators. They were ...like... biologically very very unlikely...? Not DIA-NN, a commercial thing, but I did re-learn a lesson that more peptides isn't always a better. But DIA-NN has built enough credibility for me to be hesitantly optimistic that I will like this new version. 

Get it where you get DIA-NN! Probably here https://github.com/vdemichev/diann

Saturday, April 11, 2026

Troubleshoot your EvoSep step by step with this cool online thing!

 

For the first time in a long time, I had to do some EvoSep troubleshooting. Turns out that ceramic needle thing can get clogged! 

Gabriel at EvoSep led me to this super useful online resource that walks you through step by step to get it all worked out. 

It's amazingly clear with pictures and "did it work? click here!" AND 4 MILLION PERCENT BETTER than letting Adobe's class trailing LLM help you dig through the user manual. If you see a button to turn that pile of poo off, please let me know where that is!