Tuesday, May 5, 2026

Peptide cross-sections are bi-modal?

 


Maybe this was here before? I'm not going to look, but it's definitely out now in JPR.


It makes sense in my head, though. The same way that a single population of a million ions ionized at the exact same second might end up being distributed between mostly +2 charged, some +3 and maybe a barely detectable number of +4. Why wouldn't that population of peptides dissolved in acidic buffer also have 2 different possible shapes (or more?) Is that charge linked in some way? Would make sense. The authors suggest a simple calculator for predicting both modes - which would be amazing - but it doesn't appear to be in the Github. https://github.com/cox-labs/CCS - maybe it's coming? Or maybe I don't have nearly enough time today and all the maths in the paper scared me a little? Probably.

Monday, May 4, 2026

Dissecting honey bee differential development!

 


I'm legitimately knocking out a couple of blogposts to get my brain fired up for writing and my hands used to the new (quieter) keyboard I brought to a super intensive 3 day writing camp. R01 resubmit peer pressure time! As you might guess, both R01s I should be writing on are about the human liver and not honey bees, but you probably have a dumb way of doing things as well. 

Where the f' is the control key? I'd rather look for it here. AND honey bees are super cool! 


Did you know that worker and drones (which I thought were the same thing) develop at very different rates? Neither did I. Do I care? Right now I do. And these authors did and that's what really matters. I'm pretty sure it isn't a great time to be a honey farmer person. 

Want to talk about an experimental sampling procedure that doesn't sound like fun? These authors collected 1,000 developing workers and the same number of developing drones from at least 8 different time points, up to 70 hours. I feel like a gif should go in here, but that would definitely make it clear to everyone around me that I'm not working on my grants. I'm warming up my brain! 

The sample prep is ...interesting....and kind of old fashioned, but that's how they've been doing it in their group. Acetone precipitation and a lot of urea. Probably there's lots of weird stuff in the developing bees. Would I have put them in liquid N2, smacked them with a hammer and S-trapped it and gotten the same or better results? We'll never know, but that's how I'd do it.

The boring stuff is well-described, which is a refreshing change of pace this year. QE HF ran in top20 mode and a gradient I could reproduce without guessing. Yay MCP reviewers! Downstream analysis in PEAKS against a surprisingly complete sounding FASTA. Solid work all around and - screw it. - 

8 time points! 



Thursday, April 30, 2026

OmicsMLMentor - A web app for machine learning in -omics data!

 


Interesting! When this group talks about -omics they even include lipids and metabolites. Worth taking a look at for sure. 


Figure 2 is one of the clearest descriptions I've ever seen of machine learning classifiers. 

The link to the web portal in the paper appears to need a user name and passcode, but I ain't got time for that.

Probably faster to pull the code from this Github anyway

Wednesday, April 29, 2026

What is a token? Running AI /LLMs locally for proteomics people?

 


I had a really weird conversation this week when people were talking about how many "tokens" they were using for making AIs do things poorly for them.

Look, I'm also getting AIs to poorly do things for me that I don't know how to do. What I'm not doing is 
1) Paying for them...
2) Letting some money hoarding corporate weirdos see what I don't know how to do by sending my prompts off to some AI datacenter they knocked down a park to build.

And the LLMs on modern hardware can run faster than the cloud based ones because the upload/download speed can be the bottleneck. 

So! Ben's short and poorly written guide to running an AI / LLM thing locally on a new or old PC.

Disclaimer and clarification: I know people have to use these for their jobs and they have their own local instances that are on their own HPCs so their work can control data access, etc., This isn't shade for you at all. I was surprised by all of this and I'm sharing it. 

For this example I'm going to use my GTX 1080T video card I purchased to run PacMan on a really really big screen in/around 2017/2018. Possibly longer ago than that. 

Since I'm dumb, I use a Graphical User Interface (GUI) called 
LM Studio




Once you install it, you need a Model. For this example I'm just going to use the first one that's famous. It rhymes with Chutney. 


No joke, it's seriously that easy. I like this big old PC that will be retired soon because it 
1) Doesn't have a wifi card
2) I can just disconnect the ethernet cable from it. 
3) It has trouble telling what the year is. I have the same problem. 

Once I know it's offline and I've confirmed I haven't had another head injury or something and I do know what year it is, then I ask it things that I know stuff about. In this example I asked it about single cell proteomics. The answers are seriously no worse than what the ones on the Cloud will give you. It did blow my mind when I realized this. 

For real, if you're paying for one of these things you should try it. The reason I like to have a PC I can physically disconnect is that some of the available AI models written for data centers can't tell if they're online or not. ChutNeyPT will INSIST sometimes that it is running on a GPU farm in Arkansas when I know it's running on a GPU that is roughly 80% cat and Pug fur by actual weight. 

Honestly, the 8GB model that runs on this old GPU does have some very noticeable lag. And the total data it is drawing from is significantly smaller than other models. It's got to squeeze into 8GB so some things have to go. 

If you want it to run faster than the internet/cloud versions you need to get something newer. The 1080 video card is ooooooold.... 5090 is on the market now and they haven't released a new generation every year. More like every 1.5-2 years. An M4 Mac with 24GB of unified memory that I got last year for $1300 is legitimately lag free. So. Fast. 

Which brings up this question. What are all the huge data centers for? 

When I say that I'm doing dumb things with these AIs, I'd like to humbly consider that - as a scientist without any real hobbies except...proteomics.... the stuff I'm doing with these LLMs might be harder than what the average person typing prompts is doing. And....like....I'm also blasting the new At the Gates album on this same PC. I think I've got 40 tabs open and I've got 2 separate Python APIs open because I don't know where the default folders are located and I don't want to save the side scroller I've been tinkering with for 8-10 years and will likely never finish with the work scripts that I'll likely also never finish. So....like what are the 40 zillion core data centers doing other than accelerating the collapse of our climate?  

Is this a tutorial or a rant by someone who is ultimately very confused. 

Monday, April 27, 2026

Temporal dynamics of gastruloid development!

 

I love when a proteomics study makes my newsfeed! 

Did I know what a gastruloid was before yesterday? Related, do you have gastroids? 


Here is a link and there are reasons this ultracool study is making the popsci popups!


This is one of the earliest stages of mammalian development - studied at ridiculously high depth here by RNA-Seq, proteomics (by TMT SPS RT MS3) and phosphoproteomics by the same.

Don't feel like reading? Check out this awesome interactive webpage with protein networks and protein by protein visual analysis


 


Edit: I thought it had phosphopeptide interactions mapped, but I think I just clicked on a bunch of phosphoproteins coincidentally. I also implied that protein-protein interactions were performed in the study, but when I got to the methods I realized that this was a complex and multi-level meta-analysis. It's easier for me to copy pasta here. There is a Github up for reproducing this analysis as well. 

Solid and very interesting work, even if RTS was employed. 😇 



Sunday, April 26, 2026

What is in Fetal Bovine Serum?!?

 


Okay, so here we go - a real question for proteomics scientists.

WTF is in that weird yellow stuff you put in the cell culture media? Apparently it comes from a cow. And - even if you don't have it in your database to look for it, it probably has an effect...

Super cool idea for a study. 

https://pubs.acs.org/doi/10.1021/acs.jproteome.5c01097



Friday, April 24, 2026

Single bacterium proteomics - round 2 - label free!

 


Whew...what a month..... if only the highest numerical % of your grant was the one that got you funded, I'd be looking at catching my breath and starting a deep dive into some amazingly cool single cells for a couple of years. It is, however, the lowest number that gets funded, which is both seemingly weird (totally weird....nerds....) and it's funny to joke about it and not funny to be a little sad.

While I was doing ALL THE THINGS the world kept moving and I kept mostly meeting my daily reading goal, so I'll back print some things like -

SINGLE BACTERIUM PROTEOMICS - ROUND 2 - LABEL FREE??? Yikes. That's crazy.

I can't remember, but I think Akos's group got 12 good solid E.coli proteins

IMP-Vienna got 50 without TMT!   That's crazy. It's so so so little protein. I'm really impressed that it all didn't end up permanently trapped to the plastic of the 384 well plates they used. Super cool to see what we could do if we really really wanted to make a statement. 



Wednesday, April 22, 2026

Deeper is not always better in plasma proteomics!

 


So...this came up with some incredible scientists I met at the University of North Carolina this week...

And here is a really cool review/perspective on the same issues. 

UNC's core is getting WAY higher plasma proteome coverage than I ever have with their amazing robots and magic nanoparticle things. But when they do quantitative comparisons and have rigorous restrictions on their quantitative accuracy, the numbers drop.

Is it as bad as an aptamer? Of course not. Nothing is as bad at measuring the abundance of a protein as an aptamer. Might as well flip a coin ;) 

But this is a smart look at different proteomics technologies for plasma enrichment that...wait....did they only give 5 stars to the one they developed...? Hmmm.... I mean...I'm not going to make fun of the stuff I developed either.... hmmm.... okay, but they make some incredible points about a whole lot of this stuff.

AND - BTW - when you're drawing blood where does the stuff go that you stabbed a needle through? Does the needle just perfectly part it's way through skin and blood vessels? It must, right? There's not just a big chunk of human skin floating around in there, right? 

Tuesday, April 14, 2026

GlycoDiveR - Actually make sense of glycoproteomics data?

 


We were JUST talking about this in lab meeting last week! I swear.

I said something like "well...sure...we can generate loads of good glycoproteomics data (I've got a tattoo that is almost old enough to drive that shows I've successfully pulled it off at least once on some pretty crappy instrumentation)....but you can't actually interpret what that big pile of glycopeptide stuff means....


And....well...there went that argument!