Thursday, April 30, 2026

OmicsMLMentor - A web app for machine learning in -omics data!

 


Interesting! When this group talks about -omics they even include lipids and metabolites. Worth taking a look at for sure. 


Figure 2 is one of the clearest descriptions I've ever seen of machine learning classifiers. 

The link to the web portal in the paper appears to need a user name and passcode, but I ain't got time for that.

Probably faster to pull the code from this Github anyway

Wednesday, April 29, 2026

What is a token? Running AI /LLMs locally for proteomics people?

 


I had a really weird conversation this week when people were talking about how many "tokens" they were using for making AIs do things poorly for them.

Look, I'm also getting AIs to poorly do things for me that I don't know how to do. What I'm not doing is 
1) Paying for them...
2) Letting some money hoarding corporate weirdos see what I don't know how to do by sending my prompts off to some AI datacenter they knocked down a park to build.

And the LLMs on modern hardware can run faster than the cloud based ones because the upload/download speed can be the bottleneck. 

So! Ben's short and poorly written guide to running an AI / LLM thing locally on a new or old PC.

Disclaimer and clarification: I know people have to use these for their jobs and they have their own local instances that are on their own HPCs so their work can control data access, etc., This isn't shade for you at all. I was surprised by all of this and I'm sharing it. 

For this example I'm going to use my GTX 1080T video card I purchased to run PacMan on a really really big screen in/around 2017/2018. Possibly longer ago than that. 

Since I'm dumb, I use a Graphical User Interface (GUI) called 
LM Studio




Once you install it, you need a Model. For this example I'm just going to use the first one that's famous. It rhymes with Chutney. 


No joke, it's seriously that easy. I like this big old PC that will be retired soon because it 
1) Doesn't have a wifi card
2) I can just disconnect the ethernet cable from it. 
3) It has trouble telling what the year is. I have the same problem. 

Once I know it's offline and I've confirmed I haven't had another head injury or something and I do know what year it is, then I ask it things that I know stuff about. In this example I asked it about single cell proteomics. The answers are seriously no worse than what the ones on the Cloud will give you. It did blow my mind when I realized this. 

For real, if you're paying for one of these things you should try it. The reason I like to have a PC I can physically disconnect is that some of the available AI models written for data centers can't tell if they're online or not. ChutNeyPT will INSIST sometimes that it is running on a GPU farm in Arkansas when I know it's running on a GPU that is roughly 80% cat and Pug fur by actual weight. 

Honestly, the 8GB model that runs on this old GPU does have some very noticeable lag. And the total data it is drawing from is significantly smaller than other models. It's got to squeeze into 8GB so some things have to go. 

If you want it to run faster than the internet/cloud versions you need to get something newer. The 1080 video card is ooooooold.... 5090 is on the market now and they haven't released a new generation every year. More like every 1.5-2 years. An M4 Mac with 24GB of unified memory that I got last year for $1300 is legitimately lag free. So. Fast. 

Which brings up this question. What are all the huge data centers for? 

When I say that I'm doing dumb things with these AIs, I'd like to humbly consider that - as a scientist without any real hobbies except...proteomics.... the stuff I'm doing with these LLMs might be harder than what the average person typing prompts is doing. And....like....I'm also blasting the new At the Gates album on this same PC. I think I've got 40 tabs open and I've got 2 separate Python APIs open because I don't know where the default folders are located and I don't want to save the side scroller I've been tinkering with for 8-10 years and will likely never finish with the work scripts that I'll likely also never finish. So....like what are the 40 zillion core data centers doing other than accelerating the collapse of our climate?  

Is this a tutorial or a rant by someone who is ultimately very confused. 

Monday, April 27, 2026

Temporal dynamics of gastruloid development!

 

I love when a proteomics study makes my newsfeed! 

Did I know what a gastruloid was before yesterday? Related, do you have gastroids? 


Here is a link and there are reasons this ultracool study is making the popsci popups!


This is one of the earliest stages of mammalian development - studied at ridiculously high depth here by RNA-Seq, proteomics (by TMT SPS RT MS3) and phosphoproteomics by the same.

Don't feel like reading? Check out this awesome interactive webpage with protein networks and protein by protein visual analysis


 


Edit: I thought it had phosphopeptide interactions mapped, but I think I just clicked on a bunch of phosphoproteins coincidentally. I also implied that protein-protein interactions were performed in the study, but when I got to the methods I realized that this was a complex and multi-level meta-analysis. It's easier for me to copy pasta here. There is a Github up for reproducing this analysis as well. 

Solid and very interesting work, even if RTS was employed. 😇 



Sunday, April 26, 2026

What is in Fetal Bovine Serum?!?

 


Okay, so here we go - a real question for proteomics scientists.

WTF is in that weird yellow stuff you put in the cell culture media? Apparently it comes from a cow. And - even if you don't have it in your database to look for it, it probably has an effect...

Super cool idea for a study. 

https://pubs.acs.org/doi/10.1021/acs.jproteome.5c01097



Friday, April 24, 2026

Single bacterium proteomics - round 2 - label free!

 


Whew...what a month..... if only the highest numerical % of your grant was the one that got you funded, I'd be looking at catching my breath and starting a deep dive into some amazingly cool single cells for a couple of years. It is, however, the lowest number that gets funded, which is both seemingly weird (totally weird....nerds....) and it's funny to joke about it and not funny to be a little sad.

While I was doing ALL THE THINGS the world kept moving and I kept mostly meeting my daily reading goal, so I'll back print some things like -

SINGLE BACTERIUM PROTEOMICS - ROUND 2 - LABEL FREE??? Yikes. That's crazy.

I can't remember, but I think Akos's group got 12 good solid E.coli proteins

IMP-Vienna got 50 without TMT!   That's crazy. It's so so so little protein. I'm really impressed that it all didn't end up permanently trapped to the plastic of the 384 well plates they used. Super cool to see what we could do if we really really wanted to make a statement. 



Wednesday, April 22, 2026

Deeper is not always better in plasma proteomics!

 


So...this came up with some incredible scientists I met at the University of North Carolina this week...

And here is a really cool review/perspective on the same issues. 

UNC's core is getting WAY higher plasma proteome coverage than I ever have with their amazing robots and magic nanoparticle things. But when they do quantitative comparisons and have rigorous restrictions on their quantitative accuracy, the numbers drop.

Is it as bad as an aptamer? Of course not. Nothing is as bad at measuring the abundance of a protein as an aptamer. Might as well flip a coin ;) 

But this is a smart look at different proteomics technologies for plasma enrichment that...wait....did they only give 5 stars to the one they developed...? Hmmm.... I mean...I'm not going to make fun of the stuff I developed either.... hmmm.... okay, but they make some incredible points about a whole lot of this stuff.

AND - BTW - when you're drawing blood where does the stuff go that you stabbed a needle through? Does the needle just perfectly part it's way through skin and blood vessels? It must, right? There's not just a big chunk of human skin floating around in there, right? 

Tuesday, April 14, 2026

GlycoDiveR - Actually make sense of glycoproteomics data?

 


We were JUST talking about this in lab meeting last week! I swear.

I said something like "well...sure...we can generate loads of good glycoproteomics data (I've got a tattoo that is almost old enough to drive that shows I've successfully pulled it off at least once on some pretty crappy instrumentation)....but you can't actually interpret what that big pile of glycopeptide stuff means....


And....well...there went that argument! 


Monday, April 13, 2026

Deep Visual Proteomics of Pain!

 


Wow.... I do just have to leave this here and move on. I've already forwarded the paper to a bunch of people, though, and can't wait to spend more time on it. 

We need to figure out how much DDM you can use before it's a bad thing, though! This group used 6-8x more than what we use, and they get a lot more membrane proteins.....

Totally worth taking a look at! 



Sunday, April 12, 2026

DIA-NN 2.5! Now with 70% more ....70%?? ...more peptides!?!?

 

Okay, we have to take a look at this for real. I do like the color scheme on these plots, though...

As an aside, I ran a commercial program for some people recently and it gave me 20% more protein groups than the ones I currently use. Those extra 20% really annoyed my collaborators. They were ...like... biologically very very unlikely...? Not DIA-NN, a commercial thing, but I did re-learn a lesson that more peptides isn't always a better. But DIA-NN has built enough credibility for me to be hesitantly optimistic that I will like this new version. 

Get it where you get DIA-NN! Probably here https://github.com/vdemichev/diann