Wednesday, April 29, 2026

What is a token? Running AI /LLMs locally for proteomics people?

 


I had a really weird conversation this week when people were talking about how many "tokens" they were using for making AIs do things poorly for them.

Look, I'm also getting AIs to poorly do things for me that I don't know how to do. What I'm not doing is 
1) Paying for them...
2) Letting some money hoarding corporate weirdos see what I don't know how to do by sending my prompts off to some AI datacenter they knocked down a park to build.

And the LLMs on modern hardware can run faster than the cloud based ones because the upload/download speed can be the bottleneck. 

So! Ben's short and poorly written guide to running an AI / LLM thing locally on a new or old PC.

Disclaimer and clarification: I know people have to use these for their jobs and they have their own local instances that are on their own HPCs so their work can control data access, etc., This isn't shade for you at all. I was surprised by all of this and I'm sharing it. 

For this example I'm going to use my GTX 1080T video card I purchased to run PacMan on a really really big screen in/around 2017/2018. Possibly longer ago than that. 

Since I'm dumb, I use a Graphical User Interface (GUI) called 
LM Studio




Once you install it, you need a Model. For this example I'm just going to use the first one that's famous. It rhymes with Chutney. 


No joke, it's seriously that easy. I like this big old PC that will be retired soon because it 
1) Doesn't have a wifi card
2) I can just disconnect the ethernet cable from it. 
3) It has trouble telling what the year is. I have the same problem. 

Once I know it's offline and I've confirmed I haven't had another head injury or something and I do know what year it is, then I ask it things that I know stuff about. In this example I asked it about single cell proteomics. The answers are seriously no worse than what the ones on the Cloud will give you. It did blow my mind when I realized this. 

For real, if you're paying for one of these things you should try it. The reason I like to have a PC I can physically disconnect is that some of the available AI models written for data centers can't tell if they're online or not. ChutNeyPT will INSIST sometimes that it is running on a GPU farm in Arkansas when I know it's running on a GPU that is roughly 80% cat and Pug fur by actual weight. 

Honestly, the 8GB model that runs on this old GPU does have some very noticeable lag. And the total data it is drawing from is significantly smaller than other models. It's got to squeeze into 8GB so some things have to go. 

If you want it to run faster than the internet/cloud versions you need to get something newer. The 1080 video card is ooooooold.... 5090 is on the market now and they haven't released a new generation every year. More like every 1.5-2 years. An M4 Mac with 24GB of unified memory that I got last year for $1300 is legitimately lag free. So. Fast. 

Which brings up this question. What are all the huge data centers for? 

When I say that I'm doing dumb things with these AIs, I'd like to humbly consider that - as a scientist without any real hobbies except...proteomics.... the stuff I'm doing with these LLMs might be harder than what the average person typing prompts is doing. And....like....I'm also blasting the new At the Gates album on this same PC. I think I've got 40 tabs open and I've got 2 separate Python APIs open because I don't know where the default folders are located and I don't want to save the side scroller I've been tinkering with for 8-10 years and will likely never finish with the work scripts that I'll likely also never finish. So....like what are the 40 zillion core data centers doing other than accelerating the collapse of our climate?  

Is this a tutorial or a rant by someone who is ultimately very confused. 

No comments:

Post a Comment