Monday, May 19, 2025

MAP-MS! Get more Orbitrap dynamic range for free!

 


I had to sleep on this one and then find out that the coolest new toy I received for my lab needs to charge before I can try it out --


(not this paper - the fact I have to charge a device that I think will dramatically improve life in my lab and that while charging it I realized it has instructions I'm supposed to read) And now I think I get it. 


Remember BoxCar? I vaguely do, but it worked in sort of a similar way. Both try to take advantage of the fact that ion injection times on MS1 scans are really fast but Orbitraps are 


BoxCar works by chopping up your MS1 into lots of little DIA windows and then alternating between MS1 scans. 

Could you do something very similar but with great big boxes that wouldn't slow anything down at all? And if they were static could it be done with absolutely no obvious consequences? 

That's what MAP-MS appears to be! They study the distribution of trypitic peptides in humans to make their big window cuts and then multiplex their MS1 acquisitions to boost the lower abundance ions, reduce the highest abunance and everything seems to just work. 

On DIA experiments they get 11% more coverage without changing anything else. A lot of people will trade in a proteomics instrument for 11% more IDs, so that's pretty appealing. I do have concerns that maybe not every piece of sofware will love the data, but we won't know until we try it! 

Friday, May 16, 2025

DESI profiling of 165 proteoforms across over 10,000 single rat brain cells!

 


Wow. Okay, so I was a little (lot) less excited about the desciption of this technology when sea slug neurons were being profiled when the technology was first described here. Those sea slug neurons can be really convenient to work with since they can grow out on a plate to be easily 100x larger than typical cells. However - these are rat brain cells and these are going to be biologically relevant to more than just sea slug biology! 

They start with rapidly murdering the rats and getting to their brains and dissocation of the cells with papain. The cell suspension is then allowed to sink down and adhere to plates (fuzzy on this procedure, but it has been detailed in previous studies). Then those cells are fixed(?) in glycerol and some in ethanol (?) on the slides and they're ready for DESI analysis. 

DESI is like MALDI where you're moving spatially across a slide but the ionization is very different - from a mass spectrometrist standpoint the most important part is that you're multiply charging things. Here they can get the charges up enough that proteins are picking up enough charges to be detected in an Orbitrap (Exploris) running single ion methods.

It's a brief read, and a really interesting one. There is a really cool supplementary video and if you want to find what proteoforms were actually detected you'll want Supplemental Data 5.4. All the files are up on MASSIVE, but I suspect given the unique nature of the data it might be tough to make sense of them with the tools that I have.

Intact protein analysis of 10,000 freaking rat brain cells?!? 

Wednesday, May 14, 2025

Tuesday, May 13, 2025

PFly - Is this the missing link in LCMS proteomics deep learning models?




Okay - so this one has bugged me (and a lot of other people for a long time) - we can do a pretty great job now of predicting peptide fragmentation (unless the vast majority of PTMs are involved). Supposedly we can do a solid job of predicting peptide elution patterns (exclusively from C-18 reversed phase chromatography). 

What has been missing is predicting what peptides from each protein will actually ionize (or fly). 

This has been tried before, btw -


- however, is often the case in academic software development, many of these lead to 404 errors or software that only runs in Windows 95 - or....well....they aren't very good. 

I'm a little sad to say this but when I did my normal round of sending a paper that I just found yesterday and was reading at lunch the responses were univerally ...skeptical at best.... but maybe this is finally it! 

Introducing pFLY! (I read it at lunch yesterday and it's faded in my mind a little but I'm just about 99.0% that the p stands for Pug) 


??

Friday, May 9, 2025

Benchmarking SILAC workflows in 2025! Wait. What?




Okay - for just a second I thought I'd mistakenly scheduled posting this article about 10 years in the future, but apparently this really is new


Mandatory - 


For any of you scientists out there who aren't getting up in the morning complaining about your joints, SILAC was something we used to do in proteomics all the time. We did it to the point it was called the "gold standard for proteomics quantification" all the time. And not just from the companies that sold us the heavy labeled amino acids which caused every plate of cells to cost $183,000. And at this time that was a big deal because to do good comprehensive proteomics on an Orbitrap XL in 2009 when I was doing it required 15 plates of cells. If you did a great job, 3 weeks of run time would get you 3,000 proteins quantified. Please not that some of these numbers are exaggerated.

Anyway, you'd grow cells passage after passage in media with heavy lysine and arginine until you were pretty sure that all your proteins had heavy labels. Then you'd pretend that you didn't think that cells were way too dumb for something like 18 passages in very strange isotopically labeled amino acids could have any possible phenotypic effects. Then you'd take cells grown without it, treat one with drug and one with DMSO, then pretend that DMSO has no phenotypic effects. Then you'd lyse your cells and mix your light and heavy proteins or digested peptides (I forget, I last used it for a paper in 2010? 2009?) and run them. At the MS1 level you'd see your heavy/light pairs and quantify off those.

There were drawbacks, some of which could probably be inferred from my description of the method above, but a lot? at least some? good science came from it. I can't think of any off the top of my head, but you've probably heard my philosophy that it's best to ignore everything in proteomics before 2017 -and this technique was largely gone by then. 

However - if you did have some reason to do SILAC in 2025 - I bet you'd wonder what could and should process the data! And here you go! 

Silliness aside, I've never considered doing SILAC DIA. 

Oh yeah, you can do some really cool stuff with SILAC by introducing it and then changing the media. That can provide measurements of protein half-life and protein turnover and things like that. There are reasons. Just don't use it for pairwise drug treatment stuff. There are much better ways to do those things now! 

Thursday, May 8, 2025

Top-down proteomics of Alzheimer's tissues reveals entirely new targets!

 


I've got a lot to do today, but this new study is a jaw-dropper.


Quantitative top down (intact protein!) proteomics. Of over 100 individuals (I think over 1,000 samples?!?!?) and multiple proteoform level alterations that appear differential in Alzheimer's? I will come back to this, but you should absolutely check this out. 

Wednesday, May 7, 2025

Lessons learned migrating proteomics to HPC (high performance computing) environments!

 

Sometimes this blog is just what I learn as I'm going through learning something for myself - and this is clearly one of those posts. 

One thing that was not emphasized nearly as well as it could have been during my interviews at Pitt was the absolutely amazing world class High Performance Computational /Computer / Cluster (HPC) framework that we have. 

It took a little work and me bugging colleagues with dumb questions, but I've got some workflows going that need a lot of firepower! 

Namely things like FragPipe open search - and R packages that almost inevitably require ludicrous amounts of RAM. 



Things I've learned so far. 

1) The time to get my data to the HPC can be a bottleneck worth considering. My TT Ultra2 is generating around 160GB of data/day right now. Around 1.5GB per single cell and closer to 4GB for libraries and QC samples. Seems to average out pretty close to 160GB. Transferring 1 day of files to the HPC seems to be around 1-2 hours. Not a big deal, but something to consider if you're the person prepping samples, running the instruments, writing the grants and papers, writing blogposts and picking your kid up on time from daycare every day. Worth planning those transfer out. 

2) NOT ALL PROGRAMS YOU USE WORK IN LINUX. FragPipe, SearchGUI/PeptideShaker, MaxQuant are all very very pretty in Linux. Honestly, they look nicer and probably run better than in Windows. DIA-NN will run in Linux, but you do lose the GUI. You have to go command line. But what you can do is set up your GUI runs and then export those from DIA-NN. Maybe I'll show that later. 

3) You may need to have good estimates of your time usage. In my case I currently get a 50,000 core hour allotment. If I am just doing 80 Fragpipe runs, I need to think about 

Cores I need x number of hours I need those cores. I can't request more than 128 cores simultaneously right now (for some reason, yesterday I could only request 64 with FragPipe, I should check). But if I need 128 cores - do I need those for 10 hours? If so, thats' 1,280 core hours I will blow through. 

Since MSFragger is ultra-fast but match between runs and MS1 ion extraction is less fast and uses fewer maximum cores/file, there isn't a difference for a small dataset for using 32 cores. Your bottlenecks aren't where you really scale up forever.

4) Things that are RAM dependent may be WAY WAY FASTER. I think we scale to 8GB of RAM/core on our base clusters here. 32 cores gives me 256 GB of RAM! If your program is fast enough to read/write to offset a lack or RAM or use every amount of RAM around to maximum effect, those things can be much much faster.

5) Processes that are processing core speed dependent may be slower. For a test, I gave FragPipe 22 14 cores on a desktop in my lab and 14 cores on the HPC with the same 2 LFQ files. Unsurprisingly, you can really crank up the Ghz on desktop PCs where it makes sense to have lower overall core speeds when you have 10,000 cores sitting around. 

6) You probably need help with all installation and upgrades. Most of us are used to that by now, though. I can upgrade my lab PCs to FragPipe 23 today. I need to put in a service request to have someone upgrade me on the HPC. 

7) You may have to wait in line. I tried to set up some FragPipe runs before bed and requested the HPC allotments. Then I dozed off in my chair waiting my turn. Then when I woke up the clock had already started ticking. I wasn't using my cores, but I had blocked them so no one else could use them, so they did count against me. 

I'll probably add to this later -but I highly recommend this recent study out of Harvard which has been my go-to guide.

Also this study trying to address the LFQ bottlenecks! 

Monday, May 5, 2025

6,500 proteins per (LARGE) mouse cell (oocyte) and studying age related changes!

 


I wasn't sure if I liked this study or not, but then I got to an interesting and very counter-intuitive observation and then the biology and decided that I did. 


It's not the first single cell oocyte paper we've seen, and it should be noted that they are quite big cells. These authors estimated them at about 2 nanograms of protein, which seems right based on what I remember from another study. 

One thing that I find really surprising here is that - unlike previous studies - this group tried the reduced volume of 384 well plates and found autosampler vials more reproducible. I'm stumped on this one. This is contrary to everything I've seen and Matzinger et al., found and is frankly just counter intuitive across the board. 

The surface area of an autosampler vial is huge, comparatively to the bottom of a 384 well plate. I do find it a complete pain in the neck to calibrate some autosamplers for accurately picking up out of 384 well plates, but I don't know how much that plays in here. Also some glass binds less peptides than some plastics. Insert shrug. 

That aside, the authors put one oocyte into things with the CellenOne and then add digest. Incubate and inject. 60 min run to run on a 50um x 20cm column and running diaPASEF with a 166ms ramp time. 

Data analysis was in SpectroNaut. 

Okay, and the reason this is escaping the drafts folder is because the biology is really cool. They look at both artificial (handling) and natural (aging linked) conditions and how they effect single oocytes. There are a lot of people out there who care about how those things (probably not in mice, but maybe?) change throughout the aging process! 

Editors make statement on proteomics transparency AND a video for how to make your data available!

 


I wonder if this was inspired by some of the same things that I was just complaining about? 

Okay, so rather than just complain about it, I also went crowdsourcing to find resources - and here is a 4 minute video showing you how to make your data publicly available on PRIDE! 



Sunday, May 4, 2025

Use single cell proteomics (SCP) to add biological relevance to single cell sequencing (scSeq) data!

 


Transcript abundance tells you what a cell wants to do.

Peptide/protein abundance tells you what the cell is actually doing.

You can get measurements of the transcripts of tens of thousands of cells with a few hours of effort and passing it off with reports coming back in a few days.

Each single cell proteome is a lot slower and a lot more expensive, but worth it for the whole... biological relevance... thing.... 

What if you could do a scSeq on tons and tons of cells - and single cell proteomics (SCP) on a small number to correct all that scSeq data? Would you be downloading that as fast as you possibly could?